Entries tagged [machine]
Ignite 2.8 Released: Less Stress in Production and Advances in Machine Learning
With thousands of changes contributed to Apache Ignite 2.8 that enhanced almost all the components of the platform, it’s possible to overlook some of the improvements that can convince you to upgrade to this version sooner than later. While a quick check of the release notes will help to discover anticipated bug fixes, this article aims to guide through enhancements every Ignite developer should be aware of.
New Subsystem for Production Monitoring and Tracing
Several months of constant work on IEP-35: Monitoring & Profiling has resulted in the creation of a robust and elastic subsystem for production monitoring and diagnostic (aka. profiling). This was influenced by the needs of many developers who deployed Ignite in critical environments and were asking for a foundation that can be integrated with many external monitoring tools and be expanded easily.
The new subsystem consists of several registries that group individual metrics related to a specific Ignite component. For instance, you will find registries for cache, compute, or service grid APIs. Since the registries are designed to be generic, specific exporters can observe the state of Ignite via a myriad of tools supporting various protocols. By default, Ignite 2.8 introduces exporters for monitoring interfaces such as log files, JMX and SQL views, and contemporary ones such as OpenCensus.
Presently, this new subsystem is released in an experimental mode only to give Ignite users some time to check the new API and suggest any improvements. Since the developer community is already impatient to remove the experimental flag, don’t delay!
Advances in Ignite Machine Learning
Machine Learning (ML) capabilities of Ignite 2.8 are so drastically different from previous versions that if you’ve been waiting for the best moment to use the API, then the time has come. Let’s scratch the surface here and learn more details from the updated documentation pages.
A model training is usually a multi-step process that goes with preprocessing, training, and evaluation/valuation phases. A new pipelining API puts things in order by combining all the phases in a single workflow.
In addition to the pipelining APIs, Ignite 2.8 introduced ensemble methods, which allow combining several machine learning techniques into one predictive model to decrease variance (bagging) and bias (boosting), or improve predictions (stacking).
Furthermore, now you can import Apache Spark or XGBoost models to Ignite for further inference, pipelining other tasks. Feel free to keep training a model with your favorite framework and convert it to Ignite representation once the model needs to be deployed in production and executed at scale.
Beyond Java: Partition-Awareness and Other Changes
Even though Ignite is a Java middleware, it functions as a cross-platform database and compute platform that is used for applications developed in C#, C++, Python, and other programming languages.
Thin client protocol is a real enabler for other programming languages support, and with Ignite 2.8, it got a significant performance optimization by supporting partition-awareness. The latter allows thin clients to send query requests directly to nodes that own the queried data. Without partition awareness, an application that is connected to the cluster via a thin client executes all queries and operations via a single server node that acts as a proxy for the incoming requests.
Check the detailed blog post by Pavel Tupitsyn, Ignite committer and PMC, who elaborates on the partition-awareness feature and introduces other .NET-specific enhancements.
Less Stress in Production
This section lists top improvements that might not have striking or catchy names but can bring relief by automating and optimizing things, and by avoiding data inconsistencies when you are already in production.
The stop-the-world pauses triggered by Java garbage collectors impact performance, responsiveness, and throughput of our Java applications. Apache Ignite has a partition-map-exchange (PME) process that, as Java garbage collectors, has some phases that put on hold all running operations for the sake of cluster-wide consistency. For most of the Ignite usage scenarios, these phases complete promptly and are unnoticed. However, some low-latency or high-throughput use cases can detect a decline that might impact some business operations for a moment in time. This wiki page lists all the conditions that can trigger a distributed PME, and with Ignite 2.8, some of them were taken off the list -- the blocking PME no longer happens if a node belonging to the current baseline topology leaves the cluster or a thick client connects to it.
Next, we all know that things break, and what really matters is how a system handles failures. With Ignite 2.8, we revisited the way the cluster handles crash recoveries on restarts while replaying write-ahead-logs (check IGNITE-7196 and IGNITE-9420). Also, the read-repair feature was added to manage data inconsistencies between primary and backups copies of the cluster on-the-fly.
Furthermore, it’s worth mentioning that Ignite 2.8 became more prudent about disk space consumption by supporting the compaction of data files and write-ahead-logs of the native persistence. By sacrificing a bit more CPU cycles for the needs of compaction algorithms, you can save a lot on the storage end.
Last but not least, is an auto-baseline feature that changes a cluster topology for deployments with Ignite native persistence without the need for your intervention in many scenarios. Check this documentation page for more details.
Reach out to us on the community user list for more questions, details, and feedback.
Ignite contributors and committers
Posted at 12:00AM Mar 11, 2020 by Denis Magda in General | |
Apache Ignite 2.4 Brings Advanced Machine Learning and Spark DataFrames Capabilities
Usually, Ignite community rolls out a new version once in 3 months, but we had to make an exception for Apache Ignite 2.4 that consumed five months in total. We could easily blame Thanksgiving, Christmas and New Year holidays for the delay and would be forgiven, but, in fact, we were forging the release you can't simply pass by.
Let's dive in and search for a big fish.
Machine Learning General Availability
Eight months ago, at the time of Apache Ignite 2.0, we put out the first APIs that formed the foundation of the Ignite's machine learning component of today. Since that time, Ignite machine learning experts and enthusiasts have been moving the library to the general availability condition meticulously. And Ignite 2.4 became a milestone that let us consider the ML Grid to be production ready.
The component gained a variety of algorithms that can solve a myriad of regression and classification tasks, gave an ability to train models avoiding ETL from Ignite to other systems, paved a way to deep learning usage scenarios. All that now empowers Ignite users with the tools for dealing with fraud detection, predictive analytics, and for building recommendation systems...if you want. Note, ETL is optional, and the whole memory-centric cluster is at your service!
Moreover, Machine Learning Grid welcomed a software donation by NetMillennium, Inc. in the form of genetic algorithms that solve optimization problems by simulating the process of biological evolution. The algorithms haven't got to Ignite 2.4 and waiting for their time for a release in the master branch. Once you get them, you can apply the biological evolution simulation for real-world applications including automotive design, computer gaming, robotics, investments, traffic/shipment routing and more.
It's not a joke or misprint. Spark users, the DataFrames are now officially supported for you! Many of you have been anticipating them for years and, thanks to Nikolay Izhikov, who was "promoted" to an Ignite committer for the contribution, now you can leverage from them.
No need to be wordy here. Just go ahead and start with DataFrames in Ignite.
Expanding Ignite ecosystem
It was unfair that only Java, C#, and C++ developers could utilize the breadth and depth of Ignite APIs in their applications. Ignite 2.4 solved the injustice with its new low-level binary client protocol. The protocol communicates with an existing Ignite cluster without starting a full-fledged Ignite node. An application can connect to the cluster through a raw TCP socket from any programming language you like.
The beauty of the protocol is that you can develop a so-called Ignite thin client that is a lightweight client connected to the cluster and interacts with it using key-value, SQL, and other APIs. .NET thin client is already at your service and Node.JS, Python, PHP, Java thin clients are in a forge and being developed for the next releases.
RPM repository and much more
So, now Apache Ignite can also be installed from the official RPM repository. Debian users, the packages for your operating systems to be assembled soon.
Overall, if to list all the features and benefits Ignite 2.4 brings, only 2 people will read the article till the end - me and my dear mom Thus, I'll let you discover the rest from the release notes.
Apache Ignite 2.0: Redesigned Off-heap Memory, DDL and Machine Learning
We released the long-awaited Apache Ignite version 2.0 on May 5. The community spent almost a year incorporating tremendous changes to the legacy Apache Ignite 1.x architecture. And all of that effort paid off. Our collective blood, sweat (and perhaps even a few tears) opened up new and exciting opportunities for the Apache Ignite project.
Have I piqued your interest about this new release yet? Let's walk through some of the main new features that have appeared under the hood of Apache Ignite 2.0.
Reengineered Off-Heap Memory Architecture.
The platform’s entire memory architecture was reengineered from scratch. In a nutshell, all of the data and indexes are now stored in a completely new manageable off-heap memory that has no issues with memory fragmentation, accelerates SQL Grid significantly and helps your application easily tolerate Java GC pauses.
Take a peek at the illustration below and try to guess what’s changed. Afterward, please read this documentation to see if your eye caught everything that’s new.
Here’s something extremely noteworthy: the architecture now integrates seamlessly with disk drives. Why do we care about this? Stay tuned!
Data Definition Language.
This release introduces support for Data Definition Language (DDL) as a part of its SQL Grid functionality. Now you can define -- and, what’s more important, alter -- indexes in runtime without the need to restart your cluster. Apache Ignite users have long awaited this feature! Even more exciting news: users can leverage this with standard SQL commands like CREATE or DROP index. This is only the beginning! Go to this page to learn more about current DDL support.
Machine Learning Grid Beta - Distributed Algebra.
Apache Ignite is about more than in-memory storage. And it’s not just one more product for distributed computations or real-time streaming. It's much, much more than that. It's a hot blend of well-integrated distributed and highly concurrent modules that turned Apache Ignite into what is today: A robust data-fabric and framework with the goal of making your application thrive and outperform even the best of expectations.
But there was one thing missing until now. Drumroll, please: machine-learning support!
With Apache Ignite 2.0 you can check project’s own distributed algebra implementation. The distributed algebra is the foundation of the entire component. And soon you can expect to get distributed versions of widely used regression algorithms, decision trees and more.
Spring Data Integration.
Spring Data integration allows the interaction of an Apache Ignite cluster using the well-known and highly adopted Spring Data Framework. You can connect to the cluster by means of Spring Data repositories and start executing distributed SQL queries as well as simple CRUD operations.
Are you using Rocket MQ in your project and need to push data from the Rocket to Ignite? Here is an easy solution.
Hibernate L2 cache users have been anticipating support of Hibernate 5 on Apache Ignite for quite a long time. Apache Ignite 2.0 grants this desire. The integration now supports Hibernate 5 and contains a number of bug fixes and improvements.
Ignite.NET has been enhanced with an addition of a plugin system that allows the writing and embedding 3rd party .NET components into Ignite.NET.
The Ignite.C++ part of the community finally came up with a way to execute arbitrary C++ code on remote cluster machines.
This approach was initially tested for continuous queries. You can now register continuous queries' remote filters on any cluster node you like. Going forward you can expect support for the Ignite.C++ compute grid and more.
Want to learn more? Please join me June 7 for a webinar titled, “Apache® Ignite™: What’s New in Version 2.0.” I hope to see you there!
P.S. Just in case you can’t wait until June… here's a full list of the changes inside Apache Ignite 2.0.