Entries tagged [data]

Wednesday May 31, 2017

The Apache Software Foundation Announces Apache® SystemML™ as a Top-Level Project

Open Source Big Data machine learning platform in use at Cadent Technology and IBM Watson Health, among other organizations.

Forest Hill, MD –31 May 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® SystemML™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache SystemML is a machine learning platform optimal for Big Data that provides declarative, large-scale machine learning and deep learning. SystemML can be run on top of Apache Spark, where it automatically scales data, line by line, to determine whether code should be run on the driver or an Apache Spark cluster.

"Today, the machine learning revolution is leading to thousands of life-altering innovations such as self-driving cars and computers that detect cancer," said Deron Eriksson, Vice President of Apache SystemML. "Apache SystemML enables and simplifies this process by executing optimized high-level algorithms on Big Data using proven technologies such as Apache Spark and Apache Hadoop MapReduce."

The core of Apache SystemML has been created from the ground up with the following design principles in mind: 

  • Performance and Scalability, as SystemML scales up on single nodes, and scales out on large clusters using Apache Spark or Apache Hadoop;
  • "Designed for data scientists", enabling data scientists to develop algorithms in a system with a strong foundation in linear algebra and statistical functions; and 
  • Cost-based optimization for scalable execution plans, that significantly shortens and simplifies the development and deployment cycle of algorithms for varying data characteristics and system configurations.

Using Apache SystemML, data scientists are able to implement algorithms using high-level language concepts without knowledge of distributed programming. Depending on data characteristics such as data size/shape and data sparsity (dense/sparse), and cluster characteristics such as cluster size and memory configurations, SystemML's cost-based optimizing compiler automatically generates hybrid runtime execution plans that are composed of single-node and distributed operations on Apache Spark or Apache Hadoop clusters for best performance.

"SystemML allows Cadent to implement advanced numerical programming methods in Apache Spark, empowering us to leverage specialized algorithms in our predictive analysis software," said Michael Zargham, Chief Scientist at Cadent Technology.

"SystemML is like SQL for Machine Learning, it enables Data Scientists to concentrate on the problem at hand, working in a high-level script language like R, and all the optimizations and rewrites are handled by the very powerful SystemML optimizer that considers data and available resources to produce the best execution plan for the application," said Luciano Resende, Architect at the IBM Spark Technology Center and Apache SystemML Incubator Mentor.

"IBM Watson Health VBC is using Apache SystemML on Apache Spark to build risk models on a very large EHR data set to predict emergency department visits," said Steve Beier, Vice President of Value Based Care Platform and Analytics at IBM Watson Health. "The models identify high-risk patients so that they can be targeted with preemptive strategies, thus potentially reducing care costs while at the same time leading to optimal outcomes for patients."

SystemML originated at IBM Research - Almaden in 2010, and was submitted to the Apache Incubator in November 2015. SystemML initiated compressed linear algebra research, a differentiating feature in SystemML, which received the VLDB 2016 Best Paper.

"The Apache Incubator is all about open collaboration and communication and was invaluable for everyone involved in SystemML," added Eriksson. "The Apache SystemML community sincerely encourages everyone interested in machine learning and deep learning to help build our community around this revolutionary technology."

Catch Apache SystemML in action at the Big Data Developers Silicon Valley MeetUp on 8 June 2017 in San Francisco, CA.

Availability and Oversight
Apache SystemML software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache SystemML, visit http://systemml.apache.org/ and https://twitter.com/ApacheSystemML

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit https://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "SystemML", "Apache SystemML", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday May 17, 2017

The Apache Software Foundation Announces Apache® Beam™ v2.0.0

Open Source unified programming model for batch and streaming Big Data processing in use at Google Cloud, PayPal, and Talend, among others.

Forest Hill, MD —17 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Beam™ v2.0.0, the first stable release of the unified programming model for both batch and streaming Big Data processing.

An Apache Top-Level Project (TLP) since December 2016, Beam includes Java and Python software development kits used to define data processing pipelines and runners to execute them on Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow, among other execution engines.

Apache Beam has its roots in Google's internal work on data processing over the last decade, evolving from the initial MapReduce system, through FlumeJava and MillWheel, into Google Cloud Dataflow v1.x, which defined the unified programming model that became the heart of Apache Beam.

"The first stable release is an important milestone for the Apache Beam community," said Davor Bonaci, Vice President of Apache Beam. "This is a statement from the community that it intends to maintain API stability with all releases for the foreseeable future, making Beam suitable for enterprise deployment."

Apache Beam v2.0.0 improves user experience across the project, focusing on seamless portability across execution environments, including engines, operating systems, on-premise clusters, cloud providers, and data storage systems. Other highlights include:
  • API stability and future compatibility within this major version;
  • Stateful data processing paradigms that unlock efficient, data-dependent computations;
  • Support for user-extensible file systems, with built-in support for Hadoop Distributed File System, among others; and
  • A metrics subsystem for deeper insight into pipeline execution.

Apache Beam is in use at Google Cloud, PayPal, and Talend, among others.

"Apache Beam is a mature data processing API for the enterprise, with powerful semantics that solve real-world challenges of stream processing," said Tomer Pilossof, Big Data Manager at PayPal. "With Beam, we provide data processing solutions for a wide range of customers within the PayPal organization."

"We at Talend are thrilled to have contributed to Apache Beam reaching the 2.0.0 milestone and its first official stable release," said Laurent Bride, Chief Technology Officer at Talend. "Apache Beam is now part of the foundation of Talend products. Recently, we released Talend Data Preparation for Big Data which leverages Beam to create transformation pipelines that are portable across many execution engines. Later this year, we plan to deliver Talend Data Streams, taking the Apache Beam integration one step further by utilizing its powerful streaming semantics. Whether for batch, streaming, or real-time use cases, Apache Beam is a powerful framework that delivers the flexibility and advanced functionality our customers need."

"We congratulate the Apache Beam community for reaching the key milestone of a first stable release," said William Vambenepe, Lead Product Manager for Big Data, Google Cloud. "We look forward to our Google Cloud Dataflow customers taking full advantage of Beam's powerful programming model and newest features to run their data processing pipelines on Google Cloud."

Apache Beam v2.0.0 is making its debut at Apache: Big Data, taking place this week in Miami, FL, with four sessions featuring Apache Beam. Apache Beam will also be highlighted at numerous face-to-face meetups and conferences, including the Future of Data San Jose meetup, Strata Data Conference London, Berlin Buzzwords, and DataWorks Summit San Jose.

"I'd like to invite everyone to try out Apache Beam v2.0.0 today and consider joining our vibrant community," added Bonaci. "We welcome feedback, contribution and participation through our mailing lists, issue tracker, pull requests, and events."

Availability and Oversight
Apache Beam software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Beam, visit https://beam.apache.org/ and https://twitter.com/ApacheBeam

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server -- the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Beam", "Apache Beam", "Apex", "Apache Apex", "Flink", "Apache Flink", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday May 15, 2017

The Apache Software Foundation Announces Apache® Samza™ v0.13

Open Source Big Data distributed stream processing framework in production at Intuit, LinkedIn, Netflix, Optimizely, Redfin, and Uber, among other organizations.

Forest Hill, MD —15 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Samza™  v0.13, the latest version of the Open Source Big Data distributed stream processing framework.

An Apache Top-Level Project (TLP) since January 2015, Samza is designed to provide support for fault-tolerant, large scale stream processing. Developers use Apache Samza to write applications that consume streams of data and to help organizations understand and respond to their data in real-time. Apache Samza offers a unified API to process streaming data from pub-sub messaging systems like Apache Kafka and batch data from Apache Hadoop.

"The latest 0.13 release takes Apache Samza's data processing capabilities to the next level with multiple new features," said Yi Pan, Vice President of Apache Samza. "It also improves the simplicity and portability of real-time applications."

Apache Samza powers several real-time data processing needs including realtime analytics on user data, message routing, combating fraud, anomaly detection, performance monitoring, real-time communication, and more. Apache Samza can process up to 1.1 million messages per second on a single machine. v0.13 highlights include:
  • A higher level API that developers can use this to express complex processing pipelines on streams more concisely;
  • Support for running Samza applications as a lightweight embedded library without relying on YARN;
  • Support for flexible deployment options; 
  • Support for rolling upgrade of running Samza applications;
  • Improved monitoring and failure detection using a built-in heart beating mechanism;
  • Enabling better integrations with other cluster-manager frameworks and environments; and
  • Several bug-fixes that improve reliability, stability and robustness of data processing,

Organizations such as Intuit, LinkedIn, Netflix, Optimizely, Redfin, TripAdvisor, and Uber rely on Apache Samza to power complex data architectures that process billions of events each day. A list of user organizations is available at https://cwiki.apache.org/confluence/display/SAMZA/Powered+By

"Apache Samza is a highly performant stream/data processing system that has been battle tested over the years of powering mission critical applications in a wide range of businesses," said Kartik Paramasivam, Head of Streams Infrastructure, and Director of Engineering at LinkedIn. "With this 0.13 release, the power of Samza is no longer limited to YARN based topologies. It can now be used in any hosting environment. In addition, it now has a new higher level API that makes it significantly easier to create arbitrarily complex processing pipelines."

"Apache Samza has been powering near real-time use cases at Uber for the last year and a half," said Chinmay Soman, Staff Software Engineer at Uber. "This ranges from analytical use cases such as understanding business metrics, feature extraction for machine learning as well as some critical applications such as Fraud detection, Surge pricing and Intelligent promotions. Samza has been proven to be robust in production and is currently processing about billions of messages per day, accounting for 100s of TB of data flowing through the system." 

"At Optimizely, we have built the world’s leading experimentation platform, which ingests billions of click-stream events a day from millions of visitors for analysis," said Vignesh Sukumar, Senior Engineering Manager at Optimizely. "Apache Samza has been a great asset to Optimizely's Event ingestion pipeline allowing us to perform large scale, real time stream computing such as aggregations (e.g. session computations) and data enrichment on a multiple billion events/day scale. The programming model, durability and the close integration with Apache Kafka fit our needs perfectly."

"It has been a phenomenal experience engaging with this vibrant international community of users and contributors, and I look forward to our continued growth. It is a great time to be involved in the project and we welcome new contributors to the Samza community," added Pan.

Catch Apache Samza in action at Apache: Big Data, 16-18 May 2017 in Miami, FL http://apachecon.com/ , where the community will be showcasing how Samza simplifies stream processing at scale.

Availability and Oversight
Apache Samza software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Samza, visit http://samza.apache.org/ , https://blogs.apache.org/samza/ , and https://twitter.com/samzastream

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", "Kafka", "Apache Kafka", "Samza", "Apache Samza", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday May 01, 2017

The Apache Software Foundation Announces Apache® CarbonData™ as a Top-Level Project

Open Source Big Data analytics accelerator in use at Bank of Communications, Hulu, Huawei, SAIC Motor, Zhejiang Mobile, among others.

Forest Hill, MD –1 May 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® CarbonData™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data.

"We are very proud to complete the incubation process and graduate as an Apache Top-Level Project," said Liang Chen, Vice President of Apache CarbonData. "The CarbonData community grew rapidly over last ten months, both in terms of size and diversity. Since entering the Apache Incubator, we have completed 4 releases, and exceeded 90 contributors from 10 different organizations."

With the aim of using a unified file format to satisfy all kinds of data analysis cases, Apache CarbonData seamlessly integrates with Hadoop and Spark to improve Big Data analysis efficiency. In benchmarks, CarbonData's faster interactive query helps in speeding up queries approximately 10x faster than standard column-oriented SQL on Hadoop data stores.

Highlights include:

  • Unique data organization to allow faster filtering and better compression;
  • Multi-level Indexing to enable faster search and speeding up query processing;
  • Deep Apache Spark Integration for dataframe + SQL compliance;
  • Advanced push down optimization to minimize the amount of data being read processed, converted, transmitted, and shuffled;
  • Efficient compression and global encoding schemes to further improve aggregation query performance;
  • Dictionary encoding for reduced storage space and faster processing; and
  • Data update + delete support using standard SQL syntax.


Apache CarbonData is in use at an array of organizations, including Bank of Communications, medical/pharma social platform DXY, Hulu, Huawei, group online retailer MEITUAN, SAIC Motor, Zhejiang Mobile, among others.

"CarbonData has very good performance as a ‘SQL on Hadoop’ solution," said Tan Sheng, Director of SAIC Motor’s Big Data team. "It is suitable for SAIC Motor to adopt as a central Big Data platform component. Not only do we use Apache CarbonData, we also actively participate in its community as contributors." 

"Apache CarbonData is great, as helped our audit business to improve 7-10X performance based on 14 billion rows of data," said Wei Zhao, Senior Engineer at Bank of Communications.

"Apache CarbonData is very suitable for our filter query cases, and has averaged 20x improvement on performance," said William Zhu, Architecture team member at DXY. "And, as CarbonData supports data update and delete, this feature is very useful. We would consider CarbonData as our all-in-one solution to unify all analysis data."

CarbonData was first developed at Huawei in 2013. The project was submitted to the Apache Incubator in June 2016, and had its first official release two months later. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category.

"Apache CarbonData is a great example of the value of the incubation process," said Jean-Baptiste Onofré, Apache CarbonData Incubator Mentor and Project Management Committee member. "Helping grow the CarbonData developer and user communities has increased our visibility, which allowed us to extend our use cases and tests, and gather new ideas. The initial CarbonData committers did (and are still doing) great work to welcome new users and contributors, clearly understanding it's a step forward for the project."

"We will continue to put our efforts towards optimizing data format efficiency for Big Data ecosystem and provide an unified and high performance data storage solution," added Liang. "The Apache CarbonData community welcomes interested contributors to work with us on our journey forward."

Catch Apache CarbonData in action at ApacheCon (16-18 May/Miami), and Spark Summit (5-7 June/San Francisco).

Availability and Oversight
Apache CarbonData software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache CarbonData, visit http://carbondata.apache.org/ , https://twitter.com/ApacheCarbonDat , and https://www.facebook.com/carbondata/

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "CarbonData", "Apache CarbonData", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # # 

The Apache Software Foundation Announces Apache® Mahout™ v0.13.0

Open Source scalable machine learning and data mining library for Big Data artificial intelligence now more powerful and easier to use.

Forest Hill, MD —1 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® MahoutTM v0.13.0, the latest version of the Open Source scalable machine learning library.

Apache Mahout provides an environment for quickly creating machine-learning applications that scale and run on the highest-performance parallel computation engines available. Mahout is the first scalable generalized tensor and linear algebra solving engine taking data scientists from interactive experiments to production use.

"Apache Mahout 0.13.0 is more powerful with its new algorithm framework that allows for easier implementation of machine learning algorithms," said Andrew Palumbo, Vice President of Apache Mahout. "The enhanced Mahout code base and development framework make machine learning even more accessible, which is a game changer in the field of artificial intelligence."

Mahout provides a wide variety of premade algorithms (Matrix Factorization, QR via ALS, SSVD, PCA, etc.) for Scala + Apache Spark, H2O, and Apache Flink, as well as on-GPU compute for performance improvements in very large tensor math. Apache Mahout provides the data science tools to automatically find meaningful patterns in Big Data sets by supporting the following main data science use cases:
  • Collaborative filtering – mines user behavior and makes product recommendations (such as eCommerce product recommenders);
  • Regression – estimates a numerical value based on values of other inputs;
  • Clustering – takes items in a particular class (such as Web pages or newspaper articles) and organizes them into naturally occurring groups, such that items belonging to the same group are similar to each other; and
  • Classifying – learns from existing categorizations and then assigns unclassified items to the best category.

New in v0.13.0
Apache Mahout now makes it easier to do matrix math on graphics cards, which is relevant for most modern machine-learning and deep-learning methods. In addition, v0.13.0 allows shared nothing computation on GPUs, on multi-core CPU, or in the JVM as appropriate, as well as a simplified framework for building new algorithms. As Mahout comprises an interactive environment and library that support generalized scalable linear algebra and include many modern machine-learning algorithms, the project has also collaborated with developers on other projects, including the Open Source linear algebra library ViennaCL, the Java wrapper library interface JavaCPP, and the graphics processor technology manufacturer NVIDIA to add CUDA bindings directly into Mahout for simplicity of development.

The v0.13.0 release reflects 62 separate JIRA issues from v0.12.2, including numerous enhancements to Mahout-Samsara, the vector math experimentation environment with R-like syntax that works at scale. Complete release notes are at http://mahout.apache.org/release-notes/Apache-Mahout-0.13.0-Release-Notes.pdf

Future versions of Mahout will include support for native iterative solvers, a more robust algorithm library, and smarter probing and optimization of multiplications, among other features.

A comprehensive list of users of Apache Mahout is available at https://mahout.apache.org/general/powered-by-mahout.html ; current users are mostly researchers and developers actively involved in building distributed machine-learning pipelines and tools.

"We thank our community of developers and users who helped make this milestone release possible, and welcome new contributors to help us advance machine learning," added Palumbo.

Catch Apache Mahout in action at Apache: Big Data, where attendees learn first-hand from many original project creators and companies from the greater Mahout community. Apache: Big Data will be held 16-18 May 2017 in Miami, FL. To register, and for more information, visit http://apachecon.com/

Availability and Oversight
Apache Mahout software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Mahout, visit http://mahout.apache.org/ and https://twitter.com/ApacheMahout

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Flink", "Apache Flink", "Mahout", "Apache Mahout", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday April 25, 2017

The Apache Software Foundation Announces Apache® cTAKES™ v4.0

Widely adopted Open Source biomedical data extraction, annotation, and clinical information management platform now faster and easier to use.

Forest Hill, MD —25 April 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® cTAKES™ v4.0, the latest version of the Open Source natural language processing system for information extraction from health-related free-text.

Apache cTAKES (clinical Text Analysis Knowledge Extraction System) is a natural-language processing based information extraction platform for health-related text that identifies signals important for the biomedical domain including types of clinical named entities mapped to various biomedical terminologies/ontologies such as the Unified Medical Language System (UMLS) -- drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures along with their associated attributes such as negation, uncertainty, and more.

"Apache cTAKES has helped considerably advance biomedical data extraction and clinical information management over the last several years," said Pei Chen, Vice President of Apache cTAKES. "We are proud to lead the development of a widely adopted, interoperable, community-driven solution for clinical decision support systems and clinical research. The improvements in v4 makes cTAKES easier to use, thereby benefiting the greater medical community."

cTAKES originated in 2006 by a team of physicians, computer scientists, and software engineers at Mayo Clinic, and was submitted to the Apache Incubator in June 2012. cTAKES was built using the Apache UIMA (Unstructured Information Management Architecture) framework and Apache OpenNLP machine-learning based toolkit for the processing of health-related natural language text. Apache cTAKES components create rich linguistic and semantic annotations that have been utilized for a variety of biomedical use cases including clinical decision support systems and clinical research. 

Highlights of Apache cTAKES v4 include:
  • Dictionary Builder Graphical user interface (GUI) for easy dictionary selection and build-up;
  • Pipe Bits to be used to describe cTAKES modules for programs that help users create pipelines such as document descriptions of components, and inputs, outputs, parameters, dependencies implemented as Java annotations simplifies pipeline builders indicates whether a component is a Collection Reader, Annotator, or a Cas Consumer (Writer);
  • Piper files, allowing fast and easy creation and modification of custom pipelines with many capabilities;
  • Graphical user interface (GUI) for easy pipeline creation to select cTAKES components, view descriptions of the components, and inputs, outputs, parameters, dependencies implemented using the new Pipe Bits;
  • Example Clinical Documents with manual expert annotations of clinical narratives (mock ups). The narratives were annotated using the Open Source Anafora annotation tool (https://github.com/weitechen/anafora);
  • Temporal module for extraction of events, time expressions, and temporal relations; and
  • Numerous bug fixes that resulted in a more stable, much faster and robust release

"Apache cTAKES v4 release is a pivotal milestone that incorporates state-of-the-art methods for some of the most difficult tasks in clinical narrative processing and information extraction, namely coreference resolution and temporality. Integrating novel user friendly interfaces and a scaled up optimization of its core concept mapper, v4 provides the open-source and medical communities a stable, industrial strength tool to mine clinical text," said Prof. Guergana Savova, ASF Member and Apache cTAKES Project Management Committee member, and Principal Investigator of the Natural Language Processing Lab at the Computational Health Informatics Program, Boston Children’s Hospital and faculty at Harvard Medical School. "The world-wide community involvement is exactly what we envisioned when we started cTAKES back in 2006. We are grateful to the community for its many contributions and are greatly appreciative of the efforts of Sean Finan and James Masanz, members of the Apache cTAKES Project Management Committee for leading this milestone release."

"We are using Apache cTAKES v4 to link phenotypic and genomic/genetic data for the Boston Children’s Hospital Precision Link Biobank," said Kenneth D. Mandl, Director of the Computational Health Informatics Program at Boston Children’s Hospital.

"We are using cTAKES to help identify people with multiple sclerosis from the electronic health records and investigate disease trajectory and treatment response in this chronic neurological disorder," said Zongqi Xia, MD, PhD, an Assistant Professor of Neurology and Biomedical Informatics at University of Pittsburgh.

"We have been using cTAKES in the VA Radiology Reports to look for word tokens that correlate with lung, liver and other findings," said Dr. Joe Erdos, faculty at Yale School of Medicine and associated scientist at the Veterans Affairs (VA) in Connecticut.

"We have been frequent users of cTAKES since the 3.x days, and are excited by the cTAKES release," said Chris Mattmann, Principal Data Scientist in the Engineering & Science Directorate at NASA Jet Propulsion Laboratory, and member of the Apache cTAKES Project Management Committee. "Our Shangridocs tool that allows for interactive text extraction and analysis from science research papers in the bioinformatics/clinical domain is built around Apache cTAKES and Apache OpenNLP. We plan on upgrading ASAP to cTAKES 4.0 and contributing to the platform. cTAKES scalability is something we are very interested in - and in the ability to extend the existing UMLS taxonomy with custom medical metadata and information and cTAKES 4.0 (and beyond) is the perfect platform for growth in this area."

Availability and Oversight
Apache cTAKES software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache cTAKES, visit http://ctakes.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "cTAKES", "Apache cTAKES", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday February 08, 2017

The Apache Software Foundation Announces Apache® Ranger™ as a Top-Level Project

Big Data security management framework for the Apache Hadoop ecosystem in use at ING, Protegrity, and Sprint, among other organizations.

Forest Hill, MD —8 February 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Ranger™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

The latest addition to the ASF’s more than three dozen projects in Big Data, Apache Ranger is a centralized framework used to define, administer and manage security policies consistently across Apache Hadoop components. Ranger also offers the most comprehensive security coverage, with native support for numerous Apache projects, including Atlas (incubating), HBase, HDFS, Hive, Kafka, Knox, NiFi, Solr, Storm, and YARN. 

"Graduating to a Top-Level Project reflects the maturity and growth of the Ranger Community," said Selvamohan Neethiraj, Vice President of Apache Ranger. "We are pleased to celebrate a great milestone and officially play an integral role in the Apache Big Data ecosystem."

Apache Ranger provides a simple and effective way to set access control policies and audit the data access across the entire Hadoop stack by following industry best practices. One of the key benefits of Ranger is that access control policies can be managed by security administrators from a single place and consistently across hadoop ecosystem. Ranger also enables the community to add new systems for authorization even outside Hadoop ecosystem, with a robust plugin architecture, that can be extended with minimal effort. In addition, Apache Ranger provides many advanced features, such as:
  • Ranger Key Management Service (compatible with Hadoop’s native KMS API to store and manage encryption keys for HDFS Transparent Data Encryption);
  • Dynamic column masking and row filtering;
  • Dynamic policy conditions (such as prohibition of toxic joins);
  • User context enrichers (such as geo-location and time of day mappings); and
  • Classification or tag based policies for Hadoop ecosystem components via integration with Apache Atlas.

"As early adopters of Apache Ranger and having contributed to Apache Ranger, we have come to rely upon Apache Ranger as a key part of our security infrastructure for data," said Ferd Scheepers, Chief Information Architect at ING. "We are therefore pleased to learn that the project has now graduated to a TLP project through the efforts of the Apache community. We believe that Apache Ranger represents the best-in-class Open Source security framework for authorization, encryption management, and auditing across Hadoop ecosystem. We laud the community's efforts in building an extensible and enterprise grade architecture for Apache Ranger, and for innovative features such as tag or classification based security (built in conjunction with Apache Atlas). We congratulate the Apache Ranger community on achieving this significant milestone and are confident Apache Ranger will evolve into the de-facto standard for security stack across the Hadoop ecosystem."

"As heavy users of Apache Ranger in production, we are pleased to see the project become a TLP through validation across community efforts," said Timothy R. Connor, Big Data & Advanced Analytics Manager at Sprint. "Apache Ranger has built a next generation ABAC model for authorization along with a robust data-centric Open Source security framework supporting advanced security capabilities such as dynamic row filtering and column masking. All of these point to Apache Ranger maturing into a robust and comprehensive security product for authorization, encryption management and auditing through the Apache community."

"It's great to see Apache Ranger become a TLP," said Dominic Sartorio, Senior Vice President of Products & Development at Protegrity. "Apache Ranger's comprehensive auditing and broad authorization coverage across the Hadoop ecosystem, along with its highly scalable and extensible architecture and rich set of APIs, integrates very well with Protegrity's fine grained data protection capabilities. Our continued collaboration with the Apache Ranger community will help meet the data security requirements of the next generation of enterprise-grade production Hadoop deployments."

"As organizations entrust their enterprise data to Open Source data platforms such as Apache Hadoop, there is a critical need to use the most innovative techniques to safeguard this data," said Alan Gates, Co-Founder of HortonWorks and Apache Ranger incubation mentor. "Apache Ranger community has taken the original, proprietary code base and used it to build a new and successful Apache project that employs an attribute-based approach to define and enforce authorization policies. This modern approach is a combination of subject, action, resource, and environment and goes beyond role-based access control techniques exclusively based on organizational roles - permissions mapping. It has been a pleasure to be their mentor in this process and help them learn the Apache way."

"More and more users are adopting Apache Ranger to secure data in the Hadoop ecosystem," added Neethiraj. "We look forward to welcoming new Ranger users to our mailing lists and community events."

Availability and Oversight
Apache Ranger software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Ranger, visit https://ranger.apache.org/ and @ApacheRanger.

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 5,900 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Ranger", "Apache Ranger", "HBase", "Apache HBase", "HDFS", "Apache HDFS", "Hive", "Apache Hive", "Kafka", "Apache Kafka", "Knox", "Apache Knox", "NiFi", "Apache NiFi", "Solr", "Apache Solr", "Storm", "Apache Storm", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #


Wednesday January 18, 2017

The ASF asks: Have you met Apache Ignite?

Since 1999, The Apache Software Foundation (ASF) has been recognized as a leading source for an array of Open Source software and tools that meet the demand for interoperable, adaptable, and sustainable solutions. The all-volunteer ASF develops, stewards, and incubates dozens of enterprise-grade Open Source projects that power mission-critical applications in financial services, aerospace, publishing, government, healthcare, research, infrastructure, and more. From Abdera to ZooKeeper, the demand for ASF's reliable, community-driven software continues to grow dramatically across many categories, including Cloud, IoT, Artificial Intelligence and Machine Learning, Mobile, and Big Data, where the Apache Hadoop ecosystem dominates the marketplace.

Did you know that numerous Fortune 500 enterprises depend on Apache Ignite's in-memory data platform to process large-scale data sets in real-time, at orders of magnitude faster than traditional technologies?

We are pleased to showcase Apache Ignite, the high-performance In-Memory Data Fabric that provides in-memory data caching, partitioning, processing, and querying components.

Quick peek: Apache Ignite is an integrated and distributed In-Memory Data Fabric for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies. It is designed to easily power both existing and new applications in a distributed, massively parallel architecture on affordable, industry-standard hardware.

Background: Originally created at GridGain as its flagship in-memory computing (IMC) platform, Ignite entered the Apache Incubator in September 2014 and graduated as an Apache Top-Level Project in August 2015.

Why Ignite: Apache Ignite addresses today's Fast Data and Big Data needs by providing a comprehensive in-memory data fabric, which includes a data grid with SQL and transactional capabilities, in-memory streaming, an in-memory file system, and more.

Heavily benchmarked, Ignite has been built from the ground up to linearly scale to hundreds of nodes with strong semantics for data locality and affinity data routing to reduce redundant data noise. Ignite data grid is lightning fast and is one of the fastest implementations of transactional or atomic data in distributed clusters today.

Unlike other Big Data processing solutions, Apache Ignite treats RAM as a primary storage facility (as opposed to being used exclusively for processing). As such, Ignite's memory-first approach is more efficient and faster: with improved system indexes, reduced data fetch time, and no delays in a stream content processing, among other benefits.

Additionally --and unique to Apache Ignite-- its SQL Grid eliminates the need for painful and challenging migration from relational database to in-memory data grid (IMDG), alleviating the need for developers to have to rewrite SQL based code to IMDG's native APIs. This means that developers can keep using existing applications and tools written for relational databases and based on SQL language with very little to no code modification. Ignite SQL Grid is horizontally scalable, fault tolerant, and SQL ANSI-99 compliant.

Using Apache Ignite, developers benefit from:
  • Data Grid --replicate or partition data in memory within the cluster;
  • SQL Grid --add in-memory distributed database capabilities;
  • Compute Grid --distribute computations across cluster nodes;
  • Service Grid -- implement fault-tolerant microservices based solutions;
  • Streaming & CEP --easily stream large volumes of data into Ignite processing them in real-time; and
  • Data Structures --distribute own data structure across the cluster.

To solve real-time business issues and meet application requirements for the highest performance and scale, Apache Ignite leverages and integrates a host of Apache projects including Spark, Hadoop, YARN, and Mesos.

Latest release: Apache Ignite v1.8 on 9 December 2016 under the Apache License v.2.0. More details can be found below and in the release notes.

What's under the hood: New in Apache Ignite v1.8:
  • SQL Grid now fully supports all DML commands including UPDATE, INSERT and DELETE queries. A full-fledged support of DML and SELECT statements allows to interact with Apache Ignite using standard SQL commands connecting via ODBC and JDBC drivers. This provides true cross-platform connectivity even from languages such as PHP and Ruby which are not natively supported by the project. 
  • Redis protocol implementation which enables users to store and retrieve distributed data from Apache Ignite cache using any Redis compatible client.
  • Ignite.NET provides .NET Entity Framework 2nd Level Cache solution that stores data in the distributed Ignite cache. This is ideal for scenarios with multiple application servers using a single SQL database via Entity Framework: cached queries are shared between all machines in the cluster.
  • Ignite.NET implements ASP.NET session caching provider that stores session data in the Ignite cache which distributes session state across multiple servers in order to provide high availability and fault tolerance.
  • Deadlock detection mechanism has been improved and now works for optimistic transaction and near caches.

Check out the Apache Ignite blog for articles, insight, how-tos, and additional resources at https://ignite.apache.org/blogs.html

For downloads, documentation, examples, use cases, and more information, visit http://ignite.apache.org/ .

# # #

Monday November 21, 2016

The Apache Software Foundation Announces Apache® Geode™ as a Top-Level Project

Open Source Big Data in-memory data grid used by hundreds of enterprises to power mission-critical low latency, high concurrency transactional applications at extreme scale.

Forest Hill, MD —21 November 2016— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Geode™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Geode is an Open Source in-memory data grid that provides transactional data management for scale-out applications needing low latency response times during high concurrent processing.

"Graduating as a Top-Level Project marks an important milestone for Apache Geode," said Mark Bretl, Vice President of Apache Geode. "Our community is proud to champion a diverse group of developers and users whose support has helped Geode reach a sustainable level of maturity."

The Geode codebase was originally developed by Gemstone Systems in 2002. GemFire, the original commercial distribution of Geode, was first widely adopted by the financial sector as the transactional, low-latency data engine used in Wall Street trading platforms. Pivotal®, which owns the GemFire technology, submitted the Geode code to the Apache Incubator in April 2015.

"We are excited to see Geode graduate from the Apache Incubator to a Top-Level Project. It's quite a feat to transform a mature commercial product into a widely adopted open source project," said Elisabeth Hendrickson, VP of Big Data R&D at Pivotal. "The committers in Geode have worked hard at building community and making the project accessible to newcomers, paving the way for developers everywhere to benefit from a proven in memory data grid technology."

Since entering the Apache Incubator, the project has had significant increases in the number of independent developers contributing to the code, as well as organizations incorporating Apache Geode in their deployments and solutions. Today, over 600 enterprises use the technology behind Apache Geode for high-scale business applications that must meet low latency and 24x7 availability requirements, such as financial risk analysis systems, high volume eCommerce Websites, and transportation & logistics management.

"zData has been deploying big solutions with the technology of Apache Geode well before it became open source software. We look forward to helping more of our customers enjoy the speed, reliability, and scale that Apache Geode brings to any application architecture."
-- Dillon Woods, CTO, zData Inc.

"Apache Geode is an important component of Capgemini's Business Data Lake and fast reacting business scale out analytics solutions. Capgemini congratulates the Apache Geode community on becoming a top level project in The Apache Software Foundation." 
-- Steve Jones, Global Vice President, Big Data, Capgemini

"Apache Apex provides direct support for Apache Geode. Geode helps Apex deployments by providing fast, fault-tolerant storage and query support for stream processing data. Data Torrent welcomes Apache Geode as a peer project of Apache Apex".
--Amol Kekre, CTO at Data Torrent

"Apache Geode is an important component of Ampool Active Data Store. It provides scale-out in-memory processing with transactional consistency. We've been enthusiastic users of Apache Geode since its beginning, and look forward to this next phase".
-- Milind Bhandarkar, CEO at Ampool

"Through the incubation process we have worked to create an open and collaborative community for developers and users to work together, and look forward to seeing new contributions, feedback, bug reports, and subscribers to the Geode email lists," added Bretl.

The Apache Geode project welcomes contributions and community participation through mailing lists, face-to-face MeetUps, Geode Clubhouse online, and other events such as the Apache: Big Data conference series.

Availability and Oversight
Apache Geode software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Geode, visit http://geode.apache.org/ and @ApacheGeode.

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 5,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Geode", "Apache Geode", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

Wednesday July 27, 2016

Apache Software Foundation Announces Apache® Twill™ as a Top-Level Project

Open Source abstraction layer over Apache Hadoop® YARN simplifies developing distributed Hadoop applications.

Forest Hill, MD –27 July 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Twill™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed Hadoop applications, allowing developers to focus more on their application logic.

"The Twill community is excited to graduate from the Apache Incubator to a Top-Level Project," said Terence Yim, Vice President of Apache Twill and Software Engineer at Cask. "We are proud of the innovation, creativity and simplicity Twill demonstrates. We are also very excited to bring a technology so versatile in Hadoop into the hands of every developer in the industry."

Apache Twill provides rich built-in features for common distributed applications for development, deployment, and management, greatly easing Hadoop cluster operation and administration.

"Enterprises use big data technologies - and specifically Hadoop - to drive more value," said Patrick Hunt, member of the Apache Software Foundation and Senior Software Engineer at Cloudera. "Apache Twill helps streamline and reduce complexity of developing distributed applications and its graduation to an Apache Top-Level Project means more people will be able to take advantage of Apache Hadoop YARN more easily."

"This is an exciting and major milestone for Apache Twill," said Keith Turner, member of the Apache Fluo (incubating) Project Management Committee, which used Twill in the development of Fluo, an Open Source project that makes it possible to update the results of a large-scale computation, index, or analytic as new data is discovered. "Early in development, we knew we needed a standard way to launch Fluo across a cluster, and we found Twill. With Twill, we quickly and easily had Fluo running across many nodes on a cluster." 

Apache Twill is in production by several organizations across various industries, easing distributed Hadoop application development and deployment.

Twill originated at Cask in early 2013. After 7 major releases, the project was submitted to the Apache Incubator in November of 2013.

"Apache Twill has come a long way through The Apache Software Foundation, and we're thrilled it has become an ASF Top-Level Project," said Nitin Motgi, CTO of Cask. "Apache Twill has become a key component behind the Cask Data Application Platform (CDAP), using YARN containers and Java threads as the processing abstraction. CDAP is an Open Source integration and application platform that makes it easy for developers and organizations to quickly build, deploy and manage data applications on Apache Hadoop and Apache Spark."

"The Apache Twill community worked extremely well within the incubator environment, developing and collaborating openly to follow The Apache Way," said Henry Saputra, ASF Member and member of the Apache Twill Project Management Committee. "There is a tremendous demand for effective APIs and virtualization for developing big data applications and Apache Twill fills that need perfectly. We’re looking forward to continuing the journey with Apache Twill as a Top-Level Project."

Catch Apache Twill in action at:
  • JavaOne, 18-22 September 2016 in San Francisco
  • Strata+Hadoop World, 27-29 September 2016 in New York City
Availability and Oversight
Apache Twill software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Twill, visit http://twill.apache.org/ and follow @ApacheTwill

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

©The Apache Software Foundation. "Apache", "Twill", "Apache Twill", "Hadoop", "Apache Hadoop", "Apache Hadoop YARN", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday July 26, 2016

The Apache Software Foundation Announces Apache® Kudu™ as a Top-Level Project

Open Source columnar storage engine enables fast analytics across the Internet of Things, time series, cybersecurity, and other Big Data applications in the Apache Hadoop ecosystem

Forest Hill, MD –25 July 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Kudu™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Kudu is an Open Source columnar storage engine built for the Apache Hadoop ecosystem designed to enable flexible, high-performance analytic pipelines.

"Under the Apache Incubator, the Kudu community has grown to more than 45 developers and hundreds of users," said Todd Lipcon, Vice President of Apache Kudu and Software Engineer at Cloudera. "Recognizing the strong Open Source community is a testament to the power of collaboration and the upcoming 1.0 release promises to give users an even better storage layer that complements Apache HBase and HDFS."

Optimized for lightning-fast scans, Kudu is particularly well suited to hosting time-series data and various types of operational data. In addition to its impressive scan speed, Kudu supports many operations available in traditional databases, including real-time insert, update, and delete operations. Kudu enables a "bring your own SQL" philosophy, and supports being accessed by multiple different query engines including such other Apache projects as Drill, Spark, and Impala (incubating).

Apache Kudu is in use at diverse companies and organizations across many industries, including retail, online service delivery, risk management, and digital advertising.

"Using Apache Kudu alongside interactive SQL tools like Apache Impala (incubating) has allowed us to deploy a next-generation platform for real-time analytics and online reporting," said Baoqiu Cui, Chief Architect at Xiaomi. "Apache Kudu has been deployed in production at Xiaomi for more than six months and has enabled us to improve key reliability and performance metrics for our customers. Kudu's graduation to a Top-Level Project allows companies like ours to operate a hybrid architecture without complexity. We look forward to continuing to contribute to its success."

"We are already seeing the many benefits of Apache Kudu. In fact we're using its combination of fast scans and fast updates for upcoming releases of our risk solutions," said Cory Isaacson, CTO at Risk Management Solutions, Inc. "Kudu is performing well, and RMS is proud to have contributed to the project’s integration with Apache Spark."

"The Internet of Things, cybersecurity and other fast data drivers highlight the demands that real-time analytics place on Big Data platforms," said Arvind Prabhakar, Apache Software Foundation member and CTO of StreamSets. "Apache Kudu fills a key architectural gap by providing an elegant solution spanning both traditional analytics and fast data access. StreamSets provides native support for Apache Kudu to help build real-time ingestion and analytics for our users."

"Graduation to a Top-Level Project marks an important milestone in the Apache Kudu community, but we are really just beginning to achieve our vision of a hybrid storage engine for analytics and real-time processing," added Lipcon. "As our community continues to grow, we welcome feedback, use cases, bug reports, patch submissions, documentation, new integrations, and all other contributions."

The Apache Kudu project welcomes contributions and community participation through mailing lists, a Slack channel, face-to-face MeetUps, and other events. Catch Apache Kudu in action at Strata + Hadoop World, 26-29 September 2016 in New York. 

Availability and Oversight
Apache Kudu software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Kudu, visit http://kudu.apache.org/ , @ApacheKudu, and http://kudu.apache.org/blog/.

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Kudu", "Apache Kudu", "Drill", "Apache Drill", "Hadoop", "Apache Hadoop", "Apache Impala (incubating)", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday June 29, 2016

The Apache Software Foundation Announces Apache® OODT™ v1.0

Open Source Big Data middleware metadata framework in use at Children's Hospital Los Angeles Virtual Pediatric Intensive Care Unit, DARPA MEMEX and XDATA, NASA Jet Propulsion Laboratory, and the National Cancer Institute, among others.

Forest Hill, MD —29 June 2016— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® OODT™ v1.0, the Big Data middleware metadata framework.

OODT is a grid middleware framework for science data processing, information integration, and retrieval. As "middleware for metadata" (and vice versa), OODT is used for computer processing workflow, hardware and file management, information integration, and linking databases. The OODT architecture allows distributed computing and data resources to be searchable and utilized by any end user.

"Apache OODT 1.0 is a great milestone in this project," said Tom Barber, Vice President of Apache OODT. "Effectively managing data pools has historically been problematic for some users, and OODT addresses a number of the issues faced. v1.0 allows us to prepare for some big changes within the platform with new UI designs for user-facing apps and data flow processing under the hood. It's an exciting time in the data management sector and we believe Apache OODT can be at the forefront of it."

OODT 1.0 signals a stage in the project where the initial scope of the platform is feature- complete and ready for general consumption. v1.0 features include:
  • Data ingestion and processing;
  • Automatic data discovery and metadata extraction;
  • Metadata management;
  • Workflow processing and support; and
  • Resource management

Originally created at NASA Jet Propulsion Laboratory in 1998 as a way to build a national framework for data sharing, OODT has been instrumental to the National Cancer Institute's Early Detection Research Network for managing distributed scientific data sets across 20+ institutions nationwide for more than a decade.

Apache OODT is in use in many scientific data system projects in Earth science, planetary science, and astronomy at NASA, such as the Lunar Mapping and Modeling Project (LMMP), NPOESS Preparatory Project (NPP) Sounder PEATE Testbed, the Orbiting Carbon Observatory-2 (OCO-2) project, and the Soil Moisture Active Passive mission testbed. In addition, OODT is used for large-scale data management and data preparation tasks in the DARPA MEMEX and XDATA efforts, and for supporting research and data analysis within the pediatric intensive care domain in collaboration with Children's Hospital Los Angeles (CHLA) and its Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit (VPICU), among many other applications.

"To watch Apache OODT grow from an internal NASA project to 1.0 where it is today and dozens of releases is an amazing feat. I truly believe having it at the ASF has allowed it to grow and prosper. We are doubling down on our commitment to Apache OODT, investing in its enhancement and use in several national-scale projects," said Chris Mattmann, member of the Apache OODT Project Management Committee, and Chief Architect, Instrument and Science Data Systems Section at NASA JPL. "Apache OODT processes some of the world's biggest data sets, distributes and manages them, and makes sure science happens in a timely and accurate fashion."

OODT entered the Apache Incubator in January 2010, and graduated as a Top-level Project in November 2010. 

Catch Apache OODT in action at ApacheCon Europe, 14-18 November 2016 in Seville, Spain http://apachecon.com/ .

Availability and Oversight
Apache OODT software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache OODT, visit http://oodt.apache.org/ and https://twitter.com/apache_oodt

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "OODT", "Apache OODT", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

The Apache Software Foundation Announces Apache® Bahir™ as a Top-Level Project

Bolsters Big Data processing by providing extensions to distributed analytic platforms such as Apache Spark.

Forest Hill, MD –29 June 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Bahir™ has become a Top-Level Project (TLP).

Apache Bahir bolsters Big Data processing by serving as a home for existing connectors that initiated under Apache Spark, as well as provide additional extensions/plugins for other related distributed system, storage, and query execution systems.

"Apache Bahir is a new community that aims to be a place to curate extensions related to distributed analytic platforms following the Apache Governance," said Luciano Resende, Vice President of Apache Bahir and an Architect at IBM contributing to The Apache Software Foundation for over 10 years. "The project is initially offering a few Apache Spark extensions but it is definitely open for expanding to other platforms such as Apache Beam, Apache Flink and others."

Bahir code is extracted from the Apache Spark project, and has spun out as a standalone project to provide implementations for different Spark related extensions/plugins, connectors, and other pluggable components. Current extensions include:
  •  streaming-akka (akka:Open Source toolkit and runtime simplifying the construction of concurrent and distributed applications on the Java Virtual Machine)
  •  streaming-mqtt (mqtt: lightweight messaging protocol for small sensors and mobile devices, optimized for high-latency or unreliable networks)
  •  streaming-twitter (Twitter: online social networking service; Bahir allows the processing of social data from Twitter)
  •  streaming-zeromq (zeromq: a high-performance asynchronous messaging library, aimed at use in distributed or concurrent applications)

In addition, Apache Bahir has a strong relationship with different storage layers; the project intends to extend that relationship to a number of other ASF projects and Apache-licensed initiatives.

"We are very interested in streaming-mqtt for remote sensing applications and control/monitoring. We have a lot of Big Data needs in Earth science especially in remote and difficult to access environments and plugins such as streaming-mqtt from Bahir provide a readily accessible and Apache-based solution to that," said Chris Mattmann, member of the Apache Bahir Project Management Committee, and Chief Architect, Instrument and Science Data Systems Section at NASA Jet Propulsion Laboratory.

"We are very motivated to increase the size and diversity of the Apache Bahir community," added Resende. "We welcome feedback, use cases, bug reports, patch submissions, code contributions, documentation, new extension proposals, and other ways to participate."

Availability and Oversight
Apache Bahir software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Bahir, visit http://bahir.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Bahir", "Apache Bahir", "Spark", "Apache Spark" and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday May 25, 2016

The Apache Software Foundation Announces Apache® Zeppelin™ as a Top-Level Project

Open Source Big Data analytics and visualization tool for distributed, interactive, and collaborative systems using Apache Flink, Apache Hadoop, Apache Spark, and more.

Forest Hill, MD –25 May 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Zeppelin™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Zeppelin is a modern, web-based notebook that enables interactive data analytics. Notebooks help developers, data scientists, and related users to handle data efficiently without worrying about command lines and cluster details.

"The Zeppelin community is pleased to graduate from the Apache Incubator," said Lee Moon Soo, Vice President of Apache Zeppelin. "With 118 worldwide contributors and widespread adoption in numerous commercial products, we are proud to officially be a part of the Apache Big Data ecosystem."

Zeppelin's collaborative data analytics and visualization capabilities makes data exploration, visualization, sharing, and collaboration easy over distributed, general-purpose data processing systems that use Apache Flink, Apache Hadoop, and Apache Spark, among other Big Data platforms.

Apache Zeppelin is:
  • Multi-purpose --features data ingestion, exploration, analysis, visualization, and collaboration;
  • Robust --supports 20+ more backend systems, including Apache Spark, Apache Flink, Apache Hive, Python, R, and any JDBC (Java Database Connectivity);
  • Easy to deploy --built on top of modern Web technologies (provides built-in Apache Spark integration, eliminating the need to build a separate module, plugin, or library);
  • Easy to use --with built-in visualizations and dynamic forms;
  • Flexible --allows users to mix different languages, exchange data between backends, adjust the layout;
  • Extensible --with pluggable architecture for interpreters, notebook storages, authentication, and visualizations (in progress); and
  • Advanced --allows interaction between custom visualizations and cluster resources

"With Apache Zeppelin, a wide range of users can make beautiful data-driven, interactive, and collaborative documents with SQL, Scala, and more," added Soo.

Apache Zeppelin is in use at an array of organizations and solutions, including Amazon Web Services, Hortonworks, JuJu, and Twitter, among others. 

"Congratulations to Apache Zeppelin community on graduation," said Tim Hall, Vice President of Product Management at Hortonworks. "Several members of our team have been working over the past year in the Zeppelin community 
to make it enterprise ready. We are excited to be associated with this community and look forward to helping our customers get the best insights out of their data with Apache Zeppelin."

"Apache Zeppelin is becoming an important tool at Twitter for creating and sharing interactive data analytics and visualizations," said Prasad Wagle, Technical Lead in the Data Platform team at Twitter. "Since it integrates seamlessly with all the popular data analytics engines, it is very easy to create and share reports and dashboards. With its extensible architecture and a vibrant Open Source community, I am looking forward to Apache Zeppelin advancing the state of the art in data analytics and visualization."

"Apache Zeppelin is the major user-facing piece of Memcore’s in-memory data processing Cloud offering. Building a technology stack might be quite exciting engineering challenge, however, if users can’t visualize and work with the data conveniently, it is as good as not having the data at all. Apache Zeppelin enables efficient user acquisition by anyone trying to build new products or service offerings in the Big- and Fast- Data markets, making innovations, collaboration, and development easier for anyone," said Dr. Konstantin Boudnik, Founder and CEO of Memcore.io. "I am very excited to see Apache Zeppelin graduating as an ASF Top Level Project. This shows that more people are joining the community, bringing the project to a new level, and adding more integration points with existing data analytics and transactional software systems. This directly benefits the community at-large."

Apache Zeppelin originated in 2013 at NFLabs as Peloton, a commercial data analytics product. Since entering the Apache Incubator in December 2014, the project has had three releases, and twice participated in Google Summer of Code under the Apache umbrella.

"It was an honor to help with the incubation of Zeppelin," said Ted Dunning, Vice President of the Apache Incubator. "I have been very impressed with the Zeppelin community and the software they have built. I see Apache Zeppelin being adopted all over the place where people need to apply a notebook style to a wide variety of kinds of computing."

Catch Apache Zeppelin in action during Berlin Buzzwords, 7 June 2016 https://s.apache.org/mV8E

Availability and Oversight
Apache Zeppelin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Zeppelin, visit http://zeppelin.apache.org/ and https://twitter.com/ApacheZeppelin

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Zeppelin", "Apache Zeppelin", "Ambari", "Apache Ambari", "Flink", "Apache Flink", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday May 23, 2016

The Apache Software Foundation Announces Apache® TinkerPop™ as a Top-Level Project

Powerful Open Source Big Data graph computing framework in use at Amazon, DataStax, and IBM, among others.

Forest Hill, MD –23 May 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® TinkerPop™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache TinkerPop is a graph computing framework that provides developers the tools required to build modern graph applications in any application domain and at any scale.

"Graph databases and mainstream interest in graph applications have seen tremendous growth in recent years," said Stephen Mallette, Vice President of Apache TinkerPop. "Since its inception in 2009, TinkerPop has been helping to promote that growth with its Open Source graph technology stack. We are excited to now do this same work as a top-level project within the Apache Software Foundation."

As a graph computing framework for both real-time, transactional graph databases (OLTP) and and batch analytic graph processors (OLAP), TinkerPop is useful for working with small graphs that fit within the confines of a single machine, as well as massive graphs that can only exist partitioned and distributed across a multi-machine compute cluster.

TinkerPop unifies these highly varied graph system models, giving developers less to learn, faster time to development, and less risk associated with both scaling their system and avoiding vendor lock-in.

The Power to Process One Trillion Edges
The central component to Apache TinkerPop is Gremlin, a graph traversal machine and language, which makes it possible to write complex queries (called traversals) that can execute either as real-time OLTP queries, analytic OLAP queries, or a hybrid of the two.

Because the Gremlin language is separate from the Gremlin machine, TinkerPop serves as a foundation for any query language to work against any TinkerPop-enabled system. Much like the Java virtual machine is host to Java, Groovy, Scala, Clojure, and the like, the Gremlin traversal machine is already host to Gremlin, SPARQL, SQL, and various host language embeddings in Python, JavaScript, etc. Once a language is compiled to a Gremlin traversal, the Gremlin machine can evaluate it against a graph database or processor. Instantly, languages such as SPARQL can execute across a one thousand node cluster for long running analytic jobs touching large parts of the graph or sub-second queries within a small neighborhood.

Apache TinkerPop is in use at organizations such as DataStax and IBM, among many others. Amazon.com is currently using TinkerPop and Gremlin to process its order fullfillment graph which contains approximately one trillion edges.

The core Apache TinkerPop release provides production-ready, reference implementations of a number of different data systems including Neo4j (OLTP), Apache Giraph (OLAP), Apache Spark (OLAP), and Apache Hadoop (OLAP). However, the bulk of the implementations are maintained within the larger TinkerPop ecosystem. These implementations include commercial and Open Source graph databases and processors, Gremlin language variants for various programming languages on and off the Java Virtual Machine, visualization applications for graph analysis and many other tools and libraries. The TinkerPop ecosystem is richly supported with many options for developers to choose from.

TinkerPop originated in 2009 at the Los Alamos National Laboratory. After two major releases (TinkerPop1 in 2011 and TinkerPop2 in 2012), the project was submitted to the Apache Incubator in January 2015.

"Following in a long line of Apache projects that revolutionized entire industries, starting with with the Apache HTTP Server, continuing with Web Services, search, and Big Data technologies, Apache TinkerPop will no doubt reshape the Graph Computing landscape," said Hadrian Zbarcea, co-Vice President of ASF Fundraising and Incubator Mentor of Apache TinkerPop. "While TinkerPop has just graduated as an ASF Top Level Project, it is already seven years old, a mature technology, backed by a number of vendors, a vibrant community, and absolutely brilliant developers."

The project welcomes those interested in contributing to Apache TinkerPop. For more information, visit http://tinkerpop.apache.org/docs/3.2.0-incubating/dev/developer/#_contributing

Availability and Oversight
Apache TinkerPop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache TinkerPop, visit http://tinkerpop.apache.org/ and https://twitter.com/apachetinkerpop

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "TinkerPop", "Apache TinkerPop", "Apache HTTP Server", "Giraph", "Apache Giraph", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark" and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation