Entries tagged [opensource]

Monday August 30, 2021

The Apache Drill Project Announces Apache® Drill(TM) v1.19 Milestone Release

Open Source, enterprise-grade, schema-free Big Data SQL query engine used by thousands of organizations, including Ant Group, Cisco, Ericsson, Intuit, MicroStrategy, Tableau, TIBCO, TransUnion, Twitter, and more.

Wilmington, DE —30 August 2021— The Apache Drill Project announced the release of Apache® DrillTM v1.19, the schema-free Big Data SQL query engine for Apache Hadoop®, NoSQL, and Cloud storage.

"Drill 1.19 is our biggest release ever," said Charles Givre, Vice President of Apache Drill. "With an already short learning curve, Drill 1.19 makes it even easier for users to quickly query, analyze, and visualize data from disparate sources and complex data sets.”

An "SQL-on-Hadoop" engine, Apache Drill is easy to deploy, highly performant, able to quickly process trillions of records, and scalable from a single laptop to a 1000-node cluster. With its schema-free JSON model (the first distributed SQL query engine of its kind), Drill is able to query complex semi-structured data in situ without requiring users to define schemas or transform data. It provides plug-and-play integration with existing Hive and HBase deployments, and is extensible out-of-the-box to access multiple data sources, such as S3 and Apache HDFS, HBase, and Hive. Additionally, Drill can directly query data from REST APIs to include platforms like SalesForce and ServiceNow. 

Drill supports the ANSI SQL 2003 standard syntax ecosystem as well as dozens of NoSQL databases and file systems, including Apache HBase, MongoDB, Elasticsearch, Cassandra, REST APIs, , HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, NAS,  local files, and more. Drill leverages familiar BI tools (such as Apache Superset, Tableau, MicroStrategy, QlikView and Excel) as well as data virtualization and visualization tools, and runs interactive queries on Hive tables with different Hive metastores.

Apache Drill v1.19
Drill is designed from the ground up to support high-performance analysis on rapidly evolving data on modern Big Data applications. v1.19 reflects more than 100 changes, improvements, and new features that include:

  • New Connectors for Apache Cassandra, Elasticsearch, and Splunk.

  • New Format Reader for XML without schemas

  • Added Avro support for Kafka plugin

  • Integrated password vault for secure credential storage

  • Support for Linux ARM64 systems

  • Added limit pushdowns for file systems, HTTP REST APIs and MongoDB

  • Added streaming for Drill's REST API

  • Integration with Apache Airflow


Developers, analysts, business users, and data scientists use Apache Drill for data exploration and analysis for its enterprise-grade reliability, security, and performance. Drill's flexibility and ease-of-use have attracted thousands of users that include Ant Group, Cardlytics, Cisco, Ericsson, Intuit, MicroStrategy, Qlik, Tableau, TIBCO, TransUnion, Twitter, National University of Singapore, and more.

"Individuals, businesses, and organizations of all types rely on Apache Drill's rich functionality," added Givre. "We invite everyone to participate in our user and developer lists as well as our Slack channel, and contribute to the project to build on our momentum and help improve the future experience for all Drill users."

Catch Apache Drill in action at ApacheCon@Home, taking place online 21-23 September 2021. For more information and to register, visit https://www.apachecon.com/ .

Availability and Oversight
Apache Drill software is released under the Apache License v2.0 and is overseen by a volunteer, self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases.

About Apache Drill
Apache Drill is the Open Source, schema-free Big Data SQL query engine for Apache Hadoop, NoSQL, and Cloud storage. For more information, including documentation and ways to become involved with Apache Drill, visit http://drill.apache.org/ , https://twitter.com/ApacheDrill , and https://apache-drill.slack.com/ .

© The Apache Software Foundation. "Apache", "Drill", "Apache Drill", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

#  #  #

Monday August 02, 2021

The Apache Software Foundation Announces Apache® Pinot™ as a Top-Level Project

Open Source distributed real-time Big Data analytics infrastructure in use at Amazon-Eero, Doordash, Factual/FourSquare, LinkedIn, Stripe, Uber, Walmart, Weibo, and WePay, among others.

Wilmington, DE —2 August 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Pinot™ as a Top-Level Project (TLP).

Apache Pinot is a distributed Big Data analytics infrastructure created to deliver scalable real-time analytics at high throughput with low latency. The project was first created at LinkedIn in 2013, open-sourced in 2015, and entered the Apache Incubator in October 2018.

"We are pleased to successfully adopt 'the Apache Way' and graduate from the Apache Incubator," said Kishore Gopalakrishna, Vice President and original co-creator of Apache Pinot. "Pinot initially pushed the boundaries of real-time analytics by delivering insights to millions of Linkedin users. Today, as an Apache Top-Level Project, Pinot is in the hands of developers across the globe who are building it to power several user-facing  analytical applications and unlock the value of data within their organizations."

Scalable to trillions of records, Apache Pinot’s online analytical processing (OLAP) ingests both online and offline data sources from Apache Kafka, Apache Spark, Apache Hadoop HDFS, flat files, and Cloud storages in real time. Pinot is able to ingest millions of events and serve thousands of queries per second, and provide unified analytics in a distributed, fault-tolerant fashion. Features include:

  • Speed —answers OLAP queries with low latency on real-time data

  • Pluggable indexing —Sorted, Inverted, Text Index, Geospatial Index, JSON Index, Range Index, Bloom filters

  • Smart Materialized Views - Fast Aggregations via star-tree index

  • Supports different stream systems with near real-time ingestion —with Apache Kafka, Confluent Kafka, and Amazon Kinesis, as well as customizable input format, with out-of the box support for Avro and JSON formats

  • Highly available, horizontally scalable, and fault tolerant

  • Supports lookup joins natively and full joins using PrestoDB/Trino

Apache Pinot is used to power internal and external analytics at Adbeat, Amazon-Eero, Cloud Kitchens, Confluera, Doordash, Factual/FourSquare, Guitar Center, LinkedIn, Publicis Sapient, Razorpay, Scale Unlimited, Startree, Stripe, Traceable, Uber, Walmart, Weibo, WePay, and more.

Examples of how Apache Pinot helps organizations across numerous verticals include: 1) a fintech company uses Pinot to achieve financial data visibility across 500+ terabytes of data and sustain half million queries per second with financial transactions; 2) a food delivery service leveraged Pinot in the midst of the COVID-19 pandemic to analyze real-time data to provide a socially-distanced pick-up experience for its riders and restaurants; and 3) a large retail chain with geographically distributed franchises and stores uses Pinot for revenue-generating opportunities by analyzing real-time data for internal use cases, as well as real-time cart analysis to increase sales.

"We rely on Apache Pinot for all our real-time analytics needs at LinkedIn," said Kapil Surlaker, Vice President of Engineering at LinkedIn. "It's battle-tested at LinkedIn scale for hundreds of our low-latency analytics applications. We believe Apache Pinot is the best tool out there to build site-facing analytics applications and we will continue to contribute heavily and collaborate with the Apache Pinot community. We are very happy to see that it's now a Top-level Apache project."

"We use Apache Pinot in our real-time analytics platform to power external user-facing applications and critical operational dashboards," said Ujwala Tulshigiri, Engineering Manager at Uber. "With Pinot's multi-tenancy support and horizontal scalability, we have scaled to hundreds of use cases that run complex aggregations queries on terabytes of data at millisecond latencies, with the minimal overhead of cluster management."

"We've been using Apache Pinot since last year, and it's been a huge win for our client’s dashboard project," said Ken Krugler, President of Scale Unlimited. "Pinot's ability to rapidly generate aggregation results over billions of records, with modest hardware requirements, was critical for the success of the project. We've also been able to provide patches to add functionality and fix issues, which the Pinot community has quickly integrated and released. There was never any doubt in our minds that Pinot would graduate from the Apache incubator and become a successful top-level project."

"Last year, we started without analytics built into our product," said Pradeep Gopanapalli, technical staff member at Confluera. "By the end of the year, we were using Apache Pinot for real-time analytics in production. Not many of our competitors can even dream of having such results. We are very happy with our choice."

"Pinot is critical to our real-time analytics platform and allowed us to scale without degrading latency," said software engineer Elon Azoulay. "Pinot enables us to onboard large datasets effortlessly, run complex queries which return in milliseconds and is super reliable. We would like to emphasize how helpful and engaged the community is and are certain that we made the right choice with Pinot, it continues to impress us and satisfy our real-time analytics needs."

"We created Pinot at LinkedIn with the goal of tackling the low-latency OLAP problem for site-facing use cases at scale. We evolved it to solve numerous OLAP use cases, and open-sourced it because there aren't many technologies in that domain," said Subbu Subramaniam, member of the Apache Pinot Project Management Committee, and Senior Staff Engineer at LinkedIn. "It is heart-warming to see such a wide adoption and great contributions from the community in improving Pinot over time."

"We are at the beginning of this transformation and we cannot wait to see every software company build real-time applications using Apache Pinot," added Gopalakrishna. "We welcome everyone to join our community Slack channel and contribute to the project."

Catch Apache Pinot in action at ApacheCon Asia online on 7 August 2021. For more information and to register, visit https://www.apachecon.com/acasia2021/

Availability and Oversight
Apache Pinot software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Pinot, visit http://pinot.apache.org/ and https://twitter.com/ApachePinot

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $22B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 850+ individual Members and 200 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 8,200+ Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors that include Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Huawei, IBM, Indeed, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Talend, Tencent, Target, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Pinot", "Apache Pinot", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday May 04, 2021

Media Alert: Apache OpenOffice Recommends upgrade to v4.1.10 to mitigate legacy vulnerability

Wilmington, DE —4 May 2021— 


Who:
Apache OpenOffice, an Open Source office-document productivity suite comprising six productivity applications: Writer, Calc, Impress, Draw, Math, and Base. The OpenOffice suite is based around the OpenDocument Format (ODF), supports 41 languages, and ships for Windows, macOS, Linux 64-bit, and Linux 32-bit. Apache OpenOffice delivers up to 2.4 Million downloads each month.

What: A recently reported vulnerability states that all versions of OpenOffice through 4.1.9 can open non-http(s) hyperlinks, and could lead to untrusted code execution. 

The Apache OpenOffice Project has filed a Common Vulnerabilities and Exposures report with MITRE Corporation’s national vulnerability reporting system:

> CVE-2021-30245: Code execution in Apache OpenOffice via non-http(s) schemes in Hyperlinks
>
> Severity: moderate
>
>Credit: Fabian Bräunlein and Lukas Euler of Positive Security https://positive.security/blog/url-open-rce#open-libreoffice


The complete CVE report is available at https://www.openoffice.org/security/cves/CVE-2021-30245.html

How: Applications of the OpenOffice suite handle non-http(s) hyperlinks in an insecure way, allowing for 1-click code execution on Windows and Xubuntu systems via malicious executable files hosted on Internet-accessible file shares.

Why: The mitigation in Apache OpenOffice 4.1.10 assures that a security warning is displayed to give users the option of continuing to open the hyperlink. Best practice dictates to be careful when opening documents from unknown and unverified sources. 

When: The vulnerability predates OpenOffice entering the Apache Incubator. During the analysis of this issue, it was discovered that an incorrect bug fix was made by the StarOffice/OpenOffice.org developers preparing OpenOffice 2.0 in 2005, whilst under the auspices of Sun Microsystems. 


Where: Download Apache OpenOffice v4.1.10 at https://www.openoffice.org/download/

Apache OpenOffice Highlights

24 October 2020 — 300 million downloads of Apache OpenOffice
14 October 2020 — 20 year anniversary of OpenOffice
18 October 2016 — 200 million downloads of Apache OpenOffice
17 April 2014 — 100 million downloads of Apache OpenOffice
17 October 2012 — OpenOffice graduated as an Apache Top Level Project (TLP)
13 June 2011 — OpenOffice.org entered the Apache Incubator

[downloads are binary installation files]

For more information, visit https://openoffice.apache.org/ and https://twitter.com/ApacheOO

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 850+ individual Members and 200 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with more than 8,100 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "OpenOffice", "Apache OpenOffice", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday March 24, 2021

The Apache® Software Foundation Celebrates 22 Years of Open Source Innovation "The Apache Way"

World's largest Open Source foundation provides $22B+ in community-led software 100% free of charge for the common good

Wilmington, DE —24 March 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today its 22nd Anniversary.

Originally established by the 21-member Apache Group, who oversaw the then-3-year-old Apache HTTP Server, the ASF today is the world's largest, vendor-neutral, Open Source foundation, comprising 800+ individual Members, 8,100+ Committers, and 40,000+ code contributors located on every continent. Conservatively valued at more than $22B, Apache’s 350+ projects and 37 incubating podlings are all freely-available to the public-at-large, at 100% no cost, and with no licensing fees.

"Over the past 22 years the ASF has evolved to meet the growing needs of the greater community," said Sander Striker, Board Chair of The Apache Software Foundation. "The ASF enables people from all over the world to collaborate, develop, and shepherd the projects and communities that are helping individuals, sustaining businesses, and transforming industries."

Advancing its mission of providing software for the public good, the ASF's projects are integral to nearly every aspect of modern computing, benefitting billions worldwide. The "Apache Way" process of community-led, collaborative development has led to breakthrough innovations in Artificial Intelligence and Deep Learning, Big Data, Build Management, Cloud Computing, Content Delivery and Management, Edge Computing and IoT, Fintech, Identity Management, Integration, Libraries, Messaging, Mobile, Search, Security, Servers, and Web Frameworks, among other categories. Projects undergoing development in the Apache Incubator span AI, Big Data, blockchain, Cloud computing, cryptography, deep learning, email, IoT, machine learning, microservices, mobile, operating systems, testing, visualization, and more.

Nearly half a million people participate in ASF projects and initiatives, including ApacheCon, the ASF's official global conference series; Community Development, which oversees contributor onboarding and mentoring and programs such as Google Summer of Code; and Diversity & Inclusion, whose programs promote diversity, equity, and inclusion across the greater Apache community.

The ASF's influence is everywhere —countless ubiquitous and mission-critical applications across dozens of industries are powered by Apache projects; the Apache License 2.0 was the top-ranked Open Source license in 2020 (source: WhiteSource); the Apache Way is the backbone for open development and inner source environments; and new users, developers, and enthusiasts are onboarding to the greater Apache community every day (the ASF has been a Google Summer of Code mentoring organization for the past 16 years, since the program's inception). The ASF is the top-ranked Open Source not-for-profit organization with the most stars on GitHub (source: GitHub).

A just-released feature on the ASF in FOSSlife [1] states, "The Apache project has undeniably changed the world … Apache remains a crucial Web server, the most popular in the field. For building Open Source communities, the lessons learned by creating the project still resonate throughout the open source world. Every project is advised to respect the Apache value of 'community over code'."

ASF operations bolster Apache projects and their communities with infrastructure support, bandwidth, connectivity, servers, hardware, development environments, legal counsel, accounting services, trademark protection, marketing and publicity, educational events, and related administrative assistance. As a United States private 501(c)(3) not-for-profit charitable organization, the ASF's day-to-day operating expenses are offset through tax-deductible sponsorships, corporate contributions, and individual donations. Current ASF Sponsors are:

Platinum: Amazon Web Services, Facebook, Google, Huawei, Microsoft, Namebase, Pineapple Fund, Tencent, and Verizon Media.

Gold: Anonymous, Baidu, Bloomberg, Cloudera, Confluent, IBM, Indeed, Reprise Software, Union Investment, and Workday.

Silver: Aetna, Alibaba Cloud Computing, Capital One, Comcast, Didi Chuxing, Red Hat, and Target.

Bronze: Bestecasinobonussen.nl, Bookmakers, Casino2k, Cerner, Curity, GridGain, Gundry MD, Host Advice, HotWax Systems, Journal Review, LeoVegas Indian Online Casino, Miro-Kredit AG, Mutuo Kredit AG, Online Holland Casino, ProPrivacy, PureVPN, RX-M, RenaissanceRe, SCAMS.info, SevenJackpots.com, Start a Blog by Ryan Robinson, Talend, The Best VPN, The Blog Starter, The Economic Secretariat, Top10VPN, Twitter, and Writers Per Hour.

Targeted Platinum: Amazon Web Services, CloudBees, DLA Piper, Fastly, JetBrains, Leaseweb, Microsoft, OSU Open Source Labs, Sonatype, and Verizon Media.

Targeted Gold: Atlassian, Datadog, Docker, PhoenixNAP, and Quenda.

Targeted Silver: HotWax Systems, Manning Publications, and Rackspace.

Targeted Bronze: Bintray, Education Networks of America, Friend of Apache Cordova, Google, Hopsie, No-IP, PagerDuty, Peregrine Computer Consultants Corporation, Sonic.net, SURFnet, and Virtru.

"Baidu has always maintained close cooperation with Apache Software Foundation. In the past, we donated Apache ECharts, Apache Doris, Apache brpc, and Apache Teaclave. We are very grateful to Apache way for promoting the growth of these projects and enabling Baidu to make greater contributions to the open source world together with ASF."
—Zhenyu Hou, Corporate Vice President of Baidu Group

"Congratulations to the Apache Software Foundation on its twenty-second anniversary! If it were not for ASF's work to incubate and steward open source projects, the internet community would not be thriving to the same degree. Open source is enabling our digital prosperity, and the ASF plays a key, behind-the-scenes role in this. We share their vision for the availability of trustworthy open-source software and are proud to be a sponsor."
—Travis Spencer, CEO of Curity

"Congratulations to the 22nd anniversary of the Apache Software Foundation! Didi Chuxing is more than honored to join the Apache family as a corporate sponsor this year. At Didi, our developers utilize and contribute to many Apache projects such as Hadoop, Kylin, and Flink etc. Sharing the same “Community Over Code” principle, we hope to drive more innovations with Apache and we look forward to further collaborations!"
—Yunbo Wang, Director of Technical Community and Open Source at Didi Chuxing

"Facebook was originally built on a stack using the Apache HTTP Server, and it's one of the many reasons we've been sponsoring, advocating, utilizing, and contributing to the ASF for the past 10 years. We're proud to be a part of the ASF community and look forward to continued support of its mission to provide Open Source software for the public good."
—Joel Marcey, Open Source Developer Advocate and Ecosystem Lead at Facebook

"We are honored to be a part of and proud to support the ASF! The Apache community continues to be an incredibly valuable resource for HotWax. Contributing to and receiving from the ASF remains a central focal point for our business, and an important part of our team philosophy."
—Mike Bates, CEO of HotWax Systems

"It is an honor to support Apache, an organization responsible for such an astounding amount of Open Source projects that truly make up the fabric of the Internet. Here's to all that's been accomplished in the last 22 years – we can't wait to see what the future of open development brings."
—Robert van der Meulen, Global Product Strategy Lead at Leaseweb

"We're extending a big congratulations to the Apache Software Foundation on their 22nd anniversary! The ASF has been a key driver for the success of open source software models and community-led development for over two decades. Microsoft is honored to engage with and contribute to the Apache community across many facets of our business including Azure big data, Hadoop and Spark – and we look forward to continuing the collaboration."
—Stormy Peters, Director of Open Source Programs Office at Microsoft

"Congratulations to the Apache Software Foundation on its 22nd anniversary! Tencent has been a user and contributor to the projects at ASF. Many developers from Tencent have been actively involved with the ASF projects as Chair or PMC. We look forward to continuing our collaboration and creating more open-source innovations with 'The Apache Way'."
—Mark Shan, Chair of Tencent Open Source Alliance


[1] FOSSlife "How the Apache Project Boosted the Free and Open Source Software Movements" https://www.fosslife.org/how-apache-project-boosted-free-and-open-source-software-movements

Additional ASF Resources

 - "Trillions and Trillions Served" documentary on the ASF https://s.apache.org/Trillions-Feature

 - About The Apache Way http://apache.org/theapacheway/

 - The Apache Way to Sustainable Open Source Success https://s.apache.org/GhnI

 - FY2020 Annual Report https://s.apache.org/FY2020AnnualReport

 - Ways to support the ASF http://apache.org/foundation/contributing.html


About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world's largest Open Source foundation, stewarding 227M+ lines of code and providing more than $22B+ worth of software to the public at 100% no cost. The ASF's all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,100 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Huawei, IBM, Indeed, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Apache HTTP Server", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday March 04, 2021

The Apache Software Foundation Announces Apache® Daffodil™ as a Top-Level Project

Open Source universal data interchange implementation of the Data Format Description Language (DFDL) standard in use at DARPA, GE Research, Naval Postgraduate School, Owl Cyber Defense, Perspecta Labs, and Raytheon BBN Technologies, among others.

Wilmington, DE —4 March 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Daffodil™ as a Top-Level Project (TLP).

Apache Daffodil is an Open Source implementation of the Data Format Description Language 1.0 specification (DFDL; the Open Grid Forum open standard framework for describing the attributes of any data format [1]) to enable universal data interchange. The project was first created at the University of Illinois National Center for Supercomputing Applications (NCSA) in 2009, and entered the Apache Incubator in August 2017.

"We're extremely excited that Apache Daffodil has achieved this important milestone in its development. The Daffodil DFDL implementation is a game changer in complex text and binary data interfaces and creates massive opportunities for organizations to easily implement highly sophisticated processes like data decomposition, inspection, and reassembly," said Michael Beckerle, Vice President of Apache Daffodil. "Instead of spending a lot of time worrying about how to deal with so many kinds of data that you need to take in, from day one you can convert all sorts of data into XML, or JSON, or your preferred data structure, and convert back if you need to write data out in its original format."

Apache Daffodil is particularly useful in large-scale organizations, such as governments and large corporations, where massive amounts of complex and legacy data must be exchanged and made accessible every day. Daffodil is also particularly useful in cybersecurity, where data must be inspected for correctness and sanitized.

Apache Daffodil is in use at major global organizations that include DARPA, GE Research, Naval Postgraduate School, Owl Cyber Defense, Perspecta Labs, and Raytheon BBN Technologies, among others.

"We are using Daffodil to translate DFDL schema specifications into code for our Monitoring & INspection Device (MIND) as part of our work on DARPA’s Guaranteed Architecture for Physical Security (GAPS) program," said said Bill Smith, Principal Engineer at GE Research. "One of our engineers has joined the Apache Daffodil Project Management Committee and is building out the new DFDL-to-C backend on a dedicated Daffodil development branch. We are now translating DFDL schemas provided by other DARPA GAPS performers to C code suitable for the small resource-constrained controllers in our MIND device. When complete, Daffodil's DFDL-to-C backend will give us the ability to annotate DFDL schemas with security policies and rapidly reconfigure our MIND device for different mission security profiles."

"Apache Daffodil is an important asset to our cross domain solutions technology stack, allowing Owl to support our customers by extending our filtering capabilities to new data types faster and with less risk," said Ken Walker, CTO at Owl Cyber Defense. "It's directly in line with our company priorities, as supporters of the Open Source community, and highly beneficial to our product lines to have this high-quality Open Source implementation of DFDL to support challenging, sometimes proprietary data formats, such as Link16, VMF, USMTF, OSIsoft PI System, and JANAP-128, without the need to develop additional software. DFDL enables our Raise-the-Bar compliant cross domain solutions to support new data types without additional rounds of lengthy lab-based testing and recertification."

"The DFDL open spec and the Apache Daffodil implementation have helped us tremendously in parsing and transforming fixed-format data in a variety of different R&D projects at BBN," said Michael Atighetchi, Lead Scientist at Raytheon BBN Technologies. "Sharing parsers through a vendor-neutral XML representation is a game changer that enables a significant speedup in developing, maturing, and transitioning advanced capabilities to help war fighters."

"Our research on applying Data Format Description Language (DFDL) is exploring how to unlock and archive a plethora of diverse data streams from unmanned systems," said Don Brutzman, Naval Postgraduate School. "Both the DFDL standard and the Apache Daffodil open-source implementation provide a big benefit for these potential capabilities. Continuing work at Naval Postgraduate School (NPS) Consortium for Robotics and Unmanned Systems Education and Research (CRUSER) hopes to make telemetry from field experimentation and simulation repeatably tractable for Big Data analytics."

"Graduation to a TLP recognizes that the Apache Daffodil project follows the rigorous software development practices that have made so many of ASF projects trusted and successful," added Beckerle. "With the increasing interest in Big Data, interoperability, and protection from malicious data, we welcome new contributors to help us further grow the Apache Daffodil community."

[1] Data Format Description Language (DFDL) v1.0 Specification https://www.ogf.org/documents/GFD.240.pdf

Availability and Oversight
Apache Daffodil software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Daffodil, visit https://daffodil.apache.org/ and https://twitter.com/ApacheDaffodil 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 200 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,100 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 

© The Apache Software Foundation. "Apache", "Daffodil", "Apache Daffodil", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday June 04, 2020

The Apache Software Foundation Announces Apache® Hudi™ as a Top-Level Project

Open Source data lake technology for stream processing on top of Apache Hadoop in use at Alibaba, Tencent, Uber, and more.

Wakefield, MA —4 June 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Hudi™ as a Top-Level Project (TLP).

Apache Hudi (Hadoop Upserts Deletes and Incrementals) data lake technology enables stream processing on top of Apache Hadoop compatible cloud stores & distributed file systems. The project was originally developed at Uber in 2016 (code-named and pronounced "Hoodie"), open-sourced in 2017, and submitted to the Apache Incubator in January 2019.

"Learning and growing the Apache way in the incubator was a rewarding experience," said Vinoth Chandar, Vice President of Apache Hudi. "As a community, we are humbled by how far we have advanced the project together, while at the same time, excited about the challenges ahead."

Apache Hudi is used to manage petabyte-scale data lakes using stream processing primitives like upserts and incremental change streams on Apache Hadoop Distributed File System (HDFS) or cloud stores. Hudi data lakes provide fresh data while being an order of magnitude efficient over traditional batch processing. Features include:

  • Upsert/Delete support with fast, pluggable indexing
  • Transactionally commit/rollback data
  • Change capture from Hudi tables for stream processing
  • Support for Apache Hive, Apache Spark, Apache Impala and Presto query engines
  • Built-in data ingestion tool supporting Apache Kafka, Apache Sqoop and other common data sources
  • Optimize query performance by managing file sizes, storage layout
  • Fast row based ingestion format with async compaction into columnar format
  • Timeline metadata for audit tracking

Apache Hudi is in use at organizations such as Alibaba Group, EMIS Health, Linknovate, Tathastu.AI, Tencent, and Uber, and is supported as part of Amazon EMR by Amazon Web Services. A partial list of those deploying Hudi is available at https://hudi.apache.org/docs/powered_by.html

"We are very pleased to see Apache Hudi graduate to an Apache Top-Level Project. Apache Hudi is supported in Amazon EMR release 5.28 and higher, and enables customers with data in Amazon S3 data lakes to perform record-level inserts, updates, and deletes for privacy regulations, change data capture (CDC), and simplified data pipeline development," said Rahul Pathak, General Manager, Analytics, AWS. “We look forward to working with our customers and the Apache Hudi community to help advance the project."

"At Uber, Hudi powers one of the largest transactional data lakes on the planet in near real time to provide meaningful experiences to users worldwide," said Nishith Agarwal, member of the Apache Hudi Project Management Committee. "With over 150 petabytes of data and more than 500 billion records ingested per day, Uber’s use cases range from business critical workflows to analytics and machine learning."

"Using Apache Hudi, end-users can handle either read-heavy or write-heavy use cases, and Hudi will manage the underlying data stored on HDFS/COS/CHDFS using Apache Parquet and Apache Avro," said Felix Zheng, Lead of Cloud Real-Time Computing Service Technology at Tencent.

"As cloud infrastructure becomes more sophisticated, data analysis and computing solutions gradually begin to build data lake platforms based on cloud object storage and computing resources," said Li Wei, Technical Lead on Data Lake Analytics, at Alibaba Cloud. "Apache Hudi is a very good incremental storage engine that helps users manage the data in the data lake in an open way and accelerate users' computing and analysis."

"Apache Hudi is a key building block for the Hopsworks Feature Store, providing versioned features, incremental and atomic updates to features, and indexed time-travel queries for features," said Jim Dowling, CEO/Co-Founder at Logical Clocks. "The graduation of Hudi to a top-level Apache project is also the graduation of the open-source data lake from its earlier data swamp incarnation to a modern ACID-enabled, enterprise-ready data platform."

"Hudi's graduation to a top-level Apache project is a result of the efforts of many dedicated contributors in the Hudi community," said Jennifer Anderson, Senior Director of Platform Engineering at Uber. "Hudi is critical to the performance and scalability of Uber's big data infrastructure. We're excited to see it gain traction and achieve this major milestone."

"Thus far, Hudi has started a meaningful discussion in the industry about the wide gaps between data warehouses and data lakes. We have also taken strides to bridge some of them, with the help of the Apache community," added Chandar. "But, we are only getting started with our deeply technical roadmap. We certainly look forward to a lot more contributions and collaborations from the community to get there. Everyone’s invited!"

Catch Apache Hudi in action at Virtual Berlin Buzzwords 7-12 June 2020, as well as at MeetUps, and other events.

Availability and Oversight
Apache Hudi software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hudi, visit http://hudi.apache.org/ and https://twitter.com/apachehudi 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 200M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 765 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,600 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, Pineapple Fund, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 

© The Apache Software Foundation. "Apache", "Hudi", "Apache Hudi", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Thursday March 26, 2020

The Apache® Software Foundation Celebrates 21 Years of Open Source Leadership

World’s largest Open Source foundation advances community-led innovation "The Apache Way"


Wakefield, MA —26 March 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today its 21st Anniversary.


Advancing its mission of providing software for the public good, the ASF's all-volunteer community grew from 21 original Members overseeing the development of the Apache HTTP Server to 765 individual Members, 206 Apache Project Management Committees, and 7,600+ Committers shepherding 300 projects and 200M+ lines of Apache code valued at more than $20B.


Apache’s breakthrough technology touches every aspect of modern computing, powering most of the Internet, managing exabytes of data, executing teraflops of operations, and storing trillions of objects in virtually every industry. Apache projects are all freely-available, at 100% no cost, and with no licensing fees.


“Over the past two decades, The Apache Software Foundation has served as a trusted home for vendor-neutral, community-led collaboration,“ said David Nalley, Executive Vice President at The Apache Software Foundation. “Today, the ASF is a vanguard for Open Source, fostering project communities large and small, with a portfolio of best-in-class innovations upon which the world continues to rely.“

The Apache Way

As a community-led organization, the ASF is strictly vendor-neutral. Its independence ensures that no organization, including ASF Sponsors and those who employ contributors to Apache projects, is able to control a project's direction or has special privileges of any kind.

The ASF’s community-focused development process known as "The Apache Way" guides existing projects and their communities, and continues to inspire a new generation of innovations from around the world. The Apache Way edict involves:

  • Earned Authority: all individuals are given the opportunity to participate based on publicly earned merit, i.e., what they contribute to the community.

  • Community of Peers: individuals participate at the ASF, with merit gained by the individual everlasting and free from association of employment status or employer.

  • Open Communications: all communications related to code and decision-making are publicly accessible to ensure asynchronous collaboration within the ASF’s globally-distributed communities.

  • Consensus Decision Making: Apache Projects are overseen by a self-selected team of active volunteers who are contributing to their respective projects.

  • Responsible Oversight: The ASF governance model is based on trust and delegated oversight. 


The Apache Way has been a forerunner in collaborative computing, and has directly influenced the InnerSource methodology of applying Open Source and open development principles to an organization. The Apache Way has been adopted by countless organizations, including Capital One, Comcast, Ericsson, HP, IBM, Google, Microsoft, PayPal, SAP, T-Mobile, and many others.

The ASF’s focus on community is so integral to the Apache ethos that the maxim, "Community Over Code" is an unwavering tenet. Vibrant, diverse communities keep code alive, however, code, no matter how well written, cannot thrive without a community behind it. Members of the Apache community share their thoughts on “Why Apache” in the teaser for “Trillions and Trillions Served”, the upcoming documentary on the ASF https://s.apache.org/Trillions-teaser 

Powerhouse Projects

Dozens of enterprise-grade Apache projects have defined industries and serve as the backbone for some of the most visible and widely used applications in Artificial Intelligence and Deep Learning, Big Data, Build Management, Cloud Computing, Content Management, DevOps, IoT and Edge Computing, Mobile, Servers, and Web Frameworks, among many other categories. 

 

No other software foundation serves the industry with such a wide range of projects. Examples of the breadth of applications that are "Powered by Apache" include:

 

  • China’s second largest courier, SF Express, uses Apache SkyWalking to ship critical COVID-19 coronavirus supplies worldwide;

  • Apache Guacamole’s clientless remote desktop gateway is helping thousands of individuals, businesses, and universities worldwide safely work from home without needing to be tied to a specific device, VPN, or client;

  • Alibaba uses Apache Flink to process more than 2.5 billion records per second for its merchandise dashboard and real-time customer recommendations;

  • the European Space Agency’s Jupiter spacecraft mission control is powered by Apache Karaf, Apache Maven, and Apache Groovy;

  • British Government Communications Headquarters (GCHQ)’s application Gaffer stores and manages petabytes of data using Apache Accumulo, Apache HBase, and Apache Parquet;

  • Netflix uses Apache Druid to manage its 1.5 trillion-row data warehouse to manage what users see when tapping the Netflix icon or logging in from a browser across platforms;

  • Uber's 100-petabyte data lake is powered in near real-time using Apache Hudi (incubating), supporting everything from warehousing to advanced machine learning;

  • Boston Children's Hospital uses Apache cTAKES to link phenotypic and genomic data in electronic health records for the Precision Link Biobank for Health Discovery;

  • Amazon, DataStax, IBM, Microsoft, Neo4j, NBC Universal and many others use Apache Tinkerpop in their graph databases and to write complicated traversals; 

  • the Global Biodiversity Information Facility uses Apache Beam, Hadoop, HBase, Lucene, Spark, and others to integrate biodiversity data from nearly 1,600 institutions and more than a million species and nearly 1.4 billion location records freely available for research;

  • the European Commission developed its new API Gateway infrastructure using Apache Camel;

  • China Telecom Bestpay uses Apache ShardingSphere (incubating) to scale 10 billion datasets for mobile payments distributed across more than 30 applications;

  • Apple’s Siri uses Apache HBase to complete full ring replication around the world in 10 seconds;

  • the US Navy uses Apache Rya to power smart drones, autonomous small robot swarms, manned-unmanned team advanced tactical communications, and more; and

  • hundreds of millions of Websites worldwide are powered by the Apache HTTP Server.

Additional Milestones

In addition to the ASF’s 21st Anniversary, the greater Apache community are celebrating milestone anniversaries of the following projects:

25 Years - Apache HTTP Server

21 Years - Apache OpenOffice (at the ASF since 2011), Xalan, Xerces

20 Years - Apache Jakarta (Apache Open Source Java projects), James, mod_perl, Tcl, APR/Portable Runtime, Struts, Subversion (at the ASF since 2009), Tomcat

19 Years - Apache Avalon, Commons, log4j, Lucene, Torque, Turbine, Velocity

18 Years - Apache Ant, DB, FOP, Incubator, POI, Tapestry

17 Years - Apache Cocoon, James, Logging Services, Mavin, Web Services

16 Years - Apache Gump, Portals, Struts, Geronimo, SpamAssassin, Xalan, XML Graphics

15 Years - Apache Lucene, Directory, MyFaces, Xerces, Tomcat


The chronology of all Apache projects can be found at https://projects.apache.org/committees.html?date


The Apache Incubator is home to 45 projects undergoing development, spanning AI, Big Data, blockchain, Cloud computing, cryptography, deep learning, hardware, IoT, machine learning, microservices, mobile, operating systems, testing, visualization, and many other categories. The complete list of projects in the Incubator is available at http://incubator.apache.org/  

Support Apache 

The ASF advances the future of open development by providing Apache projects and their communities bandwidth, connectivity, servers, hardware, development environments, legal counsel, accounting services, trademark protection, marketing and publicity, educational events, and related administrative support.


As a United States private 501(c)(3) not-for-profit charitable organization, the ASF is sustained through tax-deductible corporate and individual contributions that offset day-to-day operating expenses. To support Apache, visit http://apache.org/foundation/contributing.html 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 200M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 765 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,600 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, ODPi, Pineapple Fund, Private Internet Access, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF


© The Apache Software Foundation. "Apache", "Accumulo", "Apache Accumulo", "Camel", "Apache Camel", "cTAKES", "Apache cTAKES", "Druid", "Apache Druid", "Flink", "Apache Flink", "Groovy", "Apache Groovy", "Guacamole", "Apache Guacamole", "HBase", "Apache HBase", "Apache HTTP Server", "Karaf", "Apache Karaf", "Maven", "Apache Maven", "Parquet", "Apache Parquet", "Rya", "Apache Rya", "SkyWalking, "Apache SkyWalking", "Tinkerpop", "Apache Tinkerpop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.


# # #

Monday November 04, 2019

The Apache Software Foundation Announces Apache® SINGA™ as a Top-Level Project

Open Source machine learning library in use at Citigroup, NetEase, and Singapore General Hospital, among others.

Wakefield, MA —4 November 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® SINGA™ as a Top-Level Project (TLP).

Apache SINGA is an Open Source distributed, scalable machine learning library. The project was originally developed in 2014 at the National University of Singapore, and was submitted to the Apache Incubator in March 2015.

"We are excited that SINGA has graduated from the Apache Incubator," said Wei Wang, Vice President of Apache SINGA and Assistant Professor at the National University of Singapore. "The SINGA project started at the National University of Singapore, in collaboration with Zhejiang University, focusing on scalable distributed deep learning. In addition to scalability, during the incubation process, built multiple versions to improve the project’s usability and efficiency. Incubating SINGA at the ASF brought opportunities to collaborate, grew our community, standardize the development process, and more."

Apache SINGA is a distributed machine learning library that facilitates the training of large-scale machine learning (especially deep learning) models over a cluster of machines. Various optimizations on efficiency, memory, communication and synchronization are implemented to speed it up and scale it out. Currently, the Apache SINGA project is working on SINGA-lite for deep learning on edge devices with 5G, and SINGA-easy for making AI usable by domain experts (without deep AI background).

Apache SINGA is in use at organizations such as Carnegie Technologies, CBRE, Citigroup, JurongHealth Hospital, National University of Singapore, National University Hospital, NetEase, Noblis, Shentilium Technologies, Singapore General Hospital, Tan Tock Seng Hospital, YZBigData, and others. Apache SINGA is used across applications in banking, education, finance, healthcare, real estate, software development, and other categories.

"So glad to see the first Apache project focusing on distributed deep learning become a Top-Level Project," said Beng Chin Ooi, Distinguished Professor of National University of Singapore who initialized the SINGA project, and a member of the Apache SINGA Project Management Committee. "It is essential to scale deep learning via distributed computing as the deep learning models are typically large and trained over big datasets, which may take hundreds of days using a single GPU."

"I am glad to witness the graduation of Apache SINGA as a TLP," said Gang Chen, Professor and Dean of Zhejiang University and Dean of ZJU-NetEase research lab. "We will continue to contribute to the development and use it for industry applications such as smart fabric printing, e-commerce recommendation and smart cities."

"Apache SINGA has a flexible distributed training framework," said Sheng Wang, Research Scientist at the DAMO Academy of Alibaba and a member of the Apache SINGA Project Management Committee. "SINGA can implement multiple popular distributed training strategies, including synchronous and asynchronous training. It achieved excellent scalability in comparison with other deep learning platforms."

"Apache SINGA has been applied to support many different healthcare applications at MZH Technologies," said Zhongle Xie, CTO of Hangzhou MZH Technologies and a member of the Apache SINGA Project Management Committee. "The performance of disease diagnoses based on X-Ray images could even pass the radiologists. We also built a food recognition app using SINGA to help patients monitor their food intake and log the nutrition automatically."

"We are working with cardiologists in Fuwai Hospital, Beijing, China, to develop a machine learning/deep learning cardiovascular disease prediction model, using cardiovascular risk factors and other indirect factors such as diet and exercise," said MZH Technologies co-founder and Beijing Institute of Technology Professor, Meihui Zhang. "We are also using Apache SINGA for data cleaning and integration."

"Besides scalability, SINGA team is continuously improving the library by adding new features to make it easier to use," said Moaz Reyad, Postdoctoral Researcher at Université Grenoble Alpes, and a member of the Apache SINGA Project Management Committee. "For example, SINGA has a sub-component called SINGA-auto (original name is Rafiki), which provides AutoML features like automatic hyper-parameter tuning."

"We would like to thank all our mentors for guiding the project and all contributors for helping on this project from incubation to graduation," added Wang. "Deep learning and other AI technologies are changing the world from many aspects. We welcome newcomers to join our community to make contributions to this exciting field!"

Availability and Oversight
Apache SINGA software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache SINGA, visit http://singa.apache.org/ and https://twitter.com/ApacheSINGA

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, Union Investment, Workday, and Verizon Media. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "SINGA", "Apache SINGA", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #

Wednesday April 24, 2019

The Apache Software Foundation Announces Apache® SkyWalking™ as a Top-Level Project

Open Source Application Performance Monitor (APM) tool in use at Alibaba, China Eastern Airlines, Huawei, and WeBank, among others.

Wakefield, MA —24 April 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® SkyWalking™ as a Top-Level Project (TLP).

Apache SkyWalking is an application performance monitor (APM) tool that provides an automatic, highly efficient way to instrument microservices, cloud native, and container-based applications. The project was originally developed in 2015, and entered the Apache Incubator in December 2017.

"This is a special day for the SkyWalking project and its community. We thank our mentors, contributors, and the Apache Incubator for helping us achieve this goal," said Sheng Wu, Vice President of Apache SkyWalking. "The original agenda behind SkyWalking was to help newcomers understand what is distributed tracing, and the community has grown bigger and stronger since we entered the Apache Incubator. Through The Apache Way, SkyWalking has a very active and diverse community, is used by over 70 companies, and has over 100 source contributors from dozens of different organizations."

Apache SkyWalking provides tracing, service mesh telemetry analysis, metric aggregation and visualization for the distributed system. The project landscape has expanded from a pure tracing system, to an observability analysis platform, and application performance management/monitoring system. Features include:

  • Distributed tracing-based APM: 100% traces collected with low payload for original system;
  • Cloud-native friendly: observe distributed system powered by service mesh, Istio and Envoy;
  • Automated source code change: multiple language agents provided, especially with auto instrumentation supported, in Java, .NET and Nodejs;
  • Easy to operate: doesn’t require Big Data in monitoring large scale distributed system; and
  • Advanced visualization: used in traces, metrics and topology map.

Apache SkyWalking is in use at dozens of organizations that include 5i5j Group, Alibaba, autohome.com, China Eastern Airlines, China Merchants Bank, Daocloud, dangdang.com, guazi.com, Huawei, ke.com, iFLYTEK, primeton.com, Sinolink Securities, tetrate.io, tuhu.cn, tuya.com, WeBank, Yonghui Superstores, youzan.com, and more.

"Instrumentation is unquestionably the most time-consuming part of establishing a distributed tracing solution into an existing platform. I had the chance to code with some of the SkyWalking community earlier on and could see the quality being invested back then," said Mick Semb Wever, ASF Member and Apache SkyWalking Incubating Mentor. "When they were looking for mentors and a champion to help them create a proposal to become an Apache project, I was excited at the opportunity to help bring the project to the Apache Incubator, and was pleasantly surprised to see how prepared, and ASF-like, the SkyWalking community and project had already become. As was the case with Apache Kylin, SkyWalking has not only been a model project during the incubation process, they have also become ambassadors on open development The Apache Way to the greater Open Source community in China. Congratulations on graduating as an Apache Top-Level Project."

"SkyWalking is one of the only Open Source tracing systems where usability and user interface have been a focus, something missing in most Open Source projects," said Jonah Kowall, CTO at Kentik, and former VP Research at Gartner. "Making tracing and APM more easily used by developers and operations team is a key goal which makes Apache Skywalking a project to watch."

"Apache SkyWalking has done a lot of work in spreading modern cloud native observability in China and across the world," said Chris Aniczszyk, CTO and COO of the Cloud Native Computing Foundation. "We are happy to see Apache SkyWalking become a TLP and look forward to their community growing and collaborating with CNCF projects like Kubernetes, Envoy, Jaeger and more."

"I hear regularly from users that observability is the most important feature they're getting out of their service mesh," said Zack Butcher, Core Contributor to Istio. "By integrating Apache SkyWalking with Istio, the SkyWalking team has brought their incredible tools for deeply understanding system behavior to the mesh. We've already seen great results, and I can't wait to see what further insights users unlock using Apache SkyWalking together with Istio to observe and manage their deployments."

"At WeBank, we use different banking architectures, from distributed architecture to Open Source technologies. We’ve built a messaging bus called WeMQ based on Apache RocketMQ that fully utilizes the benefits of messaging by implementing various messaging techniques in different scenarios, such as message exchanges, pub/sub and request/reply models," said Eason Chen, WeBank Tech Specialist, and Apache RocketMQ Contributor. "However, after adding different messaging services that are critical to our business, we realized there is a need for a universal visual traceable system for the distributed message to help us to diagnosis problem of applications. We believe Apache SkyWalking can address our current challenges, and we look forward to contributing to its efforts."

"I am very glad to see SkyWalking has been promoted as Apache Top-Level Project," said Lie Mao, Architect at China Eastern Airlines IT Solution Department. "Apache SkyWalking is integrated into the China Eastern Airlines microservice architecture support platform. SkyWalking provides practical features and visualization capabilities about topology map and distributed tracing, to help us understand the distributed system. I hope the Open Source community can contribute more plugins to Apache SkyWalking to enhance its role in the multi-language hybrid architecture."

"I found SkyWalking in 2017. In two years, it has grown very fast, and the community is very active," said DongXue Si, Senior Software Engineer at CloudWise Inc. "The project is adopted by many companies, and is attracting a lot of developers. Apache SkyWalking makes application performance monitoring easier and more convenient. I believe it will be better and better powered by its diversity community: Bless it."

"As early adopters of SkyWalking, we are very glad to see it graduate as an Apache Top-Level Project," said Liang Zhang, Architect at JD.com, Podling Project Management Committee member of Apache ShardingSphere (incubating), and former Architect at dangdang.com. "Dangdang.com adopted SkyWalking much earlier before it joined the Apache Incubator: we have witnessed its development, new features, and community growth. It is a very good example for Apache ShardingSphere (incubating). I look forward to our projects cooperating on observability in databases, and building a better Open Source ecosystem together."

"Congratulations to SkyWalking for becoming an Apache Top Level project," said Yuqi Zhou, Middleware Development Manager at Sinolink Securities Co. "Apache SkyWalking’s elegant design and good performance solves the our tracing and monitoring needs. Thanks to the Open Source community for bringing us such an awesome project: I wish it continued success."

"In helping enterprise customers transform their business application from traditional architecture to a Microservices architecture, one of the most important aspects of the microservices governance platform is its observability to obtain invocation relationships between components, as well as inside service itself, and to generate statistics based on these data, including SLA of services provided to the outside world," said Grissom Wang, Chief Architect at DaoCloud. "We surveyed a number of similar Open Source technologies and eventually chose Apache SkyWalking as one of the core components of DaoCloud Microservices platform because of its openness, extendibility, high performance, excellent code quality, active community, and forward-looking integration with Istio."

"Congrats SkyWalking being an Apache TLP," said Niangang Xu, co-founder of Yonghui Cloud Computing. "Apache SkyWalking helps us to improve the design of microservice, and has been enabling us to manage and observe a lot of distributed systems at scale!"

"SkyWalking is on its way to becoming a world wide Open Source project," added Wu. "We welcome everyone to participate on our mailing lists, GitHub, and Slack channels, and to learn more through our events, presentations, Website, and documents."

Catch Apache SkyWalking in action at SkyWalking DevCon (Shanghai; 11 May 2019), GIAC (Shenzhen; 21-23 June 2019), KubeCon + CloudNativeCon China (Shanghai; 25-26 June 2019), ApacheCon North America  (Las Vegas; 9-12 September 2019), and DevOps Stage (Kiev; 18-19 October 2019).

Availability and Oversight
Apache SkyWalking software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache SkyWalking, visit http://skywalking.apache.org/ and https://twitter.com/ASFSkyWalking

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects that provide $20B+ worth of Apache Open Source software to the public at 100% no cost. Through the ASF's merit-based process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting billions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, Union Investment, Workday, and Verizon Media. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "SkyWalking", "Apache SkyWalking", "Kylin", "Apache Kylin", "RocketMQ", "Apache RocketMQ", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday March 26, 2019

20 Years of Open Source Innovation, The Apache Way

by Jim Jagielski and Sally Khudairi

As the world's largest and one of the most influential Open Source foundations, The Apache Software Foundation (ASF) is home to more than 350 community-led projects and initiatives. The ASF's 731 individual Members and more than 7,000 Committers are global, diverse, and often embodies a case of collective humility. We've assembled a list of 20 ubiquitous and up-and-coming Apache projects to celebrate the ASF's 20th Anniversary on 26 March 2019, applaud our all-volunteer community, and thank the billions of users who benefit from their Herculean efforts.


1. Apache HTTP Server
Web/Servers. http://httpd.apache.org/

The most popular Open Source HTTP server on the planet shot to fame just 13 months from its inception in 1995, and remains so today due to its ability to provide a secure, efficient and extensible server that provides HTTP services observing the latest HTTP standards. Serving modern operating systems including UNIX, Microsoft Windows, and Mac OS/X, the Apache HTTP Server played a key role in the initial growth of the World Wide Web; its rapid adoption over all other Web servers combined was also instrumental to the wide proliferation of eCommerce sites and solutions. The Apache HTTP Server project was the ASF's flagship project at its launch, and served as the basis upon which future Apache projects emulated with its open, community-driven, merit-based development process known as "The Apache Way".


2. Apache Incubator
Innovation. http://incubator.apache.org/

The Apache Incubator is the ASF's nexus for innovation, serving as the entry path for projects and codebases wishing to officially become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects go through the incubation process to ensure all donations are in accordance with the ASF legal standards, and develop diverse communities that adhere to the ASF's guiding principles. Incubation is required of newly accepted projects until their infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. Whilst incubation is neither a reflection of the completeness or stability of the code, nor does it indicate that the project has yet to be fully endorsed by the ASF, its rigorous process of mentoring projects and their communities according to "The Apache Way" has led to the graduation of nearly 200 projects in the Incubator's 16-year history. Today 51 "podlings" are undergoing development in the Apache Incubator across an array of categories, including annotation, artificial intelligence, Big Data, cryptography, data science/storage/visualization, development environments, Edge and IoT, email, JavaEE, libraries, machine learning, serverless computing, and more.


3. Apache Kafka
Big Data. https://kafka.apache.org/

The Apache footprint as the foundation of the Big Data ecosystem continues to grow, from Accumulo to Hadoop to ZooKeeper, with fifty active projects to date and two dozen more in the Apache Incubator. Apache Kafka's highly-performant distributed, fault tolerant, real-time publish-subscribe messaging platform powers Big Data solutions at Airbnb, LinkedIn, MailChimp, Netflix, The New York Times, Oracle, PayPal, Pinterest, Spotify, Twitter, Uber, Wikimedia Foundation, and countless other businesses.


4. Apache Maven
Build Management. http://maven.apache.org/

Spinning out of the Apache Turbine servlet framework project in 2004, Apache Maven has risen to the top as the hugely popular build automation tool that helps Java developers build and release software. Stable, flexible, and feature-rich, Maven streamlines continuous builds, integration, testing, and delivery processes with an impressive central repository and robust plug-in ecosystem, making it the go-to choice for developers who want to easily manage a project’s build, reporting, and documentation.


5. Apache CloudStack
Cloud Computing. http://cloudstack.apache.org/

Super-quick to deploy, well-documented, and with an easy production environment, one of the biggest draws to Apache CloudStack is that it "just works". Powering some of the industry's most visible Clouds –from global hosting providers to telcos to the Fortune 100 top 5% and more– the CloudStack community is cohesive, agile, and focused, leveraging 11 years of Cloud success to enable users to rapidly and affordably build fully featured clouds.


6. Apache cTAKES
Content. http://ctakes.apache.org/

Developed from real-world use at the Mayo Clinic in 2006, cTAKES was created by a team of physicians, computer scientists and software engineers seeking a natural language processing system for extraction of information from electronic medical record clinical free-text. Today Apache cTAKES is an integral part of the Mayo Clinic's electronic medical records and has processed more than 80 million clinical notes. Apache cTAKES is a growing standard for clinical data management infrastructure across hospitals and academic institutions that include Boston Children’s Hospital, Cincinnati Children’s Hospital, Massachusetts Institute of Technology, University of Colorado Boulder, University of Pittsburgh, and University of California San Diego, as well as companies such as Wired Informatics.


7. Apache Ignite
Data Management. https://ignite.apache.org/

Apache Ignite is used for transactional, analytical, and streaming workloads at petabyte scale for the likes of American Airlines, ING, Yahoo Japan and countless others on premises, on cloud platforms, or in hybrid environments. Apache Ignite's in-memory data fabric provides an in-memory data grid, compute grid, streaming, and acceleration solutions across the Apache Big Data system ecosystem, including Apache Cassandra, Apache Hadoop, Apache Spark, and more.


8. Apache CouchDB
Databases. http://couchdb.apache.org/

Thousands of organizations such as the BBC, GrubHub, and the Large Hadron Collider use Apache CouchDB for seamless data flow between every imaginable computing environment, from globally-distributed server clusters to mobile devices to Web browsers. Its Couch Replication Protocol allows you to store, retrieve, and replicate data safely on premises or on the Cloud with very high performance reliability. Apache CouchDB does all the heavy lifting so you can sit back and relax.


9. Apache Edgent (incubating)
Edge computing. http://edgent.incubator.apache.org/

The boom of IoT –personal assistants, smart phones, smart homes, connected cars, Industry 4.0 and beyond– is producing an ever-growing amount of data streaming from millions of systems, sensors, equipment, vehicles and more. The demand for reliable, efficient real-time data has driven the need for the "Empowered Edge", where data collection and analysis is optimized by moving away from centralized sources towards the edges of of the networks, where much of the data originates. Companies like IBM and SAP are leveraging Apache Edgent to accelerate analytics at the edge across the IoT ecosystem. Apache Edgent can be used in conjunction with many Apache data analytics solutions such as Apache Flink, Apache Kafka, Apache Samza, Apache Spark, Apache Storm, and more.


10. Apache OFBiz
Enterprise Resource Planning (ERP). https://ofbiz.apache.org/

Whereas most of the ASF projects are about running or creating infrastructure, we also realize the importance of running and handling a business. Apache OFBiz is a comprehensive suite of business applications from accounting and CRM through Warehousing and Inventory control. The Java based framework provides the power and the flexibility to serve as the core of one's B2B and B2C business management and is easily expandable and customizable. Apache OFBiz is a complete ERP solution, flexible, free, and fully Open Source and services users from United Airlines to Cabi.


11. Apache SIS (Spatial Information System)
Geospatial. http://sis.apache.org/

The US National Oceanic and Atmospheric Administration, Vietnamese National Space Center, numerous spatial agencies, governments, and others rely on Apache SIS to create their own intelligent, standards-based interoperable geospatial applications. The Apache SIS toolkit handles spatial data, location awareness, geospatial data representation, and provides a unified metadata model for file formats used for real-time smart city visualization, geospatial dataset discovery, state-of-the-art location-enabled emergency management, earth observation, as well as information modeling for extra-terrestrial bodies such as Mars and asteroids.


12. Apache Syncope
Identity Management. http://syncope.apache.org/

Apache Syncope manages digital identity data in enterprise applications and environments to handle user information such as username, password, first name, last name, email address, etc. Identity management involves considering user attributes, roles, resources and entitlements that control who access to what data, when, how, and why. Apache Syncope users include the Italian Army, the University of Helsinki, University of Milan, and the SWITCH Swiss university network.


13. Apache PLC4X (incubating)
Internet of Things (IoT). http://plc4x.incubator.apache.org/

Connectivity and integration across many Industrial IoT edge gateways is often impossible with closed-source, proprietary legacy systems with incompatible protocols. Apache PLC4X provides a universal protocol adapter for creating Industrial IoT applications through a set of libraries that allow unified access to any type of industrial programmable logic controllers (PLCs) using a variety of protocols with a shared API. In addition, the project is planning integrations modular to Apache IoT projects that include Apache Brooklyn, Apache Camel, Edgent, Apache Kafka, Apache Mynewt, and Apache NiFi.


14. Apache Commons
Libraries. http://commons.apache.org/

With 42%+ of Apache projects written in Java (that's 62+ million lines of code), having a set of stable, reusable Open Source Java software components available to all Apache projects and external users is both helpful and necessary. Apache Commons provides a suite of dozens of stable, reusable, easily deployed Java components, and a workspace for Commons contributors to collaborate on the development of new components.


15. Apache Spark
Machine Learning. http://spark.apache.org/

Big Data is growing exponentially each year, accelerated by industries such as agriculture, big business, FinTech, healthcare, IoT, manufacturing, mobile advertising and more. Apache Spark's unified analytics engine for processing and analyzing large-scale data processing helps data scientists apply machine learning insights and an array of libraries to improve responsiveness more accurate results. Apache Spark runs workloads 100x faster on Apache Hadoop, Apache Mesos, Kubernetes, whether standalone or in the cloud, and to access diverse data sources, from Apache Cassandra, Apache Hadoop HDFS, Apache HBase, Apache Hive, and hundreds of others.


16. Apache Cordova
Mobile. https://cordova.apache.org/

Apache Cordova is the popular developer tool used to easily build cross-platform, cross-device mobile apps using a Write-Once-Run-Anywhere solution, which enabling developers to create a single app that will appear the same across multiple mobile device platforms. Apache Cordova acts as an extensible container, and serves as the base that most mobile application development tools and frameworks are built upon, including mobile development platforms and commercial software products by Blackberry, Google, IBM, Intel, Microsoft, Oracle, Salesforce, and many others.


17. Apache Tomcat
Java/Servers. https://tomcat.apache.org/

Starting off as the Apache JServ project, designed to allow for Java "servlets" to be run in a Web environment, Tomcat grew to being a full-fledged, comprehensive Java Application server and was the de-facto reference implementation for the Java specifications. Since 2005, Apache Tomcat has formed, and still forms, the foundation of numerous Java-based web infrastructures such as eBay, E*Trade, WalMart, and The Weather Channel.


18. Apache Lucene/Solr
Search. http://lucene.apache.org/solr/

Adobe, AOL, Apple, AT&T, Bank of America, Bloomberg, Cisco, Disney, eTrade, Ford, The Guardian, Homeland Security, Instagram, MTV Networks, NASA Planetary Data System, Netflix, SourceForge, Verizon, Walmart, whitehouse.gov, Zappos, and countless others turn to Apache Lucene Solr to quickly and reliably index and search multiple sites and enterprise data such as documents and email. Popular features include near real-time indexing, automated failover and recovery, rich document parsing and indexing, user-extensible caching, design for high-volume traffic, and much more. 


19. Apache Wicket
Web Framework. http://wicket.apache.org/

The Apache Wicket component-based Web application framework is prized by many followers for its "Plain Old Java Object" (POJO) data model and markup/logic separation not common in most frameworks. Developers have been using Apache Wicket since 2004 to quickly create powerful, reusable components using object oriented methodology with Java and HTML. Wicket powers thousands of applications and sites for governments, stores, universities, cities, banks, email providers, and more, including Apress, DHL, SAP, Vodafone, and Xbox.com.


20. Apache Daffodil (incubating)
XML. http://daffodil.apache.org/

Governments handle massive amounts of complex and legacy data across security boundaries every day. In order for such data to be consumed, it must be inspected for correctness and sanitized of malicious data. Whilst traditional inspection methods are often proprietary, incomplete, and poorly maintained, Apache Daffodil streamlines the process with an Open Source implementation of the Data Format Description Language specification (DFDL) that fully describes a wide array of complex and legacy file formats down to the bit level. Daffodil can parse data to XML or JSON to allow for validation, sanitization, and transformation, and also serialize or ''unparse'' back to the original file format, effectively mitigating a large variety of common vulnerabilities.

The Apache Software Foundation is a leader in community-driven open source software and continues to innovate with dozens of new projects and their communities. Apache projects are managing exabytes of data, executing teraflops of operations, and storing billions of objects in virtually every industry. Apache software is an integral part of nearly every end user computing device, from laptops to tablets to phones. The commercially-friendly and permissive Apache License v2.0 has become an open source industry standard. As the demand for quality open source software continues to grow, the collective Apache community will continue to rise to the challenge of solving current problems and ideate tomorrow’s opportunities through The Apache Way of open development. Learn more at http://apache.org/

# # # 

Thursday March 21, 2019

The Apache Software Foundation Announces Apache® Unomi™ as a Top-Level Project

Powerful Open Source Customer Data Platform in use at Al-Monitor, Altola, Jahia, and Yupiik, among others. 

Wakefield, MA —21 March 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Unomi™ as a Top-Level Project (TLP).

Apache Unomi is a standards-based, Customer Data Platform (CDP) that manages online customer, leads, and visitor information to provide personalized experiences that adheres to visitor privacy rules such as GDPR and “Do Not Track” preferences. The project was originally developed at Jahia, and was submitted to the Apache Incubator in October 2015.

"I am truly thankful to our community, especially our mentors, who have helped us achieve this milestone," said Serge Huber, Vice President of Apache Unomi. "The original vision behind Unomi was to ensure true privacy by making the technologies handling customer data completely Open Source and independent. Since it was submitted to the Apache Incubator, developing Unomi using the Apache Way will ensure the project grows its community to be more diverse and welcome new users and developers."

Apache Unomi is versatile, and features privacy management, user/event/goal tracking, reporting, visitor profile management, segmentation, personas, A/B testing, and more. It can be used as:

  • a personalization service for a Web CMS;

  • an analytics service for  native mobile applications;

  • a centralized profile management system with segmentation capabilities; and

  • a consent management hub

Apache Unomi is the industry's first reference implementation of the upcoming OASIS CDP specification (established by the OASIS CXS Technical Committee, which sets standards as a core technology for enabling the delivery of personalized user experiences). As a reference implementation, Apache Unomi serves as a real world example of how the standard will be stable, and is quickly gaining traction by those interested in truly open and transparent customer data privacy. Apache Unomi is in use at organizations such as Al-Monitor, Altola, Jahia, Yupiik, and many others to create and deliver consistent personalized experiences across channels, markets, and systems.

"When Serge and I announced the launch of the Apache Unomi project at the 2015 ApacheCon Budapest, Apache Unomi, at that time, was the first proposal among the rising Customer Data Platform industry's segment, positioned as an 'ethical data-driven marketing' product that would respect the privacy of customers while leveraging the power of unified customers data," said Elie Auvray, Head of Business Development at Jahia. "Jahia's digital experience management solutions are based on Apache Unomi, and we can't wait to see how the project will now evolve with its growing community. Seeing today Apache Unomi becoming a Top-Level Project is a great reward for us as Open Source software believers. We are proud of this milestone, grateful to the Apache Software Foundation and our mentors, and we know it's only the beginning of a new –hopefully long and successful– journey."

"Under development at OASIS, the Customer Data Platform specification –for which Apache Unomi aims to be the reference implementation– lies at the crossroads of many solutions providers needs such as WCM, CRM, Big Data Platforms, Machine Learning, IoT and Digital Marketing," said Laurent Liscia, CEO of OASIS. "At a time when client data interoperability and built-in data privacy are mandatory foundations for legal, consistent, and personalized experiences across channel markets and systems, the CDP specification, together with Apache Unomi, is a clear and welcome answer to end-user concerns."

"Apache Unomi is the perfect solution to implement a user profile platform," said Jean-Baptiste Onofré, Fellow at Talend. "It fully addresses the user trust and privacy needs, allowing to easily create user profile and Web marketing features. As Unomi is powered by Apache Karaf, it's also a great platform for several use cases, such as digital marketing in Web applications, managing user profiles on IoT devices, and more."

"Apache Unomi enables Al-Monitor readers to be driven towards additional personalized content that corresponds, via content tags profiling and related automated segmentations, to what they have already accessed," said Valerie Voci, Head of Digital Strategy and Marketing at Al-Monitor. "This data follows our customers where they go, so it's a consistent experience whether they are getting these recommendations in their inbox or on the Website or both. And if a change takes place on one, that change is immediately reflected on the other. It helps us create a very cohesive marketing message and a great overall digital experience."

"As we were developing a progressive web app (PWA) for a client, we were looking for a Customer Data Platform (CDP) to store customer insights, such as behavioral and explicit customer data," said Lars Petersen, Co-Founder at Altola. "Privacy was table stake for us, along with the flexibility to customize data schema and open API. We selected Apache Unomi based on these parameters, we had it up and running on AWS in less than 30 min. and are very impressed with the maturity of the platform, its privacy by design and how easy it was to work with."

"In a digital world, customer data is very important to offer a better experience to users. However, data privacy and trust is not an option for users," said François Papon, CTO at Yupiik. "Apache Unomi is the best solution for our clients because it's an Open Source project managed by an independent foundation, there is no vendor lock-in. It's also based on other solutions like Apache Karaf that made it ready for modularity, scalability, cloud, devops, and more." 

"Apache Unomi is poised to disrupt the Customer Data Platform market," said Thomas Sigdestad, CTO at Enonic, and co-chair, with Serge Huber, of the CDP standards work at OASIS open. "The CDP marketplace is lacking from a standard way of exchanging data, and the vendor space is over-represented by closed source and proprietary cloud offerings. This effectively limits the potential and adoption of CDP in general. Apache Unomi is not merely Open Source, but also the reference implementation of the imminent CDP standard from OASIS. Companies using Unomi will benefit from faster and simpler integrations without locking their customer data into yet another proprietary silo." 

"Graduating as an Apache Top-Level Project is only the beginning," added Huber. "Unomi has a lot of potential that it still to be developed, and is a perfect opportunity for those interested in Customer Data Privacy to participate through our mailing lists and Slack channel, and to learn more about the project on our Website and presentations."

Catch Apache Unomi in action at ApacheCon North America (9-12 September 2019 in Las Vegas, Nevada), and ApacheCon Europe (22-24 October 2019 in Berlin, Germany) http://apachecon.com/ .

Availability and Oversight
Apache Unomi software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Unomi, visit http://unomi.apache.org/

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects seeking to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, Union Investment, Workday, and Verizon Media. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Unomi", "Apache Unomi", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

Tuesday March 19, 2019

The Apache Way to Sustainable Open Source Success

As Open Source software continues to grow in importance, it seems appropriate to reflect upon the ongoing success of The Apache Software Foundation (ASF) as it approaches its 20th anniversary. The Apache Way of community-driven development continues to gain momentum despite the compounding challenges of building software in the greater Open Source ecosystem.

This approach, The Apache Way, was defined over 24 years ago by the original Apache Group, prior to the establishment of the Foundation. It has led to our success as a foundation and we believe it has been fundamental to the triumph of Open Source as a whole.

While The Apache Way has been refined over the years, it remains true to the original goals of transparent, community-driven collaboration in a vendor-neutral environment that is accessible to all.

The Apache Way defines Open Source in terms of both a legal and a social framework for collaboration. It helps others understand what makes Open Source powerful and how participants are expected to behave. In this post we will examine The Apache Way in the context of the Foundation's mission:

"The mission of the Apache Software Foundation (ASF) is to provide software for the public good. We do this by providing services and support for many like-minded software project communities consisting of individuals who choose to participate in ASF activities." 

Let's dissect this mission statement. 

"Provide Software for the Public Good"

Key points in this section: 

  • We produce software that is non-excludable and non-rivalrous

  • Use of the software in any context does not reduce its availability to others

  • Users and contributors have no committed responsibility to the foundation, our projects or our communities

  • Use of a license that conforms to the Open Source Definition is necessary but not sufficient to deliver on our mission 

Investopedia defines a public good as "a product that one individual can consume without reducing its availability to another individual, and from which no one is excluded." On the surface, this is a good definition for our use of the term. However, there is a nuance in our use. Our mission is not to produce "public goods" but to "provide software for the public good". 

To understand why this is important, one needs to think about what motivates the ASF to produce software that is a public good.

Open Source software can be digitally copied and reused in an unlimited number of ways. Every user can modify it for their specific needs. They can combine it with other software. They can design innovative new products and services using it and can make a living from the proceeds. This is all possible without impacting other people's use of the software. As such, the ASF produces software that can be used for the public good in many different ways.

To allow us to deliver on this part of the mission, it is critical that we adopt a license that uses the law to protect the software curated here at the Foundation. For us that license is the Apache License, Version 2. In addition, we adopt an inbound licensing policy that defines which licenses are allowable on software reused within Apache projects. This policy can be summarized as: 

  • The license must meet the Open Source Definition (OSD).

  • The license, as applied in practice, must not impose significant restrictions beyond those imposed by the Apache License 2.0.

This means that you can be assured that software curated by projects within The Apache Software Foundation is both a public good and for the public good. You can use Apache software for any purpose and you have no responsibility to the Foundation or the project to contribute back (though as addressed in the next section, it is often in your interests to do so). 

It is important to recognize that there are software projects out there that adopt our license but do not adopt our inbound licensing policy. Such projects may bring restrictions that are not covered by our license; therefore, it is important to carefully examine the licensing policies of these projects. Using the Apache License alone may not provide you with the same options a Foundation project provides. 

Apache projects are successful, in large part, because of our diligence with respect to clearly-defined licensing policies. Such diligence makes it much easier for downstream users to understand what they can and cannot do with Apache software. The Apache License is deliberately permissive to ensure that everyone has an opportunity to participate in Open Source within the ASF or elsewhere. Modifications of our license are allowed, but modified licenses are neither the Apache License nor affiliated with or endorsed by The Apache Software Foundation. No modified license can be represented as such. Modified licenses that use the Apache name are strictly disallowed, as they are both confusing to users and undermine the Apache brand.

While we recognize that there are many ways to license software, whether Open Source or otherwise, we assert that only projects that use both our license (unmodified) and our inbound licensing policy truly follow and adhere to The Apache Way. 

While an OSD-approved license and associated policies are necessary for successful Open Source production, they are not sufficient. They provide a legal framework for the production of Open Source, but they do not provide a social framework, which brings us to the second sentence of our mission:

"The mission of the Apache Software Foundation is to provide software for the public good. We do this by providing services and support for many like-minded software project communities of individuals who choose to contribute to Apache projects."

"Like-Minded Software Project Communities of Individuals"

Key points in this section: 

  • The Apache Way provides a governance model designed to create a social framework for collaboration

  • The Apache Software Foundation develops communities, and those communities develop software

  • ASF project communities develop and reuse software components that in turn may be reused in products

  • Users of ASF software often build products and services using our software components

  • Our model, and others like it, have produced some of the largest and longest-lived Open Source projects that have literally revolutionized the industry 


There is a lot packed into these few words. It is an understanding of these words that makes the difference between software that is under an Open Source license and software that reaches sustainability through The Apache Way. These words underscore the fact that the Foundation does not directly produce software. That's right, The Apache Software Foundation, with upwards of $8Bn of software code, does not directly produce software. Rather than focus on software, we focus on the creation of and support of collaborative communities; the software is an intentional by-product. 

Our like-minded project communities come together because they share common problems that can be addressed in software. As the saying goes, "a problem shared is a problem halved". By bringing together individuals with their unique ideas and skills, we break down barriers to collaboration. 

The Apache Way is carefully crafted to create a social structure for collaboration, which complements the legal framework discussed above. Where the legal framework ensures an equal right to use the software, The Apache Way ensures an equal ability to contribute to the software. This is critically important to the long term sustainability of Open Source software projects. This social structure for collaboration is missing from many non-Apache projects, yet a robust social structure is invariably a key component in long-term successful projects outside of the ASF.

The Apache Way is fully inclusive, open, transparent and consensus-based. It promotes vendor neutrality to prevent undue influence (or control) from a single company. It ensures that any individual with a valuable contribution is empowered, and it seeks to assure that a project remains sustainable despite inevitable changes in community membership over time.

Apache projects typically produce software components that can be combined with other software (of any license) in different ways to solve different problems. This provides plenty of opportunity for participants to collaborate within a given software project independent of their relationship outside the Foundation. This is very different from the idea of licensing your product as a whole under an Open Source license. Our model offers more opportunities for reuse which, in turn, increase the pool of individuals likely to contribute to the project.

In addition, our merit-based system seeks to ensure that as people come and go, for whatever reason, there is always someone to take their place. As a result, some ubiquitous Apache projects have existed for over 20 years and helped commercialize the World Wide Web; while dozens of newer projects have defined industry segments such as Big Data and IoT (Internet of Things). 

A core tenet of The Apache Way is "Community Over Code", which encapsulates our deep belief that a healthy community is a far higher priority than good code. A strong community can always rectify a problem with the code, whereas an unhealthy community will likely struggle to maintain a codebase in a sustainable manner. Healthy communities ensure the Foundation has the stability to thrive for the next 20 years and beyond. Apache projects do not have the problem of scaling that others, who focus only on the legal frameworks of Open Source, suffer from. If you look around at projects that have grown up alongside the Apache projects, you will see a similar focus on scaling the governance model. This is no accident. 

Why this is Important

Software is a critical part of any modern economy. It touches every part of every life in the developed world, and is increasingly transforming everyday life, from womb to grave, everywhere.

At The Apache Software Foundation, we believe that every developer has their personal motivations for building software. We celebrate their right to choose when and how they build their software, including their right to use a non-open license. 

We will not dictate what is best for developers or for the software industry.

We care about the provision of software that enables our users, our contributors, and the general public to decide what is best for them.

We welcome you to use our software and contribute to our projects -- or not. It's up to you. 

We ask that you leave commercial interests at the door.

Countless organizations are proving that their team members who collaborate in a vendor-neutral environment often apply Open Innovation processes (such as The Apache Way) to their work. This helps create internal efficiencies and lays the groundwork for new external opportunities that may provide additional added benefits.

Bringing only your intention of contributing what best serves the greater Apache community reinforces trust in the people and projects behind the Apache brand, and helps us realize our mission of providing software for the public good. 

We learn together and work together to deliver the best software we can. 

Apache software is available for all.

The freedom to choose is what makes the Foundation and Apache projects so strong.

Summary

The software industry has changed and continues to change. The ways software is delivered to end users have changed. Some of the leaders in our industry have retired and new leaders have emerged. But some things have not changed. Our model of collaborative software development, through a combination of a licensing and social framework, remains one of the most successful models of software production.

Increasing the number of users, even those who do not contribute to code, should be seen as a benefit, not a problem, in Open Source. More users present an opportunity. At Apache, more users means more success since they are our future contributors.

As a US 501(c)(3) public charitable organization, The Apache Software Foundation helps individuals and organizations understand how Open Source at scale works in a highly competitive market. For more than two decades our focus has not been on producing software, but rather mentoring communities who produce software. The Apache Way advances sustainable Open Source communities: everything we do is Open Source so all kinds of users can benefit from our experience. Apache is for everyone.

# # #

Tuesday February 19, 2019

The Apache® Software Foundation Announces Apache Arrow™ Momentum

Open Source Big Data in-memory columnar layer adopted by dozens of Open Source and commercial technologies; exceeded 1,000,000 monthly downloads within first three years as an Apache Top-Level Project

Wakefield, MA —19 February 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced momentum with Apache® Arrow™, the Open Source Big Data in-memory columnar layer.

Since the founding of the project in January 2016, Apache Arrow has quickly become the defacto standard for representing and processing analytical data in memory, accelerating analytical processing and interchange by more than 100x.

"When we became a Top-Level Project, we projected that the majority of the world's data will be processed through Arrow within the next decade," said Jacques Nadeau, Vice President of Apache Arrow. "In just three years time, we are proud to see Arrow's substantial industry adoption and increased value across a wide range of analytical, machine learning, and artificial intelligence workloads."

Highlights of Apache Arrow's success include:

Industry Adoption —more than 20 major technologies adopted Arrow to accelerate in-memory analytics, including Apache Spark, NVIDIA RAPIDS, pandas, and Dremio, among others. A list of known Open Source and commercial implementations can be found at https://arrow.apache.org/powered_by/

Millions of Downloads —leveraging and integrating Apache Arrow into many other technologies has bolstered downloads to more than 1,000,000 each month.

New Language Support —as a cross-language development platform, supporting multiple programming languages is paramount. Apache Arrow has grown from supporting one language to eleven different languages today; they include C++, Java, Python, R, C#, Javascript, and Ruby, among others.

Seamless Data Format Support —Arrow supports different data types, both simple and nested, located in arbitrary memory such as regular system RAM, memory-mapped files or on-GPU memory. In addition, it can ingest data from popular storage formats such as Apache Parquet, CSV files, Apache ORC, JSON, and more.

Major Code Donations —Apache Arrow's new features and expanded functionality are due in part to code and component donations that include:
  • C# Library
  • Gandiva LLVM-based Expression Compiler
  • Go Library
  • Javascript Library
  • Plasma Shared Memory Object Store
  • Ruby Libraries (Apache Arrow and Apache Parquet)
  • Rust Libraries (Parquet and DataFusion Query Engine)
Community and Contributor Growth —over the past 12 months, nearly 300 individuals have submitted more than 3,000 contributions that have grown the Apache Arrow code base by 300,000 lines of code. The Arrow community is welcoming approximately 10 new contributors each month.


In January the project announced its most recent release, Apache Arrow 0.12.0, which reflects more than 600 enhancements developed during Q4 2018. The Apache Arrow community is actively working on a number of impactful new initiatives that include solving high performance analytical problems and allowing for more efficient data distribution across entire clusters.

"Apache Arrow's rapid industry adoption and developer community growth supports our original thesis of the importance of a language-independent open standard for columnar data," said Wes McKinney, member of the Apache Arrow Project Management Committee, and creator of Python's pandas project. "Additionally, we are seeing productive collaborations take place not only between programming languages but also between the database systems and data science worlds. We look forward to welcoming more data system developers into our community."

About Apache Arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.

Availability and Oversight
Apache Arrow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Arrow, visit http://arrow.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, Union Investment, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Arrow", "Apache Arrow", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday January 23, 2019

The Apache Software Foundation Announces Apache® Hadoop® v3.2.0

Pioneering Open Source distributed enterprise framework powers US$166B Big Data ecosystem

Wakefield, MA —23 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced Apache® Hadoop® v3.2.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.

Now in its 11th year, Apache Hadoop is the foundation of the US$166B Big Data ecosystem (source: IDC) by enabling data applications to run and be managed on large hardware clusters in a distributed computing environment. "Apache Hadoop has been at the center of this big data transformation, providing an ecosystem with tools for businesses to store and process data on a scale that was unheard of several years ago," according to Accenture Technology Labs.

"This latest release unlocks the powerful feature set the Apache Hadoop community has been working on for more than nine months," said Vinod Kumar Vavilapalli, Vice President of Apache Hadoop. "It further diversifies the platform by building on the cloud connector enhancements from Apache Hadoop 3.0.0 and opening it up for deep learning use-cases and long-running apps."

Apache Hadoop 3.2.0 highlights include:
  • ABFS Filesystem connector —supports the latest Azure Datalake Gen2 Storage;
  • Enhanced S3A connector —including better resilience to throttled AWS S3 and DynamoDB IO;
  • Node Attributes Support in YARN —helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels;
  • Storage Policy Satisfier  —supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories; 
  • Hadoop Submarine —enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster;
  • C++ HDFS client —helps to do async IO to HDFS which helps downstream projects such as Apache ORC;
  • Upgrades for long running services —supports in-place seamless upgrades of long running containers via YARN Native Service API (application program interface) and CLI (command-line interface).

"This is one of the biggest releases in Apache Hadoop 3.x line which brings many new features and over 1,000 changes," said Sunil Govindan, Apache Hadoop 3.2.0 release manager. "We are pleased to announce that Apache Hadoop 3.2.0 is available to take your data management requirements to the next level. Thanks to all our contributors who helped to make this release happen."

Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Huawei, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of educational and production users, as well as companies that offer Hadoop-related services at https://wiki.apache.org/hadoop/PoweredBy

Global Knowledge hails, "...the open-source Apache Hadoop platform changes the economics and dynamics of large-scale data analytics due to its scalability, cost effectiveness, flexibility, and built-in fault tolerance. It makes possible the massive parallel computing that today's data analysis requires."

Hadoop is proven at scale: Netflix captures 500+B daily events using Apache Hadoop. Twitter uses Apache Hadoop to handle 5B+ sessions a day in real time. Twitter’s 10,000+ node cluster processes and analyzes more than a zettabyte of raw data through 200B+ tweets per year. Facebook’s cluster of 4,000+ machines that store 300+ petabytes is augmented by 4 new petabytes of data generated each day. Microsoft uses Apache Hadoop YARN to run the internal Cosmos data lake, which operates over hundreds of thousands of nodes and manages billions of containers per day.

Transparency Market Research recently reported that the global Hadoop market is anticipated to rise at a staggering 29% CAGR with a market valuation of US$37.7B by the end of 2023.

Apache Hadoop remains one of the most active projects at the ASF: it ranks #1 for Apache project repositories by code commits, and is the #5 repository by size (3,881,797 lines of code).

"The Apache Hadoop community continues to go from strength to strength in further driving innovation in Big Data," added Vavilapalli. "We hope that developers, operators and users leverage our latest release in fulfilling their data management needs."

Catch Apache Hadoop in action at the Strata conference, 25-28 March 2019 in San Francisco, and dozens of Hadoop MeetUps held around the world, including on 30 January 2019 at LinkedIn in Sunnyvale, California.

Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official global conference series. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday January 08, 2019

The Apache Software Foundation Announces Apache® Airflow™ as a Top-Level Project

Open Source Big Data workflow management system in use at Adobe, Airbnb, Etsy, Google, ING, Lyft, PayPal, Reddit, Square, Twitter, and United Airlines, among others.

Wakefield, MA —8 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Airflow™ as a Top-Level Project (TLP).

Apache Airflow is a flexible, scalable workflow automation and scheduling system for authoring and managing Big Data processing pipelines of hundreds of petabytes. Graduation from the Apache Incubator as a Top-Level Project signifies that the Apache Airflow community and products have been well-governed under the ASF's meritocratic process and principles.

"Since its inception, Apache Airflow has quickly become the de-facto standard for workflow orchestration," said Bolke de Bruin, Vice President of Apache Airflow. "Airflow has gained adoption among developers and data scientists alike thanks to its focus on configuration-as-code. That has gained us a community during incubation at the ASF that not only uses Apache Airflow but also contributes back. This reflects Airflow’s ease of use, scalability, and power of our diverse community; that it is embraced by enterprises and start-ups alike, allows us to now graduate to a Top-Level Project."

Apache Airflow is used to easily orchestrate complex computational workflows. Through smart scheduling, database and dependency management, error handling and logging, Airflow automates resource management, from single servers to large-scale clusters. Written in Python, the project is highly extensible and able to run tasks written in other languages, allowing integration with commonly used architectures and projects such as AWS S3, Docker, Apache Hadoop HDFS, Apache Hive, Kubernetes, MySQL, Postgres, Apache Zeppelin, and more. Airflow originated at Airbnb in 2014 and was submitted to the Apache Incubator March 2016.

Apache Airflow is in use at more than 200 organizations, including Adobe, Airbnb, Astronomer, Etsy, Google, ING, Lyft, NYC City Planning, Paypal, Polidea, Qubole, Quizlet, Reddit, Reply, Solita, Square, Twitter, and United Airlines, among others. A list of known users can be found at https://github.com/apache/incubator-airflow#who-uses-apache-airflow

"Adobe Experience Platform is built on cloud infrastructure leveraging open source technologies such as Apache Spark, Kafka, Hadoop, Storm, and more," said Hitesh Shah, Principal Architect of Adobe Experience Platform. "Apache Airflow is a great new addition to the ecosystem of orchestration engines for Big Data processing pipelines. We have been leveraging Airflow for various use cases in Adobe Experience Cloud and will soon be looking to share the results of our experiments of running Airflow on Kubernetes." 

"Our clients just love Apache Airflow. Airflow has been a part of all our Data pipelines created in past 2 years acting as the ring-master and taming our Machine Learning and ETL Pipelines," said Kaxil Naik, Data Engineer at Data Reply. "It has helped us create a Single View for our client's entire data ecosystem. Airflow's Data-aware scheduling and error-handling helped automate entire report generation process reliably without any human-intervention. It easily integrates with Google Cloud (and other major cloud providers) as well and allows non-technical personnel to use it without a steep learning curve because of Airflow’s configuration-as-a-code paradigm."

"With over 250 PB of data under management, PayPal relies on workflow schedulers such as Apache Airflow to manage its data movement needs reliably," said Sid Anand, Chief Data Engineer at PayPal. "Additionally, Airflow is used for a range of system orchestration needs across many of our distributed systems: needs include self-healing, autoscaling, and reliable [re-]provisioning."

"Since our offering of Apache Airflow as a service in Sept 2016, a lot of big and small enterprises have successfully shifted all of their workflow needs to Airflow," said Sumit Maheshwari, Engineering Manager at Qubole. "At Qubole, not only are we a provider, but also a big consumer of Airflow as well. For example, our whole Insight and Recommendations platform is built around Airflow only, where we process billions of events every month from hundreds of enterprises and generate insights for them on big data solutions like Apache Hadoop, Apache Spark, and Presto. We are very impressed by the simplicity of Airflow and ease at which it can be integrated with other solutions like clouds, monitoring systems or various data sources."

"At ING, we use Apache Airflow to orchestrate our core processes, transforming billions of records from across the globe each day," said Rob Keevil, Data Analytics Platform Lead at ING WB Advanced Analytics. "Its feature set, Open Source heritage and extensibility make it well suited to coordinate the wide variety of batch processes we operate, including ETL workflows, model training, integration scripting, data integrity testing, and alerting. We have played an active role in Airflow development from the onset, having submitted hundreds of pull requests to ensure that the community benefits from the Airflow improvements created at ING.  We are delighted to see Airflow graduate from the Apache Incubator, and look forward to see where this exciting project will be taken in future!"

"We saw immediately the value of Apache Airflow as an orchestrator when we started contributing and using it," said Jarek Potiuk, Principal Software Engineer at Polidea. "Being able to develop and maintain the whole workflow by engineers is usually a challenge when you have a huge configuration to maintain. Airflow allows your DevOps to have a lot of fun and still use the standard coding tools to evolve your infrastructure. This is 'infrastructure as a code' at its best."

"Workflow orchestration is essential to the (big) data era that we live in," added de Bruin. "The field is evolving quite fast and the new data thinking is just starting to make an impact. Apache Airflow is a child of the data era and therefore very well positioned, and is also young so a lot of development can still happen. Airflow can use bright minds from scientific computing, enterprises, and start-ups to further improve it. Join the community, it is easy to hop on!"

Availability and Oversight
Apache Airflow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Airflow, visit http://airflow.apache.org/ and https://twitter.com/ApacheAirflow

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Airflow", "Apache Airflow", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation