The Apache Software Foundation Blog

Thursday October 14, 2021

Apache Software Foundation moves to CDN distribution for software

It’s not enough to create and release useful software. As an open source foundation, a major part of the Apache Software Foundation’s (ASF) job is to help get that software into the hands of users.

To do so, we’ve relied for many years on the contributions of individuals and organizations to provide mirror infrastructure to distribute our software. We’re now retiring that system in favor of a content distribution network (CDN), and taking a moment to say thank you to all the individuals and organizations who helped get ASF software into the hands of millions of users.

The history and function of the ASF mirror system

Today if you want to download the source or binaries for an ASF project, you’ll probably have it copied over before you can refill your coffee. But when the Apache Group (precursor to the foundation) first got its start, bandwidth was a lot more limited.

This was true for users and true for the limited resources available to those who wanted to distribute the software. As demand grew it became more than a single site could handle.

To share the load, we began to use a “mirror” system. That is, copies of the artifacts distributed to mirror sites that were closer to the users who wanted the software. Instead of all requests being served by a central server, the mirrors could sync up with the main site and then serve a portion of the audience looking to download Apache software.

The first mirror sites became available in April, 1995. Among the first mirror providers was SunSite, 'a network of Internet servers providing archives of information, software and other publicly available resources.'

In April 1997, Brian Behlendorf invited 66 people already hosting mirrors to join the 'mirror@' Apache mailing list. In June of the same year users could automatically be directed to a local mirror by a CGI script that would select the right mirror based on their country code.

Henk P. Penning joined the mirrors mailing list in 2002, and went on to become a major contributor to the system (among other things at the foundation). A mirror in 2002 would need to allocate a whopping 10 GB of space to handle all the artifacts available for download. Penning contributed to the ASF infrastructure until his passing in 2019.

Penning was joined in improving the mirror system by Gavin McDonald, who helped check for “stale” mirrors with out-of-date copies and sent reminders to the admins to keep them up to date. Eventually the team implemented a checker to do this automatically.

This elides a great deal of work, history and dedication to providing open source software for the public good. Suffice to say, the history of the mirror system (which you can read more about, here) is the story of open source writ small: many individuals and organizations coming together to chop wood and carry water to lay infrastructure that many more will take for granted.

The present and future for distributing ASF software

Today, that 10GB has grown to more than 180GB for a mirror to carry all ASF software.

The industry has changed as well. Technology has advanced, bandwidth costs have dropped, and mirror systems are giving way to content delivery networks (CDNs).

After discussion and deliberation, the ASF’s Infrastructure team has decided to move our download system to a CDN with professional support and a service level appropriate to the foundation’s status in the technology world.

Our new delivery system is part of a global CDN with economies of scale and fast, reliable downloads around the world. We expect ASF users will see faster deployment of software, without any lag that one might usually see with a mirror system while local mirrors sync off the main instance.

ASF projects won’t see any difference in their workflow, just a faster delivery of open source artifacts to their users.

Once again, we’d like to thank all the contributors who’ve helped stand up mirrors over the past 20+ years. Without the mirror system to deliver our software, we would never have made it this far.

Thursday October 07, 2021

The Apache Software Foundation Announces Apache® OpenOffice® 4.1.11

Updates to security and availability of leading Open Source office document productivity suite

Wilmington, DE —7 October 2021— The Apache® Software Foundation (ASF), the world’s largest Open Source foundation, announced today Apache OpenOffice® 4.1.11, the popular Open Source office-document productivity suite.

Used by millions of organizations, institutions, and individuals around the world, Apache OpenOffice delivered 317M+ downloads* and provides more than $25M in value to users per day. Apache OpenOffice supports more than 40 languages, offers hundreds of ready-to-use extensions, and is the productivity suite of choice for governments seeking to meet mandates for using ISO/IEC standard Open Document Format (ODF) files.

"Users worldwide depend on OpenOffice to meet their office productivity needs," said Carl Marcum, Vice President of Apache OpenOffice. "We are proud to offer improved security and availability with our latest release. Businesses of all sizes across numerous industries, educational institutions, non-profits, digitally-inclusive communities, application developers, and countless others rely on Apache OpenOffice to efficiently create, manage, and deliver high-impact, integrated content."

Apache OpenOffice comprises six productivity applications: Writer (word processor), Calc (spreadsheet tool), Impress (presentation editor), Draw (vector graphics drawing editor), Math (mathematical formula editor), and Base (database management program). The OpenOffice suite ships for Windows, macOS, and Linux.

Apache OpenOffice v4.1.11
The 14th release under the auspices of the ASF, OpenOffice v4.1.11 reflects dozens of improvements, features, and bug fixes that include:

  • New Writer Fontworks gallery
  • Updated document types where hyperlink is allowed
  • Updated Windows Installer
  • Increased font size in Help


In addition, the project is mitigating 5 CVE (Common Vulnerabilities and Exposures) reports, three of which will be disclosed on 11 October, in coordination with The Document Foundation.

Apache OpenOffice delivers up to 2.4M downloads per month and is available as a free download to all users at 100% no cost, charge, or fees of any kind.

Apache OpenOffice is available on the Windows 11 Store as of 5 October 2021.

OpenOffice source code is available for anyone who wishes to enhance the applications. The Project welcomes contributions back to the project as well as its code community. Those interested in participating with Apache OpenOffice can learn more at https://openoffice.apache.org/get-involved.html .

* partial count: the number above reflects full-install downloads of Apache OpenOffice via SourceForge as of September 2021.

Tribute
Of special note, Apache OpenOffice 4.1.11 is dedicated to the memory of Dr. Patricia Shanahan, late member of the Apache OpenOffice Project Management Committee, former member of the ASF Board of Directors, former Vice President Apache River, and contributor to Apache Community Development. More information on Patricia can be found at the ASF's memorial page http://apache.org/memorials/patricia_shanahan.html . 

Availability and Oversight
Apache OpenOffice software is released under the Apache License v2.0 and is overseen by a volunteer, self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. The project strongly recommends that users download OpenOffice only from the official site https://www.openoffice.org/download/ to ensure that they receive the original software in the correct and most recent version.

About Apache OpenOffice
Apache OpenOffice is a leading Open Source office-document productivity suite comprising six productivity applications: Writer, Calc, Impress, Draw, Math, and Base. OpenOffice is based around the OpenDocument Format (ODF), supports 40+ languages, and ships for Windows, macOS, and Linux. OpenOffice originated as "StarOffice" in 1985 by StarDivision, who was acquired by Sun Microsystems in 1999. The project was open-sourced under the name "OpenOffice.org", and continued development after Oracle Corporation acquired Sun Microsystems in 2010. OpenOffice entered the Apache Incubator in 2011 and graduated as an Apache Top-level Project in October 2012. Apache OpenOffice delivers up to 2.4 Million downloads each month is the productivity suite of choice for hundreds of educational institutions and government organizations seeking to meet mandates for using ISO/IEC standard Open Document Format (ODF) files. For more information, including documentation and ways to become involved with Apache OpenOffice, visit https://openoffice.apache.org/ and https://twitter.com/ApacheOO .

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $22B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 850+ individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 8,200+ Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Huawei, IBM, Indeed, Microsoft, Namebase, Pineapple Fund, Red Hat, Replicated, Reprise Software, Talend, Target, Tencent Cloud, Union Investment, Workday, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF .

© The Apache Software Foundation. "Apache", "OpenOffice", "Apache OpenOffice", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

#  #  #

Tuesday September 21, 2021

Apache Ranger response to incorrect analyst report on Cloud data security

Introduction

A recent industry analyst report by GigaOm and sponsored by Immuta comparing Apache Ranger to Immuta paints an incorrect picture on the complexities of using Apache Ranger. We believe the report contains a number of errors and inconsistencies. Unfortunately the Apache Ranger Project Management Committee (PMC) was not contacted by the analyst firm during preparation of the report.


We have attempted to contact the authors and members of the research team several times, requesting the opportunity to review the inaccuracies and have them corrected. Despite our many attempts to rectify the misinformation, no-one from the analyst firm responded.


For the benefit of existing and potential users of Apache Ranger, it is important for Apache Ranger PMC to respond to this report with facts.


Use cases

Let us now go through the scenarios covered in the report, and see how the numbers reported change with appropriate use of Apache Ranger to address the requirements.


  • Scenario 1b: Mask All PII Data

    • lists 2 policy changes in Immuta vs 5 in Apache Ranger. In fact, only one Apache Ranger policy would be needed to address this requirement. 

    • Shows author's lack of understanding of Apache Ranger policy model. Series of steps to allow/deny/deny-exception listed are applicable only for an access policy but not for a masking policy. Also, in access policies, allow/deny/deny-exception can be replaced by a switch named denyAllElse, as shown in the image below.

    • With use of user-groups or roles, a time-tested best practice followed universally by access control systems, this requirement can be met by a single Apache Ranger policy, as shown below.
      Masking policy:

Access policy:


  • Scenario 1c: Allow Email Domains Through the Masking Policy

    • lists 2 policy changes in Immuta vs 5 in Apache Ranger. In fact, only one Apache Ranger masking policy would be needed to address this requirement. Same as the previous scenario.

    • Claim: Apache Ranger does not have a regular expression masking policy

    • Truth: instead of building a virtualization layer that can introduce significant complexities and performance penalties, Apache Ranger uses native capabilities of the data processing application to perform masking and filtering. Given regular expressions are supported by such applications, it will be simpler to create a custom expression to suit your needs like email address, account numbers, credit card numbers; importantly without having to drag security software vendor.


  • Scenario 1d: Add Two Users Access to All PII Data

    • lists 1 policy change in Immuta vs 4 in Apache Ranger. However, the following suggests that each user must be updated in Immuta UI to add necessary attributes. Wouldn't the number of steps be as large as the number of users?

      • Added the AuthorizedSensitiveData > All attribute to each user in the Immuta UI.

    • counts 4 policy changes in Apache Ranger policies, while the only change needed is to add users (2 or 200 users!) to a group or role. No policy changes are needed if time tested best practices are followed - by referencing groups or roles in policies instead of individual users.


  • Scenario 2a: Share Data With Managers

    • lists 1 policy change in Immuta vs 101 in Apache Ranger. With use of lookup tables, which is a common practice in enterprises, the requirement can be met with a single row-filter policy in Apache Ranger.

ss_store_sk in (select store_id from store_authorization where user_name=current_user())


  • Scenario 2b: Merging Groups

    • lists 0 policy change in Immuta vs 1 in Apache Ranger. This is the same as the previous scenario, where the author chose to not follow common practice of using lookup tables. With use of a lookup table, as detailed above, no policy changes will be needed in Apache Ranger.


  • Scenario 2c: Share Additional Data With Managers

    • lists 0 policy changes in Immuta vs 102 in Apache Ranger. Once again, with use of a lookup table, only 2 policies would be required in Apache Ranger:

table store:
s_store_sk in (select store_id from store_authorization where user_name=current_user())

table store_returns:
sr_store_sk in (select store_id from store_authorization where user_name=current_user())


  • Scenario 2d: Reorganize Managers Into Regions

    • lists 0 policy changes in Immuta vs 40 in Apache Ranger. Same as previous scenarios - with use of a lookup table, no policy changes will be needed in Apache Ranger.


  • Scenario 2e: Restrict Data Access to Specific Countries

    • lists 1 policy change in Immuta vs 71 in Apache Ranger. With use of a lookup table, only one row-filter policy is needed in Apache Ranger.


  • Scenario 2f: Grant New User Group Access to All Rows by Default

    • lists 0 policy change in Immuta vs 30 in Apache Ranger. With use of a lookup table, no additional policy would be needed in Apache Ranger.


  • Scenario 2g: Apply Policies to a Derived Data Mart

    • lists 0 policy changes in Immuta vs 140 in Apache Ranger for the addition of 15 tables. With Apache Ranger, new tables can either be added to existing policies, or new policies can be created. It will require 15 policy updates in Apache Ranger - not 140 as claimed by the author. Also, no details on the changes to be done in Immuta (other than ‘0 policy changes’) are provided.


  • Scenario 3a: "AND" logic policy

    • says "unable to meet requirement" in Apache Ranger - which is incorrect. The author does suggest a good approach to meet this requirement in Apache Ranger - by creating a role with users who are both the groups, and referencing this role in policies. However, the point about Apache Ranger not supporting policies based on a user belonging to multiple groups is correct. However, this can easily be addressed with a custom condition extension. If there is enough interest from the user community, an enhancement to support this condition out of the box would be considered.


  • Scenario 3b: Conditional Policies

    • says "unable to meet requirement" in Apache Ranger - which is incorrect. As mentioned earlier, Apache Ranger leverages expressions supported by underlying data processing engine for masking and row-filtering. The requirement can easily be met with following expression in the masking policy:

      CASE WHEN (extract(year FROM current_date()) - birth_year) > 16) THEN {col} ELSE NULL END


There is no need to create views as suggested in the report.


  • Scenario 3c: Minimization Policies

    • as mentioned in the report Apache Ranger doesn't support policies to limit the number of records accessed. If there is enough interest from the user community, this enhancement would be considered.


  • Scenario 3d: De-Identification Policies

    • Says “unable to meet requirement” in Apache Ranger - which is incorrect. While Apache Ranger doesn’t talk about k-anonymity directly, the requirements can be implemented using Apache Ranger data masking policies - by setting up appropriate masking expressions for columns.

      • for columns that require NULL value to be returned, setup a mask policy with type as MASK_NULL

      • for columns that require a constant value, setup a mask policy with type as CONSTANT and specify desired value - like “NONE”

      • for columns that require a ‘generalized’ value based on the existing value of the column, use custom expressions as shown below. This does require analyzing the table to arrive at generalized values:
        CASE WHEN {col} < 20 THEN 16
            WHEN {col} BETWEEN 20 AND 29 THEN 26
            WHEN {col} BETWEEN 30 AND 39 THEN 36
            WHEN {col} BETWEEN 40 AND 49 THEN 46
            WHEN {col} BETWEEN 50 AND 59 THEN 56
            WHEN {col} BETWEEN 60 AND 69 THEN 66
            WHEN {col} BETWEEN 70 AND 79 THEN 76
            WHEN {col} BETWEEN 80 AND 89 THEN 86
            WHEN {col} BETWEEN 90 AND 99 THEN 96
            ELSE 106
        END

 

What the report doesn't talk about?

It is important to take note of what the report doesn’t talk about. For example:


Extendability: Apache Ranger’s open policy model and plugin architecture enable extending access control to other applications, including custom applications within an enterprise.


Wider acceptance of Apache Ranger by major cloud vendors like AWS, Azure, GCP; and availability of support from seasoned industry experts who continue to contribute to Apache Ranger and extend its reach.


Performance: Apache Ranger policy-engine is highly optimized for performance, which results in only a very small overhead (mostly around 1 millisecond) to authorize accesses; and importantly, there are no overheads in the data access path.


Apache Ranger features like security zones that allow different sets of policies to be applied to data in landing, staging, temp, production zones. A security zone can consist of resources across applications, for example: S3 buckets/paths, Solr collections, Snowflake tables, Presto catalogs/schemas/tables, Trino catalogs/schemas/tables, Apache Kafka topics, Synapse database/schemas/tables.



Monday August 30, 2021

The Apache Drill Project Announces Apache® Drill(TM) v1.19 Milestone Release

Open Source, enterprise-grade, schema-free Big Data SQL query engine used by thousands of organizations, including Ant Group, Cisco, Ericsson, Intuit, MicroStrategy, Tableau, TIBCO, TransUnion, Twitter, and more.

Wilmington, DE —30 August 2021— The Apache Drill Project announced the release of Apache® DrillTM v1.19, the schema-free Big Data SQL query engine for Apache Hadoop®, NoSQL, and Cloud storage.

"Drill 1.19 is our biggest release ever," said Charles Givre, Vice President of Apache Drill. "With an already short learning curve, Drill 1.19 makes it even easier for users to quickly query, analyze, and visualize data from disparate sources and complex data sets.”

An "SQL-on-Hadoop" engine, Apache Drill is easy to deploy, highly performant, able to quickly process trillions of records, and scalable from a single laptop to a 1000-node cluster. With its schema-free JSON model (the first distributed SQL query engine of its kind), Drill is able to query complex semi-structured data in situ without requiring users to define schemas or transform data. It provides plug-and-play integration with existing Hive and HBase deployments, and is extensible out-of-the-box to access multiple data sources, such as S3 and Apache HDFS, HBase, and Hive. Additionally, Drill can directly query data from REST APIs to include platforms like SalesForce and ServiceNow. 

Drill supports the ANSI SQL 2003 standard syntax ecosystem as well as dozens of NoSQL databases and file systems, including Apache HBase, MongoDB, Elasticsearch, Cassandra, REST APIs, , HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, NAS,  local files, and more. Drill leverages familiar BI tools (such as Apache Superset, Tableau, MicroStrategy, QlikView and Excel) as well as data virtualization and visualization tools, and runs interactive queries on Hive tables with different Hive metastores.

Apache Drill v1.19
Drill is designed from the ground up to support high-performance analysis on rapidly evolving data on modern Big Data applications. v1.19 reflects more than 100 changes, improvements, and new features that include:

  • New Connectors for Apache Cassandra, Elasticsearch, and Splunk.

  • New Format Reader for XML without schemas

  • Added Avro support for Kafka plugin

  • Integrated password vault for secure credential storage

  • Support for Linux ARM64 systems

  • Added limit pushdowns for file systems, HTTP REST APIs and MongoDB

  • Added streaming for Drill's REST API

  • Integration with Apache Airflow


Developers, analysts, business users, and data scientists use Apache Drill for data exploration and analysis for its enterprise-grade reliability, security, and performance. Drill's flexibility and ease-of-use have attracted thousands of users that include Ant Group, Cardlytics, Cisco, Ericsson, Intuit, MicroStrategy, Qlik, Tableau, TIBCO, TransUnion, Twitter, National University of Singapore, and more.

"Individuals, businesses, and organizations of all types rely on Apache Drill's rich functionality," added Givre. "We invite everyone to participate in our user and developer lists as well as our Slack channel, and contribute to the project to build on our momentum and help improve the future experience for all Drill users."

Catch Apache Drill in action at ApacheCon@Home, taking place online 21-23 September 2021. For more information and to register, visit https://www.apachecon.com/ .

Availability and Oversight
Apache Drill software is released under the Apache License v2.0 and is overseen by a volunteer, self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases.

About Apache Drill
Apache Drill is the Open Source, schema-free Big Data SQL query engine for Apache Hadoop, NoSQL, and Cloud storage. For more information, including documentation and ways to become involved with Apache Drill, visit http://drill.apache.org/ , https://twitter.com/ApacheDrill , and https://apache-drill.slack.com/ .

© The Apache Software Foundation. "Apache", "Drill", "Apache Drill", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

#  #  #

Monday August 02, 2021

The Apache Software Foundation Announces Apache® Pinot™ as a Top-Level Project

Open Source distributed real-time Big Data analytics infrastructure in use at Amazon-Eero, Doordash, Factual/FourSquare, LinkedIn, Stripe, Uber, Walmart, Weibo, and WePay, among others.

Wilmington, DE —2 August 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Pinot™ as a Top-Level Project (TLP).

Apache Pinot is a distributed Big Data analytics infrastructure created to deliver scalable real-time analytics at high throughput with low latency. The project was first created at LinkedIn in 2013, open-sourced in 2015, and entered the Apache Incubator in October 2018.

"We are pleased to successfully adopt 'the Apache Way' and graduate from the Apache Incubator," said Kishore Gopalakrishna, Vice President and original co-creator of Apache Pinot. "Pinot initially pushed the boundaries of real-time analytics by delivering insights to millions of Linkedin users. Today, as an Apache Top-Level Project, Pinot is in the hands of developers across the globe who are building it to power several user-facing  analytical applications and unlock the value of data within their organizations."

Scalable to trillions of records, Apache Pinot’s online analytical processing (OLAP) ingests both online and offline data sources from Apache Kafka, Apache Spark, Apache Hadoop HDFS, flat files, and Cloud storages in real time. Pinot is able to ingest millions of events and serve thousands of queries per second, and provide unified analytics in a distributed, fault-tolerant fashion. Features include:

  • Speed —answers OLAP queries with low latency on real-time data

  • Pluggable indexing —Sorted, Inverted, Text Index, Geospatial Index, JSON Index, Range Index, Bloom filters

  • Smart Materialized Views - Fast Aggregations via star-tree index

  • Supports different stream systems with near real-time ingestion —with Apache Kafka, Confluent Kafka, and Amazon Kinesis, as well as customizable input format, with out-of the box support for Avro and JSON formats

  • Highly available, horizontally scalable, and fault tolerant

  • Supports lookup joins natively and full joins using PrestoDB/Trino

Apache Pinot is used to power internal and external analytics at Adbeat, Amazon-Eero, Cloud Kitchens, Confluera, Doordash, Factual/FourSquare, Guitar Center, LinkedIn, Publicis Sapient, Razorpay, Scale Unlimited, Startree, Stripe, Traceable, Uber, Walmart, Weibo, WePay, and more.

Examples of how Apache Pinot helps organizations across numerous verticals include: 1) a fintech company uses Pinot to achieve financial data visibility across 500+ terabytes of data and sustain half million queries per second with financial transactions; 2) a food delivery service leveraged Pinot in the midst of the COVID-19 pandemic to analyze real-time data to provide a socially-distanced pick-up experience for its riders and restaurants; and 3) a large retail chain with geographically distributed franchises and stores uses Pinot for revenue-generating opportunities by analyzing real-time data for internal use cases, as well as real-time cart analysis to increase sales.

"We rely on Apache Pinot for all our real-time analytics needs at LinkedIn," said Kapil Surlaker, Vice President of Engineering at LinkedIn. "It's battle-tested at LinkedIn scale for hundreds of our low-latency analytics applications. We believe Apache Pinot is the best tool out there to build site-facing analytics applications and we will continue to contribute heavily and collaborate with the Apache Pinot community. We are very happy to see that it's now a Top-level Apache project."

"We use Apache Pinot in our real-time analytics platform to power external user-facing applications and critical operational dashboards," said Ujwala Tulshigiri, Engineering Manager at Uber. "With Pinot's multi-tenancy support and horizontal scalability, we have scaled to hundreds of use cases that run complex aggregations queries on terabytes of data at millisecond latencies, with the minimal overhead of cluster management."

"We've been using Apache Pinot since last year, and it's been a huge win for our client’s dashboard project," said Ken Krugler, President of Scale Unlimited. "Pinot's ability to rapidly generate aggregation results over billions of records, with modest hardware requirements, was critical for the success of the project. We've also been able to provide patches to add functionality and fix issues, which the Pinot community has quickly integrated and released. There was never any doubt in our minds that Pinot would graduate from the Apache incubator and become a successful top-level project."

"Last year, we started without analytics built into our product," said Pradeep Gopanapalli, technical staff member at Confluera. "By the end of the year, we were using Apache Pinot for real-time analytics in production. Not many of our competitors can even dream of having such results. We are very happy with our choice."

"Pinot is critical to our real-time analytics platform and allowed us to scale without degrading latency," said software engineer Elon Azoulay. "Pinot enables us to onboard large datasets effortlessly, run complex queries which return in milliseconds and is super reliable. We would like to emphasize how helpful and engaged the community is and are certain that we made the right choice with Pinot, it continues to impress us and satisfy our real-time analytics needs."

"We created Pinot at LinkedIn with the goal of tackling the low-latency OLAP problem for site-facing use cases at scale. We evolved it to solve numerous OLAP use cases, and open-sourced it because there aren't many technologies in that domain," said Subbu Subramaniam, member of the Apache Pinot Project Management Committee, and Senior Staff Engineer at LinkedIn. "It is heart-warming to see such a wide adoption and great contributions from the community in improving Pinot over time."

"We are at the beginning of this transformation and we cannot wait to see every software company build real-time applications using Apache Pinot," added Gopalakrishna. "We welcome everyone to join our community Slack channel and contribute to the project."

Catch Apache Pinot in action at ApacheCon Asia online on 7 August 2021. For more information and to register, visit https://www.apachecon.com/acasia2021/

Availability and Oversight
Apache Pinot software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Pinot, visit http://pinot.apache.org/ and https://twitter.com/ApachePinot

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $22B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 850+ individual Members and 200 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 8,200+ Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors that include Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Huawei, IBM, Indeed, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Talend, Tencent, Target, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Pinot", "Apache Pinot", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday July 27, 2021

The Apache Cassandra Project Releases Apache® Cassandra™ v4.0, the Fastest, Most Scalable and Secure Cassandra Yet

Open Source enterprise-grade Big Data distributed database powers mission-critical deployments with improved performance and unparalleled levels of scale in the Cloud

Wilmington, DE —27 July 2021— The Apache Cassandra Project released today v4.0 of Apache® Cassandra™, the Open Source, highly performant, distributed Big Data database management platform.

"A long time coming, Cassandra 4.0 is the most thoroughly tested Cassandra yet," said Nate McCall, Vice President of Apache Cassandra. "The latest version is faster, more scalable, and bolstered with enterprise security features, ready-for-production with unprecedented scale in the Cloud."

As a NoSQL database, Apache Cassandra handles massive amounts of data across load-intensive applications with high availability and no single point of failure. Cassandra’s largest production deployments include Apple (more than 160,000 instances storing over 100 petabytes of data across 1,000+ clusters), Huawei (more than 30,000 instances across 300+ clusters), and Netflix (more than 10,000 instances storing 6 petabytes across 100+ clusters, with over 1 trillion requests per day), among many others. Cassandra originated at Facebook in 2008, entered the Apache Incubator in January 2009, and graduated as an Apache Top-Level Project in February 2010.

Apache Cassandra v4.0
Cassandra v4.0 effortlessly handles unstructured data, with thousands of writes per second. Three years in the making, v4.0 reflects more than 1,000 bug fixes, improvements, and new features that include:

  • Increased speed and scalability – streams data up to 5 times faster during scaling operations, and up to 25% faster throughput on reads and writes, that delivers a more elastic architecture, particularly in Cloud and Kubernetes deployments.

  • Improved consistency – keeps data replicas in sync to optimize incremental repair for faster, more efficient operation and consistency across data replicas.

  • Enhanced security and observability – audit logging tracks users access and activity with minimal impact to workload performance. New capture and replay enables analysis of production workloads to help ensure regulatory and security compliance with SOX, PCI, GDPR, or other requirements.

  • New configuration settings – exposed system metrics and configuration settings provides flexibility for operators to ensure they have easy access to data that optimize deployments.

  • Minimized latency – garbage collector pause times are reduced to a few milliseconds with no latency degradation as heap sizes increase.

  • Better compression – improved compression efficiency eases unnecessary strain on disk space and improves read performance.


Cassandra 4.0 is community-hardened and tested by Amazon, Apple, DataStax, Instaclustr, iland, Netflix, and others that routinely run clusters as large as 1,000 nodes and with hundreds of real-world use cases and schemas. 

The Apache Cassandra community deployed several testing and quality assurance (QA) projects and methodologies to deploy the most stable release yet. During the testing and QA period, the community generated reproducible workloads that are as close to real-life as possible, while effectively verifying the cluster state against the model without pausing the workload itself.

"In our experience, nothing beats Apache Cassandra for write scaling, and we're looking forward to the performance and management improvements in the 4.0 release," said Elliott Sims, Senior Systems Administrator at Backblaze. "We rely on Cassandra to manage over one exabyte of customer data and serve over 50 billion files for our customers across 175 countries so optimizing Cassandra's capabilities and performance means a lot to us."

"Since 2016, software engineers at Bloomberg have turned to Apache Cassandra because it’s easy to use, easy to scale, and always available," said Isaac Reath, Software Engineering Team Lead, NoSQL Infrastructure at Bloomberg. "Today, Cassandra is used to support a variety of our applications, from low-latency storage of intraday financial market data to high-throughput storage for fixed income index publication. We serve up more than 20 billion requests per day on a nearly 1 PB dataset across a fleet of 1,700+ Cassandra nodes."

"Netflix uses Apache Cassandra heavily to satisfy its ever-growing persistence needs on its mission to entertain the world. We have been experimenting and partially using the 4.0 beta in our environments and its features like Audit Logging and backpressure," said Vinay Chella, Netflix Engineering Manager and Apache Cassandra Committer. "Apache Cassandra 4.0's improved performance helps us reduce infrastructure costs. 4.0's stability and correctness allow us to focus on building higher-level abstractions on top of data store compositions, which results in increased developer velocity and optimized data store access patterns. Apache Cassandra 4.0 is faster, secure, and enterprise-ready; I highly suggest giving it a try in your environments today."

"Apache Cassandra's contributors have worked hard to deliver Cassandra 4.0 as the project's most stable release yet, ready for deployment to production-critical Cloud services," said Scott Andreas, Apache Cassandra Contributor. "Cassandra 4.0 also brings new features, such as faster host replacements, active data integrity assertions, incremental repair, and better compression. The project's investment in advanced validation tooling means that Cassandra users can expect a smooth upgrade. Once released, Cassandra 4.0 will also provide a stable foundation for development of future features and the database's long-term evolution."

Apache Cassandra is in use at Activision, Apple, Backblaze, BazaarVoice, Best Buy, Bloomberg Engineering, CERN, Constant Contact, Comcast, DoorDash, eBay, Fidelity, GitHub, Hulu, ING, Instagram, Intuit, Macy's, Macquarie Bank, Microsoft, McDonalds, Netflix, New York Times, Monzo, Outbrain, Pearson Education, Sky, Spotify, Target, Uber, Walmart, Yelp, and thousands of other companies that have large, active data sets. In fact, Cassandra is used by 40% of the Fortune 100. Select Apache Cassandra case studies are available at https://cassandra.apache.org/case-studies/ 

In addition to Cassandra 4.0, the Project also announced a shift to a yearly release cycle, with releases to be supported for a three-year term.

Catch Apache Cassandra in action through presentations from the April 2021 Cassandra World Party https://s.apache.org/jjv2d .

Availability and Oversight
Apache Cassandra software is released under the Apache License v2.0 and is overseen by a volunteer, self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Cassandra, visit https://cassandra.apache.org/ and https://twitter.com/cassandra .

About Apache Cassandra
Apache Cassandra is an Open Source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Apache Cassandra is used in some of the largest data management deployments in the world, including nearly half of the Fortune 100.

© The Apache Software Foundation. "Apache", "Cassandra", "Apache Cassandra", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

#  #  #

Tuesday May 04, 2021

Media Alert: Apache OpenOffice Recommends upgrade to v4.1.10 to mitigate legacy vulnerability

Wilmington, DE —4 May 2021— 


Who:
Apache OpenOffice, an Open Source office-document productivity suite comprising six productivity applications: Writer, Calc, Impress, Draw, Math, and Base. The OpenOffice suite is based around the OpenDocument Format (ODF), supports 41 languages, and ships for Windows, macOS, Linux 64-bit, and Linux 32-bit. Apache OpenOffice delivers up to 2.4 Million downloads each month.

What: A recently reported vulnerability states that all versions of OpenOffice through 4.1.9 can open non-http(s) hyperlinks, and could lead to untrusted code execution. 

The Apache OpenOffice Project has filed a Common Vulnerabilities and Exposures report with MITRE Corporation’s national vulnerability reporting system:

> CVE-2021-30245: Code execution in Apache OpenOffice via non-http(s) schemes in Hyperlinks
>
> Severity: moderate
>
>Credit: Fabian Bräunlein and Lukas Euler of Positive Security https://positive.security/blog/url-open-rce#open-libreoffice


The complete CVE report is available at https://www.openoffice.org/security/cves/CVE-2021-30245.html

How: Applications of the OpenOffice suite handle non-http(s) hyperlinks in an insecure way, allowing for 1-click code execution on Windows and Xubuntu systems via malicious executable files hosted on Internet-accessible file shares.

Why: The mitigation in Apache OpenOffice 4.1.10 assures that a security warning is displayed to give users the option of continuing to open the hyperlink. Best practice dictates to be careful when opening documents from unknown and unverified sources. 

When: The vulnerability predates OpenOffice entering the Apache Incubator. During the analysis of this issue, it was discovered that an incorrect bug fix was made by the StarOffice/OpenOffice.org developers preparing OpenOffice 2.0 in 2005, whilst under the auspices of Sun Microsystems. 


Where: Download Apache OpenOffice v4.1.10 at https://www.openoffice.org/download/

Apache OpenOffice Highlights

24 October 2020 — 300 million downloads of Apache OpenOffice
14 October 2020 — 20 year anniversary of OpenOffice
18 October 2016 — 200 million downloads of Apache OpenOffice
17 April 2014 — 100 million downloads of Apache OpenOffice
17 October 2012 — OpenOffice graduated as an Apache Top Level Project (TLP)
13 June 2011 — OpenOffice.org entered the Apache Incubator

[downloads are binary installation files]

For more information, visit https://openoffice.apache.org/ and https://twitter.com/ApacheOO

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 850+ individual Members and 200 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with more than 8,100 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "OpenOffice", "Apache OpenOffice", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Sunday April 11, 2021

The Apache Software Foundation Welcomes 40 New Members

The Apache Software Foundation (ASF) welcomes the following new Members who were elected during the annual ASF Members' Meeting on 9 and 11 March 2021:

Maxime Beauchemin, Bolke de Bruin, Wei-Chiu Chuang, Jiangjie (Becket), Pablo Estrada, Dave Grove, Madhawa Kasun Gunasekara, Nathan Hartman, Tilman Hausherr, Georg Henzler, Xiangdong Huang, Nikita Ivanov, Yu Li, Geoff Macartney, Denis A. Magda, Carl Marcum, Matteo Merli, Aaron Morton, Aizhamal Nurmamat kyzy, Enrico Olivelli, Jaikiran Pai, Juan Pan, Pranay Pandey, Arun Patidar, Jarek Potiuk, Rodric Rabbah, Katia Rojas, Maruan Sahyoun, Aditya Sharma, Atri Sharma, Ankit Singhal, Michael Adam Sokolov, Simon Steiner, Benoit Tellier, Josh Thompson, Abhishek Tiwari, Sven Vogel, William Guo Wei, Ming Wen, Andrew Wetmore, and Liang Zhang.

The ASF incorporated in 1999 with a core membership of 21 individuals who oversaw the progress of the Apache HTTP Server. This group grew with Committers —developers who contributed code, patches, documentation, and other contributions, and were subsequently granted access by the Membership:

  •  to "commit" or "write" directly to Apache code repositories as well as make non-code contributions;
  •  the right to vote on community-related decisions; and
  •  the ability to propose an active contributor for Committership.

Those Committers who demonstrate merit in the Foundation's growth, evolution, and progress are nominated for ASF Membership by existing Members.

This election brings the total number of ASF Members to 853 today. Individuals elected as ASF Members legally serve as the "shareholders" of the Foundation https://www.apache.org/foundation/governance/members.html

For more information on how the ASF works, visit http://www.apache.org/foundation/how-it-works.html 

Apache Is Open https://blogs.apache.org/foundation/entry/apache-is-open and 

Briefing: The Apache Way http://apache.org/theapacheway/

# # #

Thursday March 11, 2021

Announcing New ASF Board of Directors

At The Apache Software Foundation (ASF) Annual Members' Meeting held this week, the following individuals were elected to the ASF Board of Directors:

  • Bertrand Delacretaz (current Director)
  • Roy Fielding (current Director)
  • Sharan Foga (new Director)
  • Justin Mclean (current Director)
  • Craig Russell (current Director)
  • Sam Ruby (current Director)
  • Roman Shaposhnik (former Director)
  • Sander Striker (current Director)
  • Sheng Wu (new Director)


The ASF thanks Shane Curcuru, Patricia Shanahan, and Niclas Hedhman (who resigned from the Board prior to the Members’ Meeting) for their service, and welcomes our new and returning directors.

An overview of the ASF's governance, along with the complete list of ASF Board of Directors, Executive Officers, and Project/Committee Vice Presidents, can be found at http://apache.org/foundation/

For more information on the Foundation's operations and structure, see http://apache.org/foundation/how-it-works.html#structure

# # #

Tuesday February 23, 2021

The Apache® Software Foundation Sustains its Mission of Providing Software for the Public Good through Corporate Sponsorships and Charitable Giving

World's largest Open Source foundation provides more than $22B worth of community-led software at 100% no charge to users worldwide.

Wilmington, DE —23 February 2021— The Apache® Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Corporate Sponsorship and Charitable Giving has enabled the Foundation to sustain its mission of providing software for the public good.

The ASF is the world's largest Open Source foundation. Apache software projects are integral to nearly every end-user computing device, benefit billions of users worldwide, with Web requests received from every Internet-connected country on the planet. Valued conservatively at more than $22B, Apache Open Source software is available to the public-at-large at 100% no cost. No payment of any kind is ever required to use, contribute to, or otherwise participate in Apache projects. The ASF depends on tax-deductible Sponsorships and donations to offset its operations expenses that include infrastructure, marketing and publicity, accounting, and legal services.

"We are proud of our Sponsors, whose generous support helps our volunteer community continue to develop essential software that keeps the world running," said Daniel Ruggeri, ASF Vice President of Fundraising. "ASF Sponsorship allows us to make great strides towards developing and improving our projects, enriching our communities, educating and mentoring newcomers, and encouraging and facilitating participation by under-represented groups. Fiscal support today secures the groundwork to ensure future Apache benefits can be shared by all."

ASF Sponsors include:

Platinum —Amazon Web Services, Facebook, Google, Huawei, Microsoft, Namebase, Pineapple Fund, Tencent, and Verizon Media.

Gold —Anonymous, Baidu, Bloomberg, Cloudera, Confluent, IBM, Indeed, Reprise Software, Union Investment, and Workday.

Silver —Aetna, Alibaba Cloud Computing, Capital One, Comcast, Didi Chuxing, Red Hat, and Target.

Bronze —Bestecasinobonussen.nl, Bookmakers, Casino2k, Cerner, Curity, Gundry MD, GridGain, Host Advice, HotWax Systems, LeoVegas Indian Online Casino, Miro-Kredit AG, Mutuo Kredit AG, Online Holland Casino, ProPrivacy, PureVPN, RX-M, RenaissanceRe, SCAMS.info, SevenJackpots.com, Start a Blog by Ryan Robinson, Talend, The Best VPN, The Blog Starter, The Economic Secretariat, Top10VPN, and Twitter.

In addition to ASF Sponsors, Targeted Sponsors provide in-kind support for select Foundation operations and initiatives that benefit Apache Projects and their communities. They include:

Platinum —Amazon Web Services, CloudBees, DLA Piper, JetBrains, Leaseweb, Microsoft, OSU Open Source Labs, Sonatype, and Verizon Media.

Gold —Atlassian, Datadog, Docker, PhoenixNAP, and Quenda.

Silver —HotWax Systems, Manning Publications, and Rackspace.

Bronze —Bintray, Education Networks of America, Friend of Apache Cordova, Hopsie, Google, No-IP, PagerDuty, Peregrine Computer Consultants Corporation, Sonic.net, SURFnet, and Virtru.

"We deeply appreciate the ongoing support over the course of this unprecedentedly challenging year," said Sally Khudairi, ASF Vice President of Sponsor Relations. "Widespread awareness of the value of The Apache Software Foundation has led organizations and individuals to reach deep and help ensure our day-to-day operations continue without interruption. We are grateful and humbled by the support."

Corporate Contributions
In addition to Sponsorship, a variety of Corporate Giving programs benefit the ASF. They include:

Annual Corporate Giving —organizations such as Bloomberg, IBM, Microsoft, PayPal, Vanguard, and many others offer tax benefits and provide their employees the ability to boost their support of a diverse set of nonprofit organizations that include the ASF.

Matching Gifts and Volunteer Grants —donations to the ASF can be doubled or tripled through a corporate matching gift program. Employers such as American Express, AOL, Bloomberg, IBM, and Microsoft match contributions and volunteer hours made by their employees.

Charitable Gifts and Payroll Giving —as an official charity in Benevity https://www.benevity.com/ , the Blackbaud Giving Fund https://blackbaudgivingfund.org/ , and other philanthropic giving distributors, the ASF benefits from numerous corporate giving initiatives, such as the Microsoft Tech Talent for Good volunteer program and Charles Schwab Charitable, among others.

Individual Donations
Individuals and organizations wishing to support Apache with one-time and recurring tax-deductible donations using a credit or debit card, PayPal, ACH electronic bank transfer, or Apple/Google/Microsoft Pay on their mobile device are invited to do so at https://donate.apache.org/ . Supporting Apache through an online purchase from Amazon, using cryptocurrency, mailing in a check, and other methods are also possible.

For more information, including ways to support the ASF, visit http://apache.org/foundation/contributing.html

Learn about the ASF's commitment to providing software for the public good in "Apache Everywhere" https://s.apache.org/ApacheEverywhere

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $22B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,100 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Confluent, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Namebase, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF .


© The Apache Software Foundation. "Apache", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday February 16, 2021

The Apache Software Foundation Announces Apache® Gobblin™ as a Top-Level Project

Open Source distributed Big Data integration framework in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Labs, Swisscom, Verizon, and more.

Wilmington, DE —16 February 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Gobblin™ as a Top-Level Project (TLP).

Apache Gobblin is a distributed Big Data integration framework used in both streaming and batch data ecosystems. The project originated at LinkedIn in 2014, was open-sourced in 2015, and entered the Apache Incubator in February 2017.

"We are excited that Gobblin has completed the incubation process and is now an Apache Top-Level Project," said Abhishek Tiwari, Vice President of Apache Gobblin and software engineering manager at LinkedIn. "Since entering the Apache Incubator, we have completed four releases and grown our community the Apache Way to more than 75 contributors from around the world."

Apache Gobblin is used to integrate hundreds of terabytes and thousands of datasets per day by simplifying the ingestion, replication, organization, and lifecycle management processes across numerous execution environments, data velocities, scale, connectors, and more.

"Originally creating this project, seeing it come to life and solve mission-critical problems at many companies has been a very gratifying experience for me and the entire Gobblin team," said Shirshanka Das, Founder and CTO at Acryl Data, and member of the Apache Gobblin Project Management Committee.

As a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems, Apache Gobblin makes the arduous task of creating and maintaining a modern data lake easy. It supports the three main capabilities required by every data team: 

  • Ingestion and export of data from a variety of sources and sinks into and out of the data lake while supporting simple transformations. 
  • Data Organization within the lake (e.g. compaction, partitioning, deduplication).
  • Lifecycle and Compliance Management of data within the lake (e.g. data retention, fine-grain data deletions) driven by metadata.

"Apache Gobblin supports deployment models all the way from a single-process standalone application to thousands of containers running in cloud-native environments, ensuring that your data plane can scale with your company’s growth," added Das.

Apache Gobblin is in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Laboratories, Swisscom, and Verizon, among many others.

"We chose Apache Gobblin as our primary data ingestion tool at Prezi because it proved to scale, and it is a swiss army knife of data ingestion," said Tamas Nemeth, Tech Lead and Manager at Prezi. "Today, we ingest, deduplicate, and compact more than 1200 Apache Kafka topics with its help, and this number is still growing. We are looking forward to continuing to contribute to the project and helping the community enable other companies to use Apache Gobblin."

"Apache Gobblin has been at the center stage of the data management story at LinkedIn. We leverage it for various use-cases ranging from ingestion, replication, compaction, retention, and more," said Kapil Surlaker, Vice President of Engineering at LinkedIn. "It is battle-tested and serves us well at exabyte scale. We firmly believe in the data wrangling capabilities that Gobblin has to offer, and we will continue to contribute heavily and collaborate with the Apache Gobblin community. We are happy to see that Gobblin has established itself as an industry standard and is now an Apache Top-Level Project."

"Open community and meritocracy are the key drivers for Apache Gobblin's success," added Tiwari. "We invite everyone interested in the data management space to join us and help shape the future of Gobblin."

Catch Apache Gobblin in action in the upcoming hackathon planned for late Q1 2021. Details will be posted on the Apache Gobblin mailing lists and Twitter feed listed below.

Availability and Oversight
Apache Gobblin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Gobblin, visit https://gobblin.apache.org/ and https://twitter.com/ApacheGobblin 

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 

© The Apache Software Foundation. "Apache", "Gobblin", "Apache Gobblin", "Hadoop", "Apache Hadoop", "MapReduce", "Apache MapReduce", "Mesos", "Apache Mesos", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday February 03, 2021

The Apache Software Foundation Announces Apache® DataSketches™ as a Top-Level Project

Open Source high-performance Big Data streaming algorithm library in use at Nielsen Identity, Permutive, Splice Machine, and Verizon Media, among others.

Wilmington, DE —3 February 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® DataSketches™ as a Top-Level Project (TLP).

Apache DataSketches is a highly performant Big Data analysis library for scalable approximate algorithms. The project originated at Yahoo in 2012, was open-sourced in 2015, and entered the Apache Incubator in March 2019.

"We are excited to be part of the ASF," said Lee Rhodes, Vice President of Apache DataSketches. "We have learned a great deal from the incubation process and look forward to working with new users of our library that want to take advantage of sketching technology."

Apache DataSketches’s library of specialized streaming algorithms —known as sketches— comprise small data structures that process data at massive scale. Sketches are ideal for queries that cannot afford the time or huge compute resources needed to generate exact results. Where approximate results are acceptable, sketches are the only viable alternative for interactive queries with real-time analysis. Apache DataSketches is:

  • Fast —produces approximate results at orders of magnitude faster than traditional methods -- user configurable size vs accuracy tradeoff;
  • Efficient —sketch algorithms process data in a single pass for both real-time and batch;
  • Mergeable —allows for parallelization;
  • Optimized for large-scale computing environments that process Big Data —such as Apache Hadoop, Apache Spark, Apache Druid, Apache Hive, Apache Pig, PostgreSQL;
  • Binary compatible across multiple languages and platforms —available in Java, C++, and Python;
  • Expanded Analysis —including count distinct with set operations, quantiles, most frequent items (heavy hitters), matrix computations, and more; and
  • Mathematically defined and proven error properties —provides a priori and a posteriori error estimation and upper and lower bounds with statistically derived confidence intervals.

Apache DataSketches is used in large-scale computing environments such as Nielsen Identity, Permutive, Splice Machine, and Verizon Media, among others, as well as Apache Druid and Apache Pinot (incubating).

"The Apache DataSketches project takes powerful algorithms for data summarization and analysis, and makes them available to everyone," said Professor Graham Cormode of the University of Warwick. "While these methods are tremendously useful in practice, their descriptions were previously only in highly technical scientific papers. This project has made robust, dependable and well-documented implementations available to all. Already the library has been used for a wide range of applications, including service quality, monitoring, ad analytics and the sciences."

"Using Apache DataSketches has enabled Apache Druid users to perform common tasks such as quantiles and unique counting in a highly performant and efficient manner," said Gian Merlino, Vice President of Apache Druid. "We have worked closely together over the years to make the power of DataSketches accessible to Apache Druid users, helping us provide real-time analytics at scale."

"Sketches are fundamental to calculating many of our key company metrics," said Tom Miller, Director of Software Development Engineering at Verizon Media. "It allows us to greatly simplify our data processing and reduce storage costs by allowing us to calculate non-additive metrics across user specified dimension combinations at report time instead of having to either retain raw data or pre-calculate for each set of dimensions."

"Combining Apache Druid and DataSketches allows us to provide our customers real-time insights into their target audiences and advertising campaigns," said Yakir Buskilla, Senior Vice President of Research and Development and General Manager Israel at Nielsen Identity. "The ability to evaluate set expressions make the Theta Sketch especially powerful for multi-set cardinality estimation as well as funnel analysis."

“Apache DataSketches has provided us with a solid theoretical foundation upon which we are able to store and process data at scale - in a simple, fast and cost-efficient manner," said David Cromberge, Senior Software Engineer at Permutive. "It has been a pleasure to engage with their creators and community who have been helpful at every step of the way.”

"We use DataSketches's Theta-Sketches for distinct-count aggregations that are used to solve large multi-set cardinality approximation," said Mayank Shrivastava, Committer and member of the Apache Pinot (incubating) Podling Project Management Committee. "The ability to evaluate set expressions make the Theta Sketch especially powerful for multi-set cardinality estimation as well as funnel analysis."

"We welcome those interested in streaming algorithms to visit us, learn about this exciting technology, and contribute to Apache DataSketches to make our project even better," added Rhodes.

Availability and Oversight
Apache DataSketches software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache DataSketches, visit https://datasketches.apache.org .

About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ .

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF .

© The Apache Software Foundation. "Apache", "DataSketches", "Apache DataSketches", "Druid", "Apache Druid", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "Pig", "Apache Pig", "Pinot (incubating)", "Apache Pinot (incubating)", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday January 26, 2021

The Apache Software Foundation Announces Apache® ECharts™ as a Top-Level Project

Adaptable, interactive, responsive Open Source charting and data visualization software in use at Alibaba, Amazon, Baidu, GitLab, Intel, and Tencent, among others.


Wilmington, DE —26 January 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® ECharts™ as a Top-Level Project (TLP).

Apache ECharts is an intuitive, interactive, and powerful charting and visualization library ideally suited for commercial-grade presentations. The project originated in 2013 at Baidu and entered the Apache Incubator in January 2018.

"Our decision to incubate ECharts at The Apache Software Foundation was a wise one," said Ovilia Zhang, Vice President of Apache ECharts. "Through the Apache Way, our community is healthier and more diverse, which has improved ECharts to become a more attractive, competitive choice for visualization professionals and enthusiasts."

Written in JavaScript and based on the ZRender rendering engine supporting both Canvas and SVG, Apache ECharts provides an array of dynamic, highly-customizable chart types that include line, column, scatter, pie, radar, candlestick, gauge, funnel, heatmap, and more. Features include:

  • Customized and amalgamated chart styles with more than 20 chart types

  • Multi-dimensional data analysis and coding

  • Interactive components available out-of-the-box

  • Cross-device responsiveness

  • Optimized dynamic scaling

  • Server side rendering

  • Immediate UI response on millions of streaming data through progressive rendering

  • Extensions for:

    • 3-D visualization and other rich special effects

    • Python, R, Julia, and other languages

    • Platforms that include Wechat App and Baidu Smart Program


Examples of ECharts' many data visualization options are available at https://echarts.apache.org/examples/ 

The project has recently released ECharts 5, which provides rendering ability for tens of millions of data points, and supports accessibility requirements in compliance with W3C’s Web Accessibility Initiative Accessible Rich Internet Applications Suite (WAI-ARIA) standards.


Building on EChart’s core features, ECharts 5 makes it even easier for developers to tell the story behind the data through 15 new features and improvements in story-telling and data expression, optimized visualization and responsive design, interaction and performance enhancement, developer experience, internationalization, and more.


Apache ECharts is in use at Alibaba, Amazon, Baidu, GitLab, Intel, and Tencent, among others, as well as solutions such as Apache Superset data visualization software. The project continues to grow in popularity, with more than 44,000 stars on GitHub and 25,000 weekly downloads on npm to date. 


"The world we live in today is powered by software and data," said Erica Brescia, COO of GitHub. "With Apache ECharts, developers around the world have access to a powerful, free and open source library for data visualization. It is great to see the project flourishing on GitHub. Congrats to the Apache ECharts on their graduation to a top level project at the Apache Software Foundation."


"Apache ECharts helps visualization experts and data analysts easily create a wide variety of visualizations that are very helpful for us to analyze and explore the story behind the data," said visualization academia pioneer Professor Wei Chen of Zhejiang University.


"We are glad to witness ECharts’ pleasant process in the Apache Incubator," said Ming Zu, Senior Manager at Baidu. "Our community grew with individuals from many countries and organizations, who contributed to bug fixing, issue resolving, and new feature implementation."


"When the Apache Superset community looked into visualization libraries to rebuild the core visualization plugins, ECharts stood out as the absolute best fit," said Maxime Beauchemin, original creator of both Apache Airflow and Superset, and serves as Vice President of Apache Superset. "It has an unparalleled variety of visualizations, a rich and composable visual grammar, an intuitive and well designed API, a flexible and performant rendering engine, a very lean tree of dependencies, and the important set of guarantees that the ASF provides when committing long term to using an Open Source project."


"It was a pleasure guiding the ECharts community through the Apache Incubator," said Dave Fisher, ASF Member and Apache ECharts Incubating Mentor. "They have embraced the Apache Way of community-led development, encouraging those interested in helping improve ECharts to contribute and become part of its growing community.”


"This is an exciting time for the ECharts community," added Zhang. "We are enjoying continued growth, and invite those interested in contributing to the project to join us on our developer and user lists."


See the range of options available with ECharts in "Apache ECharts in 5 minutes", a new video created by members of the Apache ECharts community (in Mandarin Chinese with English subtitles) https://youtu.be/nKKK0orjSq8 


Availability and Oversight

Apache ECharts software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache ECharts, visit http://echarts.apache.org and https://twitter.com/ApacheECharts


About the Apache Incubator

The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/ 


About The Apache Software Foundation (ASF)

Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF 


© The Apache Software Foundation. "Apache", "ECharts", "Apache ECharts", "Airflow", "Apache Airflow", "Superset", "Apache Superset", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.


# # #

Monday January 25, 2021

Apache Software Foundation Security Report: 2020

Synopsis: This report explores the state of security across all Apache Software Foundation projects for the calendar year 2020. We review key metrics, specific vulnerabilities, and the most common ways users of ASF projects were affected by security issues.


Released: January 2021


Author: Mark Cox, Vice President Security, Apache Software Foundation

Background

The security committee of the Apache Software Foundation (ASF) oversees and coordinates the handling of vulnerabilities across all of the 340+ Apache projects.  Established in 2002 and composed of all volunteers, we have a consistent process for how issues are handled, and this process includes how our projects must disclose security issues.


Anyone finding security issues in any Apache project can report them to security@apache.org where they are recorded and passed on to the relevant dedicated security teams or private project management committees (PMC) to handle.  The security committee monitors all the issues reported across all the addresses and keeps track of the issues throughout the vulnerability lifecycle.


The security committee is responsible for ensuring that issues are dealt with properly and will actively remind projects of their outstanding issues and responsibilities.  As a board committee, we have the ability to take action including blocking their future releases or, worst case, archiving a project if such projects are unresponsive to handling their security issues.  This, along with the Apache Software License, are key parts of the ASF’s general oversight function around official releases, allowing the ASF to protect individual developers and giving users confidence to deploy and rely on ASF software.


The oversight into all security reports, along with tools we have developed, gives us the ability to easily create metrics on the issues.  Our last report covered the metrics for 2019.

Statistics for 2020

In 2020 our security email addresses received in total 18,000 emails. After spam filtering and thread grouping this was 946 (2019: 620) non-spam threads.  Unfortunately many security reports do look like spam and so the security team are careful to review all messages to ensure real reports are not missed for too long.

Diagram 1: Breakdown of ASF security email threads for calendar year 2020


Diagram 1 gives the breakdown of those 946 threads.  257 threads (27%) were people confused by the Apache License.  As many projects use the Apache License, not just those under the ASF umbrella, people can get confused when they see the Apache License and they don't understand what it is.  This is most common for example on mobile phones where the licenses are displayed in the settings menu, usually due to the inclusion of software by Google released under the Apache License.  We no longer reply to these emails. This is nearly double the number we saw in 2019.


The next 220 of the 946 (23%) are email threads with people asking non-security (usually support-type) questions.


The next 93 of those reports were researchers reporting issues in an Apache web site.  These are almost always false negatives; where a researcher reports us having directory listings enabled, source code visible, or the lack of various domain headers.  These reports are generally the unfiltered output of some publicly available scanning tool, and often where the reporter asks us for some sort of monetary reward (bounty) for their report.


That left 376 (2019: 320) reports of new vulnerabilities in 2020, which spanned across 101 of the top level projects.  These 376 reports are a mix of both external reporters and internal; for example where a project has found an issue themselves and followed the ASF process to assign it a CVE name and address it we’d still count it here.  We don’t keep metrics that would give the breakdown of internal vs external reports.


The next step is that the appropriate project triages the report to see if it's really an issue or not.  Invalid reports and reports of things that are not actually vulnerabilities get rejected back to the reporter.  Of the remaining issues that are accepted they are assigned appropriate CVE names and eventually fixes are released.


As of January 1st 2021, 35 of those 376 reports were still under triage (i.e. the project had not yet determined if the report is accepted or rejected).  


The remaining closed 341 (2019: 301) reports led to us assigning 151 (2019: 122) CVE names.  Some vulnerability reports may include multiple issues, some reports are across multiple projects, and some reports are duplicates where the same issue is found by different reporters, so there isn't an exact one-to-one mapping of accepted reports to CVE names.  The Apache Security committee handles CVE name allocation and is a Mitre Candidate Naming Authority (CNA), so all requests for CVE names in any ASF project are routed through us, even if the reporter is unaware and contacts Mitre directly or goes public with an issue before contacting us.

Noteworthy events

During 2020 there were a few events worth discussion; either because they were severe and high risk, they had readily available exploits, or otherwise due to media attention. These included:

  • February: An issue in Tomcat CVE-2020-1938 gained press interest when it was given branding and a name (“Ghostcat”) and was disclosed by a third-party coordination centre before Tomcat released an advisory (although after the issue was fixed in new releases of Tomcat). Although serious if exploited, it only affected Tomcat installations which exposed an unprotected AJP Connector to untrusted networks (which is already not a good thing to do even without this issue). That limits the number of affected installations.  Various proof-of-concept exploits are public for this issue, including a Metasploit exploit.

  • July: Versions of Apache Guacamole 1.1.0 and earlier were vulnerable to issues in RDP, CVE-2020-9497 and CVE-2020-9498.  If a user connects to a malicious or compromised RDP server it could lead to memory disclosure and possible remote code execution. 

  • August: A vulnerability in Apache Struts (CVE-2019-0230) could lead to arbitrary code execution. In order to exploit the vulnerability, an attacker would need to inject malicious Object-Graph Navigation Language (OGNL) expressions into an attribute that is used within an OGNL expression. Although Struts has mitigations to address potential injected expressions, versions before 2.5.22 left an attack vector open which was fixed in updates for this issue.  A metasploit exploit exists for this issue.

  • November: Previously each ASF project was responsible for writing up their own CVE entries and submitting them to Mitre. This leads to many delays in the CVE database being updated with Apache issues as entries are often rejected as the legacy format causes issues. We released an internal tool providing projects dealing with security issues a way to edit, validate, and submit their entries to Mitre.  We aim to have the CVE database updated within a day of an issue being published.

  • December: The CVE project released a new automation API and the ASF became the first organisation to get a live CVE name using it. Instead of the security team holding a pool of names requested in advance we now allocate them on demand, with the service taking care of emails to the PMC and other previously manual parts of the process. We expect more automation available during 2021 allowing us to streamline the CVE process for projects even further.

Timescales

Our security teams and project management teams are all volunteers and so we do not give any formal SLA on handling of issues.  However we can break down our aims and goals for each part of the process:


Triage: Our aim is to handle incoming mails to the security@apache.org alias within three working days.  We do not measure or report on this because we assess the severity of each incoming issue and apply the limited resources we have appropriately.  The alias is staffed by a very small number of volunteers taken from the different project PMCs.  After the security team forward a report to a PMC they will reply to the reporter.  Therefore if you have reported an issue to us and not received any response after a week please send us a followup email.  Sometimes reporters send reports attaching large PDF files or even movies of exploitation that don’t make it to us, so please ensure any follow ups are a simple plain text email.


Investigation: Once a report is sent to the private list of the projects management committee, the process of triage and investigation varies in time depending on the project, availability of resources, and number of issues to be assessed.  As we send reports to this private list it does not reach every project committer, so there is a much smaller limited set of people in each project able to investigate and respond.  As a general guideline we try to ensure projects have triaged issues within 90 days of the report.  The ASF security team chase any untriaged issues over 90 days old.


Fix: Once a security issue is triaged and accepted, the timeline for the fixing of issues depends on the schedules of the projects themselves.  Issues of lower severity are most often held to future pre-planned releases.  


Announcement: Our process allows projects up to a few days between a fix release being pushed and the announcement of the vulnerability, to let mirrors catch up.  All vulnerabilities are announced via the announce@apache.org list.  We now aim to have them appear in the public Mitre list within a day of the announcement.

Conclusion

Apache Software Foundation projects are highly diverse and independent.  They have different languages, communities, management, and security models.  However one of the things every project has in common is a consistent process for how reported security issues are handled. The ASF Security Committee works closely with the project teams, communities, and reporters to ensure that issues get handled quickly and correctly.  This responsible oversight is a principle of The Apache Way and helps ensure Apache software is stable and can be trusted.


This report gave metrics for calendar year 2020 showing from the 18,000 emails received we triaged over 370 vulnerability reports relating to ASF projects, leading to fixing 151 (CVE) issues. The number of non-spam threads dealt with was up 53% from 2019 with the number of actual vulnerability reports up 13% and assigned CVE up 24%.


If you have vulnerability information you would like to share with or comments on this report please contact us.


# # #

Thursday January 21, 2021

The Apache Software Foundation Announces Apache® Superset™ as a Top-Level Project

Open Source enterprise-grade Big Data visualization and business intelligence Web application in use at Airbnb, American Express, Dropbox, Lyft, Netflix, Nielsen, Rakuten Viki, Twitter, and Udemy, among others.

Wilmington, DE —21 January 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Superset™ as a Top-Level Project (TLP).

Apache Superset is a modern, Open Source data exploration and visualization platform that  enables users to easily and quickly build and explore dashboards using its simple no-code visualization builder and state-of-the-art SQL editor. The project originated at Airbnb in 2015 and entered into the Apache Incubator program in May 2017.

"It's been amazing to be an active part of growing a welcoming, diverse and engaged community over the past five years while following the ASF principles around inclusion, openness and collaboration," said Maxime Beauchemin, Vice President of Apache Superset. "At the scale and level of diversity that the Superset project has achieved, it's critical to have a solid governance model in place like the one prescribed by the ASF."

Apache Superset v1.0
Superset helps streamline the analytics process by providing an intuitive interface to rapidly explore and visualize datasets, create interactive dashboards, and model real-time business intelligence insights at scale. The platform integrates with most SQL speaking data sources, including modern cloud-native databases, data warehouses, and engines at petabyte scale. 

The Project also celebrates a major milestone with the release of Apache Superset 1.0. Features include: 

  • Rich library of visualizations with support for integrating custom visualizations
  • Thin caching layer to optimize performance of charts and dashboards 
  • Code-free visualization builder
  • State-of-the-art SQL editor and metadata workflow
  • Extensible enterprise authentication and security model 
  • Easy-to-use, lightweight semantic layer
  • Notification alerts and scheduled reports


"Apache Superset 1.0 is a solid, mature, self-standing solution that fully solves business intelligence and data visualization needs for modern data teams," added Beauchemin. "Superset not only covers the table stakes, but also offers guarantees, features and a fresh approach that existing BI solutions can't match."

Apache Superset is in use at Airbnb, American Express, Dropbox, Lyft, Netflix, Nielsen, Rakuten Viki, Twitter, and Udemy, among others. A list of known users is available at https://github.com/apache/superset/blob/master/INTHEWILD.md .

"Apache Superset helps Airbnb democratize data insights and make data-informed decisions," said Jeff Feng, Product Lead at Airbnb and member of the Apache Superset Project Management Committee. "Superset uniquely connects SQL analysis with data exploration for thousands of our employees each week. It also serves as a flexible and reliable platform for visualizing metrics, helping executives and knowledge workers see and understand data."

"We had an amazing journey with Superset at Dropbox," said Chloe Wang, Senior Product Manager, Data Insights Platform at Dropbox. "Superset got introduced in 2019 and soon became the most widely adopted query engine within the analytical organization. As a result, our analysts are able to make timely and high confidence product decisions."

"Before Superset, we were paying for a patchwork of proprietary tools and we kept running into limitations when it came to customizing charts and dashboards," said Amit Miran, Software Team Lead for Media Application Framework group at Nielsen. "Once the Superset project supported adding of custom visualizations, that was the turning point for us at Nielsen to start adopting Superset in large projects. We’re very excited about native dashboard filters and future support for cross filtering, which will make our viz plugins even more powerful. The excitement for the project drove me to become involved in my first open source project."

"Apache Superset is an amazing project that enables engineers to easily execute data analysis," said Grace Guo, member of the Apache Superset Project Management Committee. "I have been a Superset user and a Superset builder for a few years. I run queries in SQL Lab, visualize data using one of the many supported chart types, and build dashboards, specifically focusing on performance and product adoption metrics. As an engineer, I appreciate the ability to contribute to the product. If I see some area to improve, or need a feature which doesn’t exist, I am happy to create a PR to fix it for myself and benefit other users."

"Apache Superset’s strength lies in its community," added Beauchemin. "We invite those interested in data visualization to join our mailing lists and help shape future versions of Superset."

Learn more about the latest in v1.0 at the Apache Superset community global MeetUp on 28 January. Registration is open to all and free of charge https://s.apache.org/3cm4f 


Availability and Oversight
Apache Superset software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Superset, visit https://superset.apache.org/


About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF


© The Apache Software Foundation. "Apache", "Superset", "Apache Superset", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation