Entries tagged [bigdata]
The Apache Software Foundation Announces Apache® Gobblin™ as a Top-Level Project
Open Source distributed Big Data integration framework in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Labs, Swisscom, Verizon, and more.
Wilmington, DE —16 February 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Gobblin™ as a Top-Level Project (TLP).
Apache Gobblin is a distributed Big Data integration framework used in both streaming and batch data ecosystems. The project originated at LinkedIn in 2014, was open-sourced in 2015, and entered the Apache Incubator in February 2017.
"We are excited that Gobblin has completed the incubation process and is now an Apache Top-Level Project," said Abhishek Tiwari, Vice President of Apache Gobblin and software engineering manager at LinkedIn. "Since entering the Apache Incubator, we have completed four releases and grown our community the Apache Way to more than 75 contributors from around the world."
Apache Gobblin is used to integrate hundreds of terabytes and thousands of datasets per day by simplifying the ingestion, replication, organization, and lifecycle management processes across numerous execution environments, data velocities, scale, connectors, and more.
"Originally creating this project, seeing it come to life and solve mission-critical problems at many companies has been a very gratifying experience for me and the entire Gobblin team," said Shirshanka Das, Founder and CTO at Acryl Data, and member of the Apache Gobblin Project Management Committee.
As a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems, Apache Gobblin makes the arduous task of creating and maintaining a modern data lake easy. It supports the three main capabilities required by every data team:
- Ingestion and export of data from a variety of sources and sinks into and out of the data lake while supporting simple transformations.
- Data Organization within the lake (e.g. compaction, partitioning, deduplication).
- Lifecycle and Compliance Management of data within the lake (e.g. data retention, fine-grain data deletions) driven by metadata.
"Apache Gobblin supports deployment models all the way from a single-process standalone application to thousands of containers running in cloud-native environments, ensuring that your data plane can scale with your company’s growth," added Das.
Apache Gobblin is in use at Apple, CERN, Comcast, Intel, LinkedIn, Nerdwallet, PayPal, Prezi, Roku, Sandia National Laboratories, Swisscom, and Verizon, among many others.
"We chose Apache Gobblin as our primary data ingestion tool at Prezi because it proved to scale, and it is a swiss army knife of data ingestion," said Tamas Nemeth, Tech Lead and Manager at Prezi. "Today, we ingest, deduplicate, and compact more than 1200 Apache Kafka topics with its help, and this number is still growing. We are looking forward to continuing to contribute to the project and helping the community enable other companies to use Apache Gobblin."
"Apache Gobblin has been at the center stage of the data management story at LinkedIn. We leverage it for various use-cases ranging from ingestion, replication, compaction, retention, and more," said Kapil Surlaker, Vice President of Engineering at LinkedIn. "It is battle-tested and serves us well at exabyte scale. We firmly believe in the data wrangling capabilities that Gobblin has to offer, and we will continue to contribute heavily and collaborate with the Apache Gobblin community. We are happy to see that Gobblin has established itself as an industry standard and is now an Apache Top-Level Project."
"Open community and meritocracy are the key drivers for Apache Gobblin's success," added Tiwari. "We invite everyone interested in the data management space to join us and help shape the future of Gobblin."
Catch Apache Gobblin in action in the upcoming hackathon planned for late Q1 2021. Details will be posted on the Apache Gobblin mailing lists and Twitter feed listed below.
Availability and Oversight
Apache Gobblin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Gobblin, visit https://gobblin.apache.org/ and https://twitter.com/ApacheGobblin
About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Gobblin", "Apache Gobblin", "Hadoop", "Apache Hadoop", "MapReduce", "Apache MapReduce", "Mesos", "Apache Mesos", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 02:00PM Feb 16, 2021
by Sally in General |
|
The Apache Software Foundation Announces Apache® Superset™ as a Top-Level Project
Open Source enterprise-grade Big Data visualization and business intelligence Web application in use at Airbnb, American Express, Dropbox, Lyft, Netflix, Nielsen, Rakuten Viki, Twitter, and Udemy, among others.
Wilmington, DE —21 January 2021— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Superset™ as a Top-Level Project (TLP).
Apache Superset is a modern, Open Source data exploration and visualization platform that enables users to easily and quickly build and explore dashboards using its simple no-code visualization builder and state-of-the-art SQL editor. The project originated at Airbnb in 2015 and entered into the Apache Incubator program in May 2017.
"It's been amazing to be an active part of growing a welcoming, diverse and engaged community over the past five years while following the ASF principles around inclusion, openness and collaboration," said Maxime Beauchemin, Vice President of Apache Superset. "At the scale and level of diversity that the Superset project has achieved, it's critical to have a solid governance model in place like the one prescribed by the ASF."
Apache Superset v1.0
Superset helps streamline the analytics process by providing an intuitive interface to rapidly explore and visualize datasets, create interactive dashboards, and model real-time business intelligence insights at scale. The platform integrates with most SQL speaking data sources, including modern cloud-native databases, data warehouses, and engines at petabyte scale.
The Project also celebrates a major milestone with the release of Apache Superset 1.0. Features include:
- Rich library of visualizations with support for integrating custom visualizations
- Thin caching layer to optimize performance of charts and dashboards
- Code-free visualization builder
- State-of-the-art SQL editor and metadata workflow
- Extensible enterprise authentication and security model
- Easy-to-use, lightweight semantic layer
- Notification alerts and scheduled reports
"Apache Superset 1.0 is a solid, mature, self-standing solution that fully solves business intelligence and data visualization needs for modern data teams," added Beauchemin. "Superset not only covers the table stakes, but also offers guarantees, features and a fresh approach that existing BI solutions can't match."
Apache Superset is in use at Airbnb, American Express, Dropbox, Lyft, Netflix, Nielsen, Rakuten Viki, Twitter, and Udemy, among others. A list of known users is available at https://github.com/apache/superset/blob/master/INTHEWILD.md .
"Apache Superset helps Airbnb democratize data insights and make data-informed decisions," said Jeff Feng, Product Lead at Airbnb and member of the Apache Superset Project Management Committee. "Superset uniquely connects SQL analysis with data exploration for thousands of our employees each week. It also serves as a flexible and reliable platform for visualizing metrics, helping executives and knowledge workers see and understand data."
"We had an amazing journey with Superset at Dropbox," said Chloe Wang, Senior Product Manager, Data Insights Platform at Dropbox. "Superset got introduced in 2019 and soon became the most widely adopted query engine within the analytical organization. As a result, our analysts are able to make timely and high confidence product decisions."
"Before Superset, we were paying for a patchwork of proprietary tools and we kept running into limitations when it came to customizing charts and dashboards," said Amit Miran, Software Team Lead for Media Application Framework group at Nielsen. "Once the Superset project supported adding of custom visualizations, that was the turning point for us at Nielsen to start adopting Superset in large projects. We’re very excited about native dashboard filters and future support for cross filtering, which will make our viz plugins even more powerful. The excitement for the project drove me to become involved in my first open source project."
"Apache Superset is an amazing project that enables engineers to easily execute data analysis," said Grace Guo, member of the Apache Superset Project Management Committee. "I have been a Superset user and a Superset builder for a few years. I run queries in SQL Lab, visualize data using one of the many supported chart types, and build dashboards, specifically focusing on performance and product adoption metrics. As an engineer, I appreciate the ability to contribute to the product. If I see some area to improve, or need a feature which doesn’t exist, I am happy to create a PR to fix it for myself and benefit other users."
"Apache Superset’s strength lies in its community," added Beauchemin. "We invite those interested in data visualization to join our mailing lists and help shape future versions of Superset."
Learn more about the latest in v1.0 at the Apache Superset community global MeetUp on 28 January. Registration is open to all and free of charge https://s.apache.org/3cm4f
Availability and Oversight
Apache Superset software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Superset, visit https://superset.apache.org/
About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with nearly 8,000 Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cloudera, Comcast, Didi Chuxing, Facebook, Google, Handshake, Huawei, IBM, Microsoft, Pineapple Fund, Red Hat, Reprise Software, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Superset", "Apache Superset", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 02:00PM Jan 21, 2021
by Sally in General |
|
The Apache Software Foundation Announces Apache® IoTDB™ as a Top-Level Project
Open Source Internet of Things-native database integrates with the Apache Big Data ecosystem for high-speed data ingestion, massive data storage, and complex data analysis in the cloud, in the field, and on the edge.
Wakefield, MA —23 September 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® IoTDB™ as a Top-Level Project (TLP).
Apache IoTDB is an Open Source IoT database designed to meet the rigorous data, storage, and analytics requirements of large-scale Internet of Things (IoT) and Industrial Internet of Things (IIoT) applications. The project was first developed as a research project at Tsinghua University and entered the Apache Incubator in November 2018.
"The Internet of Things, especially Industrial IoT, has swept the globe with unimaginable volumes of data,” said Xiangdong Huang, Vice President of Apache IoTDB. "To date, both Relational and Key Value-based database solutions struggle to meet the demands of IoT data management. Apache IoTDB is the missing link between current IoT data and IoT applications, and is redefining how IoT data is managed, both in the cloud and on the edge. We are proud to graduate as an Apache Top-Level Project, which is an important milestone in our project’s maturity."
Apache IoTDB provides a compact and time series optimized columnar data file, which is able to efficiently store and access time series data. The database engine is specially optimized for time series-oriented operations, such as aggregations query, down-sampling, and time alignment query. Due to its lightweight structure, high performance, and deep integration with Apache Big Data ecosystem projects (such as Flink, Hadoop, and Spark), Apache IoTDB easily meets the requirements of storing massive data sets, ingesting high-speed data, and analyzing complex data, both on the edge and the cloud. Features include:
- High-throughput read and write: supports high-speed write access for millions of low-power and intelligently networked devices, and provides lightning-quick read access for retrieving data on billions of data points.
- Efficient directory structure: organizes complex metadata structure from IoT devices and large scale time series data, with fuzzy searching strategy for complex directory of time series data.
- Rich query semantics: supports time alignment for time series data across devices and sensors, computation in time series field, and abundant aggregation functions in time dimension.
- Flexible deployment: supports running on the edge (e.g., running on a Raspberry Pi), as well as forming a cluster in the cloud. It also provides a bridge tool between cloud platforms and data synchronization on premise machines.
- Deep integration with Open Source Big Data projects: supports analysis ecosystems, including Apache Flink, Hadoop, PLC4X and Spark, as well as other Open Source applications.
- Low hardware cost: reaches a high compression ratio of disk storage.
Apache IoTDB is in use at dozens of organizations that include ArcelorMittal AMERICA, BONC Ltd., the China Meteorological Administration, Datang Xianyi, Goldwind, Haier, Lenovo, NAVINFO, pragmatic industries GMBH, Shanghai Metro, Tsinghua University, Yangtze Optical Fiber and Cable Company, and more.
"IoTDB has attained Apache Top Level project status at a time of confluence of database, IoT and AI technologies in conjunction with a wider adoption of Industry 4.0 and automation approaches to further enable remote work and increased efficiencies," said Prof. C. Mohan, recently retired IBM Fellow, Former Chief Scientist of IBM India, and a member of the US National Academy of Engineering. "I am excited since this is the first Chinese University originated open-source project to reach this status. While I have been associated with the researchers behind IoTDB as a Distinguished Visiting Professor of the School of Software at China's prestigious Tsinghua University, I have seen this project reach maturity and build up a vibrant OSS community around it. It has a bright future ahead of it and I plan to collaborate on it."
"Apache IoTDB is a perfect fit for edge computing," said Dr. Julian Feinauer, CEO at pragmatic industries GmbH. "The high compression helps to use the (limited) amount of memory we have very efficiently. IoTDB is a perfect fit, especially in IIoT use cases, where network and compute capabilities are limited on the edge."
"Apache IoTDB was initially launched by a Chinese University and then incubated successfully in the Apache Community," said Prof. Hong Mei, an academician of the Chinese Academy of Sciences. "Following the Apache Way, it has created a healthy and active international open source community. It is a successful practice of open source education and culture advancement in China."
"Apache IoTDB has made many optimizations for different runtime environments, operating systems, and workloads in both the edge and the cloud. As a core infrastructure software in Industrial Internet, it innovates a series of IoT data management and analysis techniques," said Prof. Xiangke Liao, an academician of the Chinese Academy of Engineering. "Through the open source model, Apache IoTDB shares its creative techniques to the world."
"With the continuous growth of intelligent devices, machine-generated data is growing day by day, which poses extraordinary challenges on storing process, query speed, and storage space," said Dawei Liu, architect at AutoAI Inc., a subsidiary of NAVINFO, and member of the Apache IoTDB Project Management Committee. "We tried and tested a variety of solutions and finally chose IoTDB as our core database for its high performance, openness to the enterprise, and its active community. We built our Wecloud platform based on Apache IoTDB, which has served well for BMW, Toyota, and Great Wall Motors, among other auto manufacturers. The project deeply attracted me to become a part of the community. The coolest thing is that I finally became an IoTDB committer and now share our ideas to the community."
"Apache IoTDB is an open source project and software technology innovation developed for the need of AIoT Big Data applications," said Prof. Jianmin Wang, Dean of the Tsinghua University School of Software, who originally decided to donate the project to the ASF. "It is also a very beneficial attempt for training leading talents. There will be a long way to go and the future is promising."
"Apache IoTDB is on its way to becoming a standard IoT data management and analysis solution, and we’re excited to build upon our work thus far," added Huang. "We believe Apache IoTDB will help more users and companies to solve their real problems. The process to achieve the goal is exciting and honorable, and we invite more contributors to join us. Following the Apache Way, let's bring this interesting, meaningful, and powerful software to the whole world."
A published paper on Apache IoTDB written by members of the Apache IoTDB Project Management Committee is available at http://www.vldb.org/pvldb/vol13/p2901-wang.pdf . An introduction to Apache IoTDB from ApacheCon Europe 2019 is available on Feathercast https://feathercast.apache.org/2019/09/12/hello-world-introducing-apache-iotdb-a-database-for-the-internet-of-things-xiangdong-huang-julian-feinauer/
Catch Apache IoTDB in action at ApacheCon@Home, 29 September-1 October 2020 https://www.apachecon.com/acah2020/tracks/iot.html
Availability and Oversight
Apache IoTDB software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache IoTDB, visit http://iotdb.apache.org/ and https://twitter.com/ApacheIoTDB
About the Apache Incubator
The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 227M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,800+ Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Inspur, Pineapple Fund, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "IoTDB", "Apache IoTDB", "Flink", "Apache Flink", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:01AM Sep 23, 2020
by Sally in General |
|
The Apache Software Foundation Announces the 10th Anniversary of Apache® HBase™
Open Source distributed, scalable Big Data store celebrates a decade of processing zettabytes of data across highly scalable large tables for the Apache Hadoop ecosystem
Wakefield, MA —13 May 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the tenth Anniversary of Apache® HBase™, the distributed, scalable data store for the Apache Hadoop Big Data ecosystem.
"The success of Apache HBase is the success of Open Source," said Duo Zhang, Vice President of Apache HBase. "Ten years after graduating as a TLP, HBase is still among the most active projects at the ASF. We have hundreds of contributors all around the world. We speak different languages, we have different skills, but we all work together to make HBase better and better. Ten year anniversary is not the end, but a new beginning, I believe our strong community will lead the project to a bright future."
HBase originated at Powerset in 2006 as an Open Source system to run on Apache Hadoop’s Distributed File System (HDFS), similar to how BigTable ran on top of the Google File System. In 2007, a significant code contribution was added to the Apache Hadoop codebase and was integrated into the Apache Hadoop 0.15.0 release later that year. Development on HBase continued as a sub-project of Apache Hadoop, and graduated as an Apache Top-Level Project (TLP) in April 2010.
An Open Source, versioned, non-relational database, Apache HBase provides low latency random access to very large tables —billions of rows and millions of columns— atop clusters of non-specialized, commodity hardware. HBase reads, writes, and processes structured, semi-structured, and unstructured data in real-time environments.
Apache HBase is in use at thousands of organizations, including Adobe, Airbnb, Alibaba, Bloomberg, Flipkart, Huawei, HP, Hubspot, IBM, Microsoft, NetEase, Pinterest, Salesforce, Shopee, Tencent, Twitter, Xiaomi, and Yahoo! (now Verizon Media), among others.
Testimonials
"Congratulations on the 10th birthday of Apache HBase! Alibaba started to use HBase since January 2011 and has witnessed its growth and come along with the community through the years. The Apache HBase community has always been an open and powerful team that produced many stable, production-ready and widely used versions. Today at Alibaba, we have HBase clusters with more than 10k nodes serving hundreds of petabytes of data, as well as more than 1,000 enterprise HBase users on Alibaba Cloud. We will continue collaborating with and contributing to the HBase community and wish us all ongoing success in future!"
—Chunhui Shen and Yu Li, members of the HBase team at Alibaba
"I have worked with Apache HBase for many years and I think it is a great product. it does what it says on the tin so to speak. Ironically if you look around the NoSQL competitors, most of them are supported by start-ups, whereas HBase is only supported as part of Apache suite of products by vendors like Cloudera, Hortonworks, MapR, etc. For those who would prefer to use SQL on top, there is Apache Phoenix around which makes life easier for the most SQL-savvy world to work on HBase: problem solved. For TCO, HBase is still value for money compared to others. You don't need expensive RAM or SSD with HBase. That makes it easy to onboard it in no time. Also HBase can be used in a variety of different business applications, whereas other commercial ones are focused on narrower niche markets. Least but last happy 10th anniversary and hope HBase will go from strength to strength and we will keep using it for years to come!"
—Dr. Mich Talebzadeh, Chief Data Architect, Big Data
"Congratulations on the 10th anniversary of Apache HBase! Xiaomi started to use HBase in 2012, when our business started booming. Many key Xiaomi products and services, as well as Xiaomi's data analytics platform, require a new system to provide quick and random access to billions of rows of structured and semi-structured data. Traditional solutions are not able to handle the large volume of data brought by the quickly increasing Xiaomi user base. Among several available options, we choose HBase not only because it provides a rich set of features and excellent performance specs, but also because it has a very active, open and friendly community. Embracing open source has been part of Xiaomi's engineering culture, and our deep involvement in the development of Apache HBase demonstrates the best practices of Xiaomi's open source strategy. In the past several years, we have contributed tons of bug fixes and important features to HBase, and, in the meantime, we have contributed 9 committers and 3 PMC members to the HBase community. Looking forward, we will continue to work closely with the Apache HBase community to help the project grow, and we wish Apache HBase a wonderful future!"
—Dr. Baoqiu Cui, Vice President of Xiaomi Corporation and Technical Committee Chairman
“Congratulations on the 10th anniversary of Apache HBase, it’s great to see how the project has developed over the years and continues to have good community support around it! Salesforce has a large global footprint of Apache HBase in production storing multiple petabytes of customer data and serving several billions of queries per day for a wide variety of use cases including security, monitoring, collaboration portals, and performance caches to scale over RDBMS limitations. HBase has played a major role in Salesforce’s customer success in the BigData storage space and we continue to invest in it as one of the pillars of our multi-substrate database strategy along with Apache Phoenix for SQL access to data stored in HBase. We have contributed many features and bug fixes to HBase over the last several years, and we look forward to continue working with the Apache HBase community to develop the project further. Here’s to many more successful years for Apache HBase!”
—Sanjeev Lakshmanan, Senior Director, Software Development, Salesforce
“Happy 10th Apache HBase! It was around 8 years ago that we started looking at HBase to include as part of our Hosted Big Data Services stack. Fast-forward to today and it continues to be a critical offering in our stack, powering a diverse set of use cases and workloads such as ad targeting, content personalization, analytics, security, monitoring, etc. HBase enables these diverse workloads thanks to it’s high-scalability, feature set and performance, all of which have been continuously refined through the years. In turn our footprint continues to grow storing petabytes of data across thousands of machines. Our success is in part thanks to the project’s success as we benefit from our collaborations, the contributions and other efforts by the community (eg mailing list, meetups, HBaseCon, etc). This is a testament to the open, friendly and dedicated community around Apache HBase which is necessary for the success of any open source project. We wish the project continued success for years to come as we continue to collaborate with and be part of the community cultivating the project.”
—Francis Liu and Thiruvel Thirumoolan, HBase Big Data Team Members, Yahoo! (now Verizon Media)
“Congratulations on the 10th anniversary of Apache HBase! It’s great to see how this project has evolved from a big data project to one that runs business critical systems and continues to accelerate with a growing community and increasing pace of development! Cloudera has over 500 customers in production using it for a range of use cases ranging from mission critical transactional applications to supporting data warehousing. Our largest customers have footprints in excess of 7,000 nodes storing over 70PB of data. Our customers choose HBase because of its resilience with some customers able to realize 100% application uptime using HBase (over the past 3 years). We plan to continue to invest in HBase (and Apache Phoenix) to ensure that we can continue to both broaden support for a variety of hybrid transactional and analytical use cases and deepen support for existing use cases. Here's to many more successful years!"
—Arun C. Murthy, Chief Product Officer, Cloudera
“Many Congratulations to the Apache HBase community on the 10th anniversary. Apache HBase provides rich functions and excellent performance, and has an open and friendly community. Huawei started using HBase since 2010: HBase is widely used by multiple solutions of Huawei running on more than 10,000 nodes, storing hundreds of PBs data to meet our requirements. Huawei FusionInsight provides the Best Practices of Huawei for HBase, which serves a lot of customers across many industries such as finance, operators, government, energy, medical, manufacturing, and transportation. Meanwhile, Huawei team members contributed a lot of bug fixes and features to HBase, successfully hosted the first HBase Asia Technology Conference HBaseCon Asia 2017 at Shenzhen. Going forward, Huawei will continue to work closely with the Apache HBase community to promote community development.”
—Wei Zhi, Kai Mo and Pankaj Kumar, members of the HBase team at Huawei
“Happy 10th anniversary, HBase! At Ultra Tendency, you have been the backbone of our Dual Lambda Streaming Architecture for many years! You have served billions of queries to our customers without interruption and at low latency. Your architecture guaranteed that you were always there when we needed you, never letting us or our customers down. You are the reason why our European clients today are running flourishing new business models backed by low-latency streaming products. Our committers and contributors will continue to fix bugs and provide feature enhancements. Ultra Tendency wishes you a bright and successful future!”
—Jan Hentschel, Chief Information Officer, Ultra Tendency
“Congratulations on the 10th anniversary of Apache HBase, I can't believe it's been 10 years since the first day when I tried to use Apache HBase and its ecosystem to help the business and company. Also, it is so great to see many colleagues and friends work, discuss, cooperate together to make this system become better. Some of them also make great career development and some are still progress. Shopee, one of the biggest e-commerce platforms in Southeast Asia, has several large Apache HBase clusters in production to support businesses that depend on several billions of queries per day. Apache HBase has played a significant role in Shopee and it is still in expansion along with the business growth of Shopee. Apache HBase, as well as the community, helps us a lot and we also will continue to make contributions to Apache HBase. Looking forward to keeping working with the Apache HBase community to develop the project and its ecosystem further.”
—Li Luo, Manager of Data Infra department, Shopee
”At Microsoft, our mission is to empower every person and every organization on the planet to achieve more, and it’s this mission that drives our commitment to open source. Congratulations to the Apache HBase community on its 10th anniversary. Microsoft has been part of the vibrant HBase community since 2014, today we are proud to serve the numerous enterprise customers across industries who are leveraging HBase in Azure HDInsight for their most critical business applications.”
—Tomas Talius, Director of Engineering, Azure Data Services, Microsoft
Availability and Oversight
Apache HBase software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache HBase, visit http://hbase.apache.org/ and https://twitter.com/HBase
About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 200M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,600+ Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, ODPi, Pineapple Fund, Private Internet Access, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "HBase", "Apache HBase", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 01:00PM May 13, 2020
by Sally in General |
|
The Apache Software Foundation Announces Apache® ShardingSphere™ as a Top-Level Project
- ShardingSphere-JDBC —a lightweight Java framework that provides extra service at the Java JDBC (“Java Database Connectivity”) layer. It provides service in the form of JAR (“Java ARchive”) that requires no additional deployment or dependencies. It can be considered as an enhanced JDBC driver, which is fully compatible with JDBC and all kinds of ORM (Object/Relational Mapping) frameworks.
- ShardingSphere-Proxy —database proxy that provides a database server that encapsulates database binary protocol to support all developed languages and any terminal.
- ShardingSphere-Sidecar (TODO) —a Cloud-native database agent of the Kubernetes environment that controls the access to the database in the form of sidecar (supporting services deployed with the main application). It provides a mesh layer interacting with the database, known as “Database Mesh”.
- Completely distributed database solution that provides data sharding, distributed transactions, data migration, as well as database and data governance features.
- Independent SQL parser for multiple SQL dialects that can be used independently of ShardingSphere.
- Pluggable micro-kernel that enables all SQL dialects, database protocols and features to be plugged-in and pulled-out by service provider interfaces.
Posted at 03:38PM Apr 16, 2020
by Sally in General |
|
The Apache Software Foundation Announces Apache® Rya® as a Top-Level Project
Scalable Open Source Big Data database processes queries in milliseconds; used in autonomous drones, federated situation-aware access control systems, and petabyte-scale graphs modeling, among many other applications.
Wakefield, MA —24 September 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Rya® as a Top-Level Project (TLP).
Posted at 12:59PM Sep 24, 2019
by Sally in General |
|
The Apache® Software Foundation Announces Apache Arrow™ Momentum
- C# Library
- Gandiva LLVM-based Expression Compiler
- Go Library
- Javascript Library
- Plasma Shared Memory Object Store
- Ruby Libraries (Apache Arrow and Apache Parquet)
- Rust Libraries (Parquet and DataFusion Query Engine)
Posted at 11:00AM Feb 19, 2019
by Sally in General |
|
The Apache Software Foundation Announces Apache® Hadoop® v3.2.0
- ABFS Filesystem connector —supports the latest Azure Datalake Gen2 Storage;
- Enhanced S3A connector —including better resilience to throttled AWS S3 and DynamoDB IO;
- Node Attributes Support in YARN —helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels;
- Storage Policy Satisfier —supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories;
- Hadoop Submarine —enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster;
- C++ HDFS client —helps to do async IO to HDFS which helps downstream projects such as Apache ORC;
- Upgrades for long running services —supports in-place seamless upgrades of long running containers via YARN Native Service API (application program interface) and CLI (command-line interface).
Posted at 11:00AM Jan 23, 2019
by Sally in General |
|
The Apache Software Foundation Announces Apache® Airflow™ as a Top-Level Project
Open Source Big Data workflow management system in use at Adobe, Airbnb, Etsy, Google, ING, Lyft, PayPal, Reddit, Square, Twitter, and United Airlines, among others.
Wakefield, MA —8 January 2019— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today Apache® Airflow™ as a Top-Level Project (TLP).
Apache Airflow is a flexible, scalable workflow automation and scheduling system for authoring and managing Big Data processing pipelines of hundreds of petabytes. Graduation from the Apache Incubator as a Top-Level Project signifies that the Apache Airflow community and products have been well-governed under the ASF's meritocratic process and principles.
"Since its inception, Apache Airflow has quickly become the de-facto standard for workflow orchestration," said Bolke de Bruin, Vice President of Apache Airflow. "Airflow has gained adoption among developers and data scientists alike thanks to its focus on configuration-as-code. That has gained us a community during incubation at the ASF that not only uses Apache Airflow but also contributes back. This reflects Airflow’s ease of use, scalability, and power of our diverse community; that it is embraced by enterprises and start-ups alike, allows us to now graduate to a Top-Level Project."
Apache Airflow is used to easily orchestrate complex computational workflows. Through smart scheduling, database and dependency management, error handling and logging, Airflow automates resource management, from single servers to large-scale clusters. Written in Python, the project is highly extensible and able to run tasks written in other languages, allowing integration with commonly used architectures and projects such as AWS S3, Docker, Apache Hadoop HDFS, Apache Hive, Kubernetes, MySQL, Postgres, Apache Zeppelin, and more. Airflow originated at Airbnb in 2014 and was submitted to the Apache Incubator March 2016.
Apache Airflow is in use at more than 200 organizations, including Adobe, Airbnb, Astronomer, Etsy, Google, ING, Lyft, NYC City Planning, Paypal, Polidea, Qubole, Quizlet, Reddit, Reply, Solita, Square, Twitter, and United Airlines, among others. A list of known users can be found at https://github.com/apache/incubator-airflow#who-uses-apache-airflow
"Adobe Experience Platform is built on cloud infrastructure leveraging open source technologies such as Apache Spark, Kafka, Hadoop, Storm, and more," said Hitesh Shah, Principal Architect of Adobe Experience Platform. "Apache Airflow is a great new addition to the ecosystem of orchestration engines for Big Data processing pipelines. We have been leveraging Airflow for various use cases in Adobe Experience Cloud and will soon be looking to share the results of our experiments of running Airflow on Kubernetes."
"Our clients just love Apache Airflow. Airflow has been a part of all our Data pipelines created in past 2 years acting as the ring-master and taming our Machine Learning and ETL Pipelines," said Kaxil Naik, Data Engineer at Data Reply. "It has helped us create a Single View for our client's entire data ecosystem. Airflow's Data-aware scheduling and error-handling helped automate entire report generation process reliably without any human-intervention. It easily integrates with Google Cloud (and other major cloud providers) as well and allows non-technical personnel to use it without a steep learning curve because of Airflow’s configuration-as-a-code paradigm."
"With over 250 PB of data under management, PayPal relies on workflow schedulers such as Apache Airflow to manage its data movement needs reliably," said Sid Anand, Chief Data Engineer at PayPal. "Additionally, Airflow is used for a range of system orchestration needs across many of our distributed systems: needs include self-healing, autoscaling, and reliable [re-]provisioning."
"Since our offering of Apache Airflow as a service in Sept 2016, a lot of big and small enterprises have successfully shifted all of their workflow needs to Airflow," said Sumit Maheshwari, Engineering Manager at Qubole. "At Qubole, not only are we a provider, but also a big consumer of Airflow as well. For example, our whole Insight and Recommendations platform is built around Airflow only, where we process billions of events every month from hundreds of enterprises and generate insights for them on big data solutions like Apache Hadoop, Apache Spark, and Presto. We are very impressed by the simplicity of Airflow and ease at which it can be integrated with other solutions like clouds, monitoring systems or various data sources."
"At ING, we use Apache Airflow to orchestrate our core processes, transforming billions of records from across the globe each day," said Rob Keevil, Data Analytics Platform Lead at ING WB Advanced Analytics. "Its feature set, Open Source heritage and extensibility make it well suited to coordinate the wide variety of batch processes we operate, including ETL workflows, model training, integration scripting, data integrity testing, and alerting. We have played an active role in Airflow development from the onset, having submitted hundreds of pull requests to ensure that the community benefits from the Airflow improvements created at ING. We are delighted to see Airflow graduate from the Apache Incubator, and look forward to see where this exciting project will be taken in future!"
"We saw immediately the value of Apache Airflow as an orchestrator when we started contributing and using it," said Jarek Potiuk, Principal Software Engineer at Polidea. "Being able to develop and maintain the whole workflow by engineers is usually a challenge when you have a huge configuration to maintain. Airflow allows your DevOps to have a lot of fun and still use the standard coding tools to evolve your infrastructure. This is 'infrastructure as a code' at its best."
"Workflow orchestration is essential to the (big) data era that we live in," added de Bruin. "The field is evolving quite fast and the new data thinking is just starting to make an impact. Apache Airflow is a child of the data era and therefore very well positioned, and is also young so a lot of development can still happen. Airflow can use bright minds from scientific computing, enterprises, and start-ups to further improve it. Join the community, it is easy to hop on!"
Availability and Oversight
Apache Airflow software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Airflow, visit http://airflow.apache.org/ and https://twitter.com/ApacheAirflow
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 7,000 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, Tencent, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Airflow", "Apache Airflow", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:00AM Jan 08, 2019
by Sally in General |
|
The Apache Software Foundation Announces Apache® HAWQ® as a Top-Level Project
- Exceptional performance: parallel processing architecture delivers high performance throughput and low latency —potentially near real time— query responses that can scale to petabyte-sized datasets;
- Robust ANSI SQL compliance: leverage familiar skills. Achieve higher levels of compatibility for SQL-based applications and BI/data visualization tools. Execute complex queries and joins, including roll-ups and nested queries; and
- Apache Hadoop ecosystem integration: integrate and manage with Apache YARN. Provision with Apache Ambari. Interface with Apache HCatalog. Supports Apache Parquet, Apache HBase, and others. Easily scales nodes up or down to meet performance or capacity requirements.
Apache HAWQ is in use at Alibaba, Haier, VMware, ZTESoft, and hundreds of users around the world.
"We admire Apache HAWQ's flexible framework and ability to scale up in a Cloud ecosystem. HAWQ helps those seeking a heterogeneous computing system to handle ad-hoc queries and heavy batch workloads," said Kuien Liu, Computing Platform Architect at Alibaba. "Alibaba encourages more and more engineers to continue to embrace Open Source, and Apache HAWQ stands out as a star project. We are proud to have been collaborating with this community since 2015."
"Apache HAWQ is an attractive technology for Big Data applications," said Zixu Zhao, Architect at ZTESoft. "HAWQ serves as the foundation of our Big Data platform and it has been used in a lot of applications, such as interactive analytics and BI on telecom data. We congratulate HAWQ on becoming an Apache Top-Level Project."
"Becoming an Apache Top-Level Project is an important milestone," added Chang. "There is much work ahead of us, and we look forward to growing the HAWQ community and codebase."
Apache HAWQ software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache HAWQ, visit http://hawq.apache.org/ and https://twitter.com/ApacheHAWQ .
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 730 individual Members and 6,800 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Aetna, Anonymous, ARM, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Indeed, Inspur, LeaseWeb, Microsoft, Oath, ODPi, Pineapple Fund, Pivotal, Private Internet Access, Red Hat, Target, and Union Investment. For more information, visit http://apache.org/ and https://twitter.com/TheASF
# # #
Posted at 10:00AM Aug 23, 2018
by Sally in General |
|
The Apache® Software Foundation Announces Agenda, Keynotes, and Sponsors for ApacheCon™ North America 2018
Community-driven conference series to gather dozens of Apache projects and their communities in Montréal to share and learn about the latest Open Source innovations in Big Data, Cloud, Finance, IoT, Machine Learning, Search, Servers, and more in a collaborative, vendor-neutral environment
Wakefield, MA —17 May 2018— The Apache® Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the program for its official conference series, ApacheCon™, taking place 24-27 September 2018 in Montréal, Canada.
- Cliff Schmidt, Apache Member, former ASF Board member, and Literacy Bridge founder on how Amplio uses technology to educate and improve the quality of life of people living in very difficult parts of the world.
- Myrle Krantz, Apache Member and Vice President Apache Fineract, on how Open Source banking is helping the global fight against poverty.
- Bridget Kromhout, Principal Cloud Developer Advocate at Microsoft, on the really hard problem in software: the people.
- Euan McLeod, VP VIPER at Comcast, on the many ways that Apache software delivers your favorite shows to your living room.
Posted at 10:50AM May 17, 2018
by Sally in General |
|
The Apache Software Foundation Announces Apache® Oozie(TM) v5.0.0
- moved launcher from MapReduce mapper to YARN ApplicationMaster;
- switched from Tomcat 6 to embedded Jetty 9;
- updated third party libraries;
- completely rewritten workflow graph generator;
- JDK 8 support;
- deprecated Instrumentation in favor of Metrics;
- added indexes to speed up DB queries; and
- fixed CVE-2017-15712
Posted at 02:06PM Apr 18, 2018
by Sally in General |
|
The Apache Software Foundation Announces Apache® Hadoop® v3.0.0 General Availability
- HDFS erasure coding —halves the storage cost of HDFS while also improving data durability;
- YARN Timeline Service v.2 (preview) —improves the scalability, reliability, and usability of the Timeline Service;
- YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads;
- Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines;
- Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and
- Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.
Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of known users at https://wiki.apache.org/hadoop/PoweredBy
"It's tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today's release," said Doug Cutting, original co-creator of Apache Hadoop. "With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems. The Open Source community continues to respond to industrial demands."
Apache Hadoop's diverse community enjoys continued growth amongst the ASF's most active projects, and remains at the forefront of more than three dozen Apache Big Data projects.
Apache Hadoop has received countless awards, including top prizes at the Media Guardian Innovation Awards and Duke's Choice Awards, and has been hailed by industry analysts:
"...the lifeblood of organizational analytics…" —Gartner
"Hadoop Is Here To Stay" —Forrester
"...today Hadoop is the only cost-sensible and scalable open source alternative to commercially available Big Data management packages. It also becomes an integral part of almost any commercially available Big Data solution and de-facto industry standard for business intelligence (BI)." —MarketAnalysis.com/Market Research Media
"...commanding half of big data’s $100 billion annual market value...Hadoop is the go-to big data framework." —BigDataWeek.com
"Hadoop, and its associated tools, is currently the 'big beast' of the big data world and the Hadoop environment is undergoing rapid development..." —Bloor Research
"The opportunity to effect meaningful, even fundamental change in the Apache Hadoop project remains open," added Douglas. "Our new contributors uprooted the project from its historical strength in Web-scale analytics by introducing powerful, proven abstractions for data management, security, containerization, and isolation. Apache Hadoop drives innovation in Big Data by growing its community. We hope this latest release continues to draw developers, operators, and users to the ASF."
Catch Apache Hadoop in action at the Strata Data Conference in San Jose, CA, 5-8 March 2018, and at dozens of Hadoop Meetups held around the world.
Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server —the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, Union Investment, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:00AM Dec 14, 2017
by Sally in General |
|
The Apache Software Foundation Announces Apache® RocketMQ™ as a Top-Level Project
Forest Hill, MD –25 September 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® RocketMQ™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache RocketMQ is an Open Source distributed messaging and streaming Big Data platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
"I am very excited to see Apache RocketMQ as a Top-Level Project and I would like to thank our mentors for all their help, the Apache Incubator Project Management Committee for its advice and guidance, everyone in the RocketMQ community, and Alibaba for publishing the research upon which RocketMQ is based," said Xiaorui Wang, Vice President of Apache RocketMQ. "During the incubation process, the RocketMQ community worked very hard to develop high-quality distributed software for messaging and streaming, in an open and inclusive manner in accordance with the Apache Way."
- Low latency; more than 99.6% response latency within 1 millisecond under high pressure;
- Finance-oriented, high availability with tracking and auditing features;
- Industry-sustainable, trillion-level message capacity guaranteed;
- Vendor-neutral, support multiple messaging protocols like JMS and OpenMessaging;
- Big Data friendly, batch transferring with versatile integration for flooding throughput; and
- Massive accumulation, given sufficient disk space, accumulate messages without performance loss.
"New participants are more than welcome to join the project, To serve the community better, we created and maintained two repositories, one as our kernel version and the other one is for community contributions. The community contributed some integrated projects with some other Apache TLPs like Apache Storm, Apache Ignite, Apache Spark and Apache Flume," said Xinyu "yukon" Zhou, member of the Apache RocketMQ Project Management Committee. "We enthusiastically look forward to working together with all contributors to Apache RocketMQ in order to advance the state-of-the-art distributed messaging engine."
Availability and Oversight
Apache RocketMQ software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache RocketMQ, visit http://rocketmq.apache.org/ and https://twitter.com/ApacheRocketMQ
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 650 individual Members and 6,200 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, Inspur, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "RocketMQ", "Apache RocketMQ", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 10:00AM Sep 25, 2017
by Sally in General |
|
The Apache Software Foundation Announces Apache® MADlib™ as a Top-Level Project
Big Data machine-learning library used for scalable in-database analytics
Forest Hill, MD –22 August 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® MADlib™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache MADlib is a comprehensive library for scalable in-database analytics. It provides parallel implementations of machine learning, graph, mathematical and statistical methods for structured and unstructured data.
"Graduating as a Top-Level Project is a very important milestone for Apache MADlib," said Aaron Feng, Vice President of Apache MADlib. "During the incubation process, the MADlib community worked very hard to develop high quality software for in-database analytics, in an open and inclusive manner in accordance with the Apache Way."
MADlib grew out of discussions between database engine developers, data scientists, IT architects and academics interested in new approaches to scalable, sophisticated in-database analytics. These discussions were written up in a paper from VLDB 2009 [1] that coined the term "MAD Skills" for data analysis. The MADlib software project began the following year as a collaboration between researchers at UC Berkeley and engineers and computer scientists at Pivotal (formerly EMC/Greenplum). In September 2015, MADlib joined the ASF community as an incubating project.
MADlib is deployed on a wide variety of industry and academic projects across many different verticals, including automotive, consumer, finance, government, healthcare, and telecommunications.
"MADlib was conceived from the outset as an open-source meeting ground for software developers, computing researchers and data scientists to collaborate on scalable, in-database machine learning and statistics," said Joe Hellerstein, Professor of Computer Science at UC Berkeley, Co-Founder and Chief Strategy Officer at Trifacta, and one of the original authors of MADlib. "It has been great to witness the growth of the MADlib community and codebase as an ASF incubating project, and I look forward to this continuing as a Top-Level Project."
"At Pivotal, we have seen our customers successfully deploy MADlib on large scale data science projects across a wide variety of industry verticals," said Elisabeth Hendrickson, Vice President, R&D for Data at Pivotal. "As MADlib graduates to a Top-Level Project at the ASF, we anticipate increased adoption in the enterprise given the mature level of the codebase and the active developer community."
"The potential of the Apache MADlib project is unbounded," said Jim Jagielski, Vice Chairman of the ASF. "The ability to perform in-depth and detailed analytics, on both structured and unstructured data, using SQL enables MADlib to be applicable in scenarios where others simply can't compete. As not only interest in, but real-world usage of, machine learning becomes common place, MADlib joins the growing roster of Apache projects that define innovation."
"Apache MADlib is a great example of the diversity at Apache," said Ted Dunning, Apache MADlib Incubator Mentor and Member of the ASF Board of Directors. "MADlib does state-of-the-art machine learning, but does as an inherent part of a database. This is a radical approach that can provide important design flexibility. I am excited to see MADlib become a fully fledged project at Apache."
"New participants are more than welcome to join the project," added Feng. "We enthusiastically look forward to working together with all contributors to Apache MADlib in order to advance the state-of-the-art of scale-out data science tools."
[1] http://dl.acm.org/citation.cfm?id=1687576
Availability and Oversight
Apache MADlib software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache MADlib, visit http://madlib.apache.org/ and https://twitter.com/ApacheMADlib
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 650 individual Members and 6,200 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, Inspur, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "MADlib", "Apache MADlib", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 10:00AM Aug 22, 2017
by Sally in General |
|