Entries tagged [hadoop]
The Apache Software Foundation Announces the 10th Anniversary of Apache® HBase™
Open Source distributed, scalable Big Data store celebrates a decade of processing zettabytes of data across highly scalable large tables for the Apache Hadoop ecosystem
Wakefield, MA —13 May 2020— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the tenth Anniversary of Apache® HBase™, the distributed, scalable data store for the Apache Hadoop Big Data ecosystem.
"The success of Apache HBase is the success of Open Source," said Duo Zhang, Vice President of Apache HBase. "Ten years after graduating as a TLP, HBase is still among the most active projects at the ASF. We have hundreds of contributors all around the world. We speak different languages, we have different skills, but we all work together to make HBase better and better. Ten year anniversary is not the end, but a new beginning, I believe our strong community will lead the project to a bright future."
HBase originated at Powerset in 2006 as an Open Source system to run on Apache Hadoop’s Distributed File System (HDFS), similar to how BigTable ran on top of the Google File System. In 2007, a significant code contribution was added to the Apache Hadoop codebase and was integrated into the Apache Hadoop 0.15.0 release later that year. Development on HBase continued as a sub-project of Apache Hadoop, and graduated as an Apache Top-Level Project (TLP) in April 2010.
An Open Source, versioned, non-relational database, Apache HBase provides low latency random access to very large tables —billions of rows and millions of columns— atop clusters of non-specialized, commodity hardware. HBase reads, writes, and processes structured, semi-structured, and unstructured data in real-time environments.
Apache HBase is in use at thousands of organizations, including Adobe, Airbnb, Alibaba, Bloomberg, Flipkart, Huawei, HP, Hubspot, IBM, Microsoft, NetEase, Pinterest, Salesforce, Shopee, Tencent, Twitter, Xiaomi, and Yahoo! (now Verizon Media), among others.
Testimonials
"Congratulations on the 10th birthday of Apache HBase! Alibaba started to use HBase since January 2011 and has witnessed its growth and come along with the community through the years. The Apache HBase community has always been an open and powerful team that produced many stable, production-ready and widely used versions. Today at Alibaba, we have HBase clusters with more than 10k nodes serving hundreds of petabytes of data, as well as more than 1,000 enterprise HBase users on Alibaba Cloud. We will continue collaborating with and contributing to the HBase community and wish us all ongoing success in future!"
—Chunhui Shen and Yu Li, members of the HBase team at Alibaba
"I have worked with Apache HBase for many years and I think it is a great product. it does what it says on the tin so to speak. Ironically if you look around the NoSQL competitors, most of them are supported by start-ups, whereas HBase is only supported as part of Apache suite of products by vendors like Cloudera, Hortonworks, MapR, etc. For those who would prefer to use SQL on top, there is Apache Phoenix around which makes life easier for the most SQL-savvy world to work on HBase: problem solved. For TCO, HBase is still value for money compared to others. You don't need expensive RAM or SSD with HBase. That makes it easy to onboard it in no time. Also HBase can be used in a variety of different business applications, whereas other commercial ones are focused on narrower niche markets. Least but last happy 10th anniversary and hope HBase will go from strength to strength and we will keep using it for years to come!"
—Dr. Mich Talebzadeh, Chief Data Architect, Big Data
"Congratulations on the 10th anniversary of Apache HBase! Xiaomi started to use HBase in 2012, when our business started booming. Many key Xiaomi products and services, as well as Xiaomi's data analytics platform, require a new system to provide quick and random access to billions of rows of structured and semi-structured data. Traditional solutions are not able to handle the large volume of data brought by the quickly increasing Xiaomi user base. Among several available options, we choose HBase not only because it provides a rich set of features and excellent performance specs, but also because it has a very active, open and friendly community. Embracing open source has been part of Xiaomi's engineering culture, and our deep involvement in the development of Apache HBase demonstrates the best practices of Xiaomi's open source strategy. In the past several years, we have contributed tons of bug fixes and important features to HBase, and, in the meantime, we have contributed 9 committers and 3 PMC members to the HBase community. Looking forward, we will continue to work closely with the Apache HBase community to help the project grow, and we wish Apache HBase a wonderful future!"
—Dr. Baoqiu Cui, Vice President of Xiaomi Corporation and Technical Committee Chairman
“Congratulations on the 10th anniversary of Apache HBase, it’s great to see how the project has developed over the years and continues to have good community support around it! Salesforce has a large global footprint of Apache HBase in production storing multiple petabytes of customer data and serving several billions of queries per day for a wide variety of use cases including security, monitoring, collaboration portals, and performance caches to scale over RDBMS limitations. HBase has played a major role in Salesforce’s customer success in the BigData storage space and we continue to invest in it as one of the pillars of our multi-substrate database strategy along with Apache Phoenix for SQL access to data stored in HBase. We have contributed many features and bug fixes to HBase over the last several years, and we look forward to continue working with the Apache HBase community to develop the project further. Here’s to many more successful years for Apache HBase!”
—Sanjeev Lakshmanan, Senior Director, Software Development, Salesforce
“Happy 10th Apache HBase! It was around 8 years ago that we started looking at HBase to include as part of our Hosted Big Data Services stack. Fast-forward to today and it continues to be a critical offering in our stack, powering a diverse set of use cases and workloads such as ad targeting, content personalization, analytics, security, monitoring, etc. HBase enables these diverse workloads thanks to it’s high-scalability, feature set and performance, all of which have been continuously refined through the years. In turn our footprint continues to grow storing petabytes of data across thousands of machines. Our success is in part thanks to the project’s success as we benefit from our collaborations, the contributions and other efforts by the community (eg mailing list, meetups, HBaseCon, etc). This is a testament to the open, friendly and dedicated community around Apache HBase which is necessary for the success of any open source project. We wish the project continued success for years to come as we continue to collaborate with and be part of the community cultivating the project.”
—Francis Liu and Thiruvel Thirumoolan, HBase Big Data Team Members, Yahoo! (now Verizon Media)
“Congratulations on the 10th anniversary of Apache HBase! It’s great to see how this project has evolved from a big data project to one that runs business critical systems and continues to accelerate with a growing community and increasing pace of development! Cloudera has over 500 customers in production using it for a range of use cases ranging from mission critical transactional applications to supporting data warehousing. Our largest customers have footprints in excess of 7,000 nodes storing over 70PB of data. Our customers choose HBase because of its resilience with some customers able to realize 100% application uptime using HBase (over the past 3 years). We plan to continue to invest in HBase (and Apache Phoenix) to ensure that we can continue to both broaden support for a variety of hybrid transactional and analytical use cases and deepen support for existing use cases. Here's to many more successful years!"
—Arun C. Murthy, Chief Product Officer, Cloudera
“Many Congratulations to the Apache HBase community on the 10th anniversary. Apache HBase provides rich functions and excellent performance, and has an open and friendly community. Huawei started using HBase since 2010: HBase is widely used by multiple solutions of Huawei running on more than 10,000 nodes, storing hundreds of PBs data to meet our requirements. Huawei FusionInsight provides the Best Practices of Huawei for HBase, which serves a lot of customers across many industries such as finance, operators, government, energy, medical, manufacturing, and transportation. Meanwhile, Huawei team members contributed a lot of bug fixes and features to HBase, successfully hosted the first HBase Asia Technology Conference HBaseCon Asia 2017 at Shenzhen. Going forward, Huawei will continue to work closely with the Apache HBase community to promote community development.”
—Wei Zhi, Kai Mo and Pankaj Kumar, members of the HBase team at Huawei
“Happy 10th anniversary, HBase! At Ultra Tendency, you have been the backbone of our Dual Lambda Streaming Architecture for many years! You have served billions of queries to our customers without interruption and at low latency. Your architecture guaranteed that you were always there when we needed you, never letting us or our customers down. You are the reason why our European clients today are running flourishing new business models backed by low-latency streaming products. Our committers and contributors will continue to fix bugs and provide feature enhancements. Ultra Tendency wishes you a bright and successful future!”
—Jan Hentschel, Chief Information Officer, Ultra Tendency
“Congratulations on the 10th anniversary of Apache HBase, I can't believe it's been 10 years since the first day when I tried to use Apache HBase and its ecosystem to help the business and company. Also, it is so great to see many colleagues and friends work, discuss, cooperate together to make this system become better. Some of them also make great career development and some are still progress. Shopee, one of the biggest e-commerce platforms in Southeast Asia, has several large Apache HBase clusters in production to support businesses that depend on several billions of queries per day. Apache HBase has played a significant role in Shopee and it is still in expansion along with the business growth of Shopee. Apache HBase, as well as the community, helps us a lot and we also will continue to make contributions to Apache HBase. Looking forward to keeping working with the Apache HBase community to develop the project and its ecosystem further.”
—Li Luo, Manager of Data Infra department, Shopee
”At Microsoft, our mission is to empower every person and every organization on the planet to achieve more, and it’s this mission that drives our commitment to open source. Congratulations to the Apache HBase community on its 10th anniversary. Microsoft has been part of the vibrant HBase community since 2014, today we are proud to serve the numerous enterprise customers across industries who are leveraging HBase in Azure HDInsight for their most critical business applications.”
—Tomas Talius, Director of Engineering, Azure Data Services, Microsoft
Availability and Oversight
Apache HBase software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache HBase, visit http://hbase.apache.org/ and https://twitter.com/HBase
About The Apache Software Foundation (ASF)
Established in 1999, The Apache Software Foundation (ASF) is the world’s largest Open Source foundation, stewarding 200M+ lines of code and providing more than $20B+ worth of software to the public at 100% no cost. The ASF’s all-volunteer community grew from 21 original founders overseeing the Apache HTTP Server to 813 individual Members and 206 Project Management Committees who successfully lead 350+ Apache projects and initiatives in collaboration with 7,600+ Committers through the ASF’s meritocratic process known as "The Apache Way". Apache software is integral to nearly every end user computing device, from laptops to tablets to mobile devices across enterprises and mission-critical applications. Apache projects power most of the Internet, manage exabytes of data, execute teraflops of operations, and store billions of objects in virtually every industry. The commercially-friendly and permissive Apache License v2 is an Open Source industry standard, helping launch billion dollar corporations and benefiting countless users worldwide. The ASF is a US 501(c)(3) not-for-profit charitable organization funded by individual donations and corporate sponsors including Aetna, Alibaba Cloud Computing, Amazon Web Services, Anonymous, ARM, Baidu, Bloomberg, Budget Direct, Capital One, CarGurus, Cerner, Cloudera, Comcast, Facebook, Google, Handshake, Huawei, IBM, Indeed, Inspur, Leaseweb, Microsoft, ODPi, Pineapple Fund, Private Internet Access, Red Hat, Target, Tencent, Union Investment, Verizon Media, and Workday. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "HBase", "Apache HBase", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 01:00PM May 13, 2020
by Sally in General |
|
The Apache Software Foundation Announces Apache® Hadoop® v3.2.0
- ABFS Filesystem connector —supports the latest Azure Datalake Gen2 Storage;
- Enhanced S3A connector —including better resilience to throttled AWS S3 and DynamoDB IO;
- Node Attributes Support in YARN —helps to tag multiple labels on the nodes based on its attributes and supports placing the containers based on expression of these labels;
- Storage Policy Satisfier —supports HDFS (Hadoop Distributed File System) applications to move the blocks between storage types as they set the storage policies on files/directories;
- Hadoop Submarine —enables data engineers to easily develop, train and deploy deep learning models (in TensorFlow) on very same Hadoop YARN cluster;
- C++ HDFS client —helps to do async IO to HDFS which helps downstream projects such as Apache ORC;
- Upgrades for long running services —supports in-place seamless upgrades of long running containers via YARN Native Service API (application program interface) and CLI (command-line interface).
Posted at 11:00AM Jan 23, 2019
by Sally in General |
|
The Apache Software Foundation Announces Apache® Oozie(TM) v5.0.0
- moved launcher from MapReduce mapper to YARN ApplicationMaster;
- switched from Tomcat 6 to embedded Jetty 9;
- updated third party libraries;
- completely rewritten workflow graph generator;
- JDK 8 support;
- deprecated Instrumentation in favor of Metrics;
- added indexes to speed up DB queries; and
- fixed CVE-2017-15712
Posted at 02:06PM Apr 18, 2018
by Sally in General |
|
The Apache Software Foundation Announces Apache® Trafodion™ as a Top-Level Project
Apache Trafodion extends Apache Hadoop to guarantee transactional integrity and operational workloads for new kinds of Big Data applications that run on Hadoop.
"We are very excited to have been established as an Apache Top-Level Project," said Pierre Smits, Vice President of Apache Trafodion. "Graduation is a terrific milestone that culminates 2.5 years of contributions from around the globe to establishing a growing community committed to delivering a high-grade OLTP solution on top of the Apache Hadoop ecosystem."
- Fully functional ANSI SQL support, leveraging existing SQL skills;
- Distributed ACID data protection, guaranteeing data consistency across multiple tables and rows;
- Compile-Time and Run-Time Optimizers, delivering performance improvements for OLTP workloads;
- Parallel-aware Query Optimizer, supporting large data sets;
- Apache Spark integration, supporting streaming analysis;
- Interoperability with existing Apache Hadoop tools and solutions, such as Hive, Ambari, Flume, Kafka, and Oozie; and
- Apache Hadoop and Linux distribution neutrality.
Apache Trafodion is in use at China Mobile, China Unicom, Dell EMC, Esgyn Corporation, and Millersoft Limited, among others.
"As a member of the HP Core Team responsible for releasing Trafodion to The Apache Software Foundation, and responsible for the project’s name, I'm thrilled to see the Trafodion community be recognized with this major achievement. Congratulations to all who made it possible," said Ken Holt, COO at Esgyn Corporation. "Trafodion is the heart of EsgynDB, and the community is like its lifeblood — we at Esgyn are committed to continue to grow and support the community."
"Congratulations to the Trafodion community for becoming an Apache Top-Level Project," said Tianduo Gao, Senior Development Engineer of Software Technology (Suzhou) at China Mobile. "We are planning to use Trafodion to expand the business of China Mobile's Big Data platform: our data statistics of 4G real-time business in the country and provinces are more efficient than ever before."
"Becoming a core Apache Project is a major step forward for Trafodion. It will give Millersoft the confidence to introduce the technology to our Big Data clients," said Calum Miller, Director of Millersoft Limited. "Testing of our Open Source Data Vault engine running on top of Apache Trafodion is going well and we look forward to announcing a fully integrated product shortly."
"Apache Trafodion enhanced the operational efficiency of our Big Data platforms, and brought us better customer experience and broader application scenarios," said Charles Yu, Managing Director, Application Services at Dell EMC.
"Congratulations to Trafodion for officially becoming part of the Apache open source ecosystem," said Qingquan Gu, Senior Development Engineer of Internet of Things Marketing Service Center at China Unicom. "Using Trafodion provided China Unicom with the ability to build and integrate Big Data platforms, enhanced our operational efficiency, and brought us better customer experience."
"Becoming an Apache Top-Level Project is only the beginning," added Smits. "We are looking forward to growing the Trafodion community, reaching new adopters and contributors, and fostering a strong ecosystem around the project."
Availability and Oversight
Apache Trafodion software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Trafodion, visit http://trafodion.apache.org/ and https://twitter.com/Trafodion
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,300 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hewlett Packard, Hortonworks, Huawei, IBM, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, Union Investment, WANdisco, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Trafodion", "Apache Trafodion", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:00AM Jan 10, 2018
by Sally in General |
|
The Apache Software Foundation Announces Apache® Hadoop® v3.0.0 General Availability
- HDFS erasure coding —halves the storage cost of HDFS while also improving data durability;
- YARN Timeline Service v.2 (preview) —improves the scalability, reliability, and usability of the Timeline Service;
- YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads;
- Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines;
- Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and
- Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.
Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of known users at https://wiki.apache.org/hadoop/PoweredBy
"It's tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today's release," said Doug Cutting, original co-creator of Apache Hadoop. "With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems. The Open Source community continues to respond to industrial demands."
Apache Hadoop's diverse community enjoys continued growth amongst the ASF's most active projects, and remains at the forefront of more than three dozen Apache Big Data projects.
Apache Hadoop has received countless awards, including top prizes at the Media Guardian Innovation Awards and Duke's Choice Awards, and has been hailed by industry analysts:
"...the lifeblood of organizational analytics…" —Gartner
"Hadoop Is Here To Stay" —Forrester
"...today Hadoop is the only cost-sensible and scalable open source alternative to commercially available Big Data management packages. It also becomes an integral part of almost any commercially available Big Data solution and de-facto industry standard for business intelligence (BI)." —MarketAnalysis.com/Market Research Media
"...commanding half of big data’s $100 billion annual market value...Hadoop is the go-to big data framework." —BigDataWeek.com
"Hadoop, and its associated tools, is currently the 'big beast' of the big data world and the Hadoop environment is undergoing rapid development..." —Bloor Research
"The opportunity to effect meaningful, even fundamental change in the Apache Hadoop project remains open," added Douglas. "Our new contributors uprooted the project from its historical strength in Web-scale analytics by introducing powerful, proven abstractions for data management, security, containerization, and isolation. Apache Hadoop drives innovation in Big Data by growing its community. We hope this latest release continues to draw developers, operators, and users to the ASF."
Catch Apache Hadoop in action at the Strata Data Conference in San Jose, CA, 5-8 March 2018, and at dozens of Hadoop Meetups held around the world.
Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server —the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, Union Investment, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:00AM Dec 14, 2017
by Sally in General |
|
The Apache Software Foundation Announces Apache® Impala™ as a Top-Level Project
- A familiar SQL interface that data scientists and analysts already know;
- The ability to query high volumes of data (Big Data) in Apache Hadoop;
- Distributed queries in a cluster environment, for convenient scaling and to make use of cost-effective commodity hardware;
- The ability to share data files between different components with no copy or export/import step; for example, to write with Apache Pig, transform with Hive and query with Impala. Impala can read from and write to Hive tables, enabling simple data interchange using Impala for analytics on Hive-produced data; and
- A single system for big data processing and analytics, so customers can avoid costly modeling and ETL just for analytics.
"We use Apache Impala to boost performance of our SQL queries against our data lake," said Matteo Coloberti, Head of Analytics at Jobrapido. "Impala is an incredible service that gives us impressive performance on queries."
"We used to distribute Microsoft Excel reports to clients every one or two days but now they can search on their own by customer, sales deal, or even service type," said Andy Frey, CTO of Marketing Associates. "Apache Impala is used to query millions of rows to identify specific records that match the clients' criteria. We've even given clients a 'Query Hadoop' option that allows them to create simple SQL statements and query Hadoop directly via Impala. We're able to offer a faster, richer, and more accurate selection of services without the labor or latency concerns that we used to have."
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
© The Apache Software Foundation. "Apache", "Impala", "Apache Impala", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "Kudu", "Apache Kudu", "Pig", "Apache Pig", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:00AM Nov 28, 2017
by Sally in General |
|
The Apache Software Foundation Announces Momentum With Apache® Hadoop® v2.8
Major release of the cornerstone of the Big Data ecosystem, from which dozens of Apache Big Data projects and countless industry solutions originate.
Forest Hill, MD —5 June 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today momentum with Apache® Hadoop® v2.8, the latest version of the Open Source software framework for reliable, scalable, distributed computing.
Now ten years old, Apache Hadoop dominates the greater Big Data ecosystem as the flagship project and community amongst the ASF's more than three dozen projects in the category.
"Apache Hadoop 2.8 maintains the project's momentum in its stable release series," said Chris Douglas, Vice President of Apache Hadoop. "Our community of users, operators, testers, and developers continue to evolve the thriving Big Data ecosystem at the ASF. We're committed to sustaining the scalable, reliable, and secure platform our greater Hadoop community has built over the last decade."
Apache Hadoop supports processing and storage of extremely large data sets in a distributed computing environment. The project has been regularly lauded by industry analysts worldwide for driving market transformation. Forrester Research estimates that firms will spend US$800M in Hadoop software and related services in 2017. According to Zion Market Research, the global Hadoop market is expected to reach approximately US$87.14B by 2022, growing at a CAGR of around 50% between 2017 and 2022.
- Several important security related enhancements, including Hadoop UI protection of Cross-Frame Scripting (XFS) which is an attack that combines malicious JavaScript with an iframe that loads a legitimate page in an effort to steal data from an unsuspecting user, and Hadoop REST API protection of Cross site request forgery (CSRF) attack which attempt to force an authenticated user to execute functionality without their knowledge.
- Support for Microsoft Azure Data Lake as a source and destination of data. This benefits anyone deploying Hadoop in Microsoft's Azure Cloud. The Azure Data Lake service was actually developed for Hadoop and analytics workloads.
- The "S3A" client for working with data stored in Amazon S3 has been radically enhanced for scalability, performance, and security. The performance enhancements were driven by Apache Hive and Apache Spark benchmarks. In Hive TCP-DS benchmarks, Apache Hadoop is currently faster working with columnar data stored in S3 than Amazon EMR's closed-source connector. This shows the benefit of collaborative Open Source development.
- Several WebHDFS related enhancements include integrated CSRF prevention filter in WebHDFS, support OAuth2 in WebHDFS, disallow/allow snapshots via WebHDFS, and more.
- Integration with other applications has been improved with a separate jar for the hdfs-client than the hadoop-hdfs JAR with all the server side code. Downstream projects that access HDFS can depend on the hadoop-hdfs-client module to reduce the amount of transitive classpath dependencies.
- YARN NodeManager Resource Reconfiguration through RM Admin CLI for a live cluster that allows YARN clusters to have a more flexible resource model especially for a Cloud deployment.
In addition to physical Hadoop clusters, where the majority of storage and computation lies, Apache Hadoop is very popular within Cloud infrastructures. Contributions from Apache Hadoop's diverse community includes improvements provided by Cloud infrastructure vendors and large Hadoop-in-Cloud users. These improvements include: Azure and S3 storage and YARN reconfiguration in particular, improve Hadoop's deployment on and integration with Cloud Infrastructures. The improvements in Hadoop 2.8 enable Cloud-deployed clusters to be more dynamic in sizing, adapting to demand by scaling up and down.
"My colleagues and I are happy that tests of Apache Hive and Hadoop 2.8 show that we are able to provide a similar experience reading data in from S3 as Amazon EMR, with its closed-source fork/rewrite of S3," said Steve Loughran, member of the Apache Hadoop Project Management Committee.
Hailed as a "Swiss army knife of the 21st century" by the Media Guardian Innovation Awards and "the most important software you’ve never heard of…helped enable both Big Data and Cloud computing" by author Thomas Friedman, Apache Hadoop is used by an array of companies such as Alibaba, Amazon Web Services, AOL, Apple, eBay, Facebook, foursquare, IBM, HP, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, SAP, Tencent, Teradata, Tesla Motors, Uber, and Twitter. Yahoo, an early pioneer, hosts the world's largest known Hadoop production environment to date, spanning more than 38,000 nodes.
Catch Apache Hadoop in action at DataWorks Summit 13-15 June 2017 in San Jose, CA.
Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:00AM Jun 05, 2017
by Sally in General |
|
The Apache Software Foundation Announces Apache® Ranger™ as a Top-Level Project
- Ranger Key Management Service (compatible with Hadoop’s native KMS API to store and manage encryption keys for HDFS Transparent Data Encryption);
- Dynamic column masking and row filtering;
- Dynamic policy conditions (such as prohibition of toxic joins);
- User context enrichers (such as geo-location and time of day mappings); and
- Classification or tag based policies for Hadoop ecosystem components via integration with Apache Atlas.
# # #
Posted at 11:00AM Feb 08, 2017
by Sally in General |
|
The Apache Software Foundation Announces Apache® Kudu™ v1.0
- Support for redundant and highly available Kudu Master nodes;
- Support for manual management of range partitioning, critical for time series workloads;
- Rewritten integration with Apache Spark, including Spark SQL and Data Frame APIs;
- An officially supported client library for Python; and
- Substantial performance improvements both for random access and analytic workloads.
"Kudu 1.0 is the most performant, full-featured, and stable release of Kudu yet. Every day we see new users joining the community, deploying Kudu alongside other Apache projects such as Impala and Spark to solve valuable real-time use cases," added Lipcon. "Kudu expands the Apache Hadoop ecosystem's capabilities, enabling real-time data ingestion and updates while also serving high performance analytics with a substantially simplified architecture."
"The availability of Kudu 1.0 is an exciting milestone and my data science team is eager to evaluate it. We do a lot of work with time series workflows in science data systems and the speed-ups there should really help in our deployment of Kudu," said Chris Mattmann, Chief Architect in the Instrument and Science Data Systems Section at NASA Jet Propulsion Laboratory, and member of the Apache Kudu Project Management Committee.
Posted at 09:30AM Sep 20, 2016
by Sally in General |
|
Apache Software Foundation Announces Apache® Twill™ as a Top-Level Project
- JavaOne, 18-22 September 2016 in San Francisco
- Strata+Hadoop World, 27-29 September 2016 in New York City
Posted at 10:00AM Jul 27, 2016
by Sally in General |
|
The Apache Software Foundation Announces Apache® Kudu™ as a Top-Level Project
Posted at 08:11PM Jul 26, 2016
by Sally in General |
|
The Apache Software Foundation Announces Apache® Zeppelin™ as a Top-Level Project
- Multi-purpose --features data ingestion, exploration, analysis, visualization, and collaboration;
- Robust --supports 20+ more backend systems, including Apache Spark, Apache Flink, Apache Hive, Python, R, and any JDBC (Java Database Connectivity);
- Easy to deploy --built on top of modern Web technologies (provides built-in Apache Spark integration, eliminating the need to build a separate module, plugin, or library);
- Easy to use --with built-in visualizations and dynamic forms;
- Flexible --allows users to mix different languages, exchange data between backends, adjust the layout;
- Extensible --with pluggable architecture for interpreters, notebook storages, authentication, and visualizations (in progress); and
- Advanced --allows interaction between custom visualizations and cluster resources
Posted at 10:00AM May 25, 2016
by Sally in General |
|
The Apache Software Foundation Announces Apache® Apex™ as a Top-Level Project
- Apache: Big Data 9-12 May 2016 in Vancouver http://apachecon.com/
- Hadoop Summit 28-30 June 2016 in San Jose, CA http://hadoopsummit.org/san-jose/
- Spark & Hadoop User Group Munich 19 July 2016 http://www.meetup.com/Hadoop-User-Group-Munich/events/230313355/
Posted at 01:32PM Apr 25, 2016
by Sally in General |
|
Announcing creation of the Hadoop Software Foundation
The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the creation of the independent Hadoop Software Foundation (HSF), the new governing body for big data's biggest software program, Apache® Hadoop®.
[Read More]
Posted at 12:04PM Apr 01, 2016
by Sally in General |
|
The Apache Software Foundation Announces Apache™ Drill™ 1.0
The business intelligence (BI) partner ecosystem is embracing the power of Apache Drill. Organizations such as Information Builders, JReport (Jinfonet Software), MicroStrategy, Qlik®, Simba, Tableau, and TIBCO, are working closely with the Drill community to interoperate BI tools with Drill through standard ODBC and JDBC connectivity. This collaboration enables end users to explore data by leveraging sophisticated visualization tools and advanced analytics.
"We've been using Apache Drill for the past six months," said Andrew Hamilton, CTO of Cardlytics. "Its ease of deployment and use along with its ability to quickly process trillions of records has made it an invaluable tool inside Cardlytics. Queries that were previously insurmountable are now common occurrence. Congratulations to the Drill community on this momentous occasion."
"Drill's columnar execution engine and optimizer take full advantage of Apache Parquet's columnar storage to achieve maximum performance," said Julien Le Dem, Technical Lead of Analytics Data Pipeline at Twitter and Vice President of Apache Parquet. "The Drill team has been a key contributor to the Parquet project, including recent enhancements to Parquet types and vectorization. The Drill team’s involvement in the Parquet community is instrumental in driving the standard."
"Apache Drill 1.0 raises the bar for secure, reliable and scalable SQL-on-Hadoop," said Piyush Bhargava, distinguished engineer, IT, Cisco Systems. "Because Drill integrates with existing data virtualization and visualization tools, we expect it will improve adoption of self-service data exploration and large-scale BI queries on our advanced Hadoop platform at Cisco."
Posted at 04:44PM May 19, 2015
by Sally in General |
|