Entries tagged [bigdata]
The Apache® Software Foundation Announces Agenda, Keynotes, and Sponsors for ApacheCon™ North America 2018
Community-driven conference series to gather dozens of Apache projects and their communities in Montréal to share and learn about the latest Open Source innovations in Big Data, Cloud, Finance, IoT, Machine Learning, Search, Servers, and more in a collaborative, vendor-neutral environment
Wakefield, MA —17 May 2018— The Apache® Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the program for its official conference series, ApacheCon™, taking place 24-27 September 2018 in Montréal, Canada.
- Cliff Schmidt, Apache Member, former ASF Board member, and Literacy Bridge founder on how Amplio uses technology to educate and improve the quality of life of people living in very difficult parts of the world.
- Myrle Krantz, Apache Member and Vice President Apache Fineract, on how Open Source banking is helping the global fight against poverty.
- Bridget Kromhout, Principal Cloud Developer Advocate at Microsoft, on the really hard problem in software: the people.
- Euan McLeod, VP VIPER at Comcast, on the many ways that Apache software delivers your favorite shows to your living room.
Posted at 10:50AM May 17, 2018
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® Oozie(TM) v5.0.0
- moved launcher from MapReduce mapper to YARN ApplicationMaster;
- switched from Tomcat 6 to embedded Jetty 9;
- updated third party libraries;
- completely rewritten workflow graph generator;
- JDK 8 support;
- deprecated Instrumentation in favor of Metrics;
- added indexes to speed up DB queries; and
- fixed CVE-2017-15712
Posted at 02:06PM Apr 18, 2018
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® Hadoop® v3.0.0 General Availability
- HDFS erasure coding —halves the storage cost of HDFS while also improving data durability;
- YARN Timeline Service v.2 (preview) —improves the scalability, reliability, and usability of the Timeline Service;
- YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads;
- Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines;
- Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and
- Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.
Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo. The project maintains a list of known users at https://wiki.apache.org/hadoop/PoweredBy
"It's tremendous to see this significant progress, from the raw tool of eleven years ago, to the mature software in today's release," said Doug Cutting, original co-creator of Apache Hadoop. "With this milestone, Hadoop better meets the requirements of its growing role in enterprise data systems. The Open Source community continues to respond to industrial demands."
Apache Hadoop's diverse community enjoys continued growth amongst the ASF's most active projects, and remains at the forefront of more than three dozen Apache Big Data projects.
Apache Hadoop has received countless awards, including top prizes at the Media Guardian Innovation Awards and Duke's Choice Awards, and has been hailed by industry analysts:
"...the lifeblood of organizational analytics…" —Gartner
"Hadoop Is Here To Stay" —Forrester
"...today Hadoop is the only cost-sensible and scalable open source alternative to commercially available Big Data management packages. It also becomes an integral part of almost any commercially available Big Data solution and de-facto industry standard for business intelligence (BI)." —MarketAnalysis.com/Market Research Media
"...commanding half of big data’s $100 billion annual market value...Hadoop is the go-to big data framework." —BigDataWeek.com
"Hadoop, and its associated tools, is currently the 'big beast' of the big data world and the Hadoop environment is undergoing rapid development..." —Bloor Research
"The opportunity to effect meaningful, even fundamental change in the Apache Hadoop project remains open," added Douglas. "Our new contributors uprooted the project from its historical strength in Web-scale analytics by introducing powerful, proven abstractions for data management, security, containerization, and isolation. Apache Hadoop drives innovation in Big Data by growing its community. We hope this latest release continues to draw developers, operators, and users to the ASF."
Catch Apache Hadoop in action at the Strata Data Conference in San Jose, CA, 5-8 March 2018, and at dozens of Hadoop Meetups held around the world.
Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server —the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, Huawei, IBM, Inspur, iSIGMA, ODPi, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, Union Investment, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:00AM Dec 14, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® RocketMQ™ as a Top-Level Project
Forest Hill, MD –25 September 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® RocketMQ™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache RocketMQ is an Open Source distributed messaging and streaming Big Data platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
"I am very excited to see Apache RocketMQ as a Top-Level Project and I would like to thank our mentors for all their help, the Apache Incubator Project Management Committee for its advice and guidance, everyone in the RocketMQ community, and Alibaba for publishing the research upon which RocketMQ is based," said Xiaorui Wang, Vice President of Apache RocketMQ. "During the incubation process, the RocketMQ community worked very hard to develop high-quality distributed software for messaging and streaming, in an open and inclusive manner in accordance with the Apache Way."
- Low latency; more than 99.6% response latency within 1 millisecond under high pressure;
- Finance-oriented, high availability with tracking and auditing features;
- Industry-sustainable, trillion-level message capacity guaranteed;
- Vendor-neutral, support multiple messaging protocols like JMS and OpenMessaging;
- Big Data friendly, batch transferring with versatile integration for flooding throughput; and
- Massive accumulation, given sufficient disk space, accumulate messages without performance loss.
"New participants are more than welcome to join the project, To serve the community better, we created and maintained two repositories, one as our kernel version and the other one is for community contributions. The community contributed some integrated projects with some other Apache TLPs like Apache Storm, Apache Ignite, Apache Spark and Apache Flume," said Xinyu "yukon" Zhou, member of the Apache RocketMQ Project Management Committee. "We enthusiastically look forward to working together with all contributors to Apache RocketMQ in order to advance the state-of-the-art distributed messaging engine."
Availability and Oversight
Apache RocketMQ software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache RocketMQ, visit http://rocketmq.apache.org/ and https://twitter.com/ApacheRocketMQ
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 650 individual Members and 6,200 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, Inspur, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "RocketMQ", "Apache RocketMQ", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 10:00AM Sep 25, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® MADlib™ as a Top-Level Project
Big Data machine-learning library used for scalable in-database analytics
Forest Hill, MD –22 August 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® MADlib™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache MADlib is a comprehensive library for scalable in-database analytics. It provides parallel implementations of machine learning, graph, mathematical and statistical methods for structured and unstructured data.
"Graduating as a Top-Level Project is a very important milestone for Apache MADlib," said Aaron Feng, Vice President of Apache MADlib. "During the incubation process, the MADlib community worked very hard to develop high quality software for in-database analytics, in an open and inclusive manner in accordance with the Apache Way."
MADlib grew out of discussions between database engine developers, data scientists, IT architects and academics interested in new approaches to scalable, sophisticated in-database analytics. These discussions were written up in a paper from VLDB 2009 [1] that coined the term "MAD Skills" for data analysis. The MADlib software project began the following year as a collaboration between researchers at UC Berkeley and engineers and computer scientists at Pivotal (formerly EMC/Greenplum). In September 2015, MADlib joined the ASF community as an incubating project.
MADlib is deployed on a wide variety of industry and academic projects across many different verticals, including automotive, consumer, finance, government, healthcare, and telecommunications.
"MADlib was conceived from the outset as an open-source meeting ground for software developers, computing researchers and data scientists to collaborate on scalable, in-database machine learning and statistics," said Joe Hellerstein, Professor of Computer Science at UC Berkeley, Co-Founder and Chief Strategy Officer at Trifacta, and one of the original authors of MADlib. "It has been great to witness the growth of the MADlib community and codebase as an ASF incubating project, and I look forward to this continuing as a Top-Level Project."
"At Pivotal, we have seen our customers successfully deploy MADlib on large scale data science projects across a wide variety of industry verticals," said Elisabeth Hendrickson, Vice President, R&D for Data at Pivotal. "As MADlib graduates to a Top-Level Project at the ASF, we anticipate increased adoption in the enterprise given the mature level of the codebase and the active developer community."
"The potential of the Apache MADlib project is unbounded," said Jim Jagielski, Vice Chairman of the ASF. "The ability to perform in-depth and detailed analytics, on both structured and unstructured data, using SQL enables MADlib to be applicable in scenarios where others simply can't compete. As not only interest in, but real-world usage of, machine learning becomes common place, MADlib joins the growing roster of Apache projects that define innovation."
"Apache MADlib is a great example of the diversity at Apache," said Ted Dunning, Apache MADlib Incubator Mentor and Member of the ASF Board of Directors. "MADlib does state-of-the-art machine learning, but does as an inherent part of a database. This is a radical approach that can provide important design flexibility. I am excited to see MADlib become a fully fledged project at Apache."
"New participants are more than welcome to join the project," added Feng. "We enthusiastically look forward to working together with all contributors to Apache MADlib in order to advance the state-of-the-art of scale-out data science tools."
[1] http://dl.acm.org/citation.cfm?id=1687576
Availability and Oversight
Apache MADlib software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache MADlib, visit http://madlib.apache.org/ and https://twitter.com/ApacheMADlib
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 650 individual Members and 6,200 Committers across six continents successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, Inspur, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "MADlib", "Apache MADlib", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 10:00AM Aug 22, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® Fluo™ as a Top-Level Project
Newest addition to Apache Big Data ecosystem used for continual, incremental processing of data at petabyte scale
Forest Hill, MD –26 July 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Fluo™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache Fluo is a distributed system for incrementally processing large data sets stored in Apache Accumulo (the sorted, distributed key/value store based on Google's Bigtable, built on top of Apache Hadoop, Apache Zookeeper, and Apache Thrift). With Fluo, users can continuously join new data into large existing data sets without reprocessing all data. Unlike batch and streaming frameworks, Fluo offers much lower latency and can operate on extremely large data sets.
"I am very excited to see Apache Fluo graduate and I would like to thank our mentors for all their help, the Apache Incubator Project Management Committee for its advice and guidance, everyone in the Fluo community, and Google for publishing the research upon which Fluo is based," said Keith Turner, Vice President of Apache Fluo. "As a result of collaboration within the community, we are graduating with a beautifully designed piece of software."
Based on Percolator (built on top of Bigtable to support incremental updates to the search index at Google), Fluo makes it possible to continually-update the results of a large-scale computation, index, or analytic as new data is discovered.
"Apache Fluo is a very clever piece of software, elegantly supplementing Apache Accumulo's ability to store and maintain very large indexes," said Christopher Tubbs, ASF Member and Committer on Apache Accumulo and Apache Fluo. "Its support of transactions enables Accumulo to solve a whole new set of big data problems, and its observer framework makes designing ingest workflows fun."
An example of how Fluo works is a use case of counting phrases in unique documents. This could be accomplished by two MapReduce jobs: one job to get a unique set of documents and a following job to count phrases. Where petabytes of documents are concerned, running both jobs for a small amount of new data is inefficient. Apache Fluo enables continuous, quick computations of these two joins as new data arrives, constantly emitting deltas of phrase counts. Anything could consume the emitted deltas. For example, a query system could be continuously updated using them.
"We are excited that Fluo is becoming a Top-Level Project at the Apache Software Foundation," said Dr. Adina Crainiceanu, Apache Rya (incubating) Committer and Associate Professor, Computer Science Department, United States Naval Academy. "Heartfelt congratulations to the Fluo community for achieving this important milestone. The Apache Rya project uses the observer framework in Fluo to cache and maintain answers to complex SPARQL queries for large RDF datasets. Using cached answers greatly improves Rya's performance for complex queries. Fluo complements Rya by allowing the incremental and continuous update of the cached answers. Fluo is particularly useful because it allows updates to happen as new data is ingested, reduces updates latency, avoids stale results, and circumvents the periodical reprocessing of the entire dataset. We are confident that Apache Fluo will become one of the important frameworks for updating indexing results in a dynamic data-acquiring context."
"Fluo fulfills an important role in the Apache Hadoop ecosystem, significantly expanding existing capabilities for working with large data sets," said Billie Rinaldi, ASF Member and former Vice President of Apache Accumulo. "I was excited to see this project come to the Apache Incubator, and am even more pleased to see it graduate to a top-level Apache project."
"We welcome new users and contributors to Apache Fluo," added Turner. "If you are interested in trying Fluo, check out the Fluo Tour on the project Website. Join our mailing lists to discuss how Fluo may be a good solution for your problem, as well as for help with debugging and finding starter issues."
Catch Apache Fluo in action and meet members of the Fluo community at Accumulo Summit, 16 October 2017 in Columbia, MD. http://accumulosummit.com/
Availability and Oversight
Apache Fluo software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Fluo, visit http://fluo.apache.org/ and https://twitter.com/ApacheFluo
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit https://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Fluo", "Apache Fluo", "Accumulo", "Apache Accumulo", "Rya", "Apache Rya", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 10:00AM Jul 26, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Momentum With Apache® Hadoop® v2.8
Major release of the cornerstone of the Big Data ecosystem, from which dozens of Apache Big Data projects and countless industry solutions originate.
Forest Hill, MD —5 June 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today momentum with Apache® Hadoop® v2.8, the latest version of the Open Source software framework for reliable, scalable, distributed computing.
Now ten years old, Apache Hadoop dominates the greater Big Data ecosystem as the flagship project and community amongst the ASF's more than three dozen projects in the category.
"Apache Hadoop 2.8 maintains the project's momentum in its stable release series," said Chris Douglas, Vice President of Apache Hadoop. "Our community of users, operators, testers, and developers continue to evolve the thriving Big Data ecosystem at the ASF. We're committed to sustaining the scalable, reliable, and secure platform our greater Hadoop community has built over the last decade."
Apache Hadoop supports processing and storage of extremely large data sets in a distributed computing environment. The project has been regularly lauded by industry analysts worldwide for driving market transformation. Forrester Research estimates that firms will spend US$800M in Hadoop software and related services in 2017. According to Zion Market Research, the global Hadoop market is expected to reach approximately US$87.14B by 2022, growing at a CAGR of around 50% between 2017 and 2022.
- Several important security related enhancements, including Hadoop UI protection of Cross-Frame Scripting (XFS) which is an attack that combines malicious JavaScript with an iframe that loads a legitimate page in an effort to steal data from an unsuspecting user, and Hadoop REST API protection of Cross site request forgery (CSRF) attack which attempt to force an authenticated user to execute functionality without their knowledge.
- Support for Microsoft Azure Data Lake as a source and destination of data. This benefits anyone deploying Hadoop in Microsoft's Azure Cloud. The Azure Data Lake service was actually developed for Hadoop and analytics workloads.
- The "S3A" client for working with data stored in Amazon S3 has been radically enhanced for scalability, performance, and security. The performance enhancements were driven by Apache Hive and Apache Spark benchmarks. In Hive TCP-DS benchmarks, Apache Hadoop is currently faster working with columnar data stored in S3 than Amazon EMR's closed-source connector. This shows the benefit of collaborative Open Source development.
- Several WebHDFS related enhancements include integrated CSRF prevention filter in WebHDFS, support OAuth2 in WebHDFS, disallow/allow snapshots via WebHDFS, and more.
- Integration with other applications has been improved with a separate jar for the hdfs-client than the hadoop-hdfs JAR with all the server side code. Downstream projects that access HDFS can depend on the hadoop-hdfs-client module to reduce the amount of transitive classpath dependencies.
- YARN NodeManager Resource Reconfiguration through RM Admin CLI for a live cluster that allows YARN clusters to have a more flexible resource model especially for a Cloud deployment.
In addition to physical Hadoop clusters, where the majority of storage and computation lies, Apache Hadoop is very popular within Cloud infrastructures. Contributions from Apache Hadoop's diverse community includes improvements provided by Cloud infrastructure vendors and large Hadoop-in-Cloud users. These improvements include: Azure and S3 storage and YARN reconfiguration in particular, improve Hadoop's deployment on and integration with Cloud Infrastructures. The improvements in Hadoop 2.8 enable Cloud-deployed clusters to be more dynamic in sizing, adapting to demand by scaling up and down.
"My colleagues and I are happy that tests of Apache Hive and Hadoop 2.8 show that we are able to provide a similar experience reading data in from S3 as Amazon EMR, with its closed-source fork/rewrite of S3," said Steve Loughran, member of the Apache Hadoop Project Management Committee.
Hailed as a "Swiss army knife of the 21st century" by the Media Guardian Innovation Awards and "the most important software you’ve never heard of…helped enable both Big Data and Cloud computing" by author Thomas Friedman, Apache Hadoop is used by an array of companies such as Alibaba, Amazon Web Services, AOL, Apple, eBay, Facebook, foursquare, IBM, HP, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, SAP, Tencent, Teradata, Tesla Motors, Uber, and Twitter. Yahoo, an early pioneer, hosts the world's largest known Hadoop production environment to date, spanning more than 38,000 nodes.
Catch Apache Hadoop in action at DataWorks Summit 13-15 June 2017 in San Jose, CA.
Availability and Oversight
Apache Hadoop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Hadoop, visit http://hadoop.apache.org/ and https://twitter.com/hadoop
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 680 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "Hadoop", "Apache Hadoop", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:00AM Jun 05, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® Beam™ v2.0.0
- API stability and future compatibility within this major version;
- Stateful data processing paradigms that unlock efficient, data-dependent computations;
- Support for user-extensible file systems, with built-in support for Hadoop Distributed File System, among others; and
- A metrics subsystem for deeper insight into pipeline execution.
Posted at 10:00AM May 17, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® CarbonData™ as a Top-Level Project
Open Source Big Data analytics accelerator in use at Bank of Communications, Hulu, Huawei, SAIC Motor, Zhejiang Mobile, among others.
Forest Hill, MD –1 May 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® CarbonData™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data.
"We are very proud to complete the incubation process and graduate as an Apache Top-Level Project," said Liang Chen, Vice President of Apache CarbonData. "The CarbonData community grew rapidly over last ten months, both in terms of size and diversity. Since entering the Apache Incubator, we have completed 4 releases, and exceeded 90 contributors from 10 different organizations."
With the aim of using a unified file format to satisfy all kinds of data analysis cases, Apache CarbonData seamlessly integrates with Hadoop and Spark to improve Big Data analysis efficiency. In benchmarks, CarbonData's faster interactive query helps in speeding up queries approximately 10x faster than standard column-oriented SQL on Hadoop data stores.
Highlights include:
- Unique data organization to allow faster filtering and better compression;
- Multi-level Indexing to enable faster search and speeding up query processing;
- Deep Apache Spark Integration for dataframe + SQL compliance;
- Advanced push down optimization to minimize the amount of data being read processed, converted, transmitted, and shuffled;
- Efficient compression and global encoding schemes to further improve aggregation query performance;
- Dictionary encoding for reduced storage space and faster processing; and
- Data update + delete support using standard SQL syntax.
Apache CarbonData is in use at an array of organizations, including Bank of Communications, medical/pharma social platform DXY, Hulu, Huawei, group online retailer MEITUAN, SAIC Motor, Zhejiang Mobile, among others.
"CarbonData has very good performance as a ‘SQL on Hadoop’ solution," said Tan Sheng, Director of SAIC Motor’s Big Data team. "It is suitable for SAIC Motor to adopt as a central Big Data platform component. Not only do we use Apache CarbonData, we also actively participate in its community as contributors."
"Apache CarbonData is great, as helped our audit business to improve 7-10X performance based on 14 billion rows of data," said Wei Zhao, Senior Engineer at Bank of Communications.
"Apache CarbonData is very suitable for our filter query cases, and has averaged 20x improvement on performance," said William Zhu, Architecture team member at DXY. "And, as CarbonData supports data update and delete, this feature is very useful. We would consider CarbonData as our all-in-one solution to unify all analysis data."
CarbonData was first developed at Huawei in 2013. The project was submitted to the Apache Incubator in June 2016, and had its first official release two months later. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category.
"Apache CarbonData is a great example of the value of the incubation process," said Jean-Baptiste Onofré, Apache CarbonData Incubator Mentor and Project Management Committee member. "Helping grow the CarbonData developer and user communities has increased our visibility, which allowed us to extend our use cases and tests, and gather new ideas. The initial CarbonData committers did (and are still doing) great work to welcome new users and contributors, clearly understanding it's a step forward for the project."
"We will continue to put our efforts towards optimizing data format efficiency for Big Data ecosystem and provide an unified and high performance data storage solution," added Liang. "The Apache CarbonData community welcomes interested contributors to work with us on our journey forward."
Catch Apache CarbonData in action at ApacheCon (16-18 May/Miami), and Spark Summit (5-7 June/San Francisco).
Availability and Oversight
Apache CarbonData software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache CarbonData, visit http://carbondata.apache.org/ , https://twitter.com/ApacheCarbonDat , and https://www.facebook.com/carbondata/
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "CarbonData", "Apache CarbonData", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:01AM May 01, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® Mahout™ v0.13.0
- Collaborative filtering – mines user behavior and makes product recommendations (such as eCommerce product recommenders);
- Regression – estimates a numerical value based on values of other inputs;
- Clustering – takes items in a particular class (such as Web pages or newspaper articles) and organizes them into naturally occurring groups, such that items belonging to the same group are similar to each other; and
- Classifying – learns from existing categorizations and then assigns unclassified items to the best category.
Posted at 10:00AM May 01, 2017
by Sally Khudairi in General |
|
ApacheCon: Tomorrow's Technology Today.
- Timely Content --learn first-hand from the largest collection of global Apache project communities through detailed sessions, and standalone tracks such as Apache: Big Data, Flex Project Summit, TomcatCon, Apache: IoT, and CloudStack Collaboration Conference. Breaking industry news? You'll hear it here first.
- Innovation Insight --presentations from the Apache Incubator (the ASF's hub for Open Source innovations, where a record 64 projects are currently undergoing development) include the latest developments in data science, Cloud, embedded systems, and many other categories, as well as industry-specific areas such as climate, microfinances, and cryptography. Learn what's next.
- Knowledge Exchange --meet the people behind dozens of Apache projects through ample networking opportunities including BarCampApache, hackathons, BoFs, and corridor discussions. Driving a project in new directions? Starting a new initiative? Ideation sparks here.
- Education --gain the latest skills with in-depth tutorials, trainings, and workshops with low student-to-instructor ratio. Classes are often led by the original creators and companies behind some of the most popular projects in Open Source.
- Sponsor Showcase and Expo --engage with some of the commercial products and service providers behind Apache project communities in a friendly, relaxed, non-sales environment.
- Conference Schedule published on 9 March 2017
Posted at 07:44PM Jan 26, 2017
by Sally Khudairi in General |
|
The ASF asks: Have you met Apache Ignite?
- Data Grid --replicate or partition data in memory within the cluster;
- SQL Grid --add in-memory distributed database capabilities;
- Compute Grid --distribute computations across cluster nodes;
- Service Grid -- implement fault-tolerant microservices based solutions;
- Streaming & CEP --easily stream large volumes of data into Ignite processing them in real-time; and
- Data Structures --distribute own data structure across the cluster.
- SQL Grid now fully supports all DML commands including UPDATE, INSERT and DELETE queries. A full-fledged support of DML and SELECT statements allows to interact with Apache Ignite using standard SQL commands connecting via ODBC and JDBC drivers. This provides true cross-platform connectivity even from languages such as PHP and Ruby which are not natively supported by the project.
- Redis protocol implementation which enables users to store and retrieve distributed data from Apache Ignite cache using any Redis compatible client.
- Ignite.NET provides .NET Entity Framework 2nd Level Cache solution that stores data in the distributed Ignite cache. This is ideal for scenarios with multiple application servers using a single SQL database via Entity Framework: cached queries are shared between all machines in the cluster.
- Ignite.NET implements ASP.NET session caching provider that stores session data in the Ignite cache which distributes session state across multiple servers in order to provide high availability and fault tolerance.
- Deadlock detection mechanism has been improved and now works for optimistic transaction and near caches.
Posted at 10:10AM Jan 18, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® Eagle™ as a Top-Level Project
- Highly extensible - Apache Eagle builds its core framework around the application concept; the application itself includes the logic for monitoring source data collection, pre-processing and normalization. Developers can easily develop out-of-box monitoring applications using Eagle's application framework, and deploy into Eagle.
- Scalable - the project’s fundamental runtime is based on proven Big Data technologies, and applies a scalable core to make it adaptive according to the throughput of the data stream as well as the number of monitored applications.
- Real-time - provides state-of-the-art alert engine to identify security breaches and performance issues.
- Dynamic - users can freely enable or disable a monitoring application and dynamically change their alert policies without any impact to the underlying runtime.
"It is great to see Apache Eagle graduate to a Top Level Project within a year of time," said Seshu Adunuthula, Senior Director of Data Platforms at eBay. "It is a great product with unique position to fill the gap of monitoring and alerting large-scale distributed computing environment which is well architected to allow communities to easily implement monitoring and alerting applications on different technical domains such as networking and database clusters. I would love to see the community to grow fast in the next coming years!"
The project welcomes contributions and community participation through mailing lists, Slack channel, face-to-face Meetups, and other events.
# # #
Posted at 10:29AM Jan 10, 2017
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® Geode™ as a Top-Level Project
Posted at 10:00AM Nov 21, 2016
by Sally Khudairi in General |
|
The Apache Software Foundation Announces Apache® Kudu™ v1.0
- Support for redundant and highly available Kudu Master nodes;
- Support for manual management of range partitioning, critical for time series workloads;
- Rewritten integration with Apache Spark, including Spark SQL and Data Frame APIs;
- An officially supported client library for Python; and
- Substantial performance improvements both for random access and analytic workloads.
"Kudu 1.0 is the most performant, full-featured, and stable release of Kudu yet. Every day we see new users joining the community, deploying Kudu alongside other Apache projects such as Impala and Spark to solve valuable real-time use cases," added Lipcon. "Kudu expands the Apache Hadoop ecosystem's capabilities, enabling real-time data ingestion and updates while also serving high performance analytics with a substantially simplified architecture."
"The availability of Kudu 1.0 is an exciting milestone and my data science team is eager to evaluate it. We do a lot of work with time series workflows in science data systems and the speed-ups there should really help in our deployment of Kudu," said Chris Mattmann, Chief Architect in the Instrument and Science Data Systems Section at NASA Jet Propulsion Laboratory, and member of the Apache Kudu Project Management Committee.
Posted at 09:30AM Sep 20, 2016
by Sally Khudairi in General |
|