Entries tagged [big]
The Apache Software Foundation Announces Apache® Beam™ v2.0.0
- API stability and future compatibility within this major version;
- Stateful data processing paradigms that unlock efficient, data-dependent computations;
- Support for user-extensible file systems, with built-in support for Hadoop Distributed File System, among others; and
- A metrics subsystem for deeper insight into pipeline execution.
Posted at 10:00AM May 17, 2017 by Sally in General | |
The Apache Software Foundation Announces Apache® Samza™ v0.13
- A higher level API that developers can use this to express complex processing pipelines on streams more concisely;
- Support for running Samza applications as a lightweight embedded library without relying on YARN;
- Support for flexible deployment options;
- Support for rolling upgrade of running Samza applications;
- Improved monitoring and failure detection using a built-in heart beating mechanism;
- Enabling better integrations with other cluster-manager frameworks and environments; and
- Several bug-fixes that improve reliability, stability and robustness of data processing,
Posted at 11:00AM May 15, 2017 by Sally in General | |
The Apache Software Foundation Announces Apache® CarbonData™ as a Top-Level Project
Open Source Big Data analytics accelerator in use at Bank of Communications, Hulu, Huawei, SAIC Motor, Zhejiang Mobile, among others.
Forest Hill, MD –1 May 2017– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® CarbonData™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.
Apache CarbonData is an indexed columnar store file format for fast analytics on Big Data platforms (including Apache Hadoop, Apache Spark, among others) to help speed up queries an order of magnitude faster over petabytes of data.
"We are very proud to complete the incubation process and graduate as an Apache Top-Level Project," said Liang Chen, Vice President of Apache CarbonData. "The CarbonData community grew rapidly over last ten months, both in terms of size and diversity. Since entering the Apache Incubator, we have completed 4 releases, and exceeded 90 contributors from 10 different organizations."
With the aim of using a unified file format to satisfy all kinds of data analysis cases, Apache CarbonData seamlessly integrates with Hadoop and Spark to improve Big Data analysis efficiency. In benchmarks, CarbonData's faster interactive query helps in speeding up queries approximately 10x faster than standard column-oriented SQL on Hadoop data stores.
- Unique data organization to allow faster filtering and better compression;
- Multi-level Indexing to enable faster search and speeding up query processing;
- Deep Apache Spark Integration for dataframe + SQL compliance;
- Advanced push down optimization to minimize the amount of data being read processed, converted, transmitted, and shuffled;
- Efficient compression and global encoding schemes to further improve aggregation query performance;
- Dictionary encoding for reduced storage space and faster processing; and
- Data update + delete support using standard SQL syntax.
Apache CarbonData is in use at an array of organizations, including Bank of Communications, medical/pharma social platform DXY, Hulu, Huawei, group online retailer MEITUAN, SAIC Motor, Zhejiang Mobile, among others.
"CarbonData has very good performance as a ‘SQL on Hadoop’ solution," said Tan Sheng, Director of SAIC Motor’s Big Data team. "It is suitable for SAIC Motor to adopt as a central Big Data platform component. Not only do we use Apache CarbonData, we also actively participate in its community as contributors."
"Apache CarbonData is great, as helped our audit business to improve 7-10X performance based on 14 billion rows of data," said Wei Zhao, Senior Engineer at Bank of Communications.
"Apache CarbonData is very suitable for our filter query cases, and has averaged 20x improvement on performance," said William Zhu, Architecture team member at DXY. "And, as CarbonData supports data update and delete, this feature is very useful. We would consider CarbonData as our all-in-one solution to unify all analysis data."
CarbonData was first developed at Huawei in 2013. The project was submitted to the Apache Incubator in June 2016, and had its first official release two months later. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category.
"Apache CarbonData is a great example of the value of the incubation process," said Jean-Baptiste Onofré, Apache CarbonData Incubator Mentor and Project Management Committee member. "Helping grow the CarbonData developer and user communities has increased our visibility, which allowed us to extend our use cases and tests, and gather new ideas. The initial CarbonData committers did (and are still doing) great work to welcome new users and contributors, clearly understanding it's a step forward for the project."
"We will continue to put our efforts towards optimizing data format efficiency for Big Data ecosystem and provide an unified and high performance data storage solution," added Liang. "The Apache CarbonData community welcomes interested contributors to work with us on our journey forward."
Catch Apache CarbonData in action at ApacheCon (16-18 May/Miami), and Spark Summit (5-7 June/San Francisco).
Availability and Oversight
Apache CarbonData software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache CarbonData, visit http://carbondata.apache.org/ , https://twitter.com/ApacheCarbonDat , and https://www.facebook.com/carbondata/
About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "CarbonData", "Apache CarbonData", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 11:01AM May 01, 2017 by Sally in General | |
The Apache Software Foundation Announces Apache® Ranger™ as a Top-Level Project
- Ranger Key Management Service (compatible with Hadoop’s native KMS API to store and manage encryption keys for HDFS Transparent Data Encryption);
- Dynamic column masking and row filtering;
- Dynamic policy conditions (such as prohibition of toxic joins);
- User context enrichers (such as geo-location and time of day mappings); and
- Classification or tag based policies for Hadoop ecosystem components via integration with Apache Atlas.
# # #
Posted at 11:00AM Feb 08, 2017 by Sally in General | |
Apache Software Foundation Announces Apache® Twill™ as a Top-Level Project
- JavaOne, 18-22 September 2016 in San Francisco
- Strata+Hadoop World, 27-29 September 2016 in New York City
Posted at 10:00AM Jul 27, 2016 by Sally in General | |
The Apache Software Foundation Announces Apache® Kudu™ as a Top-Level Project
Posted at 08:11PM Jul 26, 2016 by Sally in General | |
The Apache Software Foundation Announces Apache® OODT™ v1.0
- Data ingestion and processing;
- Automatic data discovery and metadata extraction;
- Metadata management;
- Workflow processing and support; and
- Resource management
Posted at 10:00AM Jun 29, 2016 by Sally in General | |
The Apache Software Foundation Announces Apache® Bahir™ as a Top-Level Project
- streaming-akka (akka:Open Source toolkit and runtime simplifying the construction of concurrent and distributed applications on the Java Virtual Machine)
- streaming-mqtt (mqtt: lightweight messaging protocol for small sensors and mobile devices, optimized for high-latency or unreliable networks)
- streaming-twitter (Twitter: online social networking service; Bahir allows the processing of social data from Twitter)
- streaming-zeromq (zeromq: a high-performance asynchronous messaging library, aimed at use in distributed or concurrent applications)
Posted at 09:00AM Jun 29, 2016 by Sally in General | |
The Apache Software Foundation Announces Apache® Zeppelin™ as a Top-Level Project
- Multi-purpose --features data ingestion, exploration, analysis, visualization, and collaboration;
- Robust --supports 20+ more backend systems, including Apache Spark, Apache Flink, Apache Hive, Python, R, and any JDBC (Java Database Connectivity);
- Easy to deploy --built on top of modern Web technologies (provides built-in Apache Spark integration, eliminating the need to build a separate module, plugin, or library);
- Easy to use --with built-in visualizations and dynamic forms;
- Flexible --allows users to mix different languages, exchange data between backends, adjust the layout;
- Extensible --with pluggable architecture for interpreters, notebook storages, authentication, and visualizations (in progress); and
- Advanced --allows interaction between custom visualizations and cluster resources
Posted at 10:00AM May 25, 2016 by Sally in General | |
The Apache Software Foundation Announces Apache® TinkerPop™ as a Top-Level Project
The central component to Apache TinkerPop is Gremlin, a graph traversal machine and language, which makes it possible to write complex queries (called traversals) that can execute either as real-time OLTP queries, analytic OLAP queries, or a hybrid of the two.
Apache TinkerPop is in use at organizations such as DataStax and IBM, among many others. Amazon.com is currently using TinkerPop and Gremlin to process its order fullfillment graph which contains approximately one trillion edges.
The core Apache TinkerPop release provides production-ready, reference implementations of a number of different data systems including Neo4j (OLTP), Apache Giraph (OLAP), Apache Spark (OLAP), and Apache Hadoop (OLAP). However, the bulk of the implementations are maintained within the larger TinkerPop ecosystem. These implementations include commercial and Open Source graph databases and processors, Gremlin language variants for various programming languages on and off the Java Virtual Machine, visualization applications for graph analysis and many other tools and libraries. The TinkerPop ecosystem is richly supported with many options for developers to choose from.
TinkerPop originated in 2009 at the Los Alamos National Laboratory. After two major releases (TinkerPop1 in 2011 and TinkerPop2 in 2012), the project was submitted to the Apache Incubator in January 2015.
"Following in a long line of Apache projects that revolutionized entire industries, starting with with the Apache HTTP Server, continuing with Web Services, search, and Big Data technologies, Apache TinkerPop will no doubt reshape the Graph Computing landscape," said Hadrian Zbarcea, co-Vice President of ASF Fundraising and Incubator Mentor of Apache TinkerPop. "While TinkerPop has just graduated as an ASF Top Level Project, it is already seven years old, a mature technology, backed by a number of vendors, a vibrant community, and absolutely brilliant developers."
The project welcomes those interested in contributing to Apache TinkerPop. For more information, visit http://tinkerpop.apache.org/docs/3.2.0-incubating/dev/developer/#_contributing
Posted at 10:00AM May 23, 2016 by Sally in General | |
The Apache Software Foundation Announces Apache® Apex™ as a Top-Level Project
- Apache: Big Data 9-12 May 2016 in Vancouver http://apachecon.com/
- Hadoop Summit 28-30 June 2016 in San Jose, CA http://hadoopsummit.org/san-jose/
- Spark & Hadoop User Group Munich 19 July 2016 http://www.meetup.com/Hadoop-User-Group-Munich/events/230313355/
Posted at 01:32PM Apr 25, 2016 by Sally in General | |
The Apache® Software Foundation announces Apache Flink™ v1.0
- initiating backwards compatibility of public APIs against all 1.x.y versions;
- introducing functionality for complex event processing (CEP);
- supporting large state beyond memory limits;
- supporting state versioning and savepoints; and
- improving the system's monitoring functionality
- QCon (London, 7-9 March 2016)
- Strata/Hadoop World (San Jose, 28-31 March 2016)
- Hadoop Summit (Dublin, 13-14 April 2016)
- Kafka Summit (San Francisco, 26 April 2016)
- Apache: Big Data (Vancouver, 9-12 May 2016)
- OSCON (Austin, TX, 18-19 May 2016)
- Strata/Hadoop World (London, 31 May - 3 June 2016)
- Berlin Buzzwords (Berlin, 5-7 June 2016)
- Flink Forward (Berlin, 12-14 September 2016)
# # #
Posted at 12:00PM Mar 08, 2016 by Sally in General | |
The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project
- Accelerates the performance of analytical workloads by more than 100x in some cases
- Enables multi-system workloads by eliminating cross-system communication overhead
© The Apache Software Foundation. "Apache", "Apache Arrow", "Arrow", "Apache Calcite", "Calcite", "Apache Cassandra", "Cassandra", "Apache Drill", "Drill", "Apache Hadoop", "Hadoop", "Apache HBase", "HBase", "Apache Impala", "Impala", "Apache Kudu (incubating)", "Kudu (incubating)", "Apache Parquet", "Parquet", "Apache Phoenix", "Phoenix", "Apache Spark", "Spark", "Apache Storm", "Storm", "ApacheCon", and their logos are registered trademarks or trademarks of The Apache Software Foundation in the U.S. and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Posted at 12:00PM Feb 17, 2016 by Sally in General | |
The Apache® Software Foundation Announces Strong Momentum; Enters 2016 More Influential, Innovative, Efficient, and with a New Look
Since its inception, the ASF has long been recognized as a leading source for Open Source network-server, network-client, and library tools that meet the demand for interoperable, adaptable, and sustainable solutions. Its reputation for producing reliable enterprise-grade software continues to grow dramatically across several categories, most notably Big Data, where Apache powerhouses such as Hadoop, Cassandra, Storm, and many others dominate the marketplace. According to Forrester Research, 100% of enterprises will embrace the Apache Hadoop ecosystem for data storage, processing, problem solving, and predictive analytics, particularly across Cloud environments.
© The Apache Software Foundation. "Apache", "Apache Abdera", "Apache Cassandra", "Apache Cordova", "Apache Flex", "Apache Hadoop", "Apache Hive", "Apache HTTP Server", "Apache Lucene/Solr", "Apache Maven", "Apache OpenOffice", "Apache Spark", "Apache Storm", "Apache Tomcat", "Apache Zookeeper", "ApacheCon", and their logos are registered trademarks or trademarks of The Apache Software Foundation in the U.S. and/or other countries. All other brands and trademarks are the property of their respective owners.
# # #
Applications now Open for Travel Assistance to ApacheCon North America 2016
The Apache Software Foundation's Travel Assistance Committee (TAC) have announced that applications for travel assistance to ApacheCon North America 2016 are now open.
ApacheCon North America will take place 9-13 May 2016 in Vancouver British Columbia, Canada.
This year's ApacheCon is split into two separate themed events:
- Apache:Big Data --the Apache projects and people working in Big Data, ubiquitous computing, data engineering, and science
- ApacheCon:Core --the technologies, projects, and communities driving the future of Open Source, ubiquitious and emerging Web solutions, and Cloud computing
Event details and are available at http://apachecon.com/ ; the Calls for Participation for both events are also open.
Important: due to the short timeframe of each event, only applications from those who are able to attend BOTH events will be considered. Applications close on 2 March 2, 2016. Please ensure your application contains as much supporting material as required to efficiently and accurately process your request.
The TAC exists to help those that would like to attend ApacheCon events, but are unable to do so for financial reasons. For more information, and to apply, please visit http://www.apache.org/travel/ As with previous years, the TAC team anticipates a high volume of applications from a diverse range of backgrounds, and therefore encourage those considering to submit an application to do so as soon as possible.
We look forward to seeing you in Vancouver!
-Sally Khudairi, on behalf of the TAC team
Posted at 03:10PM Dec 13, 2015 by Sally in General | |