Entries tagged [open]

Monday May 01, 2017

The Apache Software Foundation Announces Apache® Mahout™ v0.13.0

Open Source scalable machine learning and data mining library for Big Data artificial intelligence now more powerful and easier to use.

Forest Hill, MD —1 May 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® MahoutTM v0.13.0, the latest version of the Open Source scalable machine learning library.

Apache Mahout provides an environment for quickly creating machine-learning applications that scale and run on the highest-performance parallel computation engines available. Mahout is the first scalable generalized tensor and linear algebra solving engine taking data scientists from interactive experiments to production use.

"Apache Mahout 0.13.0 is more powerful with its new algorithm framework that allows for easier implementation of machine learning algorithms," said Andrew Palumbo, Vice President of Apache Mahout. "The enhanced Mahout code base and development framework make machine learning even more accessible, which is a game changer in the field of artificial intelligence."

Mahout provides a wide variety of premade algorithms (Matrix Factorization, QR via ALS, SSVD, PCA, etc.) for Scala + Apache Spark, H2O, and Apache Flink, as well as on-GPU compute for performance improvements in very large tensor math. Apache Mahout provides the data science tools to automatically find meaningful patterns in Big Data sets by supporting the following main data science use cases:
  • Collaborative filtering – mines user behavior and makes product recommendations (such as eCommerce product recommenders);
  • Regression – estimates a numerical value based on values of other inputs;
  • Clustering – takes items in a particular class (such as Web pages or newspaper articles) and organizes them into naturally occurring groups, such that items belonging to the same group are similar to each other; and
  • Classifying – learns from existing categorizations and then assigns unclassified items to the best category.

New in v0.13.0
Apache Mahout now makes it easier to do matrix math on graphics cards, which is relevant for most modern machine-learning and deep-learning methods. In addition, v0.13.0 allows shared nothing computation on GPUs, on multi-core CPU, or in the JVM as appropriate, as well as a simplified framework for building new algorithms. As Mahout comprises an interactive environment and library that support generalized scalable linear algebra and include many modern machine-learning algorithms, the project has also collaborated with developers on other projects, including the Open Source linear algebra library ViennaCL, the Java wrapper library interface JavaCPP, and the graphics processor technology manufacturer NVIDIA to add CUDA bindings directly into Mahout for simplicity of development.

The v0.13.0 release reflects 62 separate JIRA issues from v0.12.2, including numerous enhancements to Mahout-Samsara, the vector math experimentation environment with R-like syntax that works at scale. Complete release notes are at http://mahout.apache.org/release-notes/Apache-Mahout-0.13.0-Release-Notes.pdf

Future versions of Mahout will include support for native iterative solvers, a more robust algorithm library, and smarter probing and optimization of multiplications, among other features.

A comprehensive list of users of Apache Mahout is available at https://mahout.apache.org/general/powered-by-mahout.html ; current users are mostly researchers and developers actively involved in building distributed machine-learning pipelines and tools.

"We thank our community of developers and users who helped make this milestone release possible, and welcome new contributors to help us advance machine learning," added Palumbo.

Catch Apache Mahout in action at Apache: Big Data, where attendees learn first-hand from many original project creators and companies from the greater Mahout community. Apache: Big Data will be held 16-18 May 2017 in Miami, FL. To register, and for more information, visit http://apachecon.com/

Availability and Oversight
Apache Mahout software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Mahout, visit http://mahout.apache.org/ and https://twitter.com/ApacheMahout

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 6,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Flink", "Apache Flink", "Mahout", "Apache Mahout", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday February 14, 2017

The Apache Software Foundation Announces Apache® MyFaces™ Tobago 3

Standards-based Open Source components library allows developers to quickly and easily create business Web applications without worrying about technical details 

Forest Hill, MD —14 February 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® MyFaces™ Tobago 3, the user interface components for creating business applications without the need for coding HTML, CSS, or JavaScript.

A sub-project of Apache MyFaces (the Open Source implementation of JavaServer Faces Web application framework that accomplishes the Model-View-Controller paradigm), Tobago is a component library for JavaServer Faces (JSF). The project was originally created at Atanion GmbH in 2002, and was donated to the Apache Incubator in 2005. Tobago graduated as Apache MyFaces sub-project in 2006.

"With a commitment to reduce the time and effort spent on development and deployment, the unofficial Tobago tagline is 'less magic, more standards'," said Udo Schnurpfeil, member of the Apache MyFaces Project Management Committee. "We are are happy that Tobago 3 helps users get their applications up and running even more quickly and easily."

By omitting the need to code HTML, CSS, or JavaScript, Tobago allows users to easily create business Web applications, and emulates the development process of conventional user interfaces (rather than the creation of Web pages) via:
  1. UI components abstracted from HTML, along with any layout information that does not belong to the general page structure. The final output format is determined by the client/user-agent;

  2. A theming mechanism that makes it easy to change the look-and-feel and provides special implementations for certain browsers; and

  3. A layout manager used to arrange the components automatically. This means that no manual laying out using HTML tables or other constructs is needed.

Under The Hood
Apache MyFaces Tobago 3's increased responsiveness and standardization makes it easier to integrate libraries and other projects. Features include:
  • Layout-management moved to CSS and JavaScript to natively achieve layout requirements and make rendering more efficient and responsive;

  • Themes using CSS library Bootstrap 4 make it easy to obtain a modern and rich design; and

  • Use of current technologies such as SCSS, CSS3, HTML5, AJAX, JSF and, Theming on pure CSS base further simplifies the development experience.

Apache Tobago dramatically reduces developer resources and programming time, providing individuals and organizations with improved productivity and ease of implementation.

"For over 10 years we have been working closely with the Tobago team. The close collaboration has been mutually beneficial. Currently we are working on more than 60 intranet applications based on Apache Tobago. We see the new features from Tobago 3 as a significant architectural leap - in particular the innovations with ajax, theming, and responsive design. We expect a fast project adoption - even with the associated migration costs," said Rainer Rohloff, Senior Software Architect at Norddeutsche Landesbank. "We look forward to working on additional projects with the Tobago team in the future."

"It's great to see many users adopt Tobago," added Schnurpfeil. "We welcome new developers and users to join us on our mailing lists, MeetUps, and community events."

Availability and Oversight
Apache MyFaces software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, release notes, documentation, and more information on Apache MyFaces, visit http://myfaces.apache.org/ and https://twitter.com/MyFacesTeam

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 5,900 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "MyFaces", "Apache MyFaces", "Tobago", "Apache MyFaces Tobago", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday January 11, 2017

The Apache Software Foundation Announces Apache® Zest™ Renamed to Apache Polygene

Rebranded Open Source Composite Oriented Programming platform reflects growing codebase and community.

Forest Hill, MD —11 January 2017— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Zest™, the Composite Oriented Programming platform, has been renamed Apache Polygene.

Apache Polygene is a platform to develop applications with large domain models and complex business logic for Java enterprise developers. Apache Polygene introduces multi-inheritence, aspect orientation (both typesafe and generic weaving) and persistence to both SQL and NoSQL storage systems. Apache Polygene also easily integrates with other technologies such as Spring Framework, REST, OSGi and many more.

"The name change was triggered to prevent confusion with other similarly named software such as the visualization toolkit from Eclipse," said Niclas Hedhman, Vice President of Apache Polygene. "Since becoming an official ASF project, our codebase and community continue to flourish. We are confident that our new identity will reflect ongoing innovation and increased productivity."

The resolution relating to the project's name change was approved at the ASF Board meeting in December 2016.

Project History
In 2007, Hedhman convinced Rickard Öberg to create an Open Source project based on Öberg’s Composite Oriented Programming (COP) concept, which launched as Qi4j. Since then, 28 people have contributed source to the project, with many others participating on mailing lists regarding direction, concepts and design. In 2015 the project arrived at the ASF as Apache Zest, along the unique designation as the first project to enter the ASF as al Top-Level Project– without entering the Apache Incubator (the official entry path for projects and codebases wishing to become part of the ASF’s efforts). As part of its eligibility, the project had to meet the rigorous requirements of the Apache Maturity Model http://s.apache.org/O4p , that addresses the integrity of a project's code, copyright, licenses, releases, community, consensus building, and independence, among other qualities. In March 2015 Apache Zest became an official ASF Top-Level Project, and renamed as Apache Polygene in December 2016.

Availability and Oversight
Apache Polygene software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Polygene, visit http://polygene.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 620 individual Members and 5,900 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Polygene", "Apache Polygene", "Zest", "Apache Zest", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Friday December 16, 2016

Feedback from The Apache Software Foundation on the Free and Open Source Security Audit (FOSSA)

by Dirk-Willem van Gulik <dirkx(at)apache(punto)org>

December 2016, v1.09

Background

The important role of open source software in key infrastructures was brought to collective attention by two major security vulnerabilities in the core of the internet infrastructure. Heartbleed and Shellshock of 2014 caused significant concern. It made a lot of people realise how important the collective efforts around these open source infrastructures are. And how much key internet infrastructure relies on open source communities. Such as the Apache community.

Two of those people were Julia Reda and Max Andersson; Members of the European Parliament. As a result they proposed (and directed Europe to fund) a pilot project:  the "Free and Open Source Software Audit (FOSSA)" within a larger workstream that was about "€1 million to demonstrate security and freedom are not opposites".

One part of the money was about developing a methodology; the other about actually auditing some widely used open source software. After soliciting votes from the public - two projects "won": KeePass and the Apache Web Server.

Audit Process

The European Commission (easiest thought of as executive part of Europe) commissioned Spanish Aerospace and Defence company Everis to carry out the review on the Apache HTTPD server (and associated APR).  Their first draft had a considerable number of false positives and a fair bit of focus on some of the more arcane build tools (e.g. our libtool that is used on OS/2 where there is no gnu-libtool). At  Apache vulnerabilty scans are most valuable if we see analysis and at least a theory as to why something is vulnerable -- so we then worked with Everis to improve the report. Their final report on Apache HTTPD and APR has since gone live along with the other audits reports and results.

As none of the vulnerabilities found were particularly severe, we did not need to go through a responsible disclosure path; but could post the issues publicly to the developer mailing list.

Feedback on FOSSA

As part of this work, we were also asked for feedback - especially important now that Julia Reda and Max Andersson have managed to secure a recent vote in the the European Parliament for additional budget.

So in the remainder of this post I'll try to outline some of the conflicting forces around a security issue report v.s. a report of a vulnerability.

Security Reports

Infrastructure software needs constant maintenance to accommodate the evolving platforms; and to back port or propagate improvements and new learnings throughout the code. It is not a static piece of code with 'security holes' waiting to be found. `Fixing' a hole without `lifting the helicopter' is not net-positive by definition; in fact it can be negative. For example if a 'fix' makes the code more complex, if it reduces the number of people that understand it, or if it has an adverse effect on systems that use a different CPU architecture, build environment or operating system.

So in general terms, the main metric is whether security overall gets better - and indirectly about optimising efficient use of the available (existing and extra), but always limited, capacity and capabilities of the resources. At any given time there is both a known 1) backlog of deficiencies and known loose ends and 2) a reservoir of unknown issues. Tackling the first will generally make things more secure. Whereas searching in the latter space only makes things more secure if one finds issues that are severe enough to warrant the time spent on the unknown versus the time not spent on the known deficiencies.

To illustrate this with examples; a report from a somewhat outdated automated vulnerability tool often reduces overall security. Time that could be spent on fixing real issues and cleanups is instead spent on dealing with the false positives and minor stuff. The opposite is also true: bringing a verified security issue to us with a modest bit of analysis as to how such is exploitable, is virtually always a straight win. This obviously is even more true for a very severe issue (where it is immediately clear how it is exploitable). 

But it is also true for the case where someone bestows time on us on a small deficiency (e.g. initially found by a tool) - provided they spend significant time and engineering on handing us the 'fix' on a well tested silver platter. And it is even more useful if a class of issues is tackled throughout; with things like updated test cases.

Throughout this it is very important to consider the threat model and what or whom the bad actors are that you are protecting against. This includes questions like: Is it when the server runs in production? Or also during build? What is the attack surface?. This is particularly important when using (modern!) automated scanning tools (even after you laboriously winnow down the 1000's of false positives for the 1 nugget).

The reason for this is that it is common for constructs such as:
  ....
  results = (results_t *) mallocOrDie(sizeof(results_t));
  results->sum = 0;
  for(int i = 0; i < ptr->array_len; i++) {
    results->sum += ptr->array[i];
  ....
to be automatically flagged by (old-fashioned) tools. This is because there is seemingly no error trapping on mallocOrDie() and because there is no bound checking on ptr->array[i]. So in those cases you need to carefully analyse how this code is used; and what assumptions there are in the API; how exposed it is and so on (e.g is len public or private to the API). 

The last thing you want (when the situation is more complex) is to add a whole load of sentinels to the above code. That would make the code harder to maintain, harder to test and introduce things like the risk of a dangling else going unnoticed. As then you've just reduced security by tackling a non-existent issue. It would have been better to focus, for example, on making sure that mallocOrDie() always bombs out reliably when it fails to allocate.

People and Community versus tools

So specifically this means people, rather than tools, spending a lot of time analysing issues are the thing that is most valuable to Open Source communities.

By the time open source infrastructure code sees use in the market that is significant enough for the likes of FOSSA to consider it 'infrastructure and important' by some metric, it is likely that it is reasonably robust and secure.  As it is open source, it has some standing and is probably used by sizeable organisations that care about security or are regulated. Therefore, it has probably seen a fair bit of (automated and manual) security testing. 

In fact, once an open source project has become part of the landscape every security vendor worth their salt will probably test their tools on it - and try to use it as a wonderful (because you are public) example they can talk about in their sales pitches (that is, if they find something).

It also means that the issues that remain tend to be hard; and are more likely to require structural improvements (e.g. hardening an API) and large scale, systematic changes. Which result in totally disproportional amounts of time to be spent on updating test cases, testing and manual validation. As otherwise it would probably already have been done before. To some extent this also applies to automated tooling; we see that modern/complex tools that are hard to run; require a lot of manual work to update their rule bases for false positives or require sizeable investments (such as certain types of fuzzing, code coverage tools, automated condition testing/swaps) are used less often (but thus tend to sometimes yield promising new strains of issues).

Secondly there is the process of impact and the cost of dealing with the report and changes.  Often the report will find a lot of 'low' issues and perhaps one or two serious ones. For the latter it is absolutely warranted to 'light up' the security response of an open source project; and have people rush into action to do triage, fix and follow up with responsible disclosure.

Given that the code is already open source, the same cannot be said for the 'low' issues. Generally anyone (bad actors and good actors) can find these too. So in a lot of cases it is better to work with the community to file these as bug reports; or even better - as simple issues usually have simple non controversial fixes, submit the fixes and associated test cases as contributions. (It is often less work for the finder of the bug to submit a technical patch & test case than to fully write up a nicely formatted PDF report)

Bug Bounties - a Panacea ?

One 'solution' which is getting a lot of media attention is that of bug bounties; where the romantic concept of a lone open source volunteer coder code the internet is replaced by a lone bounty hunter - valiantly searching for holes & getting paid if they shoot first. 

If we review that solution against the needs of large, stable, communities that deal with relatively mature and stable infrastructure code (as opposed to commercial project or new code that is still evolving) we have seen a number of counter-indications stack up:
  • Fees are not high enough for the expert volunteers one would need to be enticed by the fee alone `in bulk'.

    Take the recent Azure-Linux update reporting or the Yahoo issue as examples. 5 to 10k is unlikely to come even close to the actual out of cost of a few weeks to a few months of engineering time at that quality level (or compensating the years invested in training) that was required to find, analyse and report that issue.

  • The same applies for the higher `competition' fees - topping out at 30-100k. In those cases only the first to report gets it. So your actual payment-per-issue found is lower on average; with some 4 to 8 top global teams at this level and with 2 to 4 high-value target events per year - that works out at well below 8k/teammember per year on average.
That in itself has a number of ramifications:
  • The very best people will only engage in this as a hobby and (hence) for personal credit and pride; OR when they work for a vulnerability company that wants the PR and marketing.

BUT that means that it is personal credit & marketing that is the real driving value, not the money itself. So what then happens if we introduce money into this (already credit and marketing driven) situation? 

  • Very large numbers of people without sufficient skill may be tempted --- but then one has to worry about the impact on the open source community: is dealing with reports at that level a better time spend for volunteers than having insiders look for things ? Will time spent on these fixes distract from the important things ?

    Should we ask people to pre-filter; or ask people managing bug hunting programmes to pre-vet or otherwise carry an administrative burden ? (Keep in mind that there are third party bug-hunting programmes for Apache code that the Apache Software Foundation has no control over).
Secondly - we know (from various dissertations and experience) that introducing money into a volunteer arrangement has an impact on group dynamics and how volunteers feel rewarded; or what work they seek to get rewarded for. 

With that - it may be so that:
  • It is likely that `grunt' and `boring' work in the security area will suffer --- `let that be done by paid folks';

  • It fundamentally shifts the non-monetary (and monetary - but not relevant as too low) reward from writing secure/good code and caring/maintaining --- to the negative - finding a flaw in (someone else) code. So feel-good, job-well-done and other feedback cycles now bypass primary production processes (that of writing good code), or at the very least, make that feedback loop involve a bug bounty party.
Finally - in complex/mature code - the class of vulnerabilities that we probably want to get fixed tend to be very costly to fix/find - and any avenue you go down has a high risk of not finding a security issue but a design/quality issue. 

Bug bounty finders, unlike the coding volunteers are NOT incentivised to report/fix these.  

On top of this, they are more likely to go for the higher reward/lower risk kind of niggle stuff. Stuff that, without digging deeper, is likely to cause higher layers of the code to get convoluted and messy. As these groups have no incentive to reduce complexity or fix deeper issues (in fact, if one were cynical - they have every reason to stay clear of such - as it means ripe hunting grounds during periods of drought).

So at some level Bug bounties are about the trade-off between rewarding, paying, a single person versus saddling a community of motivated volunteers with the fallout - not so much of genuine reports; but of everything else.

So ultimately - it is about the risks of what Economists call "Externalisation"; making a cost affects a party who did not choose to incur that cost - or denying that party a choice how to spend their resources most effectively.

Summary and suggestions for the next FOSSA Audits

In summary:
  1. Submitting the results of automated validation (even with some human vetting) is generally a negative contribution to security. 

  2. Submitting a specific detailed vulnerability that includes some sort of analysis as how this could be exploitable is generally a win. 

  3. Broad classes of issues which (perhaps rightly!) give you hits all over the code base are generally only worth the time spent on them if there are additional resources willing to work on the structural fixes, write the test cases and test them on the myriad of platforms and settings -- and if a lot of the analysis and planning for this work has been done prior to submitting the issue (to generally a public mailing list).

    From this it also follows that narrow and specific (and hence more "new" and "unique") is generally more likely to increase overall security; while making public the results of something broad and shallow is at best not going to decrease security.

  4. Lighting up the security apparatus of an open source project is not 'free'. People are volunteers. So consider splitting your issues into: ones that need a responsible disclosure path; and ones that can go straight to the public lists. Keep in mind that, as the code is open source, you generally can err towards the open path a bit - other (bad) actors can run the same tools and processes as you.

  5. Consider raising the bar; rather than report a potential vulnerability - analyse it; have the resources to (help) solve it and support the community with expensive things; such as the human manpower for subsequent regression testing, documentation, unit tests or searching the code for similar issues. 

  6. Security is a process; over very long periods of time. So consider if you can consistently spend resources over long periods on things which are hard to do for (isolated) volunteers. And if it is something like comprehensive fuzzing, code-coverage, condition/exchange testing -  then consider the fact that it is only valuable if it is; a) done over long periods of time and b) comes with a large block of human manpower that do things like analyses of the results and updates of test cases.

  7. Anything that increases complexity is a risk; and may have long term negative consequences. As it may lead to code which is harder to read, harder to maintain or where the pool of people that can maintain it becomes disproportionally smaller. A broad sweeping change that increases complexity may need to be backed by a significant (5.10+ years) commitment of maintenance in order to be safe to implement; especially if the security improvement it brings is modest.

  8. Carefully consider threat model and actors when you are classing things a security hole - especially around APIs.

  9. Carefully consider what type of resources you want to mobilise in the wider community; and what incentivises the people and processes that are most likely to improve the overall security and safety. And take the overall, longterm, health and social patterns of the receiving community into account when there such forces for good are "external".  It is all to easy to in essence to in effect cause a "Denial of Service" style effect; no mater how well intentioned.

  10. World-class expertise is rare; and by extension - the experts are often isolated. Bringing them together for long periods of time in relatively neutral settings gives synergy which is hard to get otherwise. Consider using a JRC or ENISA setting as a base for long term committed efforts. An effort that is perhaps more about strengthening and improving large scale (IT) infrastructures and (consumer) safety - rather than security.

  11. Bug bounties are not the only option. Some open source communities have benefited from "grants" or "stipend"; where a specific issue got tackled or addressed. In some cases, such as in for example Google its Summer of Code - it is focused on relatively young people; and helps train them up; in other cases it gives established experts room for a (few) year(s) to really bottom out some long standing issue.
With respect to the final point - security engineering (and its associated areas; such as privacy, trust and so on) is a "hard" thing to hire; the market generally lacks capacity and capability. Also in Europe. 

While open source its access to `lots of eyeball's does help; it does not magically give us access to a lot of the right eyeballs.

Yet increasing both Capacity and Capability in society does help. And that is a long process that starts early.

# # #

Tuesday November 15, 2016

The Apache Software Foundation Announces Apache® jclouds™ v2.0


[Read More]

Monday October 03, 2016

Apache® POI™ Celebrates 15 Years at The Apache Software Foundation

Commemorates anniversary with the 78th release of Open Source libraries used for reading and writing files in Microsoft® Office formats.

Forest Hill, MD —3 October 2016— The Apache® POI™ project announced today their 15th Anniversary at The Apache Software Foundation (ASF), and the immediate availability of Apache POI v3.15, the latest version of the Open Source libraries used for reading and writing files in Microsoft® Office formats.

Using Apache POI APIs, developers are able to manipulate various file formats based on the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2CDF) using pure Java. Supported formats include those used by Word, PowerPoint, Excel, and other applications from the Microsoft Office suite.

"POI has gone from strength to strength," said Dominik Stadler, Vice President of Apache POI. "Developers around the world use Apache POI to read and write Microsoft Office documents, and we’re happy to commemorate our anniversary with the release of v3.15, POI's 78th official release."

Growth under the Foundation
POI's first public release was in August 2001. In February 2002 POI became a sub-project of Apache Jakarta (then the ASF's incubator for Java projects). Since then, the project evolved rapidly through contributions by community volunteers. Development highlights include:
  • 2002 --support for reading and writing OLE2-based XLS spreadsheets;
  • 2004 --v2.0 released, adding formula support to spreadsheets and DOC (Microsoft Word);
  • 2007--POI becomes an Apache Top-Level Project in June 2007; added support for reading and writing the binary PowerPoint format in v3.0;
  • 2008 --added support for Open Office XML formats, coinciding with Microsoft's open format standardization efforts. Later POI added support for additional file formats including formats used by Visio and Outlook.

Apache POI is used in a variety of ways, including automated generation of Microsoft Office documents from other data sources, updating and formatting existing documents, and enabling software systems to read rich, human-friendly data. Apache POI aims to serve a wide audience from small personal projects to large scale business applications. 

Numerous organizations, such as The Bank of Lithuania, Deutsche Bahn, IKAN Software, and Sunshine Systems use Apache POI. In addition, Apache Tika (content analysis toolkit) uses Apache POI to extract text and metadata from Microsoft Office documents to enable users to search and index these formats.

Community
In addition to the official POI developer and user mailing lists and the bugzilla bug tracking tool, developers are also volunteering their time to help users on community websites. An example of this is through improved localization: POI aims to work around the world despite differences in number formats, date formats, separators, character sets, and time zones.

The Apache POI community continually refreshes with new developers as prior developers move on to shepherd other projects. As with all Apache projects, and as demonstrated by Apache POI, community is the key to success and maintaining a healthy ecosystem.

"Decisions are better made by community consensus than individual opinion," added Stadler. "Our ongoing community refresh brings in new ideas and motivation, avoids stagnation, or unfairly burdening the original project creators. The one constant in our community has been Nick Burch, who has embodied The Apache Way as a contributor and mentor to POI for over a decade."

Through its healthy and vibrant user community and a number of dedicated committers, Apache POI is poised to enjoy continued active development for years to come. 

Availability and Oversight
Apache POI software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache POI, visit http://poi.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Capital One, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSIGMA, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF
© The Apache Software Foundation. "Apache", "POI", "Apache POI", "Jakarta", "Apache Jakarta", "Tika", "Apache Tika", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday July 27, 2016

The Apache Software Foundation Announces Apache® Mesos™ v1.0

Mature Open Source cluster resource manager, container orchestrator, and distributed operating systems kernel in use at Netflix, Samsung, Twitter, and Yelp, among others.

Forest Hill, MD —27 JULY 2016— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® Mesos™ v1.0, the mature clustering resource management platform.

Apache Mesos provides efficient resource isolation and sharing across distributed applications in Cloud environments as well as private datacenters. Mesos is a cluster resource manager, a container orchestrator, and a distributed operating systems kernel.

"At Berkeley in 2009, we were thinking about a new way to manage clusters and Big Data, and Mesos was born," said Benjamin Hindman, Vice President of Apache Mesos, one of the original creators of the project, and Chief Architect/Co-Founder of Mesosphere. "Mesos v1.0 is a major milestone for the community."

Mesos entered the Apache Incubator in 2010 and has had 36 releases since becoming a Top-Level Project (TLP) in 2013.

Under The Hood
Apache Mesos 1.0 includes a number of new and important features that include:
  • New HTTP API: One of the main areas of improvement in the 1.0 release, this API simplifies writing Mesos frameworks by allowing developers to write frameworks in any language via HTTP. The HTTP API also makes it easy to run frameworks behind firewalls and inside containers.

  • Unified containerizer: This allows frameworks to launch Docker/Appc containers using the Mesos containerizer without relying on docker daemon (engine) or rkt. The isolation of the containers is done using isolators.

  • CNI support: The network/cni isolator has been introduced in the Mesos containerizer to implement the Container Network Interface (CNI) specification proposed by CoreOS. With CNI, the network/cni isolator is able to allocate a network namespace to Mesos containers and attach the container to different types of IP networks by invoking network drivers called CNI plugins.

  • GPU support: Support for using Nvidia GPUs as a resource in the Mesos "unified" containerizer. This support includes running containers with and without filesystem isolation (i.e., running both imageless containers as well as containers using a Docker image)

  • Fine-grained authorization: Many of Mesos' API endpoints have added authentication and authorization, so that operators can now control which users can view which tasks/frameworks in the web UI and API, in addition to fine-grained access control over other operator APIs such as reservations, volumes, weights, and quota.
  • Mesos on Windows: Support for running Mesos on the Windows operating system is currently in beta. The Mesos community is aiming for full support by late 2016.

Over the years, Mesos has gained popularity with datacenter operators for being the first Open Source platform capable of running containers at scale in production environments, using both Docker containers and directly with Linux control groups (cgroups) and namespace technologies. Mesos' two-level scheduler distinguishes the platform as the only one that allows distributed applications such as Apache Spark, Apache Kafka, and Apache Cassandra to schedule their own workloads using their own schedulers within the resources originally allocated to the framework and isolated within a container.

"Initially the big breakthrough was this new way to run containers at scale, but the beauty of the design of Mesos and its two-level scheduler has proven to be its ability to run not only containers, but Big Data frameworks, storage services, and other applications all on the same cluster," added Hindman. "Mesos has become a core technology that serves as a kernel for other systems to be built on top, so the maturity on the API has been a big focus, and it’s one of the main areas of improvement in the 1.0 release." 

These capabilities have distinguished Apache Mesos as the kernel of choice for many Open Source and commercial offerings. One of Mesos' earliest and most notable users was Twitter, who leveraged the Mesos architecture to kill the "Fail Whale" by handling its massive growth in site traffic. Prominent Mesos contributors and users include IBM, Mesosphere, Netflix, PayPal, Yelp, and many more.

"We use Mesos regularly at NASA JPL - we are leveraging Mesos to manage cluster resources in concert with Apache Spark identify Mesoscale Convective Complexes (MCC) or extreme weather events in satellite infrared data. Mesos has performed well in managing a high memory cluster for our team," said Chris A. Mattmann, member of the Apache Mesos Project Management Committee, and Chief Architect, Instrument and Science Data Systems Section at NASA JPL. "We have also taken steps to integrate the Apache OODT data processing framework used in our missions with Apache Mesos."

Learn more about Apache Mesos at MesosCon Europe 2016 conference in Amsterdam 31 August-1 September 2016, and at MesosCon Asia 2016 in Hangzhou, China 18-19 November 2016.

Availability and Oversight
Apache Mesos software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Mesos, visit http://mesos.apache.org/ and https://twitter.com/ApacheMesos

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Mesos", "Apache Mesos", "Cassandra", "Apache Cassandra", "Kafka", "Apache Kafka", "OODT", "Apache OODT", "Apache Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Apache Software Foundation Announces Apache® Twill™ as a Top-Level Project

Open Source abstraction layer over Apache Hadoop® YARN simplifies developing distributed Hadoop applications.

Forest Hill, MD –27 July 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Twill™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed Hadoop applications, allowing developers to focus more on their application logic.

"The Twill community is excited to graduate from the Apache Incubator to a Top-Level Project," said Terence Yim, Vice President of Apache Twill and Software Engineer at Cask. "We are proud of the innovation, creativity and simplicity Twill demonstrates. We are also very excited to bring a technology so versatile in Hadoop into the hands of every developer in the industry."

Apache Twill provides rich built-in features for common distributed applications for development, deployment, and management, greatly easing Hadoop cluster operation and administration.

"Enterprises use big data technologies - and specifically Hadoop - to drive more value," said Patrick Hunt, member of the Apache Software Foundation and Senior Software Engineer at Cloudera. "Apache Twill helps streamline and reduce complexity of developing distributed applications and its graduation to an Apache Top-Level Project means more people will be able to take advantage of Apache Hadoop YARN more easily."

"This is an exciting and major milestone for Apache Twill," said Keith Turner, member of the Apache Fluo (incubating) Project Management Committee, which used Twill in the development of Fluo, an Open Source project that makes it possible to update the results of a large-scale computation, index, or analytic as new data is discovered. "Early in development, we knew we needed a standard way to launch Fluo across a cluster, and we found Twill. With Twill, we quickly and easily had Fluo running across many nodes on a cluster." 

Apache Twill is in production by several organizations across various industries, easing distributed Hadoop application development and deployment.

Twill originated at Cask in early 2013. After 7 major releases, the project was submitted to the Apache Incubator in November of 2013.

"Apache Twill has come a long way through The Apache Software Foundation, and we're thrilled it has become an ASF Top-Level Project," said Nitin Motgi, CTO of Cask. "Apache Twill has become a key component behind the Cask Data Application Platform (CDAP), using YARN containers and Java threads as the processing abstraction. CDAP is an Open Source integration and application platform that makes it easy for developers and organizations to quickly build, deploy and manage data applications on Apache Hadoop and Apache Spark."

"The Apache Twill community worked extremely well within the incubator environment, developing and collaborating openly to follow The Apache Way," said Henry Saputra, ASF Member and member of the Apache Twill Project Management Committee. "There is a tremendous demand for effective APIs and virtualization for developing big data applications and Apache Twill fills that need perfectly. We’re looking forward to continuing the journey with Apache Twill as a Top-Level Project."

Catch Apache Twill in action at:
  • JavaOne, 18-22 September 2016 in San Francisco
  • Strata+Hadoop World, 27-29 September 2016 in New York City
Availability and Oversight
Apache Twill software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Twill, visit http://twill.apache.org/ and follow @ApacheTwill

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

©The Apache Software Foundation. "Apache", "Twill", "Apache Twill", "Hadoop", "Apache Hadoop", "Apache Hadoop YARN", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Tuesday July 26, 2016

The Apache Software Foundation Announces Apache® Kudu™ as a Top-Level Project

Open Source columnar storage engine enables fast analytics across the Internet of Things, time series, cybersecurity, and other Big Data applications in the Apache Hadoop ecosystem

Forest Hill, MD –25 July 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Kudu™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Kudu is an Open Source columnar storage engine built for the Apache Hadoop ecosystem designed to enable flexible, high-performance analytic pipelines.

"Under the Apache Incubator, the Kudu community has grown to more than 45 developers and hundreds of users," said Todd Lipcon, Vice President of Apache Kudu and Software Engineer at Cloudera. "Recognizing the strong Open Source community is a testament to the power of collaboration and the upcoming 1.0 release promises to give users an even better storage layer that complements Apache HBase and HDFS."

Optimized for lightning-fast scans, Kudu is particularly well suited to hosting time-series data and various types of operational data. In addition to its impressive scan speed, Kudu supports many operations available in traditional databases, including real-time insert, update, and delete operations. Kudu enables a "bring your own SQL" philosophy, and supports being accessed by multiple different query engines including such other Apache projects as Drill, Spark, and Impala (incubating).

Apache Kudu is in use at diverse companies and organizations across many industries, including retail, online service delivery, risk management, and digital advertising.

"Using Apache Kudu alongside interactive SQL tools like Apache Impala (incubating) has allowed us to deploy a next-generation platform for real-time analytics and online reporting," said Baoqiu Cui, Chief Architect at Xiaomi. "Apache Kudu has been deployed in production at Xiaomi for more than six months and has enabled us to improve key reliability and performance metrics for our customers. Kudu's graduation to a Top-Level Project allows companies like ours to operate a hybrid architecture without complexity. We look forward to continuing to contribute to its success."

"We are already seeing the many benefits of Apache Kudu. In fact we're using its combination of fast scans and fast updates for upcoming releases of our risk solutions," said Cory Isaacson, CTO at Risk Management Solutions, Inc. "Kudu is performing well, and RMS is proud to have contributed to the project’s integration with Apache Spark."

"The Internet of Things, cybersecurity and other fast data drivers highlight the demands that real-time analytics place on Big Data platforms," said Arvind Prabhakar, Apache Software Foundation member and CTO of StreamSets. "Apache Kudu fills a key architectural gap by providing an elegant solution spanning both traditional analytics and fast data access. StreamSets provides native support for Apache Kudu to help build real-time ingestion and analytics for our users."

"Graduation to a Top-Level Project marks an important milestone in the Apache Kudu community, but we are really just beginning to achieve our vision of a hybrid storage engine for analytics and real-time processing," added Lipcon. "As our community continues to grow, we welcome feedback, use cases, bug reports, patch submissions, documentation, new integrations, and all other contributions."

The Apache Kudu project welcomes contributions and community participation through mailing lists, a Slack channel, face-to-face MeetUps, and other events. Catch Apache Kudu in action at Strata + Hadoop World, 26-29 September 2016 in New York. 

Availability and Oversight
Apache Kudu software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For project updates, downloads, documentation, and ways to become involved with Apache Kudu, visit http://kudu.apache.org/ , @ApacheKudu, and http://kudu.apache.org/blog/.

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Kudu", "Apache Kudu", "Drill", "Apache Drill", "Hadoop", "Apache Hadoop", "Apache Impala (incubating)", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Wednesday June 29, 2016

The Apache Software Foundation Announces Apache® OODT™ v1.0

Open Source Big Data middleware metadata framework in use at Children's Hospital Los Angeles Virtual Pediatric Intensive Care Unit, DARPA MEMEX and XDATA, NASA Jet Propulsion Laboratory, and the National Cancer Institute, among others.

Forest Hill, MD —29 June 2016— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® OODT™ v1.0, the Big Data middleware metadata framework.

OODT is a grid middleware framework for science data processing, information integration, and retrieval. As "middleware for metadata" (and vice versa), OODT is used for computer processing workflow, hardware and file management, information integration, and linking databases. The OODT architecture allows distributed computing and data resources to be searchable and utilized by any end user.

"Apache OODT 1.0 is a great milestone in this project," said Tom Barber, Vice President of Apache OODT. "Effectively managing data pools has historically been problematic for some users, and OODT addresses a number of the issues faced. v1.0 allows us to prepare for some big changes within the platform with new UI designs for user-facing apps and data flow processing under the hood. It's an exciting time in the data management sector and we believe Apache OODT can be at the forefront of it."

OODT 1.0 signals a stage in the project where the initial scope of the platform is feature- complete and ready for general consumption. v1.0 features include:
  • Data ingestion and processing;
  • Automatic data discovery and metadata extraction;
  • Metadata management;
  • Workflow processing and support; and
  • Resource management

Originally created at NASA Jet Propulsion Laboratory in 1998 as a way to build a national framework for data sharing, OODT has been instrumental to the National Cancer Institute's Early Detection Research Network for managing distributed scientific data sets across 20+ institutions nationwide for more than a decade.

Apache OODT is in use in many scientific data system projects in Earth science, planetary science, and astronomy at NASA, such as the Lunar Mapping and Modeling Project (LMMP), NPOESS Preparatory Project (NPP) Sounder PEATE Testbed, the Orbiting Carbon Observatory-2 (OCO-2) project, and the Soil Moisture Active Passive mission testbed. In addition, OODT is used for large-scale data management and data preparation tasks in the DARPA MEMEX and XDATA efforts, and for supporting research and data analysis within the pediatric intensive care domain in collaboration with Children's Hospital Los Angeles (CHLA) and its Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit (VPICU), among many other applications.

"To watch Apache OODT grow from an internal NASA project to 1.0 where it is today and dozens of releases is an amazing feat. I truly believe having it at the ASF has allowed it to grow and prosper. We are doubling down on our commitment to Apache OODT, investing in its enhancement and use in several national-scale projects," said Chris Mattmann, member of the Apache OODT Project Management Committee, and Chief Architect, Instrument and Science Data Systems Section at NASA JPL. "Apache OODT processes some of the world's biggest data sets, distributes and manages them, and makes sure science happens in a timely and accurate fashion."

OODT entered the Apache Incubator in January 2010, and graduated as a Top-level Project in November 2010. 

Catch Apache OODT in action at ApacheCon Europe, 14-18 November 2016 in Seville, Spain http://apachecon.com/ .

Availability and Oversight
Apache OODT software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache OODT, visit http://oodt.apache.org/ and https://twitter.com/apache_oodt

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "OODT", "Apache OODT", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday June 06, 2016

The Apache® Software Foundation Announces Annual Report for 2015-2016 Fiscal Year

Trusted community recognized for accelerating Open Source innovation and advancing the future of open development.

Forest Hill, MD —6 June 2016— The Apache® Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of its annual report for its 2015-2016 fiscal year, which ended 30 April 2016. Highlights include:

Open Source Leadership -- 

  • 21st anniversary of the Apache HTTP Server Project
  • 17th anniversary of The Apache Software Foundation
  • 175 committees managing 291 Top-level Projects (TLPs) and dozens of sub-projects
  • Millions of people worldwide access the ASF's two dozen servers and 75 distinct hosts each day
  • Apache featured more than 400 times in Gartner Magic Quadrant reports

Innovation --

  • 20 new Apache Top-level Projects (TLPs)
  • Record 55 podlings undergoing development in the Apache Incubator, plus 39 initiatives in the Apache Labs
  • Apache Hadoop® ecosystem continues to dominate the Big Data marketplace
  • 743 repositories managed
  • 33% increase in signed Individual Contributor License Agreements (CLAs)
  • 3,425 ASF Committers and 5,922 Apache code contributors (21% increase) added nearly 20M lines of code; average 18,000 code commits per month
  • 315,533,038 Lines of code changed (65% increase)

Community --

  • 58 new individual ASF Members elected
  • Apache projects overseen by 2,000+ Project Management Committee (PMC) members
  • Launched new "Help Wanted" application to match volunteers with Apache projects and activities seeking assistance
  • Held hundreds of events globally, including ApacheCon Europe and North America, plus countless conferences, workshops, and regional MeetUps
  • Participated in the Google Summer of Code as a Mentoring Organization for 12 consecutive years

Operations --

  • Apache services running 24x7x365 at near 100% uptime on an annual budget of less than US$5,000 per project
  • Launched new home.apache.org and lists.apache.org services
  • Initiated experiment with Github to automate user management and group permissions for 5,000+ Apache users
  • Launched new ASF brand identity
  • Ongoing trademarks, brand management, and legal support for dozens of existing and new projects
  • Bolstered organizational backing, with 37 ASF Sponsors and 11 Infrastructure partners
  • The ASF exited FY 2015 with revenue at $996K, ahead of projected budget

The full report is available online at https://s.apache.org/Ccml

# # #

© The Apache Software Foundation. "Apache" and "ApacheCon", are registered trademarks or trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Wednesday May 25, 2016

The Apache Software Foundation Announces Apache® Zeppelin™ as a Top-Level Project

Open Source Big Data analytics and visualization tool for distributed, interactive, and collaborative systems using Apache Flink, Apache Hadoop, Apache Spark, and more.

Forest Hill, MD –25 May 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Zeppelin™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Zeppelin is a modern, web-based notebook that enables interactive data analytics. Notebooks help developers, data scientists, and related users to handle data efficiently without worrying about command lines and cluster details.

"The Zeppelin community is pleased to graduate from the Apache Incubator," said Lee Moon Soo, Vice President of Apache Zeppelin. "With 118 worldwide contributors and widespread adoption in numerous commercial products, we are proud to officially be a part of the Apache Big Data ecosystem."

Zeppelin's collaborative data analytics and visualization capabilities makes data exploration, visualization, sharing, and collaboration easy over distributed, general-purpose data processing systems that use Apache Flink, Apache Hadoop, and Apache Spark, among other Big Data platforms.

Apache Zeppelin is:
  • Multi-purpose --features data ingestion, exploration, analysis, visualization, and collaboration;
  • Robust --supports 20+ more backend systems, including Apache Spark, Apache Flink, Apache Hive, Python, R, and any JDBC (Java Database Connectivity);
  • Easy to deploy --built on top of modern Web technologies (provides built-in Apache Spark integration, eliminating the need to build a separate module, plugin, or library);
  • Easy to use --with built-in visualizations and dynamic forms;
  • Flexible --allows users to mix different languages, exchange data between backends, adjust the layout;
  • Extensible --with pluggable architecture for interpreters, notebook storages, authentication, and visualizations (in progress); and
  • Advanced --allows interaction between custom visualizations and cluster resources

"With Apache Zeppelin, a wide range of users can make beautiful data-driven, interactive, and collaborative documents with SQL, Scala, and more," added Soo.

Apache Zeppelin is in use at an array of organizations and solutions, including Amazon Web Services, Hortonworks, JuJu, and Twitter, among others. 

"Congratulations to Apache Zeppelin community on graduation," said Tim Hall, Vice President of Product Management at Hortonworks. "Several members of our team have been working over the past year in the Zeppelin community 
to make it enterprise ready. We are excited to be associated with this community and look forward to helping our customers get the best insights out of their data with Apache Zeppelin."

"Apache Zeppelin is becoming an important tool at Twitter for creating and sharing interactive data analytics and visualizations," said Prasad Wagle, Technical Lead in the Data Platform team at Twitter. "Since it integrates seamlessly with all the popular data analytics engines, it is very easy to create and share reports and dashboards. With its extensible architecture and a vibrant Open Source community, I am looking forward to Apache Zeppelin advancing the state of the art in data analytics and visualization."

"Apache Zeppelin is the major user-facing piece of Memcore’s in-memory data processing Cloud offering. Building a technology stack might be quite exciting engineering challenge, however, if users can’t visualize and work with the data conveniently, it is as good as not having the data at all. Apache Zeppelin enables efficient user acquisition by anyone trying to build new products or service offerings in the Big- and Fast- Data markets, making innovations, collaboration, and development easier for anyone," said Dr. Konstantin Boudnik, Founder and CEO of Memcore.io. "I am very excited to see Apache Zeppelin graduating as an ASF Top Level Project. This shows that more people are joining the community, bringing the project to a new level, and adding more integration points with existing data analytics and transactional software systems. This directly benefits the community at-large."

Apache Zeppelin originated in 2013 at NFLabs as Peloton, a commercial data analytics product. Since entering the Apache Incubator in December 2014, the project has had three releases, and twice participated in Google Summer of Code under the Apache umbrella.

"It was an honor to help with the incubation of Zeppelin," said Ted Dunning, Vice President of the Apache Incubator. "I have been very impressed with the Zeppelin community and the software they have built. I see Apache Zeppelin being adopted all over the place where people need to apply a notebook style to a wide variety of kinds of computing."

Catch Apache Zeppelin in action during Berlin Buzzwords, 7 June 2016 https://s.apache.org/mV8E

Availability and Oversight
Apache Zeppelin software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Zeppelin, visit http://zeppelin.apache.org/ and https://twitter.com/ApacheZeppelin

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Zeppelin", "Apache Zeppelin", "Ambari", "Apache Ambari", "Flink", "Apache Flink", "Hadoop", "Apache Hadoop", "Hive", "Apache Hive", "Spark", "Apache Spark", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday May 23, 2016

The Apache Software Foundation Announces Apache® TinkerPop™ as a Top-Level Project

Powerful Open Source Big Data graph computing framework in use at Amazon, DataStax, and IBM, among others.

Forest Hill, MD –23 May 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® TinkerPop™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache TinkerPop is a graph computing framework that provides developers the tools required to build modern graph applications in any application domain and at any scale.

"Graph databases and mainstream interest in graph applications have seen tremendous growth in recent years," said Stephen Mallette, Vice President of Apache TinkerPop. "Since its inception in 2009, TinkerPop has been helping to promote that growth with its Open Source graph technology stack. We are excited to now do this same work as a top-level project within the Apache Software Foundation."

As a graph computing framework for both real-time, transactional graph databases (OLTP) and and batch analytic graph processors (OLAP), TinkerPop is useful for working with small graphs that fit within the confines of a single machine, as well as massive graphs that can only exist partitioned and distributed across a multi-machine compute cluster.

TinkerPop unifies these highly varied graph system models, giving developers less to learn, faster time to development, and less risk associated with both scaling their system and avoiding vendor lock-in.

The Power to Process One Trillion Edges
The central component to Apache TinkerPop is Gremlin, a graph traversal machine and language, which makes it possible to write complex queries (called traversals) that can execute either as real-time OLTP queries, analytic OLAP queries, or a hybrid of the two.

Because the Gremlin language is separate from the Gremlin machine, TinkerPop serves as a foundation for any query language to work against any TinkerPop-enabled system. Much like the Java virtual machine is host to Java, Groovy, Scala, Clojure, and the like, the Gremlin traversal machine is already host to Gremlin, SPARQL, SQL, and various host language embeddings in Python, JavaScript, etc. Once a language is compiled to a Gremlin traversal, the Gremlin machine can evaluate it against a graph database or processor. Instantly, languages such as SPARQL can execute across a one thousand node cluster for long running analytic jobs touching large parts of the graph or sub-second queries within a small neighborhood.

Apache TinkerPop is in use at organizations such as DataStax and IBM, among many others. Amazon.com is currently using TinkerPop and Gremlin to process its order fullfillment graph which contains approximately one trillion edges.

The core Apache TinkerPop release provides production-ready, reference implementations of a number of different data systems including Neo4j (OLTP), Apache Giraph (OLAP), Apache Spark (OLAP), and Apache Hadoop (OLAP). However, the bulk of the implementations are maintained within the larger TinkerPop ecosystem. These implementations include commercial and Open Source graph databases and processors, Gremlin language variants for various programming languages on and off the Java Virtual Machine, visualization applications for graph analysis and many other tools and libraries. The TinkerPop ecosystem is richly supported with many options for developers to choose from.

TinkerPop originated in 2009 at the Los Alamos National Laboratory. After two major releases (TinkerPop1 in 2011 and TinkerPop2 in 2012), the project was submitted to the Apache Incubator in January 2015.

"Following in a long line of Apache projects that revolutionized entire industries, starting with with the Apache HTTP Server, continuing with Web Services, search, and Big Data technologies, Apache TinkerPop will no doubt reshape the Graph Computing landscape," said Hadrian Zbarcea, co-Vice President of ASF Fundraising and Incubator Mentor of Apache TinkerPop. "While TinkerPop has just graduated as an ASF Top Level Project, it is already seven years old, a mature technology, backed by a number of vendors, a vibrant community, and absolutely brilliant developers."

The project welcomes those interested in contributing to Apache TinkerPop. For more information, visit http://tinkerpop.apache.org/docs/3.2.0-incubating/dev/developer/#_contributing

Availability and Oversight
Apache TinkerPop software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache TinkerPop, visit http://tinkerpop.apache.org/ and https://twitter.com/apachetinkerpop

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, OPDi, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "TinkerPop", "Apache TinkerPop", "Apache HTTP Server", "Giraph", "Apache Giraph", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark" and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday April 25, 2016

The Apache Software Foundation Announces Apache® Apex™ as a Top-Level Project

Open Source enterprise-grade unified Big Data stream and batch processing engine for Apache Hadoop in use at GE, Silver Spring Networks, and more.

Forest Hill, MD –25 April 2016– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® Apex™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Apex is a large scale, high throughput, low latency, fault tolerant, unified Big Data stream and batch processing platform for the Apache Hadoop® ecosystem.

"It is very exciting to see Apex after nearly 4 years since inception becoming an ASF top-level project," said Thomas Weise, Vice President of Apache Apex. "It opens the strong capabilities and potential of the platform to a wider audience and we’re looking forward to a growing community to continue driving innovation in the stream processing space."

Recognized by InfoWorld for its "blazing speed and simplified programmability," Apex works in conjunction with Apache Hadoop YARN, a resource management platform for working with Hadoop clusters.

Apex was originally created at DataTorrent Inc. in 2012 (coinciding with the first alpha release of YARN), and entered the Apache Incubator in August 2015.

Apex enables streaming analytics on Apache Hadoop with an enterprise-grade platform. It has been built to leverage the underlying infrastructure provided by YARN and HDFS (Hadoop Distributed File System), including resource management, multi-tenancy and security. 

Faster to Deployment
Apache Apex meets the demands of today's Big Data applications with real-time reporting, monitoring, and learning with millisecond data point precision. Its pipeline processing architecture can be used for real-time and batch processing in a unified architecture. Apex is highly performant, linearly scalable, fault tolerant, stateful, secure, distributed, easily operable with low latency, no data loss, and exactly-once semantics.

Apex streamlines development and productization of Hadoop applications and lowers the barrier-to-entry by enabling developers to write or re-use generic Java code, minimizing the specialized expertise needed to write Big Data applications. This allows organizations to maximize developer productivity, accelerate development of business logic, and reduce time to market.

"Apache Apex is an example of the latest generation of advanced stream processing software that adds significant technology and capabilities over previous options," said Ted Dunning, Vice President of the Apache Incubator, Apache Apex Incubator Mentor, and Chief Application Architect at MapR Technologies. "That this project came to Apache and is now a fully fledged project is very exciting."

Apex comes with a comprehensive library of reusable operators (functional building blocks) that can be leveraged to quickly create new and non-trivial applications. This also includes connectors to integrate with many external systems that include message buses, databases, file systems and social media feeds. Examples are Apache Cassandra, Apache HBase, JDBC, and Apache Kafka.

"Apache Apex is a battle-hardened technology, processing huge volumes of streaming data at some of the world’s largest enterprise and Internet companies," said technology advisor Eric Baldeschwieler. "Its successful Apache incubation has provided a tremendous boost to Apex, bringing many new members to its community of users and developers."

Enterprise Grade Unified Stream and Batch Processing
Apache Apex use cases include ingestion, fast real-time analytics, data movement, Extract-Transform-Load (ETL), fast batch, alerts, and real-time actions across diverse industries such as programmatic advertising, telecommunications, Internet of Things (IoT), and financial services.

"We are in the process of leveraging Big Data technologies to transform business processes and drive more value," explained Reid Levesque, Head of Solution Engineering at a financial services company. "We chose Apex to help us in this journey to do real-time ingestion and analytics on our various data sources and now we are proud to see it graduate to an Apache top level project."

Apex powers Big Data projects in production at numerous large enterprises such as GE Predix (IoT Cloud platform for industrial data and analytics); PubMatic (marketing automation software platform for publishers), and Silver Spring Networks (IoT solutions for smart cities).

"We at GE Predix data services have used Apex for our data pipeline product and look forward to our continued usage and contribution," said Parag Goradia, Executive Director of Predix Data Services. "We had great experience with Apache Apex and its capabilities. We believe Apex has a bright future as it will continue to solve big problems in the big data industry. We are proud to be associated with this project and excited that it is now in top level status."

"The Apex community has done a great job throughout the incubation process. They have built a robust community and demonstrated a firm understanding of The Apache Way," said P. Taylor Goetz, ASF Member and Apache Apex Incubator Mentor. "I'm pleased to see Apex graduate to a top-level project. These are exciting times in the world of stream processing."

"Congratulations to the Apache Apex community for working successfully through the incubation process and becoming part of the greater Apache Hadoop ecosystem," added Dunning.

Catch Apache Apex in action at:

Availability and Oversight
Apache Apex software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache Apex, visit http://apex.apache.org/ and https://twitter.com/ApacheApex

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects wishing to join the ASF enter through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF. For more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Apex", "Apache Apex", "Cassandra", "Apache Cassandra", "HBase", "Apache HBase", "Hadoop", "Apache Hadoop", "Kafka", "Apache Kafka", "YARN", "Apache YARN", and "ApacheCon" are registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Monday March 21, 2016

The Apache® Software Foundation announces Apache PDFBox™ v2.0

Milestone release of Open Source Java tool for working with PDF documents features dozens of improvements and enhancements

Forest Hill, MD —21 March 2016— The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today the availability of Apache® PDFBox™ v2.0, the Open Source Java tool for working with Portable Document Format (PDF) documents.

PDF was first released by Adobe Systems in 1993, and became an ISO International Standard - ISO 32000-1 in 2008. Apache PDFBox allows for the creation of new PDF documents, manipulation, rendering, signing of existing documents and the ability to extract content from documents. In addition, PDFBox includes several command line utilities. In February 2015, the project became the first Open Source Partner Organization of the PDF Association.

"PDF is a very popular and easy to use format for document exchange. It is used by millions of people every day, however the format itself is quite complicated and a real challenge to write a piece of software to work with it," said Andreas Lehmkühler, Vice President of Apache PDFBox. "This new major release of PDFBox includes a lot of improvements, fixes and new features which should make the life easier for our users."

Under The Hood
The Apache PDFBox library enables users to create new PDF documents, manipulate existing documents, extract content, digitally sign, print, and validate files against the PDF/A-1b standard. Its command line utilities include encrypt, decrypt, overlay, debugger, merger, PDFToImage, and TextToPDF.

PDFBox v2.0 reflects 1,167 solved issues, 418 of which were back-ported to v1.8, as well as dozens of improvements and enhancements. Highlights include:
  • improved rendering and text extraction
  • Unicode support for PDF creation
  • overhauled interactive forms support
  • extended signing and encryption support
  • overhauled parser including a self-healing mechanism for malformed or corrupted PDFs
  • reduced memory/resources footprint including fine grained control of memory usage
  • enhanced preflight module for PDF/A-1b conformance checking
  • rearranged package structure to allow smaller runtime environments

A guide to migrating to v2.0 is available at http://pdfbox.apache.org/2.0/migration.html , with community support at http://pdfbox.apache.org/mailinglists.html

"We thank all the people from our small but fine community for their support," explained Lehmkühler. "Special thanks also goes to our fellow colleagues from the Apache Tika project for their cooperation in stress-testing with a corpus of 250,000 PDF files."

"We are grateful for the Google Summer of Code program," said PDFBox committer Tilman Hausherr. "The project allowed us to hire students to improve 3D rendering and the PDFDebugger stand-alone application, which also sped up our own bug finding." 

"Apache PDFBox v2.0 is a significant milestone as it took us several years to complete," added Lehmkühler. "This long-awaited release is the collective achievement of more than 150 individuals who have contributed code to date. Without their frequent contributions it wouldn't be possible to drive a project like PDFBox."

Availability and Oversight
Apache PDFBox software is released under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For downloads, documentation, and ways to become involved with Apache PDFBox, visit http://pdfbox.apache.org/ 

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members and 5,300 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera, Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma, LeaseWeb, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Produban, Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/ and https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "Apache PDFBox", "PDFBox", "ApacheCon", and their logos are registered trademarks or trademarks of The Apache Software Foundation in the U.S. and/or other countries. All other brands and trademarks are the property of their respective owners.

# # #

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation