Apache Tajo

Tuesday June 30, 2015

Apache Tajo 0.10.1 Released

The Apache Tajo team is proud to announce the release of Apache 0.10.1.

Apache Tajo (TM) is a big data warehouse system for various data
sources. It provides distributed and scalable analytical SQL
processing on Apache Hadoop (TM).

The release is available for immediate download:

This is a minor release for bug fixes. For this release, Apache Tajo
team resolved about 51 issues including bug fixes, improvements, and
few new features.

Some of Highlights:

* Support multi-bytes delimiter for CSV/Text file (TAJO-1374, TAJO-1381)
* JDBC program is stuck after closing (TAJO-1619)
* INSERT INTO with wrong target columns causes NPE. (TAJO-1623)
* Add TajoStatement::setMaxRows method support (TAJO-1400)
* Fix NPE on natural join (TAJO-1574)
* Implement json_extract_path_text(string, string) function (TAJO-1529)
* CURRENT_DATE generates parsing errors sometimes. (TAJO-1386)
* Simple query doesn’t work in Web UI. (TAJO-1147)

For a complete list of new features and fixed problems, please see the
release notes:

We would like to thank the many contributors who made this release possible.

The Apache Tajo Team

Thursday May 01, 2014

Apache Tajo™ 0.8.0 Released!

The Apache Tajo™ team is pleased to announce the major release of Apache Tajo™ 0.8.0, an open source big data warehouse system on Hadoop. Apache Tajo™ provides low-latency and scalable batch SQL queries on large-data sets stored on HDFS and other data sources.

The source and binary release tarballs are available for free download:

This is the first top-level release. In this release, Apache Tajo team closes 363 issues, including 25 new features, 81 improvement, and 164 bug fixes.

Some of the highlights:

SQL support

  • Database support (TAJO-353)
  • Full/Left/Outer join (TAJO-34, TAJO-312)
  • Complex joins including derived subqueries and unions.
  • Datetime types and operator support
    • Date type support (TAJO-60, TAJO-438)
    • Time type support (TAJO-61, TAJO-439)
    • Timestamp type support (TAJO-62, TAJO-437)
    • Add datetime operation and functions
      • extract(), totimestamp(), utcusec_to()
  • SQL standard and PostgreSQL-compatible string function support
    • ascii(), length(), chr(), bitlength(string), hex(), octetlength(), reverse(), right(), left(), md5, repeat, substr(), strpos, locate(), initcap(), lpad(), rpad(), concat(), concat_ws(), ltrim(), rtrim(), btrim(), trim(), and so on.
  • SQL standard math function support
    • mod(), div(), degrees(), radians(), cbrt(), abs(), exp(), sqrt(), sin(), sign(), pow(), ceiling(), round(), floor(), ceil(), and so on.
  • Other functions
    • sha1(), digest(), findinset(), to_bin(), and so on.
  • CREATE TABLE AS (CTAS) on partition table (TAJO-460)
  • Quoted identifier support (TAJO-644)
  • Complex expression support in group-by, order-by, and having clauses
  • Distinct aggregation support (TAJO-601)
  • Explain statement support (TAJO-122)

Performance and Scalability

  • More I/O efficient M-way unbalanced external sort executor (TAJO-36, TAJO-584)
  • Reduced intermediate data volume (TAJO-435)
  • Star-schema broadcast support (TAJO-725)
  • Duplicated expression removal and more efficient projection push down (TAJO-501)
  • Reduced memory consumption in expression evaluation (TAJO-539)
  • Reduced GC overhead and memory usage (TAJO-522, TAJO-537, TAJO-544, TAJO-548)
  • I/O efficient sort-based dynamic partition store method (TAJO-574)

Storage support

  • Configurable serializer/deserializer of Text file (TAJO-424)
  • Parquet file support (TAJO-30, TAJO-714)
  • Avro storage support (TAJO-711)
  • Amazon S3 support (TAJO-577)

Integration with Hadoop ecosystems

  • Hadoop 2.2.0, 2.3.0 or 2.4.0 support
  • More improved Hive meta integration (TAJO-289, TAJO-300, TAJO-301)
  • Hive-compatible table partition (TAJO-285, TAJO-284, TAJO-338)

Client and user interfaces

  • More improved WEB UI
  • Tajo sql shell (tsql) recap
  • Linux Shell command and HDFS command support in tsql (TAJO-732)
  • Tajo JDBC Driver support and its improvements (TAJO-176, TAJO-745)
  • Fine-grained query progress indicator (TAJO-589)
  • Add killQuery feature (TAJO-305)

Release Notes:

For a complete list of new features and fixed problems, please see the release notes:

Many Thanks to contributors on the 0.8.0 releases

  • Alvin Henrick, DaeMyung Kang, David Chen, Hyoung Jun Kim, Hyunsik Choi, Ilhyun Suh, JaeHwa Jung, Jae Young Lee, Jinho Kim, Jihoon Son, Min Zhou, Keuntae Park, Seungun Choe, SeongHwa Ahn, Youngjun Park, Wan Heo.

Thursday November 21, 2013

Apache Tajo 0.2.0-incubating Released

Apache Tajo (incubating) 0.2 has been released

The Apache Tajo team is pleased to announce the release of Apache Tajo 0.2-incubating, a bid data warehouse system on Hadoop that provides low-latency and scalable ad-hoc queries and ETL on large-data sets stored on HDFS and other data sources.

This release is available for immediate download:

Apache Tajo 0.2-incubating resolved 193 issues including 73 bug fixes, 56 improvements and includes the following new features :

  * Add cost-based join optimization
  * Allow inline view use (i.e., table subquery)
  * Add various string functions, such as upper, lower, (L|R)TRIM, split_part, and regexp_replace.
  * Allow in predicate support
  * Improve significantly scan performance
  * Add INSERT OVERWRITE statement
  * Add CREATE TABLE statement
  * Add HiveQL mode
  * Allow configurable NULL character for CSVFile format
  * Allow compression/decompression of CSVFile (all codecs supported by Hadoop)
  * Add the extensible rewrite rule engine
  * Add tajo_dump, a backup and restore utility
  * Allow BETWEEN predicate
  * Add Tajo Resource Manager specialized for low-latency queries

The Apache Tajo team is looking for more developers and of course users to help grow the community and give feedback.  Mailing list information is at:

Check Apache Tajo at http://tajo.incubator.apache.org for more information.

Apache Tajo is an effort undergoing incubation at The Apache Software Foundation (ASF) sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.



Hot Blogs (today's hits)

Tag Cloud