Apache Samza

Tuesday December 10, 2019

Announcing the release of Samza 1.3

We are thrilled to announce the release of Apache Samza 1.3.0

Today Samza forms the backbone of hundreds of real-time production applications across a multitude of companies, such as LinkedIn, VMWare, Slack, Redfin among many others. This release of Samza adds a variety of features and capabilities to Samza’s existing arsenal, coupled with improved documentation, code snippets, examples. Samza provides leading support for large-scale stateful stream processing with:

  • First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.
  • Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.
  • A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.
  • High level API for expressing complex stream processing pipelines in a few lines of code.
  • Beam Samza Runner that marries Beam’s best in class support for EventTime based windowed processing and sophisticated triggering with Samza’s stable and scalable stateful processing model.
  • A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.).
  • A Table API that provides a common abstraction for accessing remote or local databases and allowing developers are able to "join" an input event stream with such a Table.
  • Flexible deployment model for running the the applications in any hosting environment and with cluster managers other than YARN.
  • Features like canaries, upgrades and rollbacks that support extremely large deployments with minimal downtime.

New Features, Upgrades and Bug Fixes

This release brings the following features, upgrades, and capabilities (highlights):
  • Startpoint support improvement
  • Samza SQL improvement
  • Table API improvement
  • Miscellaneous bug fixes
Full list of the jiras addressed in this release can be found here.

Startpoint support improvement

  • SAMZA-2201 Startpoints - Integrate fan out with job coordinators
  • SAMZA-2215 StartpointManager fix for previous CoordinatorStreamStore refactor
  • SAMZA-2220 Startpoints - Fully encapsulate resolution of starting offsets in OffsetManager

Samza SQL improvement

  • SAMZA-2234 Samza SQL : Provide access to Samza context to the Sama SQL UDFs
  • SAMZA-2313 Samza-sql: Add validation for Samza sql statements
  • SAMZA-2354 Improve UDF discovery in samza-sql

Table API improvement

  • SAMZA-2191 support batching for remote tables
  • SAMZA-2200 Update table sendTo() and join() operation to accept additional arguments
  • SAMZA-2219 Add a dummy table read function
  • SAMZA-2309 Remote table descriptor requires read function

Miscellaneous bug fixing

  • SAMZA-2198 Containers process always takes task.shutdown.ms to shut down
  • SAMZA-2293 Propagate the watermark future to StreamOperatorTask correctly

Important Announcement

We may introduce a backward incompatible changes regarding samza job submission in the future 1.4 release. Details can be found on SEP-23: Simplify Job Runner

Sources downloads

A source download of Samza 1.3.0 is available here, and is also available in Apache’s Maven repository. Samza’s download page for details and Samza’s feature preview for new features.


It’s a great time to get involved. You can start by reviewing the tutorials, signing up for the mailing list, and grabbing some newbie JIRAs.


Post a Comment:
Comments are closed for this entry.



Hot Blogs (today's hits)

Tag Cloud