Apache Samza

Wednesday February 22, 2017

Announcing the release of Apache Samza 0.12.0

We are excited to announce that the Apache Samza 0.12.0 has been released.

Samza has been powering real-time applications in production across several large companies (including LinkedIn, Netflix, Uber) for a few years now. Samza provides leading support for large-scale stateful stream processing with features such as:

  • First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single SSD based machine.
  • Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.
  • Minimal impact during application maintenance.
In addition to general stream processing capabilities, Samza also supports:
  • A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and outputs (HDFS, Kafka, ElastiCache etc.). This allows applications to directly process data from various event sources without mandating that the data should be moved into Kafka.
  • A fully async programming model. This allows applications that make remote calls to increase parallelism very efficiently.
  • Features like canaries, upgrades and rollbacks that support extremely large deployments.
This 0.12.0 release adds several features to Samza to improve stability, performance and ease of use. Here are some highlights of this release.

Convergence of Batch and Real-time processing in Samza:
End of Stream support: Samza has always supported streaming input sources like Kafka. In such sources, it is assumed that the incoming stream of data is infinite. Samza will now have an ‘end-of-stream’ notion to support consuming from input sources that are finite (for example, on-disk files). This enables the Samza job to shut-down gracefully when it has finished consuming all data.

HDFS Consumer: Samza now provides first-class support for consuming data from HDFS files. This enables developers to define their processing logic once, and run it in both batch and streaming environments. This feature also allows for rapid experimentation with ETL’d HDFS data using Samza without the need to write a separate Hadoop job. (SAMZA-967)

Checkpoint Notifications:
Samza can now notify the SystemConsumer when performing a checkpoint. This can enable Samza to support consumers such as: Amazon Kinesis, Amazon SQS, Azure ServiceBus Queues/Topics, Google Cloud Pub-Sub, ActiveMQ, etc., which each manage checkpointing on their own. This also enables consumers to implement smart retention policies (such as deleting data once it has been consumed). (SAMZA-1042)

Support for Yarn Node Labels:
Often Samza YARN clusters have machines that are not homogenous. For example, nodes could have different memory hardware, CPUs, spinning disks or SSDs. With this feature, users can assign “labels” to nodes in their YARN cluster and use them to specify the where their Samza job should run. This feature allows flexibility in scheduling jobs based on trade-offs in resource requirements, performance and hardware costs. For example, stateful jobs can be configured to run on nodes with SSDs while stateless jobs can be configured to run on nodes with spinning disks. (SAMZA-1013)

Bug fixes:
This release also includes several critical bug-fixes and improvements for operational stability. Some notable ones include:
  • HttpFileSystem timeout for blocking reads when localizing containers (SAMZA-1079).
  • SamzaContainer should catch all Throwables instead of only exceptions (SAMZA-1077).
  • Deadlock between KafkaSystemProducer and KafkaProducer from kafka-clients lib (SAMZA-1069).
  • Change the commit order to support at least once processing when deduping with local store (SAMZA-1065).
  • Upgraded Kafka version to 0.10. This enables us to take advantage of the critical fixes and improvements in Kafka.
  • Upgraded to Jetty 9 from Jetty 8.
  • Full support for Scala 2.11. All Samza jars will now have the scala version as 2.11 as a part of their file name. For example, samza-yarn_2.11-0.12.jar.
  • Samza is now source compatible with JDK 8 and above. Older JDKs are no longer supported.
Community Developments:
We made great community progress since the last release. We had two successful meetups where we presented Samza’s roadmap, and how Optimizely uses Samza. Several Samza use-cases in Uber and LinkedIn were featured in QCon 2016. Future:
There are a lot of exciting features to expect in our future release. Here are some highlights:
  • Support for Disk quota enforcement and throttling (SAMZA-956)
  • Support for high-level programming API for stream processing (SAMZA-1073)
  • Support for running Samza in stand-alone mode (SAMZA-516)
It’s a great time to get involved. You can start by reviewing the hello-samza tutorial, signing up for the mailing list, and grabbing some newbie JIRAs. I'd like to close by thanking everyone who's been involved in the project. It's been a great experience to be involved in this community, and I look forward to its continued growth.


Awesome update. Loved it.

Posted by Appeven on December 04, 2017 at 10:08 AM GMT #

It is very important to me. Thanks for the information you shared. Awaiting for more posts like this.!

Posted by clickerheroes.co on December 07, 2017 at 07:19 AM GMT #

I Genuinely Appreciated Understanding It. Sitting Tight For Some More Incredible Articles Like This From You In The Nearing Days

Posted by 写作服务 on September 17, 2018 at 07:18 AM GMT #

Here you will be glad to play free online spider solitaire games as these games are one of the best gaming website http://onlinesolitaire.wikidot.com/about which will help you to enjoy these games in the best quality and will also let you remove your all stress.

Posted by Hard Work on September 24, 2018 at 07:32 AM GMT #

Awesome update. Loved it.

Posted by Wortlo on January 28, 2019 at 01:16 PM GMT #

As far as I believe, this is a very nice update just like fmwhatsapp. It is now one of the finest whatsapp mods ever in the history since it has thousands of features already and few more to be added very soon. Download it from https://fmwhatsappapk.com/download/

Posted by fmwhatsapp on April 01, 2019 at 04:13 AM GMT #

thank you so much

Posted by iphongthuy net on April 03, 2019 at 01:54 AM GMT #

It is very important to me. Thanks for the information you shared. Awaiting for more posts like this.!

Posted by Malveen on April 19, 2019 at 03:46 AM GMT #

It is very important to me. Thanks for the information you shared. Awaiting for more posts like this.!

Posted by www.megadede.ws on May 09, 2019 at 12:33 PM GMT #

Gue bisa konfirmasi nama2 itu sering muncul di transfertalk forum luar Termasuk juga Mahrez yang sekarang sudah melayang ke kleb lain Pilihan makin terbatas, makanya MU harus gerak cepet sebelum incerannya diserobot kleb lain http://nobartv.net/ http://www.elangnews.com/ http://mikangift.com/ https://www.bolagpsport.com/ http://nontontvbola.space/

Posted by siaran langsung bola on May 10, 2019 at 06:46 AM GMT #

Post a Comment:
  • HTML Syntax: NOT allowed



Hot Blogs (today's hits)

Tag Cloud