Workshop: Agile Streams With Apache Kafka

Location: Cyril Magnin II

Duration: 9:00am - 4:00pm

Day of week: Monday

Level: Beginner


No Prerequisites

Apache Kafka is a de facto standard streaming data processing platform, being widely deployed as a messaging system, and having a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams) to meet the needs that commonly attend real-time message processing. In 2017, Confluent open sourced KSQL. A declarative, SQL-like stream processing language that lets you define stream processing applications easily. This allows rapid development of streaming applications and—more importantly—rapid iteration and improvement.

In this workshop we will explore best practices and architectural patterns of modern data integration with Apache Kafka and its ecosystem. This is one of the main use cases of Apache Kafka, and the use of Kafka allows integrating data systems and microservices in completely new ways. This leads to more performant, flexible, and robust data integrations. The workshop will combine theoretical discussion of best practices, lessons learned, and architecture patterns that we found useful, together with hands-on experimentation with a variety of projects from the Apache Kafka Ecosystem.

What you'll learn:

  • Apache Kafka basics
  • Data modeling for Apache Kafka
  • Use of Schemas and Schema Registry
  • Importance of Stream-Table duality for data integration
  • Stream enrichment and Stream-Join patterns
  • Rapid development of stream processing with KSQL
  • Power and Flexibility of Kafka’s Streams APIs
  • Hipster Stream Processing
  • Best practices for taking data pipelines to production
  • Common mistakes to avoid

Speaker: Tim Berglund

Senior Director of Developer Experience @Confluent

Tim is a teacher, author, and technology leader with Confluent, where he serves as the Senior Director of Developer Experience. He can frequently be found at speaking at conferences in the United States and all over the world. He is the co-presenter of various O’Reilly training videos on topics ranging from Git to Distributed Systems, and is the author of Gradle Beyond the Basics. He tweets as @tlberglund, blogs very occasionally at, and lives in Littleton, CO, USA with the wife of his youth and their youngest child, the other two having mostly grown up.

Find Tim Berglund at

Speaker: Gwen Shapira

Principal Data Architect @Confluent, PMC Member @Kafka, & Committer Apache Sqoop

Gwen is a principal data architect at Confluent helping customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating microservices, relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects. When Gwen isn't coding or building data pipelines, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.

Find Gwen Shapira at

Proposed Tracks

  • Real-World Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.

  • Deep Learning Applications & Practices

    Deep learning lessons using Tensorflow, Keras, PyTorch, Caffe across machine translation, computer vision.

  • AI Meets the Physical World

    The track where AI touches the physical world, think drones, ROS, NVidea, TPU and more.

  • Data Architectures You've Always Wondered About

    How did they do that? Real-time predictive pipelines at places like Uber, Self-Driving Cars at Google, Robotic Warehouses from Ocado in the UK, are all possible examples.

  • Applied ML for Software

    Practical machine learning inside the data centers and on software engineering teams.

  • Time Series Patterns & Practices

    Stocks, ad tech/real-time bidding, and anomaly detection. Patterns and practices for more effective Time Series work.