You are viewing content from a past/completed QCon

Workshop: Agile Streams With Apache Kafka

Location: Cyril Magnin II

Duration: 9:00am - 4:00pm

Day of week: Monday

Level: Beginner


No Prerequisites

Apache Kafka is a de facto standard streaming data processing platform, being widely deployed as a messaging system, and having a robust data integration framework (Kafka Connect) and stream processing API (Kafka Streams) to meet the needs that commonly attend real-time message processing. In 2017, Confluent open sourced KSQL. A declarative, SQL-like stream processing language that lets you define stream processing applications easily. This allows rapid development of streaming applications and—more importantly—rapid iteration and improvement.

In this workshop we will explore best practices and architectural patterns of modern data integration with Apache Kafka and its ecosystem. This is one of the main use cases of Apache Kafka, and the use of Kafka allows integrating data systems and microservices in completely new ways. This leads to more performant, flexible, and robust data integrations. The workshop will combine theoretical discussion of best practices, lessons learned, and architecture patterns that we found useful, together with hands-on experimentation with a variety of projects from the Apache Kafka Ecosystem.

What you'll learn:

  • Apache Kafka basics
  • Data modeling for Apache Kafka
  • Use of Schemas and Schema Registry
  • Importance of Stream-Table duality for data integration
  • Stream enrichment and Stream-Join patterns
  • Rapid development of stream processing with KSQL
  • Power and Flexibility of Kafka’s Streams APIs
  • Hipster Stream Processing
  • Best practices for taking data pipelines to production
  • Common mistakes to avoid

Speaker: Gwen Shapira

Principal Data Architect @Confluent, PMC Member @Kafka, & Committer Apache Sqoop

Gwen is a principal data architect at Confluent helping customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating microservices, relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of “Kafka - the Definitive Guide”, "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects. When Gwen isn't coding or building data pipelines, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.

Find Gwen Shapira at

Speaker: Tim Berglund

Senior Director of Developer Experience @Confluent

Tim is a teacher, author, and technology leader with Confluent, where he serves as the Senior Director of Developer Experience. He can frequently be found at speaking at conferences in the United States and all over the world. He is the co-presenter of various O’Reilly training videos on topics ranging from Git to Distributed Systems, and is the author of Gradle Beyond the Basics. He tweets as @tlberglund, blogs very occasionally at, and lives in Littleton, CO, USA with the wife of his youth and their youngest child, the other two having mostly grown up.

Find Tim Berglund at

2019 Tracks

  • Sequential Data: Natural Language, Time Series, and Sound

    Techniques, practices, and approaches around time series and sequential data. Expect topics including image recognition, NLP/NLU, preprocess, & crunching of related algorithms.

  • ML in Action

    Applied track demonstrating how to train, score, and handle common machine learning use cases, including heavy concentration in the space of security and fraud

  • Deep Learning in Practice

    Deep learning use cases around edge computing, deep learning for search, explainability, fairness, and perception.