Workshop: [SOLD OUT] Python-Based AI Workflows - From Notebook to Production Scale

Location: Mission

Duration: 9:00am - 4:00pm

Day of week: Monday

Level: Intermediate

Prerequisites

  • A laptop with the ability to ssh into a remote machine
  • Experience with Python
  • Some familiarity with the command line

We all love the notebook environment for exploring data, developing our models, and visualizing results, and we love Python for it's huge ecosystem of AI/ML tooling and ease of use. However, if all of our work stayed in local notebooks analyzing small local data, we wouldn't be creating real value for a business at production scale. We need to understand which Python tools to use as we scale our workflows beyond the notebook, and we need to understand how to manage and distribute our work on large data.

In this workshop, we we start with a set of Jupyter notebooks implementing an example ML/AI workflow in Python. We will then modify this code to get it ready for deployment as a set of scalable data pipeline stages. In that process, we will learn about various packages, tools, and frameworks in the Python ML/AI ecosystem (even touching on things like PyTorch). These tools are enabling data scientists to run AI workflows and transform data at scale. We will also learn about how our Python processing can be deployed on infrastructure outside of our laptop with tools like Docker and Kubernetes, which are powered the largest technology companies on the planet. Each participant will deploy their own Python-based workflow in the cloud and will complete a number of related, hands-on exercises.

Key Takeaways:

  • Knowledge about the landscape of Python ML/AI tooling and how it fits into production workflows
  • Confidence in taking exploratory analysis and scaling it to large data
  • Hands on experience with a variety of Python packages and frameworks
  • Ability to solve common pain points in scaling Python workflows past a notebook environment
  • Hands on experience with one set of methods for deploying and tracking AI workflows in production

Tools Utilized:

  • Python - pandas, numpy, scikit-learn, matplotlib, PyTorch
  • Jupyter
  • An editor of your choice (e.g., vim or PyCharm)
  • Docker, Kubernetes, Pachyderm

Speaker: Daniel Whitenack

Data Scientist, Lead Developer Advocate @pachydermIO

Daniel is a Ph.D. trained data scientist working with Pachyderm (@pachydermIO). Daniel develops innovative, distributed data pipelines which include predictive models, data visualizations, statistical analyses, and more. He has spoken at conferences around the world (Datapalooza, DevFest Siberia, GopherCon, and more), teaches data science/engineering with Ardan Labs (@ardanlabs), maintains the Go kernel for Jupyter, and is actively helping to organize contributions to various open source data science projects.

Find Daniel Whitenack at

Tracks

  • Deep Learning Applications & Practices

    Deep learning lessons using tooling such as Tensorflow & PyTorch, across domains like large-scale cloud-native apps and fintech, and tacking concerns around interpretability of ML models.

  • Predictive Data Pipelines & Architectures

    Best practices for building real-world data pipelines doing interesting things like predictions, recommender systems, fraud prevention, ranking systems, and more.

  • ML in Action

    Applied track demonstrating how to train, score, and handle common machine learning use cases, including heavy concentration in the space of security and fraud

  • Real-world Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.