You are viewing content from a past/completed QCon

Track: ML in Action

Location: Cyril Magnin III

Day of week: Tuesday

Applied Machine Learning track demonstrating how to train, score, and handle security and fraud use cases.

Track Host: Soups Ranjan

Director of Data Science @Coinbase

Soups Ranjan is the Director of Data Science at Coinbase, one the largest bitcoin exchanges in the world. He manages the Risk & Data Science team that is chartered with preventing avoidable losses to the company due to payment fraud or account takeovers. Soups has a PhD in ECE on network security from Rice University. He has previously led the development of Machine Learning pipelines to improve performance advertising at Yelp and Flurry. He is the founder of RiskSalon.org, a round-table forum for risk professionals in San Francisco to share ideas on stopping bad actors.

10:40am - 10:50am

When Do You Use ML vs. a Rules Based System?

Soups Ranjan, Director of Data Science @Coinbase

11:00am - 11:50am

Counterfactual Evaluation of Machine Learning Models

Stripe processes billions of dollars in payments a year and uses machine learning to detect and stop fraudulent transactions. Like models used for ad and search ranking, Stripe's models don't just score—they dictate actions that directly change outcomes. High-scoring transactions are blocked before they can ever get refunded or disputed by the card holder. Deploying an initial model that successfully blocks a substantial amount of fraud is a great first step, but since your model is altering outcomes, subsequent parts of the modeling process become more difficult:

  • How do you evaluate the model? You can't observe the eventual outcomes of the transactions you block (would they have been refunded or disputed?) or the ads you didn't show (would they have been clicked?) In general, how do you quantify the difference between the world with the model and the world without it?
  • How do you train new models? If your current model is blocking a lot of transactions, you have substantially fewer samples of fraud for your new training set. Furthermore, if your current model detects and blocks some types of fraud more than others, any new model you train will be biased towards detecting that residual fraud. Ideally, new models would be trained on the "unconditional" distribution that exists in the absence of the original model.

In this talk, I'll describe how injecting a small amount of randomness in the production scoring environment allows you to answer these questions. We'll see how to obtain estimates of precision and recall (standard measures of model performance) from production data and how to approximate the distribution of samples that would exist in a world without the original model so that new models can be trained soundly.

Michael Manapat, Head of Conversion Products @Stripe

12:50pm - 1:00pm

JupyterLab: The Next Generation Jupyter Web Interface

Jason Grout, Scientific Software Developer @Bloomberg & JupyterLab / Sage Core Contributor

1:10pm - 2:00pm

Measuring Business Impact of Machine Learning System

Jevin Bhorania, Cash Data Science Lead @Square

2:25pm - 2:35pm

Machine Learning: Predicting Demand in Fashion

Apparel/fashion retailers often have to buy inventory more than a quarter in advance and so have to make bets on the total demand that they expect to see in the relevant season. Also, the set of products offered by the brands change every year, and even the historical demand for previous season’s products is known only partially as each product is carried in only a subset of the stores.
In this talk, we will show how we (at Celect) use the historical data (point of sales transaction, inventory, product attributes, product images, product descriptions) to build a SaaS solution that helps buyers and merchants predict the future demand of products for the upcoming season. The short talk will cover the real life problem statement, high level ML frameworks, and how the product is used by buyers and merchants.

Ritesh Madan, VP Engineering @celect

4:00pm - 4:10pm

Optimizing Fraud Model Thresholds @Airbnb

Dave Press, Data Science Manager @Airbnb

4:20pm - 5:10pm

Machine-Learning for Trust & Safety at Airbnb

In this talk, I will review some of the Trust & Safety challenges faced by Airbnb and other peer-to-peer marketplaces. Getting a deep understanding of the user’s identity is the foundation of trust for such marketplaces, where transactions are born online, but transition to offline and often intimate interactions. We shall cover the three crucial stages of establishing trustworthiness of a user:
(1) “verification” of the user’s identity;(2) “screening” the past of the user;(3) “predicting” the future risk in the behavior of this user.
We shall focus on the machine-learning challenges in each of these stages, and some of the solutions that have proven successful at Airbnb and Trooly.

Anish Das Sarma, Engineer Manager @Airbnb

2019 Tracks

  • Groking Timeseries & Sequential Data

    Techniques, practices, and approaches around time series and sequential data. Expect topics including image recognition, NLP/NLU, preprocess, & crunching of related algorithms.

  • Deep Learning in Practice

    Deep learning use cases around edge computing, deep learning for search, explainability, fairness, and perception.