You are viewing content from a past/completed QCon

Presentation: pDB: Scalable Prediction Infrastructure With Precision and Provenance

Track: Predictive Data Pipelines & Architectures

Location: Mission

Duration: 4:20pm - 5:10pm

Day of week: Tuesday

Share this on:


We describe an extensible cloud independent data science platform based on Celect’s pDB framework for non-parametric machine learning. The pDB framework provides a common abstraction for almost of all machine learning problems of interest, including classification, personalization, time series predictions, linear and non linear regression. We developed an extensible and flexible data platform around the core pDB framework. This platform was borne out of the need for us to provide scalable and flexible predictive analytics solutions for Retailers and Federal Government.
In this talk, I will describe the pDB formalism associated with the platform, architectural aspects for data import/ETL, data transformation, compute and query architecture, cross-validation, cluster management, pipeline definition and workflow orchestration. We will illustrate the use of the platform through multiple use cases such as online personalization, document classification, and geospatial anomaly detection.

Speaker: Balaji Rengarajan

Senior Data Scientist @Celect

Balaji Rengarajan is responsible for architecting and engineering key aspects of the cloud- agnostic data science platform based on Celect’s pDB framework for non-parametric machine learning. From 2013 to 2016, he was the lead algorithms architect at Plume Wifi, a startup focusing on managing home WiFi access points from the cloud. Balaji was responsible for developing machine learning models and algorithms to predict the spatial traffic demands in homes as well as models for predicting interference levels and capacity on different WiFi channels. From 2009 to 2013, he held joint appointments as a researcher at Institute IMDEA networks, and University Carlos III in Madrid, Spain. Balaji received his masters and PhD from the university of Texas at Austin and is a recipient of a Marie-Curie ‘Amarout Europe Programme’ fellowship and TxTEC graduate fellowship.

Find Balaji Rengarajan at

2019 Tracks

  • Groking Timeseries & Sequential Data

    Techniques, practices, and approaches around time series and sequential data. Expect topics including image recognition, NLP/NLU, preprocess, & crunching of related algorithms.

  • Deep Learning in Practice

    Deep learning use cases around edge computing, deep learning for search, explainability, fairness, and perception.