Presentation: pDB: Scalable Prediction Infrastructure With Precision and Provenance

Track: Predictive Data Pipelines & Architectures

Location: Mission

Duration: 4:20pm - 5:10pm

Day of week: Tuesday

Share this on:


We describe an extensible cloud independent data science platform based on Celect’s pDB framework for non-parametric machine learning. The pDB framework provides a common abstraction for almost of all machine learning problems of interest, including classification, personalization, time series predictions, linear and non linear regression. We developed an extensible and flexible data platform around the core pDB framework. This platform was borne out of the need for us to provide scalable and flexible predictive analytics solutions for Retailers and Federal Government.
In this talk, I will describe the pDB formalism associated with the platform, architectural aspects for data import/ETL, data transformation, compute and query architecture, cross-validation, cluster management, pipeline definition and workflow orchestration. We will illustrate the use of the platform through multiple use cases such as online personalization, document classification, and geospatial anomaly detection.

Speaker: Balaji Rengarajan

Senior Data Scientist @Celect

Balaji Rengarajan is responsible for architecting and engineering key aspects of the cloud- agnostic data science platform based on Celect’s pDB framework for non-parametric machine learning. From 2013 to 2016, he was the lead algorithms architect at Plume Wifi, a startup focusing on managing home WiFi access points from the cloud. Balaji was responsible for developing machine learning models and algorithms to predict the spatial traffic demands in homes as well as models for predicting interference levels and capacity on different WiFi channels. From 2009 to 2013, he held joint appointments as a researcher at Institute IMDEA networks, and University Carlos III in Madrid, Spain. Balaji received his masters and PhD from the university of Texas at Austin and is a recipient of a Marie-Curie ‘Amarout Europe Programme’ fellowship and TxTEC graduate fellowship.

Find Balaji Rengarajan at

Proposed Tracks

  • Real-World Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.

  • Deep Learning Applications & Practices

    Deep learning lessons using Tensorflow, Keras, PyTorch, Caffe across machine translation, computer vision.

  • AI Meets the Physical World

    The track where AI touches the physical world, think drones, ROS, NVidea, TPU and more.

  • Data Architectures You've Always Wondered About

    How did they do that? Real-time predictive pipelines at places like Uber, Self-Driving Cars at Google, Robotic Warehouses from Ocado in the UK, are all possible examples.

  • Applied ML for Software

    Practical machine learning inside the data centers and on software engineering teams.

  • Time Series Patterns & Practices

    Stocks, ad tech/real-time bidding, and anomaly detection. Patterns and practices for more effective Time Series work.