Presentation: pDB: Scalable Prediction Infrastructure With Precision and Provenance

Track: Predictive Data Pipelines & Architectures

Location: Mission

Duration: 4:20pm - 5:10pm

Day of week: Tuesday

Share this on:


We describe an extensible cloud independent data science platform based on Celect’s pDB framework for non-parametric machine learning. The pDB framework provides a common abstraction for almost of all machine learning problems of interest, including classification, personalization, time series predictions, linear and non linear regression. We developed an extensible and flexible data platform around the core pDB framework. This platform was borne out of the need for us to provide scalable and flexible predictive analytics solutions for Retailers and Federal Government.
In this talk, I will describe the pDB formalism associated with the platform, architectural aspects for data import/ETL, data transformation, compute and query architecture, cross-validation, cluster management, pipeline definition and workflow orchestration. We will illustrate the use of the platform through multiple use cases such as online personalization, document classification, and geospatial anomaly detection.

Speaker: Balaji Rengarajan

Senior Data Scientist @Celect

Balaji Rengarajan is responsible for architecting and engineering key aspects of the cloud- agnostic data science platform based on Celect’s pDB framework for non-parametric machine learning. From 2013 to 2016, he was the lead algorithms architect at Plume Wifi, a startup focusing on managing home WiFi access points from the cloud. Balaji was responsible for developing machine learning models and algorithms to predict the spatial traffic demands in homes as well as models for predicting interference levels and capacity on different WiFi channels. From 2009 to 2013, he held joint appointments as a researcher at Institute IMDEA networks, and University Carlos III in Madrid, Spain. Balaji received his masters and PhD from the university of Texas at Austin and is a recipient of a Marie-Curie ‘Amarout Europe Programme’ fellowship and TxTEC graduate fellowship.

Find Balaji Rengarajan at


  • Deep Learning Applications & Practices

    Deep learning lessons using tooling such as Tensorflow & PyTorch, across domains like large-scale cloud-native apps and fintech, and tacking concerns around interpretability of ML models.

  • Predictive Data Pipelines & Architectures

    Best practices for building real-world data pipelines doing interesting things like predictions, recommender systems, fraud prevention, ranking systems, and more.

  • ML in Action

    Applied track demonstrating how to train, score, and handle common machine learning use cases, including heavy concentration in the space of security and fraud

  • Real-world Data Engineering

    Showcasing DataEng tech and highlighting the strengths of each in real-world applications.