You are viewing content from a past/completed QCon

Presentation: People You May Know: Fast Recommendations Over Massive Data

Track: Predictive Architectures in the Real World

Location: Cyril Magnin I

Duration: 10:40am - 11:20am

Day of week: Tuesday

Share this on:

This presentation is now available to view on InfoQ.com

Watch video with transcript

Abstract

The “People You May Know” (PYMK) recommendation service helps LinkedIn’s members identify other members that they might want to connect to and is the major driver for growing LinkedIn's social network. The principal challenge in developing a service like PYMK is dealing with the sheer scale of computation needed to make precise recommendations with a high recall. PYMK service at LinkedIn has been operational for over a decade, during which it has evolved from an Oracle-backed system that took weeks to compute recommendations to a Hadoop backed system that took a few days to compute recommendations to its most modern embodiment where it can compute recommendations in near real time.

This talk will present the evolution of PYMK to its current architecture. We will focus on various systems we built along the way, with an emphasis on systems we built for our most recent architecture, namely Gaia, our real-time graph computing capability, and Venice our online feature store with scoring capability, and how we integrate these individual systems to generate recommendations in a timely and agile manner, while still being cost-efficient. We will briefly talk about the lessons learned about scalability limits of our past and current design choices and how we plan to tackle the scalability challenges for the next phase of growth.

Speaker: Sumit Rangwala

Senior Staff Software Engineer - Artificial Intelligence @LinkedIn

Sumit Rangwala is a Senior Staff Software Engineer, Artificial Intelligence, currently focusing on building scalable machine learning infrastructure at Linkedin. Over the last 15+ years, Sumit has built technologies ranging from computer networking protocol, smart grid, distributed K-V store, ML scoring library, and graph recommendation platform. Sumit earned his Masters and PhD from University of Southern California focusing on computer networking and distributed systems. 

Find Sumit Rangwala at

Speaker: Felix GV

Staff Software Engineer @LinkedIn

Felix GV is a software engineer working on LinkedIn's data infrastructure. He leads the Venice project, which sits at the interserction of offline processing, nearline processing and online data serving, in order to enable relevance engineers to push the boundaries of AI.

Besides working on Venice, Felix keeps a close eye on Hadoop, Kafka, Samza, Azkaban, Zookeeper, Helix, Avro and RocksDB. Felix likes to push the limits of scalability by gently breaking every system and library Venice depens on (:

Find Felix GV at

2019 Tracks

  • Predictive Data Pipelines & Architectures

    Case Study focused look at end to end predictive pipelines from places like Salesforce, Uber, Linkedin, & Netflix

  • Sequential Data: Natural Language, Time Series, and Sound

    Techniques, practices, and approaches around time series and sequential data. Expect topics including image recognition, NLP/NLU, preprocess, & crunching of related algorithms.

  • ML in Action

    Applied track demonstrating how to train, score, and handle common machine learning use cases, including heavy concentration in the space of security and fraud