About
I'm a Software Engineering Manager at LinkedIn, leading a team that designs, builds and operates exa-scale SQL data platforms on Trino (Presto). With 10+ years in distributed data processing, I've shipped systems that range from sub-second OLAP over petabytes to SQL data-warehousing over exabytes of data lake.
I'm a PMC member of Apache Druid and a committer at Apache Hive, with active contributions to Apache Calcite. I care deeply about query optimization, performance, and the kind of code that keeps running at 3 a.m.
Outside of computers, you'll find me on alpine granite, on a glacier, or skinning up something steep.
- Trino
- SQL
- OLAP
- Distributed Systems
- Data Infrastructure
- Apache Druid
- Apache Hive
- Apache Calcite
- Query Optimization
- Java
Education
-
PhD, Computer Science · Institut Polytechnique de Grenoble (Grenoble INP)
Multi-objective fault-tolerance scheduling for parallel and distributed systems.
Experience
-
Software Engineering Manager
Sep 2022 — PresentLinkedIn · Sunnyvale, CA
Lead the team behind LinkedIn's exa-scale Trino platform — query optimization, reliability, and capacity for the largest analytics workloads at the company.
-
Staff Software Engineer
Mar 2020 — Feb 2023LinkedIn · Sunnyvale, CA
Built and scaled core SQL infrastructure on Trino. Drove performance, query planning, and federation work.
-
Staff Software Engineer
Jun 2016 — Mar 2020Cloudera · Santa Clara, CA
Brought Apache Druid to the Hortonworks Data Platform. Deep work across Hive and Druid integration.
-
Software Engineer (Druid Project)
Oct 2014 — Oct 2016Yahoo · Urbana-Champaign, IL
Designed and implemented algorithms processing hundreds of billions of data points per day on Apache Druid.
-
Postdoctoral Researcher
Feb 2012 — Oct 2014INRIA & UIUC Joint Lab / Argonne National Laboratory · Urbana-Champaign · Greater Chicago
Failure prediction, fault tolerance, and parallel job scheduling for large-scale supercomputers.
Now what's a now page?
- Scaling LinkedIn's Trino platform — query optimization and reliability at exa-scale.
- Pushing on Apache Druid PMC work and reviewing community PRs.
- Climbing granite in the Sierras and chasing spring corn on skis.
On the Board
Six years of Tension Board sessions, logged and charted.
Projects
Public work pulled live from GitHub.
-
hive-druid-benchmark
Shellhive druid integration benchmark
-
druid_examples
some example used to drive the support training
-
avro-kafka-producer
JavaAvro Wiki fake data producer
-
.dotfiles
Shell.dot files to carry around
-
TerribleJavaTestingMadeGood
JavaBlog examples of awful test patterns and ways to improve them.
-
b-slim.github.io
HTMLpersonal web site
Publications
Selected peer-reviewed work.
-
Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing
SIGMOD 2019 — Proceedings of the 2019 International Conference on Management of Data
-
Improving the Computing Efficiency of HPC Systems Using a Combination of Proactive and Preventive Checkpointing
IEEE IPDPS 2013 — 27th International Symposium on Parallel & Distributed Processing