Projects
Key engineering projects I've designed and delivered across Data Engineering, Cloud Architecture, and AI/ML — spanning petabyte-scale data platforms, real-time pipelines, and enterprise migrations.
Petabyte-Scale Audience Segmentation Framework
Mobilewalla · 2024–Present
Designed and built a framework for ingestion and processing of petabyte-scale data to generate custom consumer behaviour segments, enabling clients to understand consumer insights with various predictive models at scale.
Fintech Real-Time Risk API Backend
Mobilewalla · 2024–Present
End-to-end Fintech API backend delivering near real-time feature and risk assessment data to clients. Multi-region, highly available, petabyte-scale backend with enhanced security, metering, and logging.
Feature Integrator
Mobilewalla · 2024–Present
Single source of truth encompassing all features required for various predictive models (including age and gender models). Reads petabyte-scale aggregate data once, eliminating redundant computation and significantly reducing cost.
Clickstream Ingestion Pipeline
Walmart Labs · 2018–2019
Designed and built workflows to ingest high-volume clickstream data via Adobe Omniture into a Hive staging environment, producing primary DWH and secondary NoSQL aggregated feeds with Spark streaming tuning.
Real-Time Analytics Platform
Deloitte · 2016–2018
Implemented real-time analytics with Apache Kafka & Spark Streaming. Built a custom Kafka consumer for network outage data and a scalable end-to-end data pipeline integration framework via Apache Spark and Alluxio.
Teradata–Hive Data Migration
Capgemini · 2014–2016
Led migration of enterprise data warehouse from Teradata to Hive using Sqoop connector, Oozie orchestration, and custom ELT scripts. Resolved performance bottlenecks in Hive queries through optimised joins and aggregations.