Projects

Key engineering projects I've designed and delivered across Data Engineering, Cloud Architecture, and AI/ML — spanning petabyte-scale data platforms, real-time pipelines, and enterprise migrations.


Petabyte-Scale Audience Segmentation Framework

Mobilewalla  ·  2024–Present

Designed and built a framework for ingestion and processing of petabyte-scale data to generate custom consumer behaviour segments, enabling clients to understand consumer insights with various predictive models at scale.

SparkAWSPetabyte-scalePredictive Models

Fintech Real-Time Risk API Backend

Mobilewalla  ·  2024–Present

End-to-end Fintech API backend delivering near real-time feature and risk assessment data to clients. Multi-region, highly available, petabyte-scale backend with enhanced security, metering, and logging.

APIReal-TimeMulti-RegionHigh Availability

Feature Integrator

Mobilewalla  ·  2024–Present

Single source of truth encompassing all features required for various predictive models (including age and gender models). Reads petabyte-scale aggregate data once, eliminating redundant computation and significantly reducing cost.

Feature EngineeringMLCost OptimisationSpark

Clickstream Ingestion Pipeline

Walmart Labs  ·  2018–2019

Designed and built workflows to ingest high-volume clickstream data via Adobe Omniture into a Hive staging environment, producing primary DWH and secondary NoSQL aggregated feeds with Spark streaming tuning.

ClickstreamHiveSparkNoSQLDWH

Real-Time Analytics Platform

Deloitte  ·  2016–2018

Implemented real-time analytics with Apache Kafka & Spark Streaming. Built a custom Kafka consumer for network outage data and a scalable end-to-end data pipeline integration framework via Apache Spark and Alluxio.

KafkaSpark StreamingAlluxioReal-Time

Teradata–Hive Data Migration

Capgemini  ·  2014–2016

Led migration of enterprise data warehouse from Teradata to Hive using Sqoop connector, Oozie orchestration, and custom ELT scripts. Resolved performance bottlenecks in Hive queries through optimised joins and aggregations.

TeradataHiveSqoopOozieDWH