My Blog
Feb 20, 2026
The 60-Minute Protocol for Staying Sharp in the Age of AI
Jan 11, 2026
Engineers in 2026 Won’t Be Hired for Syntax. They’ll Be Hired for Leverage
Nov 26, 2025
I Built an AI Code Reviewer in a Weekend — Here’s the Exact Prompt
Sep 19, 2025
Integrating LLMs and AI Agents into Data Engineering Workflows
Sep 14, 2025
A Practical Guide to Spark Serialization and Deserialization
Aug 22, 2025
Zero-ETL & Cloud-Native Architectures: Building Real-Time Data Systems
Jul 11, 2025
Why Every Serious Data Engineer Should Understand Bloom Filters and HyperLogLog
Jul 6, 2025
Embedding-Based Retrieval Is Making Search Smarter
Jul 2, 2025
MLOps and Data Engineering: Bridging the Gap for Machine Learning Pipelines
Jun 17, 2025
Understanding Spark’s Catalyst Optimizer: Demystifying Query Optimization
May 27, 2025
Build Your First Baby Agent with OpenAI in 20 Minutes
May 21, 2025
Say Goodbye to Dirty Data: Build Trustworthy Pipelines with These Pro Tips
Apr 20, 2025
No SQL? No Problem: Ask Your Database Questions in Plain English
Apr 7, 2025
Catching Sneaky Data Drift Before It Wreaks Havoc
Mar 24, 2025
Your Spark Executors Are Wasting Memory — Here’s How to Fix It
Mar 8, 2025
Building a Data Lakehouse with Iceberg, Spark, and AWS Glue
Feb 11, 2025
From Data Lake to Lakehouse: A Migration Guide with Delta
Feb 5, 2025
Mastering CDC in Delta Tables: A Use-case in Spark
Jan 30, 2025
Indexing Strategies: B-Trees, Hash Indexes, Bitmaps & Beyond
Jan 20, 2025
Handling Bottlenecks in Spark Streaming: Lessons Learned
Jan 9, 2025
Demystifying Event-Driven Architecture with AWS
Dec 24, 2024
Hands-on Cloud: Build a Serverless To-Do List App on AWS
Dec 16, 2024
Zstd vs Snappy vs Gzip: The Compression King for Parquet Has Arrived
Dec 12, 2024
Building Real-Time ETL Pipelines with Flink? Here's How You Can Nail It!
Nov 30, 2024
Building Real-Time Recommendations with Spark, ALS, and Kafka
Nov 24, 2024
Customer 360 in E-commerce: Real-Life Use Case with Delta Lake on Databricks
Nov 18, 2024
Real-Time Use-case: Fraud Detection in Financial Transactions with Kafka and Spark Streaming
Nov 12, 2024
Preventing Data Mix-ups: Understanding Database Isolation and Concurrency Management
Nov 9, 2024
Data Engineering for ML: Building a Customer Churn Prediction Pipeline with Airflow
Nov 3, 2024
Building End-to-End Customer Insights Pipeline by Integrating Multiple Data Sources in Spark with Airflow