My Blog

Feb 20, 2026

The 60-Minute Protocol for Staying Sharp in the Age of AI

Jan 11, 2026

Engineers in 2026 Won’t Be Hired for Syntax. They’ll Be Hired for Leverage

Nov 26, 2025

I Built an AI Code Reviewer in a Weekend — Here’s the Exact Prompt

Sep 19, 2025

Integrating LLMs and AI Agents into Data Engineering Workflows

Sep 14, 2025

A Practical Guide to Spark Serialization and Deserialization

Aug 22, 2025

Zero-ETL & Cloud-Native Architectures: Building Real-Time Data Systems

Jul 11, 2025

Why Every Serious Data Engineer Should Understand Bloom Filters and HyperLogLog

Jul 6, 2025

Embedding-Based Retrieval Is Making Search Smarter

Jul 2, 2025

MLOps and Data Engineering: Bridging the Gap for Machine Learning Pipelines

Jun 17, 2025

Understanding Spark’s Catalyst Optimizer: Demystifying Query Optimization

May 27, 2025

Build Your First Baby Agent with OpenAI in 20 Minutes

May 21, 2025

Say Goodbye to Dirty Data: Build Trustworthy Pipelines with These Pro Tips

Apr 20, 2025

No SQL? No Problem: Ask Your Database Questions in Plain English

Apr 7, 2025

Catching Sneaky Data Drift Before It Wreaks Havoc

Mar 24, 2025

Your Spark Executors Are Wasting Memory — Here’s How to Fix It

Mar 8, 2025

Building a Data Lakehouse with Iceberg, Spark, and AWS Glue

Feb 11, 2025

From Data Lake to Lakehouse: A Migration Guide with Delta

Feb 5, 2025

Mastering CDC in Delta Tables: A Use-case in Spark

Jan 30, 2025

Indexing Strategies: B-Trees, Hash Indexes, Bitmaps & Beyond

Jan 20, 2025

Handling Bottlenecks in Spark Streaming: Lessons Learned

Jan 9, 2025

Demystifying Event-Driven Architecture with AWS

Dec 24, 2024

Hands-on Cloud: Build a Serverless To-Do List App on AWS

Dec 16, 2024

Zstd vs Snappy vs Gzip: The Compression King for Parquet Has Arrived

Dec 12, 2024

Building Real-Time ETL Pipelines with Flink? Here's How You Can Nail It!

Nov 30, 2024

Building Real-Time Recommendations with Spark, ALS, and Kafka

Nov 24, 2024

Customer 360 in E-commerce: Real-Life Use Case with Delta Lake on Databricks

Nov 18, 2024

Real-Time Use-case: Fraud Detection in Financial Transactions with Kafka and Spark Streaming

Nov 12, 2024

Preventing Data Mix-ups: Understanding Database Isolation and Concurrency Management

Nov 9, 2024

Data Engineering for ML: Building a Customer Churn Prediction Pipeline with Airflow

Nov 3, 2024

Building End-to-End Customer Insights Pipeline by Integrating Multiple Data Sources in Spark with Airflow

Showing 30 of 35 articles