Engineering Writing

Thoughts on Building Software

Deep dives into engineering decisions, lessons learned, and practical guides for building production-grade systems.

Next.jsSystem DesignData EngineeringAI/MLBest Practices
4 min read
That Time Your Pipeline Ran Successfully and Deleted 75% of Your Data
Your DAG completed. No errors. Success metrics green. Then your dashboard showed 75% fewer records than yesterday. Here's what happened — and why it kept happening.
Data Engineering
AirflowData WarehousingOrchestration
5 min read
Your ML Model Passed All Tests. Then It Failed in Production.
Model evaluation: 94% accuracy. Production: wrong predictions everywhere. Your model is fine. Your features are lying to you.
Machine Learning
MLFeature StoresData Warehousing
4 min read
Why Your Flawless AI Demo Failed in Production
Your AI demo was flawless. The model answered every question. Stakeholders approved the budget. You deployed to production. Two weeks later, it's falling apart. Here's why — and it's not the model.
Data Engineering
KafkaStreamingData Architecture
6 min read
Why Every LLM Security Tool Misses Multi-Turn Attacks — And What That Costs You
Stateless tools score 0% on progressive extraction, rephrased blocked attempts, and cross-agent attacks. Here's why the architecture is the problem, and what a stateful approach looks like.
LLM Security
LLMSecurityAI SafetyOWASP
3 min read
Stateful LLM Security: Lessons from Building StreamGuard
Building StreamGuard taught me that stateful security is table stakes for production LLM apps. Here's what I learned about session history, multi-turn attacks, and architecture decisions.
Security
PythonLLMRedisDynamoDBFastAPI
4 min read
Real-Time Data Processing: Flink vs Kafka Streams vs Spark Streaming
Three stream processing frameworks, different strengths. Here's when to use each one based on actual production experience.
Data Engineering
KafkaFlinkSparkStreaming
6 min read
Why Your Kafka Pipeline Will Break When You Add an LLM to It
Most teams wire LLM calls directly into Kafka consumers. Here's why that fails in production and what to do instead.
5 min read
Why Architecture Matters: Lessons from a Production Outage
A deep dive into how a single architectural decision caused a weekend-long outage and what we learned.
System Design
System DesignPostgreSQLNext.jsKubernetes

Enjoyed these articles?

Interested in discussing ideas from these articles, or want to collaborate on content? I'd love to hear from you.