Engineering Writing
Thoughts on Building Software
Deep dives into engineering decisions, lessons learned, and practical guides for building production-grade systems.
Next.jsSystem DesignData EngineeringAI/MLBest Practices
•4 min read
That Time Your Pipeline Ran Successfully and Deleted 75% of Your Data
Your DAG completed. No errors. Success metrics green. Then your dashboard showed 75% fewer records than yesterday. Here's what happened — and why it kept happening.
•5 min read
Your ML Model Passed All Tests. Then It Failed in Production.
Model evaluation: 94% accuracy. Production: wrong predictions everywhere. Your model is fine. Your features are lying to you.
•4 min read
Why Your Flawless AI Demo Failed in Production
Your AI demo was flawless. The model answered every question. Stakeholders approved the budget. You deployed to production. Two weeks later, it's falling apart. Here's why — and it's not the model.
•6 min read
Why Every LLM Security Tool Misses Multi-Turn Attacks — And What That Costs You
Stateless tools score 0% on progressive extraction, rephrased blocked attempts, and cross-agent attacks. Here's why the architecture is the problem, and what a stateful approach looks like.
•3 min read
Stateful LLM Security: Lessons from Building StreamGuard
Building StreamGuard taught me that stateful security is table stakes for production LLM apps. Here's what I learned about session history, multi-turn attacks, and architecture decisions.
•4 min read
Real-Time Data Processing: Flink vs Kafka Streams vs Spark Streaming
Three stream processing frameworks, different strengths. Here's when to use each one based on actual production experience.
•6 min read
Why Your Kafka Pipeline Will Break When You Add an LLM to It
Most teams wire LLM calls directly into Kafka consumers. Here's why that fails in production and what to do instead.
Enjoyed these articles?
Interested in discussing ideas from these articles, or want to collaborate on content? I'd love to hear from you.