I've spent 5 years building streaming systems. The most common question I get: "Which stream processing framework should I use?" The answer depends on what you're actually trying to do.
The Three Contenders
Apache Flink
What it is: True stream processing engine with event time semantics, stateful transformations, and exactly-once guarantees.
Best for:
- Complex event processing (joins, aggregations, pattern detection)
- Low-latency requirements (sub-second)
- Stateful operations with long windows
- Integration with Kafka ecosystems
When I chose Flink: Building a fraud detection system that needed to flag suspicious transactions within 500ms. Flink's event time processing handled out-of-order events cleanly, and the state backend made complex aggregations manageable.
Tradeoffs:
- Steeper learning curve than Kafka Streams
- JVM tuning required at scale
- Overkill for simple transformations
Kafka Streams
What it is: Lightweight stream processing library built into Kafka. No separate cluster needed.
Best for:
- Simple stream transformations (filter, map, branch)
- Teams already using Kafka heavily
- Ecosystem simplicity (one fewer moving part)
- Java/Kotlin shops
When I chose Kafka Streams: Building a real-time analytics pipeline that needed to enrich events with customer data from a database. Simple enrichment, no complex state. Kafka Streams handled it with zero infrastructure overhead.
Tradeoffs:
- No native support for non-Kafka sources/sinks
- Less sophisticated windowing than Flink
- Doesn't scale out beyond Kafka cluster boundaries
Spark Structured Streaming
What it is: Micro-batch stream processing built on Spark engine.
Best for:
- Batch + stream unified workloads
- Teams already invested in Spark
- Higher-latency tolerances (seconds to minutes)
- Complex ML pipelines on streaming data
When I chose Spark: Building a feature pipeline for ML models that needed to join streaming clickstream data with batch user profiles. Spark's unified batch/stream API made the codebase simpler, and micro-batch latency was acceptable for the use case.
Tradeoffs:
- Micro-batch = higher latency (100ms minimum, typically 1-10s)
- Not true streaming (events processed in batches)
- Overkill for simple transformations
Decision Framework
Use this flowchart:
Need sub-second latency?
├─ Yes: Flink
└─ No: Already using Spark?
├─ Yes: Spark Streaming
└─ No: Simple transformations only?
├─ Yes: Kafka Streams
└─ No: Flink
Real-World Example: LLM Enrichment Pipeline
I recently built a pipeline that enriches Kafka events with LLM-generated summaries. Here's why I chose Flink:
Requirement: Call OpenAI API for each event (1-3s latency), maintain ordering per partition, handle retries and timeouts gracefully.
Why not Kafka Streams: No async I/O operator. Would block the consumer thread on every LLM call.
Why not Spark: Micro-batch latency would compound with LLM latency. Events would wait for the batch boundary AND the LLM call.
Why Flink: Async I/O operator handles high-latency external calls natively. Queue in-flight requests, emit results when ready, maintain ordering per partition.
Code looked like this:
AsyncFunction<String, EnrichedRecord> llmEnrichment = new LLMAsyncClient();
DataStream<EnrichedRecord> enrichedStream = AsyncDataStream.orderedWait(
rawStream,
llmEnrichment,
5000, // 5s timeout
TimeUnit.MILLISECONDS,
100 // Max concurrent requests
);
Ordered wait maintains per-partition ordering while letting LLM calls happen concurrently. Throughput of async without breaking ordering guarantees.
The "It Depends" Answer
The right tool depends on:
Latency requirements: Sub-second → Flink. Seconds → Spark or Kafka Streams.
Complexity: Simple transforms → Kafka Streams. Complex stateful ops → Flink.
Existing stack: Heavy Spark investment → Spark Streaming. Heavy Kafka → Kafka Streams or Flink.
Team skills: Java/Kotlin → Kafka Streams or Flink. Python/Scala → Spark.
Operational complexity: Want fewer moving parts → Kafka Streams. Okay with separate cluster → Flink or Spark.
What I Reach For By Default
If I'm starting fresh and requirements are unclear: Flink.
Why? It handles the simple cases (simple transformations) and the complex cases (stateful joins, event time processing). The learning curve pays for itself when requirements evolve.
If the team is already heavily invested in Spark or Kafka, I'll default to those unless there's a clear reason to switch.
The Bottom Line
All three are production-grade. The differences matter at the edges — latency, operational complexity, ecosystem integration. Pick based on your actual constraints, not hypothetical future needs.
Start simple. Kafka Streams for simple enrichment, Flink for complex stateful processing, Spark when you're already in the ecosystem. You can always migrate later if requirements change.
Need help designing your streaming architecture? Let's talk.
Tags
More Like This
Get in Touch
Have a question or want to connect? Feel free to reach out.