Apache Kafka Streams.

What is Kafka Streams

A stream is a sequence of immutable data records, that fully ordered, can be replayed, and is fault tolerance (think of a Kafka Topic as a parallel)
a stream processor is a node in the processor topology (graph). It transforms incoming streams, record by record, and may create a new stream from it.
- a source processor is a special processor that take its data directly from Kafka Topic. It has no predecessors in a topology, and dosen't transform the data.
- a sink processor is processor that does not have children, it sent the stream data directly to a Kafka Topic
A topology is a graph of processors chained together by streams.

High level DSL
- it is simple
- it has all the operations we need to transform most transformation tasks
- it contains a lot of syntax helpers to make our life easy
- it's descriptive
Low Level Processor API
- it's an imperative API
- can be used to implement the most complex logic, but it's rarely needed.

a stream application, when communicating to Kafka , is leveraging the Consumer and Producer API.
therefore, all the configurations we learned before are still applicable
bootstrap.servers need to connect to Kafka (usually port 9092)
auto.offset.reset.config set to erliestto consume the topic from start
application.id specific to streams application, will be use for:
- consumer group.id == application.id (most important one to remember)
- default client.id prefix
- prefix to internal changelog topics
- default.key.serde for serialisation and deserialization of key
- default.value.serde for serialisation and deserialization of value

printing the topology at the start of the application is helpful while developing(and even in production) at it helpful understand the application flow directly from the first lines of the logs.
Reminder: The topology represents all the streams and processors of your Streams application

running a Kafka Streams may eventually create internal intermediary topics.
two types:
- repartitioning topics: in case you start transforming the key of your stream, a repartitioning will happen at some processor.
- changelog topics: in case you perform aggregations, Kafka Streams will save compacted data in these topics.
internal topics:
- are managed by Kafka Streams
- are used by Kafka Streams to save/restore state and re-partition data
- are prefixed by application.id parameter
- should never be deleted, altered or published to.

Our input topic has 2 partitions, therefore we can launch up to 2 instances of our application in parallel without any changes in the code
this is because a Kafka Streams application relies on Kafka Consumer, and we say in the Kafka Basics course that we could add consumers to a consumer group by just running the same code.
this makes scaling super easy, without the need of any application cluster