Kafka is a distributed messaging system for collecting and delivering high volumes of log data with low latency
stream processing: Flink
JMS messaging system offers rich set of delivery guarantees, which are overkill for collecting log data. LinkedIn is okay with lossing some data.
Log = segment files of 1GB
Broker keeps in-memory sorted list of offsets (indexing)
avoid explictly caching messages in memory but instead relies on underlying filesystem's page cache.
Stateless broker
consumer groups == load balancing concept in DDIA
Kafka uses Fixed number of partition as the solution.
Every consumer owns 1 partition ⇒ avoid locking and state maintainance