Data Engineering
Apache Kafka: Distributed Event Streaming Platform
Rini Susanti
2025-04-30
6 Menit Baca
Apache Kafka adalah distributed event streaming platform, developed by LinkedIn, open-sourced 2011, part of Apache. Use cases: messaging system, activity tracking, log aggregation, stream processing, event sourcing, commit log. Core concepts: Topics (categories of messages), Partitions (scalability unit, ordered within partition), Producers (publish messages), Consumers (subscribe to topics), Consumer Groups (parallel consumption), Brokers (Kafka servers), Cluster (multiple brokers). Architecture: distributed system, horizontal scaling, replication for fault tolerance, ZooKeeper for coordination (KRaft mode removes ZooKeeper dependency). Messages: key-value pairs, keys determine partition, immutable once written, retention configurable (time/size based). Producers: send messages to topics, choose partition (via key hash or custom partitioner), acknowledgment levels (acks: 0/1/all), batching untuk efficiency, compression (gzip, snappy, lz4). Consumers: subscribe to topics, pull model, track offset (position dalam partition), commit offsets (auto/manual), consumer groups enable parallelism. Partitions: enable parallelism (one consumer per partition dalam group), ordering guaranteed within partition not across, increase partitions untuk higher throughput. Replication: leader-follower model, replication factor (typically 3), in-sync replicas (ISR), leader handles reads/writes. Streams API: build stream processing applications, transformations (map, filter, aggregate), joins (stream-stream, stream-table), windowing (tumbling, hopping, session). Kafka Connect: integrate Kafka dengan external systems, source connectors (import data into Kafka), sink connectors (export dari Kafka), hundreds of connectors available, configuration-based (no coding). Schema Registry: manage schemas (Avro, Protobuf, JSON), schema evolution, compatibility checks, integrate dengan producers/consumers. Exactly-once semantics: idempotent producers, transactional writes, read committed isolation. Performance: millions messages/second, low latency (<10ms), high throughput, horizontal scaling, efficient storage (sequential I/O). Use cases: Real-time analytics (process events as they arrive), Log aggregation (centralize logs dari multiple services), Event sourcing (store state changes as events), CDC (Change Data Capture dari databases), Microservices communication, IoT data ingestion. Monitoring: metrics (Kafka Manager, Confluent Control Center, Prometheus exporters), lag monitoring (consumer group lag), JMX metrics. Configuration: broker configs (log retention, replication, ISR), producer configs (batching, compression, acks), consumer configs (fetch size, session timeout). Best practices: appropriate partition count, replication factor 3, monitoring, proper retention policies, use consumer groups, idempotent producers. Challenges: operational complexity, ZooKeeper dependency (mitigated by KRaft), ordering across partitions, consumer rebalancing, schema management. Managed services: Confluent Cloud, AWS MSK, Azure Event Hubs (Kafka-compatible), Aiven. Kafka Streams vs alternatives: Apache Flink (more mature stream processing), Apache Spark Streaming (batch-oriented), AWS Kinesis (fully managed, simpler). Ecosystem: ksqlDB (SQL untuk stream processing), Kafka REST Proxy, Schema Registry, multiple language clients. Real-world: LinkedIn, Uber, Netflix, Airbnb process trillions messages/day. Learning curve: moderate to steep, distributed systems concepts important, operational knowledge critical. Security: SSL/TLS encryption, SASL authentication, ACLs authorization, encryption at rest. Kafka transformed how companies handle real-time data, foundational untuk event-driven architectures, essential skill untuk data engineers, backend developers. High demand, competitive salaries. Kafka Stream processing paradigm shift dari batch processing.
Butuh Solusi IoT atau Smart Sensor?
Tim ahli teknis kami siap memberikan konsultasi gratis untuk proyek Anda.
Hubungi Kami