Search Technology
Elasticsearch: Distributed Search dan Analytics Engine
Ahmad Fauzi
2025-04-16
6 Menit Baca
Elasticsearch adalah distributed search dan analytics engine built on Apache Lucene, part of Elastic Stack (ELK: Elasticsearch, Logstok, Kibana). Use cases: full-text search, log analytics, application performance monitoring, security analytics, business analytics. Core concepts: Index (container for documents, similar to database), Document (JSON object, basic unit), Field (key-value pair in document), Mapping (schema definition), Shard (horizontal partition untuk scalability), Replica (copy untuk availability). Architecture: cluster (multiple nodes), node (single server), index distributed across shards, shards replicated. CRUD operations: Index document (PUT/POST), Get (GET), Update (POST _update), Delete (DELETE), bulk operations untuk efficiency. Search: Query DSL (JSON-based), term queries (exact match), match queries (full-text search), bool queries (combine multiple queries: must, should, must_not, filter), range queries, aggregations (metrics, buckets). Full-text search features: tokenization, stemming, stop words, synonyms, fuzzing (typo tolerance), highlighting, autocomplete/suggestions. Analyzers: standard analyzer, language-specific analyzers, custom analyzers (character filters, tokenizer, token filters). Aggregations: metrics (avg, sum, min, max, stats), bucket aggregations (terms, histogram, date histogram), nested aggregations. Relevance scoring: TF-IDF (Term Frequency-Inverse Document Frequency), BM25 (default modern algorithm), boosting (increase relevance for certain fields/terms). Data ingestion: direct indexing via API, Logstash (ETL pipeline), Beats (lightweight shippers: Filebeat, Metricbeat, Packetbeat), Kafka integration. Kibana: visualization tool, dashboards, Discover (explore data), Canvas (custom visualizations), Lens (drag-drop analytics), alerting. Index management: index templates, lifecycle policies (ILM: hot-warm-cold-delete architecture), rollover, snapshots untuk backup. Performance optimization: proper mapping (keyword vs text), disable unnecessary features, use filters instead of queries (cacheable), use bulk API, appropriate shard sizing (20-50GB per shard), optimize refresh interval. Monitoring: cluster health (green, yellow, red), node statistics, slow logs, use monitoring stack. Security: authentication (native, LDAP, SAML), authorization (role-based), encryption (TLS), audit logging. Use cases: E-commerce search (product catalogs, faceted search), Log analysis (centralized logging, troubleshooting), APM (Application Performance Monitoring), SIEM (Security Information Event Management), metrics dan time-series data. Scale: used by Wikipedia, GitHub, Uber, Netflix; handle petabytes of data, thousands of queries/second. Challenges: resource intensive (RAM hungry), complex cluster management, cost at scale. Alternatives: Apache Solr (similar, Lucene-based), Algolia (SaaS, specialized untuk app search), Meilisearch (lightweight, open-source), OpenSearch (AWS fork of Elasticsearch). Managed services: Elastic Cloud, AWS Elasticsearch Service, Azure Elasticsearch. Learning curve: moderate, powerful query DSL requires practice. Best practices: plan mapping carefully, monitor cluster health, use aliases untuk zero-downtime reindexing, implement proper security, backup regularly. Elasticsearch transformed search dan analytics, enables real-time insights dari massive datasets, essential tool untuk modern data-driven applications.
Butuh Solusi IoT atau Smart Sensor?
Tim ahli teknis kami siap memberikan konsultasi gratis untuk proyek Anda.
Hubungi Kami