What is the ELK Stack and how does it work?

The ELK Stack (Elastic Stack) consists of Elasticsearch (distributed search and analytics engine for storing and querying data), Logstash (data processing pipeline for collecting, transforming, and shipping logs with 50+ input/output plugins), Kibana (visualisation platform for dashboards, alerting, and data exploration), and Beats (lightweight data shippers: Filebeat for logs, Metricbeat for metrics, APM agents for tracing). Data flows from applications through Beats/Logstash to Elasticsearch, with Kibana providing the query and visualisation layer.

How do you integrate Node.js applications with ELK?

Use Winston or Pino logger with Elastic Common Schema (ECS) format for structured JSON logging. Deploy Filebeat to ship log files to Logstash/Elasticsearch. Install Elastic APM Node.js agent (single require line) for automatic transaction tracing, database query monitoring, and error tracking. Propagate correlation IDs via middleware for cross-service request tracing. Use AsyncLocalStorage for automatic log context enrichment.

How do you integrate Java applications with ELK?

Configure Logback with ECS Encoder (co.elastic.logging:logback-ecs-logging) for structured JSON output. Use SLF4J MDC for request-scoped context (trace IDs, user IDs). Deploy Filebeat or ship directly to Logstash. Install Elastic APM Java agent for automatic Spring Boot instrumentation. Export JVM metrics via Micrometer + Elastic registry for heap, GC, and thread pool monitoring.

How do you scale an ELK cluster for production workloads?

Separate node roles (master, data, ingest, coordinating) to prevent resource contention. Implement hot-warm-cold architecture with ILM policies for automatic tiering. Target 20-50GB per shard with daily index rollover. Scale Logstash horizontally behind a load balancer. Use Kafka as an ingestion buffer for traffic spikes. Enable cross-cluster search for multi-region deployments rather than replicating data.

How do you secure an ELK Stack deployment?

Enable TLS for inter-node transport and HTTP API communication. Configure RBAC with role-based access (read-only for dashboards, write for ingestion, admin for management). Integrate with LDAP/AD or SAML for SSO. Use API keys for programmatic access. Configure audit logging for compliance. Implement network-level security with VPC/firewall rules restricting access to cluster ports (9200/9300).

Setting Up a Complete ELK Stack to Monitor Distributed Systems with Node.js and Java

Introduction: Why the ELK Stack Powers Modern Observability

The ELK Stack — Elasticsearch, Logstash, and Kibana (now part of the Elastic Stack with Beats) — remains the most widely deployed open-source observability platform, processing petabytes of log data daily across organisations from startups to Fortune 500 enterprises. For distributed systems built with Node.js microservices and Java backend services, centralised logging is not optional — it's the foundation of debugging, performance monitoring, and incident response.

In 2025, the Elastic Stack has evolved with Elastic Agent for unified data collection, cross-cluster search for multi-region deployments, and Elastic Security for SIEM integration. This guide covers Elasticsearch cluster architecture, Logstash pipeline design, Beats data shipping, Kibana dashboard creation, Node.js and Java integration patterns, cluster scaling strategies, alerting configuration, and security hardening for production ELK deployments.

Elasticsearch Cluster Architecture and Index Design

Design Elasticsearch clusters for reliability, performance, and cost efficiency:

Node Roles: Configure dedicated node roles — master-eligible nodes (3 minimum for quorum, lightweight) manage cluster state, data nodes store and search indices (CPU/memory-intensive), ingest nodes run preprocessing pipelines, and coordinating nodes route requests and aggregate results. Separating roles prevents resource contention — a data node running heavy searches shouldn't destabilise cluster management.
Index Lifecycle Management (ILM): Configure ILM policies for automatic index management — hot phase (active writes, fast SSD storage), warm phase (read-only, standard storage), cold phase (infrequent access, compressed), and delete phase (TTL-based removal). A typical log retention policy: 7 days hot, 30 days warm, 90 days cold, delete after 365 days.
Shard Strategy: Each index splits into shards (default 1 primary + 1 replica). Target 20-50GB per shard for optimal performance — too many small shards waste memory (each shard consumes ~500MB heap), too few large shards create hot spots. For time-series data, use data streams with daily rollover to maintain consistent shard sizes.
Index Templates and Mappings: Define index templates with explicit field mappings — avoid dynamic mapping in production (it creates text + keyword multi-fields for every string, doubling storage). Use keyword for log levels, hostnames, and IDs; text only for fields requiring full-text search; and date for timestamps with strict_date_optional_time format.
Cross-Cluster Search: For multi-region deployments, use cross-cluster search (CCS) to query indices across clusters without data replication. Configure remote cluster connections in elasticsearch.yml — Kibana dashboards transparently query all clusters for unified visibility.

Logstash Pipeline Design: Input, Filter, and Output

Build data processing pipelines that transform raw logs into structured data:

Input Plugins: Logstash supports 50+ input plugins — beats (receive from Filebeat/Metricbeat), kafka (consume from Kafka topics for high-throughput buffering), http (receive JSON payloads via HTTP), jdbc (poll databases for change data), and syslog (receive RFC5424 messages). For high-volume deployments, use Kafka as a buffer between Beats and Logstash to handle traffic spikes.
Filter Plugins: Transform and enrich log data — grok parses unstructured text into structured fields using regex patterns (200+ built-in patterns for Apache, Syslog, Java stack traces), mutate renames/removes/converts fields, date parses timestamps into @timestamp, geoip adds geolocation data from IP addresses, and dissect provides faster parsing for delimited log formats.
Output Plugins: Route processed data — elasticsearch (primary output with index naming, pipeline routing, and bulk indexing), stdout (debugging), file (archive to disk), kafka (forward to downstream consumers), and s3 (long-term archive). Use conditional outputs to route different log types to different indices or destinations.
Pipeline Configuration: Optimise Logstash throughput with pipeline.workers (match CPU cores), pipeline.batch.size (increase for higher throughput, default 125), and pipeline.batch.delay (latency vs throughput tradeoff). Monitor pipeline metrics with /_node/stats/pipelines API to identify bottlenecks.
Persistent Queues: Enable persistent queues (queue.type: persisted) for at-least-once delivery — if Logstash crashes, queued events survive restart. Configure queue.max_bytes to prevent disk exhaustion. Persistent queues replace the need for external message brokers in moderate-volume deployments.

Beats: Lightweight Data Shippers for Every Data Source

Deploy purpose-built data collectors across your infrastructure:

Filebeat: The most common Beat — tails log files, handles log rotation, maintains registry of file positions for exactly-once delivery, and ships to Logstash or Elasticsearch. Configure multiline patterns for Java stack traces (multiline.pattern: '^[[:space:]]') and use modules for pre-built configurations (Nginx, Apache, MySQL, PostgreSQL, System logs).
Metricbeat: Collects system and service metrics — CPU, memory, disk, network (system module), plus application-specific metrics for Kubernetes, Docker, MongoDB, Redis, PostgreSQL, and JVM/JMX. Ships metrics every 10 seconds (configurable) to Elasticsearch with pre-built Kibana dashboards for immediate visualisation.
APM Agent: Elastic APM agents (Node.js, Java, Python, .NET, Go) instrument applications automatically — capturing HTTP transactions, database queries, external API calls, and custom spans. APM data correlates with logs via trace IDs for end-to-end distributed tracing. The Node.js agent requires a single require('elastic-apm-node').start() line.
Heartbeat: Monitors service availability with ICMP, TCP, and HTTP checks — verify endpoints are reachable, check TLS certificate expiry, validate HTTP response codes and body content. Configure uptime monitors for critical services and create Kibana uptime dashboards with SLA calculations.
Elastic Agent: The unified data shipper replacing individual Beats — a single agent managed via Fleet (centralised configuration) that collects logs, metrics, APM data, and security events. Fleet policies push configuration changes to thousands of agents without manual updates. Use Elastic Agent for new deployments; existing Beats installations continue working.

Node.js Application Logging and APM Integration

Instrument Node.js microservices for comprehensive observability:

Winston with ECS Format: Use Winston logger with @elastic/ecs-winston-format for Elastic Common Schema (ECS) compliance — structured JSON logs with standardised field names (log.level, message, service.name, trace.id). ECS format ensures consistent field mapping across Node.js, Java, and Python services in the same Elasticsearch cluster.
Pino for High Performance: For high-throughput services, use Pino logger (5× faster than Winston) with pino-elasticsearch transport for direct Elasticsearch ingestion. Pino's async logging avoids blocking the event loop during log serialisation. Use pino-pretty for development and ECS format for production.
Correlation IDs: Propagate request correlation IDs (trace IDs) through Express/Fastify middleware — generate a UUID at the API gateway, pass via x-correlation-id header, and include in every log line. This enables filtering all logs for a single request across multiple microservices in Kibana. Elastic APM auto-generates trace IDs when using the agent.
Error Tracking: Log unhandled exceptions and promise rejections with full stack traces — configure process.on('uncaughtException') and process.on('unhandledRejection') to capture and ship error details before process exit. Include request context (URL, user ID, request body) in error logs for faster debugging.
Structured Log Context: Add request metadata to every log line using cls-hooked (continuation-local storage) or AsyncLocalStorage — HTTP method, URL, user ID, response time, and status code are automatically attached without passing logger instances through every function call.

Expert Solutions for Java & JVM

Need help with Java & JVM? Our engineering team builds production-ready solutions tailored to your enterprise workflows.

Book a free consultation

Java Application Logging: Logback, SLF4J, and JVM Metrics

Configure Java services for ELK-compatible structured logging:

Logback with ECS Encoder: Use co.elastic.logging:logback-ecs-logging for Elastic Common Schema output — JSON-formatted logs with ECS field names, automatic MDC (Mapped Diagnostic Context) inclusion, and stack trace serialisation. Configure in logback-spring.xml with EcsEncoder replacing PatternLayoutEncoder for production profiles.
SLF4J MDC for Context: Use SLF4J's MDC (Mapped Diagnostic Context) to attach request-scoped metadata — MDC.put("traceId", traceId) in a servlet filter or Spring interceptor. All subsequent log lines in the request thread include the trace ID, user ID, and session ID without explicit passing. Clear MDC in a finally block to prevent context leaking between requests.
Spring Boot Actuator Metrics: Export JVM metrics (heap usage, GC pauses, thread counts, connection pool stats) via Micrometer + Elastic registry — metrics ship to Elasticsearch for Kibana dashboards alongside logs. Monitor GC pause times (G1GC target: <200ms), heap pressure (>80% triggers investigation), and thread pool saturation.
Log4j2 Async Logging: For high-throughput Java services, use Log4j2 with LMAX Disruptor for lock-free asynchronous logging — 10× throughput improvement over synchronous logging. Configure AsyncLogger with RingBufferSize=262144 and WaitStrategy=busySpin for lowest latency at the cost of CPU usage.
Exception Grouping: Configure Elastic APM Java agent for automatic exception grouping — similar exceptions cluster together rather than creating thousands of individual error entries. Group by exception class, message pattern, and stack trace fingerprint. Set capture_body: all for HTTP request body capture during error investigation.

Kibana Dashboards: Visualization, Alerting, and SIEM

Build operational dashboards that provide actionable insights:

Dashboard Architecture: Create layered dashboards — Overview (system health, error rates, request volumes across all services), Service-Level (per-service latency, throughput, error breakdown), Infrastructure (CPU, memory, disk, network per node), and Investigation (log search, trace analysis, error deep-dive). Use dashboard drill-down links to navigate from overview to detail.
Key Visualisations: Use TSVB (Time Series Visual Builder) for real-time metric trends, Lens for drag-and-drop chart creation, Vega for custom visualisations (heatmaps, Sankey diagrams), and Maps for geolocation data. Create saved searches with KQL (Kibana Query Language) for common investigation patterns.
Alerting Rules: Configure Elastic alerting (formerly Watcher) for operational alerts — error rate exceeding threshold (>1% 5xx responses), latency degradation (p99 > 2s for 5 minutes), disk space warnings (<20% free), and log volume anomalies (ML-based). Route alerts to Slack, PagerDuty, email, or webhook endpoints with severity-based escalation.
Machine Learning Anomaly Detection: Elastic ML automatically detects anomalies in time-series data — unusual traffic patterns, error rate spikes, latency outliers, and log volume changes. Create ML jobs from Kibana with no data science expertise — the platform learns normal patterns and alerts on deviations.
Elastic Security (SIEM): Use Elastic Security for security information and event management — correlate security events across network, endpoint, and application logs. Pre-built detection rules identify common threats (brute force attacks, data exfiltration, privilege escalation). The Timeline investigation tool enables security analysts to piece together attack narratives.

Cluster Scaling, Security, and MDS ELK Services

Operate production ELK clusters with enterprise-grade reliability:

Horizontal Scaling: Add data nodes to increase storage and query capacity — Elasticsearch automatically rebalances shards across new nodes. Use hot-warm-cold architecture with different hardware tiers: NVMe SSDs for hot nodes (recent data, fast queries), HDDs for warm/cold nodes (historical data, lower cost). Scale Logstash horizontally behind a load balancer for ingestion throughput.
Security Configuration: Enable TLS encryption for all inter-node communication (xpack.security.transport.ssl) and HTTP API access (xpack.security.http.ssl). Configure RBAC (role-based access control) — read-only roles for dashboards, write roles for ingestion, and admin roles for cluster management. Integrate with LDAP/Active Directory or SAML for enterprise SSO.
Backup and Recovery: Configure snapshot repositories (S3, GCS, Azure Blob) for automated cluster backups — daily snapshots with weekly full snapshots and daily incrementals. Test restore procedures regularly. Use Searchable Snapshots to query archived data directly from object storage without restoring to the cluster.
Cost Optimisation: Reduce storage costs with ILM-driven tiering, force_merge on read-only indices to reduce segment count, best_compression codec for cold indices (40% smaller), and field data type optimisation (keyword vs text, scaled_float vs double). Monitor index storage with _cat/indices?s=store.size:desc.

MetaDesign Solutions delivers ELK Stack implementation and managed observability services — from cluster architecture design and Logstash pipeline development through Kibana dashboard creation, Node.js/Java integration, alerting configuration, security hardening, and ongoing cluster management for organisations building comprehensive monitoring across distributed systems.

Setting Up a Complete ELK Stack to Monitor Distributed Systems with Node.js and Java

Introduction: Why the ELK Stack Powers Modern Observability

Elasticsearch Cluster Architecture and Index Design

Logstash Pipeline Design: Input, Filter, and Output

Beats: Lightweight Data Shippers for Every Data Source

Node.js Application Logging and APM Integration

Expert Solutions for Java & JVM

Java Application Logging: Logback, SLF4J, and JVM Metrics

Kibana Dashboards: Visualization, Alerting, and SIEM

Cluster Scaling, Security, and MDS ELK Services

Frequently Asked Questions

Let's build something great together.

Setting Up a Complete ELK Stack to Monitor Distributed Systems with Node.js and Java

Introduction: Why the ELK Stack Powers Modern Observability

Elasticsearch Cluster Architecture and Index Design

Logstash Pipeline Design: Input, Filter, and Output

Beats: Lightweight Data Shippers for Every Data Source

Node.js Application Logging and APM Integration

Expert Solutions for Java & JVM

Java Application Logging: Logback, SLF4J, and JVM Metrics

Kibana Dashboards: Visualization, Alerting, and SIEM

Cluster Scaling, Security, and MDS ELK Services

Frequently Asked Questions

Related Articles

Efficiently Architecting a Resilient Distributed System with Node.js and Java

Concurrency and Multithreading in Java: Best Practices and Pitfalls

DevOps and Java: How a Top Java Development Company Accelerates Deployment Cycles

Let's build something great together.