Skip to content

Conversation

@shashank-reddy-nr
Copy link
Contributor

Add OpenTelemetry Kafka Monitoring Documentation

📋 Overview

This PR adds comprehensive documentation for monitoring Apache Kafka using OpenTelemetry. The documentation covers self-hosted and Kubernetes (Strimzi) deployments with detailed configuration examples, metrics reference, and best practices.

🎯 What's Changed

New Documentation Pages

Created 4 new documentation pages under /docs/opentelemetry/integrations/kafka/:

Page Description
overview.mdx High-level overview of monitoring approaches, architecture, and data viewing options
self-hosted.mdx Complete guide for monitoring self-hosted Kafka with systemd service configuration
kubernetes-strimzi.mdx Kubernetes deployment guide with Strimzi operator integration and dynamic pod discovery
metrics-reference.mdx Comprehensive metrics reference with 250+ metrics, NRQL queries, and alert examples

Documentation Structure

/docs/opentelemetry/integrations/kafka/
├── overview.mdx # Overview and architecture
├── self-hosted.mdx # Self-hosted deployment
├── kubernetes-strimzi.mdx # Kubernetes/Strimzi deployment
└── metrics-reference.mdx # Complete metrics reference

🔑 Key Features

Metrics Collection

  • Kafka Metrics Receiver: Cluster-level metrics (13 metrics)

    • Broker count, topic/partition metrics, consumer lag
  • JMX Receiver (Default): Broker and cluster metrics (21 metrics)

    • Message counts, request metrics, network I/O, partition health, controller status
  • JMX Receiver (Custom): Extended metrics (17 metrics)

    • Per-topic byte rates, cluster-level counts, broker-specific metrics, JVM metrics
  • Client Metrics: Application-level observability (200+ metrics)

    • Producer/consumer metrics via OpenTelemetry Java Agent instrumentation

Deployment Options

Self-hosted: Linux systemd service with dual-pipeline architecture
Kubernetes: Strimzi integration with receiver_creator for dynamic broker discovery
Client instrumentation: Zero-code Java Agent instrumentation for producers/consumers

Advanced Configuration

  • Dual-pipeline architecture: Separates broker-specific from cluster-level metrics to prevent duplication
  • Metric filtering: Cluster metrics sent without broker.id attribute
  • Topic aggregation: Automatic rollup of partition metrics by topic
  • OTLP receiver: Accepts telemetry from instrumented applications

📁 Files Changed

Added

  • /docs/opentelemetry/integrations/kafka/overview.mdx (new)
  • /docs/opentelemetry/integrations/kafka/self-hosted.mdx (new)
  • /docs/opentelemetry/integrations/kafka/kubernetes-strimzi.mdx (new)
  • /docs/opentelemetry/integrations/kafka/metrics-reference.mdx (new)
  • 5 dashboard screenshots in /static/images/

Modified

  • /docs/infrastructure/host-integrations/host-integrations-list/kafka/index.mdx
  • /src/nav/infrastructure.yml
  • /src/nav/opentelemetry.yml
  • /src/nav/message-queues-streaming.yml

🗺️ Navigation Updates

Updated navigation across three files for better discoverability:

infrastructure.yml - Under "On-host integrations"

- title: Kafka integration
  pages:
    - title: Overview
    - title: OpenTelemetry
      pages:
        - OTel Kafka overview
        - Self-hosted Kafka
        - Kubernetes (Strimzi)
        - Metrics reference

opentelemetry.yml - Under "Integrations"

- title: Kafka integration
  pages:
    - Overview
    - Self-hosted
    - Kubernetes (Strimzi)
    - Metrics reference

message-queues-streaming.yml - Under "Install integrations"

- title: Kafka
  pages:
    - OTel Kafka overview
    - Self-hosted Kafka
    - Kubernetes (Strimzi)
    - Metrics reference

📸 Screenshots Added

Filename Description
infrastructure_screenshot-crop_otel-kafka-dashboard-clusters-view.webp Clusters overview dashboard
infrastructure_screenshot-crop_otel-kafka-entity-explorer.webp Entity explorer with broker/cluster/topic entities
infrastructure_screenshot-crop_otel-kafka-Q&S-summary.webp Queues & Streams summary view
infrastructure_screenshot-crop_otel-kafka-third-party-integrations.webp Third-party services integration view
infrastructure_screenshot-crop_otel-kafka-dashboard-brokers-view.webp Brokers dashboard view

🎨 Kafka Index Page Improvements

Updated /docs/infrastructure/host-integrations/host-integrations-list/kafka/index.mdx:

  • ✅ Reorganized to feature OpenTelemetry monitoring approach
  • ✅ Added multiple data viewing paths:
    • Entity explorer (broker, cluster, topic entities)
    • Queues & Streams (provider=opentelemetry filter)
    • Third-party services (Kafka OpenTelemetry)
    • Pre-built dashboards
  • ✅ Updated NRQL query examples with proper filtering
  • ✅ Enhanced "Get help" section with OpenTelemetry resources

🔧 Technical Details

Dual-Pipeline Architecture

Broker Pipeline (includes broker.id):

metrics/broker:
  receivers: [receiver_creator, kafkametrics]
  processors: [filter/exclude_cluster_metrics, ...]
  exporters: [otlp]

Cluster Pipeline (removes broker.id):

metrics/cluster:
  receivers: [receiver_creator]
  processors: [filter/include_cluster_metrics, transform/remove_broker_id, ...]
  exporters: [otlp]

Client Telemetry Flow

Instrumented Apps → OTLP (4318) → OTel Collector → New Relic
(no api-key)                      (adds api-key)

Metrics Coverage Matrix

Source Broker Cluster Topic Partition Consumer JVM
Kafka receiver
JMX (default)
JMX (custom)
Client metrics

📚 Example Configurations

Self-hosted: Dual-Pipeline Setup

  • Systemd service configuration
  • Custom JMX metrics for per-topic and cluster metrics
  • Metric filtering and aggregation
  • Java Agent instrumentation examples

Kubernetes: Dynamic Broker Discovery

  • Receiver creator with K8s observer
  • Automatic pod discovery using Strimzi labels
  • RBAC configuration (ServiceAccount, ClusterRole, ClusterRoleBinding)
  • OTLP receiver for application telemetry
  • Application instrumentation with init containers

Metrics Reference

  • 13 Kafka receiver metrics (organized by category)
  • 21 default JMX metrics
  • 17 custom JMX metrics
  • 200+ client metrics (producer/consumer)
  • NRQL query examples for each category
  • Alert condition templates

✅ Testing Checklist

  • All internal links verified
  • Navigation renders correctly in all three locations
  • Code blocks use correct syntax
  • NRQL queries tested against New Relic
  • Screenshots display correctly
  • MDX collapsible sections work
  • External OpenTelemetry links valid
  • Configuration examples tested in lab environment

🚫 Breaking Changes

None. This is purely additive documentation.

🔍 SEO & Discoverability

Multiple navigation references improve discoverability without creating duplicate content:

  • Same canonical URLs referenced from different navigation contexts
  • Improves UX by placing docs where users expect them
  • Follows best practices from AWS, Azure, and other technical documentation

📖 Related Documentation

👥 Review Focus Areas

  1. Technical accuracy: Configuration examples and metric definitions
  2. Navigation structure: Cross-referencing across three nav files
  3. User journey: From overview → deployment → metrics → alerts
  4. Code examples: Kubernetes manifests and systemd configurations
  5. NRQL queries: Query accuracy and filter usage

📝 Notes for Reviewers

  • Documentation follows the "progressive disclosure" pattern: overview → specific guides → detailed reference
  • Collapsible sections in metrics reference improve readability
  • Dual-pipeline architecture is essential for accurate cluster-level metric reporting
  • Client instrumentation is optional but provides valuable application-level insights
  • All paths use the new /docs/opentelemetry/integrations/kafka/ location

Deployment: Self-hosted Linux and Kubernetes (Strimzi)
Monitoring Scope: Broker, cluster, topic, partition, consumer group, and application metrics
Integration Method: OpenTelemetry Collector + optional Java Agent instrumentation


This version removes all mentions of NRI-Kafka, nrdot, and New Relic Native integration while keeping the focus purely on the OpenTelemetry documentation additions.This version removes all mentions of NRI-Kafka, nrdot, and New Relic Native integration while keeping the focus purely on the OpenTelemetry documentation additions.

@shashank-reddy-nr shashank-reddy-nr requested a review from a team as a code owner December 24, 2025 07:23
@github-actions
Copy link

Hi @shashank-reddy-nr 👋

Thanks for your pull request! Your PR is in a queue, and a writer will take a look soon. We generally publish small edits within one business day, and larger edits within three days.

Please ensure the propsed changes look good by building it first in your local environment. Refer to this contribution guide to get the site up and running in your local.

If you really require a preview url, reach out to one of the writers and they will generate one for you.

@shashank-reddy-nr shashank-reddy-nr marked this pull request as draft December 24, 2025 07:23
@PallaviWrite
Copy link
Contributor

netlify build fork

@svc-docs-eng-opensource-bot
Copy link
Contributor

✅ Your PR has been mirrored to our repository as PR #22589.
Commit: 194009cf1c4052871c6434c924f394b202bdaf6b (194009c)
Our workflows will run in the mirrored PR linked above.
🚀 If the build is successful, a Netlify preview will be available shortly at: https://shashank-reddy-nr-add-otel-kafka-docs--docs-website-netlify.netlify.app

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants