Agent-to-gateway deployment pattern

Learn why and how to create a Collector deployment that combines agents and gateways

You are viewing the English version of this page because it has not yet been fully translated. Interested in helping out? See Contributing.

Agents and gateways solve different problems. By combining them in your deployment, you can create an observability architecture that addresses the following issues:

Separation of concerns: Avoid placing complex configuration and processing logic on every machine or in every node. Agent configurations stay small and focused, while central processors handle the heavier collection tasks.
Scalable cost control: Make better sampling and batching decisions in gateways that can receive telemetry from multiple agents. Gateways can see the full picture, including complete traces, and can be independently scaled.
Security and stability: Send telemetry over local networks from agents to gateways. Gateways become a stable egress point that can handle retries and manage credentials.

Example agent-to-gateway architecture

The following diagram shows an architecture for a combined agent-to-gateway deployment:

Agent collectors run on each host in a DaemonSet pattern and collect telemetry from services running on the host as well as the host’s own telemetry, with load balancing.
Gateway collectors receive data from agents, perform centralized processing, such as filtering and sampling, and then export the data to backends.
Applications communicate with local agents using the internal host network, agents communicate with gateways over the internal cluster network, and gateways securely communicate with external backends using TLS.

graph TB
    subgraph "Local Networks"
        subgraph "Host 1"
            App1[Application]
            Agent1["Agent 1"]
        end

        subgraph "Host 2"
            App2[Application]
            Agent2["Agent 2"]
        end

        subgraph "Host N"
            AppN[Application]
            AgentN["Agent 3"]
        end
    end

    subgraph "Cluster Network"
        subgraph "Gateway Tier"
            Gateway1["Gateway 1"]
            Gateway2["Gateway 2"]
        end
    end

    subgraph "External Network"
        Backend["Observability<br/>backend"]
    end

    App1 -->|"OTLP<br/>(local)"| Agent1
    App2 -->|"OTLP<br/>(local)"| Agent2
    AppN -->|"OTLP<br/>(local)"| AgentN

    Agent1 -->|"OTLP/gRPC<br/>(internal)"| Gateway1
    Agent1 -.->|"load balancing<br/>for tail sampling"| Gateway2
    Agent2 -->|"OTLP/gRPC<br/>(internal)"| Gateway1
    Agent2 -.->|"load balancing<br/>for tail sampling"| Gateway2
    AgentN -->|"OTLP/gRPC<br/>(internal)"| Gateway2

    Gateway1 -->|"OTLP/gRPC<br/>(TLS)"| Backend
    Gateway2 -->|"OTLP/gRPC<br/>(TLS)"| Backend

When to use this pattern

The agent-to-gateway pattern adds operational complexity compared to simpler deployment options. Use this pattern when you need one or more of the following capabilities:

Host-specific data collection: You need to collect metrics, logs, or traces that are only available on the host where your applications run, such as host metrics, system logs, or resource detection. For example, receivers like the hostmetricsreceiver or filelogreceiver must be unique per host instance. Running multiple instances of these receivers on the same host results in duplicate data. Similarly, the resourcedetectionprocessor adds information about the host where both the Collector and the application are running. Running this processor in a Collector on a separate machine from the application results in incorrect data.
Centralized processing: You want to perform complex processing operations, such as tail-based sampling, advanced filtering, or data transformation, in a central location rather than on every host.
Network isolation: Your applications run in a restricted network environment where only specific egress points can communicate with external backends.
Cost optimization at scale: You need to make sampling decisions based on complete trace data or perform aggregation across multiple sources before sending data to backends.

When simpler patterns work better

You might not need the agent-to-gateway pattern if:

Your applications can send telemetry directly to backends using OTLP.
You don’t need to collect host-specific metrics or logs.
You don’t require complex processing like tail-based sampling.
You’re running a small deployment where operational simplicity is more important than the benefits this pattern provides.

For simpler use cases, consider using only agents or only gateways.

Configuration examples

The following examples show typical configurations for agents and gateways in an agent-to-gateway deployment.

警告

While it is generally preferable to bind endpoints to localhost when all clients are local, our example configurations use the “unspecified” address 0.0.0.0 as a convenience. The Collector defaults to localhost. For details concerning either of these choices as endpoint configuration value, see Protect against denial of service attacks.

Example agent configuration without load balancing

This example shows an agent configuration that collects application telemetry and host metrics, then forwards them to a gateway. If you plan to tail sample, convert cumulative metrics to delta, or need data-aware routing for another reason, see the next configuration for an example with data-aware load balancing.

receivers:
  # Receive telemetry from applications
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

  # Collect host metrics
  hostmetrics:
    scrapers:
      cpu:
      memory:
      disk:
      filesystem:
      network:

processors:
  # Detect and add resource attributes about the host
  resourcedetection:
    detectors: [env, system, docker]
    timeout: 5s

  # Prevent memory issues
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

exporters:
  # Send to gateway
  otlp_grpc:
    endpoint: otel-gateway:4317
    # Absorb short gateway outages
    sending_queue:
      batch:
        sizer: items
        flush_timeout: 1s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, resourcedetection]
      exporters: [otlp]
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [memory_limiter, resourcedetection]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, resourcedetection]
      exporters: [otlp]

Example agent configuration with load balancing

This example configures an agent to use the load balancing exporter, routing telemetry based on traceID. Data-aware routing is necessary for some processing, including tail-based sampling and cumulative-to-delta metric conversions.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

exporters:
  # Load balance by trace ID
  loadbalancing:
    resolver:
      dns:
        hostname: otel-gateway-headless
        port: 4317
    routing_key: traceID
    sending_queue:
      batch:
        sizer: items
        flush_timeout: 1s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [loadbalancing]

Example gateway configuration

This example shows a gateway configuration that receives data from agents, performs tail sampling, and exports to backends:

receivers:
  # Receive from agents
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  # Prevent memory issues with higher limits
  memory_limiter:
    check_interval: 1s
    limit_mib: 2048

  # Optional: tail-based sampling
  tail_sampling:
    policies:
      # Always sample traces with errors
      - name: errors-policy
        type: status_code
        status_code: { status_codes: [ERROR] }
      # Sample 10% of other traces
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: { sampling_percentage: 10 }

exporters:
  # Export to your observability backend
  otlp_grpc:
    endpoint: your-backend:4317
    headers:
      api-key: ${env:BACKEND_API_KEY}
    # Absorb backend outages
    sending_queue:
      batch:
        sizer: items
        flush_timeout: 10s

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [otlp]
    logs:
      receivers: [otlp]
      processors: [memory_limiter]
      exporters: [otlp]

Processors in agents and gateways

In an agent-to-gateway pattern, process telemetry with care to ensure the accuracy of your data.

Recommended processing

Both agents and gateways should include:

Memory limiter processor: This processor prevents out-of-memory issues by applying backpressure when memory usage is high. Configure this as the first processor in your pipeline. Agents typically need smaller limits, while gateways require more memory for batching and sampling operations. Adjust the limits based on the requirements of your workloads and your available resources.
Batching: You can improve efficiency by batching telemetry data before export. Configure agents with smaller batch sizes and shorter timeouts to minimize latency and memory usage. Configure gateways with larger batch sizes and longer timeouts for better throughput and backend efficiency.

Sampling considerations

Probabilistic sampling: When using probabilistic sampling across multiple collectors, ensure they use the same hash seed for consistent sampling decisions.
Tail-based sampling: Configure tail-based sampling on gateways only. The processor must see all spans from a trace to make sampling decisions. Use the loadbalancingexporter in your agents to distribute traces by trace ID to your gateway instances.
注意
The tail-sampling processor can make accurate decisions only if all spans for a trace arrive at the same Collector instance. While the load balancing exporter supports routing by trace ID, running tail sampling across multiple gateway instances is an advanced setup and has practical caveats, such as re-splitting of routing when backends change and cache/decision consistency. Test carefully and prefer a single well-resourced tail-sampling gateway unless you have a robust sticky-routing strategy.

Example tail sampling architecture

The following diagram shows how trace-ID-based load balancing works with tail-based sampling across multiple gateway instances.

The loadbalancingexporter uses traceID to determine which gateway receives the spans

All spans from traceID 0xf39 (from any agent) route to Gateway 1.
All spans from traceID 0x9f2 (from any agent) route to Gateway 2.
All spans from traceID 0x31c (from any agent) route to Gateway 3.

This configuration ensures each gateway sees all spans for a trace, enabling accurate tail-based sampling decisions.

graph LR
    subgraph Applications
        A1[App 1]
        A2[App 2]
        A3[App 3]
    end

    subgraph "Agent Collectors (DaemonSet)"
        AC1[Agent 1<br/>loadbalancing]
        AC2[Agent 2<br/>loadbalancing]
        AC3[Agent 3<br/>loadbalancing]
    end

    subgraph "Gateway Collectors"
        GC1[Gateway 1<br/>tail_sampling]
        GC2[Gateway 2<br/>tail_sampling]
        GC3[Gateway 3<br/>tail_sampling]
    end

    subgraph Backends
        B1[Observability<br/>backend]
    end

    A1 -->|OTLP| AC1
    A2 -->|OTLP| AC2
    A3 -->|OTLP| AC3

    AC1 -->|traceID 0xf39| GC1
    AC1 -->|traceID 0x9f2| GC2
    AC1 -->|traceID 0x31c| GC3

    AC2 -->|traceID 0xf39| GC1
    AC2 -->|traceID 0x9f2| GC2
    AC2 -->|traceID 0x31c| GC3

    AC3 -->|traceID 0xf39| GC1
    AC3 -->|traceID 0x9f2| GC2
    AC3 -->|traceID 0x31c| GC3

    GC1 -->|OTLP| B1
    GC2 -->|OTLP| B1
    GC3 -->|OTLP| B1

Other processing considerations

Cumulative-to-delta calculations: Cumulative-to-delta metric processing requires data-aware load balancing because the calculation is only accurate if all points of a given metric series reach the same gateway Collector. When using the cumulativetodelta processor in an agent-to-gateway deployment, make sure to send each metric stream to a single Collector.

Communication between agents and gateways

Agents need to reliably send telemetry data to gateways. Configure the communication protocol, endpoints, and security settings appropriately for your environment.

Protocol selection

Use the OTLP protocol for communication between agents and gateways. OTLP provides the best compatibility across the OpenTelemetry ecosystem. Configure the OTLP exporter in your agents to send data to the OTLP receiver in your gateways.

In Kubernetes environments, use service names for endpoint configuration. For example, if your gateway service is named otel-gateway, configure your agent exporter with endpoint: otel-gateway:4317.

Retries

Configure exporter queue and retry settings (for example, retry_on_failure or sending_queue settings) on agents and gateways to handle temporary outages between agents and gateways or between gateways and backends. Gateways often need larger queues and retry policies to handle backend outages. Also consider setting a max_size for batches to avoid transient backend rejections due to oversized payloads.

Scaling agents and gateways

As your telemetry volume grows, you need to scale your Collectors appropriately. Agents and gateways have different scaling characteristics and requirements.

Agents

Agents typically don’t require horizontal scaling because they run on each host. Instead, scale agents vertically by adjusting resource limits. You can monitor CPU and memory usage through Collector internal metrics.

Gateways

You can scale gateways both vertically and horizontally:

Without tail sampling: Use any load balancer or Kubernetes service with round-robin distribution. All gateway instances operate independently.
注記
When scaling gateway instances that export metrics, ensure your deployment follows the single-writer principle to avoid multiple Collectors writing the same time series concurrently. See the gateway deployment documentation for details.
With tail sampling: Deploy agents with the loadbalancingexporter to route spans by trace ID and ensure all spans for a trace go to the same gateway instance.

For automatic scaling in Kubernetes, use Horizontal Pod Autoscaling (HPA) based on CPU or memory metrics. Configure the HPA to scale gateways based on your workload patterns.

Additional resources

For more information, see the following documentation:

フィードバック

このページは役に立ちましたか?

Thank you. Your feedback is appreciated!

Please let us know how we can improve this page. Your feedback is appreciated!

最終更新 March 16, 2026: Fix cSpell (9239966a)