Cap 7: OpenTelemetry Collector
Qué es el Collector
El OTel Collector es un proxy de telemetría: recibe datos de tus aplicaciones, los procesa y los reenvía a uno o más backends. Es vendor-neutral y se configura en YAML.
flowchart LR
APP1[App Python] -->|OTLP/gRPC| COL
APP2[App Node.js] -->|OTLP/HTTP| COL
APP3[App Go] -->|OTLP/gRPC| COL
PROM[Prometheus\nscrape] -->|pull| COL
subgraph COL[OTel Collector]
R[Receivers] --> P[Processors] --> E[Exporters]
end
COL -->|traces| JAE[Jaeger / Tempo]
COL -->|metrics| PRO[Prometheus / Thanos]
COL -->|logs| LOK[Loki / Elasticsearch]
COL -->|all signals| HNY[Honeycomb / Datadog]
Por qué usar el Collector en lugar de exportar directamente:
- Cambias de backend sin redeployar las apps
- Centraliza autenticación y TLS
- Añade procesamiento: filtrado, sampling, enriquecimiento
- Buffer ante caídas del backend
- Un endpoint para múltiples backends simultáneos
Arquitectura: Pipelines
La unidad de configuración es el pipeline. Cada pipeline maneja un tipo de señal (traces, metrics, logs) y conecta receivers → processors → exporters:
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [jaeger, otlp/honeycomb]
metrics:
receivers: [otlp, prometheus]
processors: [memory_limiter, batch]
exporters: [prometheus/remote_write]
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]
Receivers
Los receivers reciben telemetría de las fuentes:
OTLP Receiver (el más común)
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins: ["https://my-app.com"]
Prometheus Receiver (scrape)
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'my-service'
static_configs:
- targets: ['localhost:8080']
scrape_interval: 30s
Otros receivers útiles
receivers:
# Recibir logs de archivos (como Fluentd/Logstash)
filelog:
include: [/var/log/app/*.log]
operators:
- type: json_parser
# Métricas del host
hostmetrics:
collection_interval: 30s
scrapers:
cpu:
memory:
disk:
network:
# Métricas de Docker
docker_stats:
endpoint: unix:///var/run/docker.sock
Processors
Los processors transforman la telemetría en tránsito:
memory_limiter — Control de memoria
Siempre el primero en el pipeline:
processors:
memory_limiter:
limit_mib: 512 # límite total
spike_limit_mib: 128 # margen para spikes
check_interval: 5s
batch — Agrupa antes de exportar
Siempre el último antes del exporter:
processors:
batch:
send_batch_size: 1000
timeout: 5s
send_batch_max_size: 1500
attributes — Modificar atributos
processors:
attributes:
actions:
# Agregar atributo fijo
- key: environment
value: production
action: insert
# Hashear dato sensible
- key: enduser.email
action: hash
# Eliminar atributo
- key: http.request.header.authorization
action: delete
# Renombrar
- key: old.attribute.name
new_key: new.attribute.name
action: update
filter — Descartar señales
processors:
filter:
# Descartar spans de health checks
traces:
span:
- 'attributes["http.route"] == "/health"'
- 'attributes["http.route"] == "/metrics"'
# Descartar métricas con alta cardinalidad en dev
metrics:
metric:
- 'name == "debug.internal.counter"'
resource — Modificar resource attributes
processors:
resource:
attributes:
- key: service.namespace
value: payments
action: insert
- key: cloud.provider
value: aws
action: insert
tail_sampling — Sampling inteligente
El tail sampling decide después de ver el trace completo si retenerlo:
processors:
tail_sampling:
decision_wait: 10s # esperar 10s para ver el trace completo
num_traces: 100000 # traces en buffer
expected_new_traces_per_sec: 1000
policies:
# Siempre retener traces con errores
- name: errors
type: status_code
status_code: {status_codes: [ERROR]}
# Siempre retener traces lentos (>500ms)
- name: slow-traces
type: latency
latency: {threshold_ms: 500}
# 1% del resto
- name: sample-all-else
type: probabilistic
probabilistic: {sampling_percentage: 1}
Exporters
Los exporters envían datos a los backends:
exporters:
# OTLP genérico (Honeycomb, Datadog OTLP, etc.)
otlp:
endpoint: https://api.honeycomb.io:443
headers:
x-honeycomb-team: ${env:HONEYCOMB_API_KEY}
# Jaeger
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
# Prometheus remote write
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
# Loki (logs)
loki:
endpoint: http://loki:3100/loki/api/v1/push
# Debug (consola)
debug:
verbosity: detailed
# Múltiples destinos
otlp/datadog:
endpoint: https://trace.agent.datadoghq.com
headers:
DD-API-KEY: ${env:DD_API_KEY}
Configuración completa de ejemplo
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
memory_limiter:
limit_mib: 512
spike_limit_mib: 128
check_interval: 5s
batch:
send_batch_size: 512
timeout: 5s
filter:
traces:
span:
- 'attributes["http.route"] == "/health"'
resource:
attributes:
- key: deployment.environment
value: ${env:ENV}
action: insert
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
loki:
endpoint: http://loki:3100/loki/api/v1/push
debug:
verbosity: basic
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, filter, resource, batch]
exporters: [otlp/jaeger, debug]
metrics:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [loki]
telemetry:
logs:
level: info
metrics:
address: 0.0.0.0:8888 # métricas del propio Collector
Deployment: Agent vs Gateway
flowchart TD
subgraph HOSTS[Nodos / Pods]
A1[App] --> AG1[Collector\nAgent]
A2[App] --> AG2[Collector\nAgent]
A3[App] --> AG3[Collector\nAgent]
end
AG1 --> GW[Collector\nGateway]
AG2 --> GW
AG3 --> GW
GW --> JAEGER[Jaeger]
GW --> PROM[Prometheus]
GW --> LOKI[Loki]
Agent (sidecar o daemonset) — Cercano a la aplicación. Bajo overhead. Batching y buffering local.
Gateway — Centraliza fan-out a múltiples backends. Tail sampling. Autenticación centralizada.
Levantar con Docker Compose
# docker-compose.yaml
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8888:8888" # Métricas del Collector
environment:
- ENV=development
Distributions: Core vs Contrib
| Distribution | Descripción |
|---|---|
otel/opentelemetry-collector | Core — solo componentes oficiales |
otel/opentelemetry-collector-contrib | Contrib — incluye componentes de la comunidad |
grafana/otelcol-distributions | Distribución de Grafana Labs |
Para la mayoría de casos usar contrib — tiene más receivers y exporters disponibles.