OpenTelemetry Internals: Adding Node Name as a Label to Kubelet Metrics

December 19, 2023

Preface

The goal of this article is to take a simple use case, take a look at how you can solve it with OpenTelemetry, and maybe learn a bit about the inner workings of OpenTelemetry ecosystem.

Let’s say you want to use Kubelet Stats Receiver to collect node, pod, container, and volume metrics from the API server and sends it down the metric pipeline for further processing.

Seems quite straightforward, even without using the presets that are shipped by most OTel collector distributions!

As always in case of OTel, I always recommend checking out the GitHub repositories if you want to find the most up-to-date examples/tests and documentation on the various components in the ecosystem. In this case, let’s use this example from the docs.

Actually, this is the same as one the preset will also autogenerate for you.

receivers:
  kubeletstats:
    auth_type: serviceAccount
    collection_interval: 20s
    endpoint: ${env:K8S_NODE_NAME}:10250

Then, you can create a metrics pipeline, where you can use this receiver, and expose the metrics via a prometheusexporter.

While the name can be misleading to many, prometheusexporter is just simply exposing the metrics on a port that you specify in the exporter’s configuration. Many people expect it to push metrics to a destination, but that’s not what it does.

Here’s a pipeline (and the aforementioned exporter) to achieve this:

exporters:
  prometheus:
    endpoint: 0.0.0.0:9090

service:
  pipelines:
    metrics:
      receivers:
      - kubeletstats
      processors: {}
      exporters:
      - prometheus

The problem

Now, you can port-forward the collector pod, and check its /metrics endpoint.

# HELP k8s_node_cpu_time_seconds_total Node CPU time
# TYPE k8s_node_cpu_time_seconds_total counter
k8s_node_cpu_time_seconds_total 50263.641419
# HELP k8s_node_cpu_utilization_ratio Node CPU utilization
# TYPE k8s_node_cpu_utilization_ratio gauge
k8s_node_cpu_utilization_ratio 0.226553353
# HELP k8s_node_filesystem_available_bytes Node filesystem available
# TYPE k8s_node_filesystem_available_bytes gauge
k8s_node_filesystem_available_bytes 1.1995189248e+10
# HELP k8s_node_filesystem_capacity_bytes Node filesystem capacity
# TYPE k8s_node_filesystem_capacity_bytes gauge
k8s_node_filesystem_capacity_bytes 1.01390114816e+11
...

Kubelet Stats Receiver works in the context of nodes, as kubelet is the node level agent, running on all the nodes of a Kubernetes cluster, and exposing various telemetry on port 10255 (by default).

Notice that this is the same port that we specified when we configured the endpoint for in our receiver config previously.

You get all the metrics you were looking for, but notice that no additional label gets added, so once you ingest these metrics into Prometheus (or any OTel/remote_write compatible backend), you might not be able to get anything meaningful out of these metrics in their current form. Enriching these with e.g. the name of node where these originate can be beneficial to build queries that will show you which nodes are running out of disk space.

To solve the issue above, you can add a label, e.g. k8s_node_name, so the metrics will be enriched with the required context.

Resource Attributes vs. Attributes

At this point we can start talking about the Metric data model and resource attributes.

Let’s take a look at this visual illustration here:

//     Metric
//  +------------+
//  |name        |
//  |description |
//  |unit        |     +------------------------------------+
//  |data        |---> |Gauge, Sum, Histogram, Summary, ... |
//  +------------+     +------------------------------------+
//
//    Data [One of Gauge, Sum, Histogram, Summary, ...]
//  +-----------+
//  |...        |  // Metadata about the Data.
//  |points     |--+
//  +-----------+  |
//                 |      +---------------------------+
//                 |      |DataPoint 1                |
//                 v      |+------+------+   +------+ |
//              +-----+   ||label |label |...|label | |
//              |  1  |-->||value1|value2|...|valueN| |
//              +-----+   |+------+------+   +------+ |
//              |  .  |   |+-----+                    |
//              |  .  |   ||value|                    |
//              |  .  |   |+-----+                    |
//              |  .  |   +---------------------------+
//              |  .  |                   .
//              |  .  |                   .
//              |  .  |                   .
//              |  .  |   +---------------------------+
//              |  .  |   |DataPoint M                |
//              +-----+   |+------+------+   +------+ |
//              |  M  |-->||label |label |...|label | |
//              +-----+   ||value1|value2|...|valueN| |
//                        |+------+------+   +------+ |
//                        |+-----+                    |
//                        ||value|                    |
//                        |+-----+                    |
//                        +---------------------------+

Source: https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto

In short, we have metrics, described by their metadata, description, etc, and their actual data. The data contains datapoints, and these datapoints are the same key-value pairs that might be familiar to you if you are familiar with Prometheus data model.

Take a look at this excerpt from the OTel data model:

“Metric streams are grouped into individual Metric objects, identified by:

The originating Resource attributes
The instrumentation Scope (e.g., instrumentation library name, version)
The metric stream’s name”

You can find more information here. You can see that we have additional data to describe the metrics, such as resource attributes and scope.

Here, we are interested in Resource attributes that are additional metadata that desribes resources emitting the telemetry data. These are usually added by the instrumentation layer, and in the case of the collector, the receivers are responsible for adding them.

You can find the list of attributes added autmatically by our receiver here: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/kubeletstatsreceiver/documentation.md#resource-attributes

If you enable a logging exporter, and add it to your pipeline, you can see how the data model gets populated with actual telemetry data.

exporters:
  logging:
    loglevel: debug

Using a logging exporter like the one above is something I recommend, as you can easily drop it into pipelines that you want to troubleshoot, and check whether everything looks as expected or not in your pipelines.

In the logs of the pod you will see that for any given metric, you have a Resource attributes section, where the corresponding attributes appear.

Resource SchemaURL:
Resource attributes:
     -> k8s.node.name: Str(k3d-playground-server-0)
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope otelcol/kubeletstatsreceiver 0.89.0
Metric #0
Descriptor:
     -> Name: k8s.node.cpu.time
     -> Description: Node CPU time
     -> Unit: s
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
StartTimestamp: 2023-12-07 17:11:35 +0000 UTC
Timestamp: 2023-12-15 13:48:09.316153636 +0000 UTC
Value: 54347.794286

In this case, we have k8s.node.name, since the metric in question is k8s.node.cpu.time, so the container specific attributes should not appear (like they would in case of container_memory_usage_bytes for example).

If you check the diagram above, you see that resource attributes are not the same as regular attributes. The former ones describe the metric object, while the latter ones describe the datapoints (as key-value pairs).

The solution

We want to have this information as a Prometheus label, so we need to add it to all datapoints, since that’s where the Prometheus data model will expose it.

There’s a processor that can help with this problem, and it’s called transformprocessor.

Here’s the processor with the configuration that we need to implement this:

processors:
  transform/node-name:
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["k8s_node_name"], resource.attributes["k8s.node.name"])

We are transforming metrics, hence we are using metric_statements. Then, we need to apply this in the context of datapoint, because contexts NEVER supply access to individual items lower in the protobuf definition, so we cannot use metric “as statements associated to a metric WILL NOT be able to access individual datapoints, but can access the entire datapoints slice”.

Finally, the actual statement is straightforward: we take the resource attribute called k8s.node.name that we already is available because of our receiver, and set as an attribute with the same name.

This is the final configuration:

receivers:
  kubeletstats:
    auth_type: serviceAccount
    collection_interval: 20s
    endpoint: ${env:K8S_NODE_NAME}:10250
processors:
  transform/node-name:
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["k8s_node_name"], resource.attributes["k8s.node.name"])
exporters:
  logging:
    loglevel: debug
  prometheus:
    endpoint: 0.0.0.0:9090

service:
  pipelines:
    metrics:
      receivers:
      - kubeletstats
      processors:
      - transform/node_name
      exporters:
      - logging
      - prometheus

Now, if you check you metrics again, you should have the desired label and its value populated with the name of the node.

# HELP k8s_node_cpu_time_seconds_total Node CPU time
# TYPE k8s_node_cpu_time_seconds_total counter
k8s_node_cpu_time_seconds_total{k8s_node_name="k3d-playground-server-0"} 50263.641419
# HELP k8s_node_cpu_utilization_ratio Node CPU utilization
# TYPE k8s_node_cpu_utilization_ratio gauge
k8s_node_cpu_utilization_ratio{k8s_node_name="k3d-playground-server-0"} 0.226553353
# HELP k8s_node_filesystem_available_bytes Node filesystem available
# TYPE k8s_node_filesystem_available_bytes gauge
k8s_node_filesystem_available_bytes{k8s_node_name="k3d-playground-server-0"} 1.1995189248e+10
# HELP k8s_node_filesystem_capacity_bytes Node filesystem capacity
# TYPE k8s_node_filesystem_capacity_bytes gauge
k8s_node_filesystem_capacity_bytes{k8s_node_name="k3d-playground-server-0"} 1.01390114816e+11
...

You can also take a look at the logs of your collector, as that should also have the same data added as a data point attribute:

Resource SchemaURL:
Resource attributes:
     -> k8s.node.name: Str(k3d-playground-server-0)
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope otelcol/kubeletstatsreceiver 0.89.0
Metric #0
Descriptor:
     -> Name: k8s.node.cpu.time
     -> Description: Node CPU time
     -> Unit: s
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> k8s_node_name: Str(k3d-playground-server-0)
StartTimestamp: 2023-12-07 17:11:35 +0000 UTC
Timestamp: 2023-12-15 10:06:04.85338542 +0000 UTC
Value: 49965.984025

Conclusion

While the original problem wasn’t especially hard to solve, it gave us the opportunity to learn a bit about how telemetry data is being described and handled when you are using OpenTelemetry, and see how this workflow is different from the traditional Prometheus-based one.