Stacking Thanos Queries

Thanos is one of the most popular solutions if you want to have high-availability and long-term metrics storage in a high traffic multi-cluster environment that is already leveraging the Prometheus ecosystem.

This post will introduce a use case for the Query component of Thanos, which might not be that well-adopted as it should be (despite being listed in official docs and in some of the case studies out there).

Multiple approaches to solve cross-cluster integration

Let’s take a common example, you want to have a centralized monitoring platform with Thanos across multiple (Observee) clusters to have a single plane of monitoring (Observer) for all your infrastructures.

Basically, you have three options regarding architecture:

Let’s take a closer look at the benefits and caveats of these options.

No Thanos components in Observee clusters

The first option might be the most common one based on many GitHub issues and discussions with users, but there’s one thing to pay attention to: it’s entirely based on the Thanos sidecars, and you want to have a dedicated endpoint to each of these.

Sidecar: It implements Thanos’ Store API on top of Prometheus’ remote-read API. This allows Querie(r)s to treat Prometheus servers as yet another source of time series data without directly talking to its APIs.

So you will have to individually expose all the sidecars to have that endpoint for each and every store APIs, and you cannot put a single load balancer in front of them.

Stacking Queries

In this case, you have a lightweight Query running locally in the Observee clusters wich are autodiscovering sidecars in their clusters. You can expose this component instead of each sidecars, so you can go from the number of Prometheus replicas to a single one regarding ingresses. Of course, it’s often advised to run a HA pair of this Query, but in this case, you can expose these through a load balancer.


The third option also has some benefits and downsides. It’s using a newer component of Thanos, called Receive, which is built on top of local Prometheus TSDBs and implements its remote_write functionality.

Thanos developers recommend this approach only if this is the only way to integrate clusters, e.g. in egress-only environments or if the tenant/client clusters are under another team’s jurisdiction and they cannot introduce Thanos locally.
Also, be aware of these pros and cons when evaluating push/pull approaches.


All of the methods above can solve the problem of cross-cluster integration, so it’s up to you and your constraints which one you will settle with.

I’d say most of the time going with Query stacking will solve your problem quite efficiently regarding resource usage and architectural complexity.

Stacking the Query component in-cluster also has some benefits, for example, this is a simple way of solving the lack of support of per store TLS configuration without injecting Envoy sidecars or adding other reverse proxies to handle TLS termination properly.