Skip to content

Got a lot of warn "failed DNS A record lookup" #4952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
budwing opened this issue Nov 4, 2022 · 6 comments
Closed

Got a lot of warn "failed DNS A record lookup" #4952

budwing opened this issue Nov 4, 2022 · 6 comments

Comments

@budwing
Copy link

budwing commented Nov 4, 2022

Describe the bug
We upgraded Cortex from 1.10.0 to 1.13.1 recently. All the components run well except Cortex Querier. When I started Cortex Querier, it produced a lot of warn msg as following:

Nov 04 08:01:28 cos-cortex-1.13.1[1791959]: level=warn ts=2022-11-04T07:01:28.301903104Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup https on xx.xx.xx.xx:53: no such host"
Nov 04 08:01:38 cos-cortex-1.13.1[1791959]: level=warn ts=2022-11-04T07:01:38.305297822Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup https on xx.xx.xx.xx:53: no such host"
Nov 04 08:01:48 cos-cortex-1.13.1[1791959]: level=warn ts=2022-11-04T07:01:48.30911884Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup https on xx.xx.xx.xx:53: no such host"
Nov 04 08:01:58 cos-cortex-1.13.1[1791959]: level=warn ts=2022-11-04T07:01:58.313291008Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup https on xx.xx.xx.xx:53: no such host"
Nov 04 08:02:08 cos-cortex-1.13.1[1791959]: level=warn ts=2022-11-04T07:02:08.316010447Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup https on xx.xx.xx.xx:53: no such host"
Nov 04 08:02:18 cos-cortex-1.13.1[1791959]: level=warn ts=2022-11-04T07:02:18.319513141Z caller=dns_resolver.go:225 msg="failed DNS A record lookup" err="lookup https on xx.xx.xx.xx:53: no such host"

xx.xx.xx.xx is the ip address of our DNS server. Although there's a lot of warn log, Querier behaves well.

To Reproduce
I guess the reason is the address of Cortex Query Frontend configured by -querier.frontend-address.
Our Query-Frontend address is not an A record, it's a CName.
So maybe when a CName addres of Query-Frontend is specified, the warn log will be reproduced.

Expected behavior
Disable this warn log if it does not impact the functionality of querier.

Maybe it's the same as #4601 which was just fixed in v1.13.0.

@alvinlin123
Copy link
Contributor

alvinlin123 commented Nov 8, 2022

@budwing can you share your configuration value for -querier.frontend-address? The error message seems to indicate that you are resolving a host name called "https". Did you configure -querier.frontend-address to be something like https://host:port? If so, try just configuring it as host:port.

@budwing
Copy link
Author

budwing commented Nov 8, 2022

Thanks for the reply.
Yes, the address prefixed with "https". But if I removed "https", there are a lot of connection errors:

Nov 08 04:30:28 cos-cortex-1.13.1[3556861]: level=error ts=2022-11-08T03:30:28.463913462Z caller=frontend_processor.go:64 msg="error contacting frontend" address=xx.xx.xx.xx:443 err="rpc error: code = Unavailable desc = connection closed before server preface received"
Nov 08 04:30:28 cos-cortex-1.13.1[3556861]: level=error ts=2022-11-08T03:30:28.472094548Z caller=frontend_processor.go:64 msg="error contacting frontend" address=xx.xx.xx.xx:443 err="rpc error: code = Unavailable desc = connection closed before server preface received"
Nov 08 04:30:28 cos-cortex-1.13.1[3556861]: level=error ts=2022-11-08T03:30:28.483399033Z caller=frontend_processor.go:64 msg="error contacting frontend" address=xx.xx.xx.xx:443 err="rpc error: code = Unavailable desc = connection closed before server preface received"

Because we are using Envoy to be the proxy of our services, the address "xx.xx.xx.xx:443" is actually the IP of Envoy. So we don't want Cortex to parse the URL to IP address, the connection between Querier and Query-Frontend should be established based on HTTPS. It works well in v1.10.0. But in v1.13.1, it produces the warn logs.

Can you confirm whether this warn logs indicate some impacts to the functionality of Querier? Can we just ignore it?

@alvinlin123
Copy link
Contributor

alvinlin123 commented Nov 9, 2022

TL;DR

Can you confirm whether this warn logs indicate some impacts to the functionality of Querier? Can we just ignore it?

It does not impact functionality of Querier if you call Querier directly to query your metrics, and not QueryFrontend.

Longer answer

The warn log means querier cannot register itself to query-frontend. So if you make a request to QueryFrontend, it won't be able to find a querier to handle a query and the query call would fail.

However the warn log would not impact Querier, if you call Querier directly, the query would work. But if you are not using QueryFrontend you would be missing out some feature like caching and query splitting.

Things appear to work for you (i.e. no warning log) because in v1.10.0 dns_resolver.go was using grpclog, which by default set the log level to error; hence, warning log would not be printed :) In 1.13.0 dns_resolver.go uses Cortex's own logger which has default level of info, that's why you see the logs now.

If you really don't want to use QueryFrontend, you don't have to start QueryFrontend and can leave -querier.frontend-address empty. But I would recommend to use QueryFrontend for optimal querying performance.

@budwing
Copy link
Author

budwing commented Nov 11, 2022

Hi @alvinlin123,

Thanks again for the reply.

We are using Query Frontend actually, but we configured both -querier.frontend-address for Querier and -frontend.downstream-url for Query-Frontend. Since the value of -querier.frontend-address contains https, -querier.frontend-address didn't work. Because there's no log in v1.10.0, we didn't notice that. So in our case, Queriers are working as downstream of Query Frontend. For these two modes, can you confirm me which one is the best in terms of performance?

My second question is why Querier does not use DNS directly to connect Query-Frontend? In our case, we are using Envoy as proxy of the services. If Cortex Querier uses IP address directly, Envoy can't dispatch the requests, because the request header doesn't contain domain information. If DNS can't be supported, I suggest at least -querier.frontend-address shall support multiple endpoints. Otherwise, how can Querier balance the jobs if there are two Query-Frontend?

BR//Taylor

@alvinlin123
Copy link
Contributor

For these two modes, can you confirm me which one is the best in terms of performance?

This really depend on what you mean by performance.

The frontend.downstream-url simply ask the QueryFrontend to forward request to any of the Quriers behind the url, so you get a proxying-round-robin behaviour for any query. If your queries are typically small in terms of time range and visited data set, then using frontend.downstream-url is good enough, but in this case you don't really need QueryFrontend, and can just load balance across Queriers.

Generally, I would recommend use of querier.frontend-address, because it will enable query splitting and other benefits documented here.

Also note that you should only configure one of querier.frontend-address of frontend.downstream-url. If you configure both frontend.downstream-url wins. We probably should update Cortex document to call out this.

why Querier does not use DNS directly to connect Query-Frontend?
This is a good question, and it's because of how querier.frontend-address was implemented. When you configure querier.frontend-address, it essentially translate to "Ok, Queriers. go talk to each of the QueryFrontEnd pods, and let them know you are there so you can get work from QueryFrontEnd". So Querier will:

  • Resolve the DNS, find out exactly how many QueryFrontEnd (how many IP addresses) is behind the DNS
  • Call each of the QueryFrontEnd to register itself

The end result is that when QueryFrontEnd get a Query:

  • It know how many Querier are there
  • So it can assign subset of Queriers for a tenant, split the query and send the request concurrently to multiple Queriers

I understand the above may be unintuitive, but simply put, querier.frontend-address turns on queuing mechanism between Querier and QueryFrontEnd. Make sense?

@budwing
Copy link
Author

budwing commented Nov 16, 2022

I think I got all the answers and found the solution. Close this issue, thanks a lot for your patience.

@budwing budwing closed this as completed Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants