Skip to content
This repository was archived by the owner on Sep 30, 2020. It is now read-only.

[v0.17.x] Allow dnsmasq to be backed by a local copy of CoreDNS #1896

Merged
merged 1 commit into from
Aug 18, 2020

Conversation

kfr2
Copy link
Contributor

@kfr2 kfr2 commented Aug 5, 2020

This commit allows the user to specify that dnsmasq should be
backed by a pod-local copy of CoreDNS rather than relying on
the global CoreDNS service. If enabled, the dnsmasq-node
DaemonSet will be configured to use a local copy of CoreDN
for its resolution while setting the global CoreDNS service as
a fallback. This is handy in situations where the number of DNS
requests within a cluster grows large and causes resolution issues
as dnsmasq reaches out to the global CoreDNS service.

Additionally, several values passed to dnsmasq are now configurable
including its --cache-size and --dns-forward-max.

See this postmortem
for an investigation into this situation which was instrumental in
understanding issues we were facing. Many thanks to dominicgunn
for providing the manifests which I codified into this commit.


These features can be enabled and tuned by setting the following
values within cluster.yaml:

kubeDns:
  dnsmasq:
    coreDNSLocal:
      # When enabled, this will run a copy of CoreDNS within each DNS-masq pod and
      # configure the utility to use it for resolution.
      enabled: true

      # Defines the resource requests/limits for the coredns-local container.
      # cpu and/or memory constraints can be removed by setting the appropriate value(s)
      # to an empty string.
      resources:
        requests:
          cpu: 50m
          memory: 100Mi
        limits:
          cpu: 50m
          memory: 100Mi

    # The size of dnsmasq's cache.
    cacheSize: 50000

    # The maximum number of concurrent DNS queries.
    dnsForwardMax: 500

    # This option gives a default value for time-to-live (in seconds) which dnsmasq
    # uses to cache negative replies even in the absence of an SOA record.
    negTTL: 60

Related:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 5, 2020
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Aug 5, 2020
Copy link
Contributor

@dominicgunn dominicgunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a change to ensure we start up properly.

@@ -1074,6 +1074,11 @@ write_files:
"${mfdir}/kube-dns-de.yaml"
{{- end }}
{{ if .KubeDns.NodeLocalResolver -}}
{{ if .KubeDns.dnsmasq.CoreDNSLocal.Enabled -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kfr2, could we flip some of the - locations in this templating? We're trimming whitespace on the end here, causing start up to try and (in some cases) deploy "${mfdir}/dnsmasq-node-coredns-local.yaml"deploy "${mfdir}/dnsmasq-node-ds.yaml" which is causing issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing!

@dominicgunn dominicgunn changed the title Allow dnsmasq to be backed by a local copy of CoreDNS [v0.17.x] Allow dnsmasq to be backed by a local copy of CoreDNS Aug 13, 2020
@dominicgunn dominicgunn added this to the v0.17.0 milestone Aug 13, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign justinsb
You can assign the PR to them by writing /assign @justinsb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This commit allows the user to specify that dnsmasq should be
backed by a pod-local copy of CoreDNS rather than relying on
the global CoreDNS service. If enabled, the dnsmasq-node
DaemonSet will be configured to use a local copy of CoreDN
for its resolution while setting the global CoreDNS service as
a fallback. This is handy in situations where the number of DNS
requests within a cluster grows large and causes resolution issues
as dnsmasq reaches out to the global CoreDNS service.

Additionally, several values passed to dnsmasq are now configurable
including its `--cache-size` and `--dns-forward-max`.

See [this postmortem](https://github.com/zalando-incubator/kubernetes-on-aws/blob/dev/docs/postmortems/jan-2019-dns-outage.md)
for an investigation into this situation which was instrumental in
understanding issues we were facing. Many thanks to dominicgunn
for providing the manifests which I codified into this commit.

---

These features can be enabled and tuned by setting the following
values within cluster.yaml:

```yaml
kubeDns:
  dnsmasq:
    coreDNSLocal:
      # When enabled, this will run a copy of CoreDNS within each DNS-masq pod and
      # configure the utility to use it for resolution.
      enabled: true

      # Defines the resource requests/limits for the coredns-local container.
      # cpu and/or memory constraints can be removed by setting the appropriate value(s)
      # to an empty string.
      resources:
        requests:
          cpu: 50m
          memory: 100Mi
        limits:
          cpu: 50m
          memory: 100Mi

    # The size of dnsmasq's cache.
    cacheSize: 50000

    # The maximum number of concurrent DNS queries.
    dnsForwardMax: 500

    # This option gives a default value for time-to-live (in seconds) which dnsmasq
    # uses to cache negative replies even in the absence of an SOA record.
    negTTL: 60
```
@dominicgunn dominicgunn merged commit 4a7646d into kubernetes-retired:master Aug 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants