|
| 1 | +# memory bandwidth exporter |
| 2 | + |
| 3 | +Pod/container grained memory bandwidth exporter provides users memory bandwidth metrics of their running containers. The metrics include llc_occupancy, mbm_local_bytes, mbm_total_bytes, cpu utilization and memory usage, and the metrics have been processed. In addition to container-level metrics, it also provides class-level and socket-level metrics. Users can configure the list of metrics to be collected. It serves as an exporter which can be connected to Promethus-like obserbility tools. And it also can be used as a telementry provider. |
| 4 | + |
| 5 | +Memory bandwidth exporter makes use of state-of-the-art technologies like NRI to build a resource-efficient and well-maintained solution. This solution provides observability to memory bandwidth to OPEA micro-services. It lays the groundwork of better scaling and auto scaling of OPEA. It can also be deployed separately on end user environments, supporting any cases that memory bandwidth metrics are required. |
| 6 | + |
| 7 | +The memory bandwidth exporter currently only supports Intel platforms with RDT, and will fail on other platforms. We will add node feature discovery in the future. |
| 8 | + |
| 9 | +## Setup |
| 10 | + |
| 11 | +### Enable NRI in Containerd |
| 12 | + |
| 13 | +```sh |
| 14 | +# download containerd binary, containerd version v1.7.0 or higher is required |
| 15 | +$ wget https://github.com/containerd/containerd/releases/download/v1.7.0/containerd-1.7.0-linux-amd64.tar.gz |
| 16 | + |
| 17 | +# stop running containerd |
| 18 | +$ sudo systemctl stop containerd |
| 19 | + |
| 20 | +# replace old containerd |
| 21 | +$ sudo tar Cxzvf /usr/local containerd-1.7.0-linux-amd64.tar.gz |
| 22 | + |
| 23 | +# enable NRI in containerd |
| 24 | +# add an item in /etc/containerd/config.toml |
| 25 | +[plugins."io.containerd.nri.v1.nri"] |
| 26 | + disable = false |
| 27 | + disable_connections = false |
| 28 | + plugin_config_path = "/etc/containerd/certs.d" |
| 29 | + plugin_path = "/opt/nri/plugins" |
| 30 | + socket_path = "/var/run/nri/nri.sock" |
| 31 | + config_file = "/etc/nri/nri.conf" |
| 32 | + |
| 33 | +# restart containerd |
| 34 | +$ sudo systemctl start containerd |
| 35 | +$ sudo systemctl status containerd |
| 36 | + |
| 37 | +# test nri |
| 38 | +$ git clone https://github.com/containerd/nri |
| 39 | +$ cd nri |
| 40 | +$ make |
| 41 | +$ ./build/bin/logger -idx 00 |
| 42 | +``` |
| 43 | + |
| 44 | +### Enable RDT |
| 45 | + |
| 46 | +Mount resctrl to the directory `/sys/fs/resctrl`: |
| 47 | + |
| 48 | +```sh |
| 49 | +$ sudo mount -t resctrl resctrl /sys/fs/resctrl |
| 50 | +``` |
| 51 | + |
| 52 | +### Setup memory bandwidth exporter |
| 53 | + |
| 54 | +Before setup, you need to configure the runc hook: |
| 55 | + |
| 56 | +```sh |
| 57 | +$ ./config/config.sh |
| 58 | +``` |
| 59 | + |
| 60 | +#### How to build the binary and setup? |
| 61 | + |
| 62 | +```sh |
| 63 | +$ make build |
| 64 | +$ sudo ./bin/memory-bandwidth-exporter |
| 65 | +# e.g., sudo ./bin/memory-bandwidth-exporter --collector.node.name=<node_name> --collector.container.namespaceWhiteList="calico-apiserver,calico-system,kube-system,tigera-operator" |
| 66 | + |
| 67 | +# get memory bandwidth metrics |
| 68 | +$ curl http://localhost:9100/metrics |
| 69 | +``` |
| 70 | + |
| 71 | +#### How to build the docker image and setup? |
| 72 | + |
| 73 | +```sh |
| 74 | +$ make docker.build |
| 75 | +$ sudo docker run \ |
| 76 | + -e NODE_NAME=<node_name> \ |
| 77 | + -e NAMESPACE_WHITELIST="calico-apiserver,calico-system,kube-system,tigera-operator" \ |
| 78 | + --mount type=bind,source=/etc/containers/oci/hooks.d/,target=/etc/containers/oci/hooks.d/ \ |
| 79 | + --privileged \ |
| 80 | + --cgroupns=host \ |
| 81 | + --pid=host \ |
| 82 | + --mount type=bind,source=/usr/,target=/usr/ \ |
| 83 | + --mount type=bind,source=/sys/fs/resctrl/,target=/sys/fs/resctrl/ \ |
| 84 | + --mount type=bind,source=/var/run/nri/,target=/var/run/nri/ \ |
| 85 | + -d -p 9100:9100 \ |
| 86 | + --name=memory-bandwidth-exporter \ |
| 87 | + opea/memory-bandwidth-exporter:latest |
| 88 | + |
| 89 | +# get memory bandwidth metrics |
| 90 | +$ curl http://localhost:9100/metrics |
| 91 | +``` |
| 92 | + |
| 93 | +#### How to deploy on the K8s cluster? |
| 94 | + |
| 95 | +Build and push your image to the location specified by `MBE_IMG`, and apply manifest: |
| 96 | + |
| 97 | +```sh |
| 98 | +$ make docker.build docker.push MBE_IMG=<some-registry>/opea/memory-bandwidth-exporter:<tag> |
| 99 | +$ make change_img MBE_IMG=<some-registry>/opea/memory-bandwidth-exporter:<tag> |
| 100 | +# If namespace system does not exist, create it. |
| 101 | +$ kubectl create ns system |
| 102 | +$ kubectl apply -f config/manifests/memory-bandwidth-exporter.yaml |
| 103 | +``` |
| 104 | + |
| 105 | +Check the installation result: |
| 106 | + |
| 107 | +```sh |
| 108 | +kubectl get pods -n system |
| 109 | +NAME READY STATUS RESTARTS AGE |
| 110 | +memory-bandwidth-exporter-zxhdl 1/1 Running 0 3m |
| 111 | +``` |
| 112 | + |
| 113 | +get memory bandwidth metrics |
| 114 | + |
| 115 | +```sh |
| 116 | +$ curl http://<memory_bandwidth_exporter_container_ip>:9100/metrics |
| 117 | +``` |
| 118 | + |
| 119 | +#### How to delete binary? |
| 120 | + |
| 121 | +```sh |
| 122 | +$ make clean |
| 123 | +``` |
| 124 | + |
| 125 | +## More flags about memory bandwidth exporter |
| 126 | + |
| 127 | +There are some flags to help users better use memory bandwidth exporter: |
| 128 | + |
| 129 | +```sh |
| 130 | +-h, --[no-]help Show context-sensitive help (also try --help-long and --help-man). |
| 131 | +--collector.node.name="" Give node name. |
| 132 | +--collector.container.namespaceWhiteList="" Filter out containers whose namespaces belong to the namespace whitelist, namespaces separated by commas, like "xx,xx,xx". |
| 133 | +--collector.container.monTimes=10 Scan the pids of containers created before the exporter starts to prevent the loss of pids. |
| 134 | +--collector.container.metrics="all" Enable container collector metrics. |
| 135 | +--collector.class.metrics="none" Enable class collector metrics. |
| 136 | +--collector.node.metrics="none" Enable node collector metrics. |
| 137 | +--web.telemetry-path="/metrics" Path under which to expose metrics. |
| 138 | +--[no-]web.disable-exporter-metrics Exclude metrics about the exporter itself (promhttp_*, process_*, go_*). |
| 139 | +--web.max-requests=40 Maximum number of parallel scrape requests. Use 0 to disable. |
| 140 | +--runtime.gomaxprocs=1 The target number of CPUs Go will run on (GOMAXPROCS) ($GOMAXPROCS) |
| 141 | +--[no-]web.systemd-socket Use systemd socket activation listeners instead of port listeners (Linux only). |
| 142 | +--web.listen-address=:9100 ... Addresses on which to expose metrics and web interface. Repeatable for multiple addresses. |
| 143 | +--web.config.file="" Path to configuration file that can enable TLS or authentication. See: https://github.com/prometheus/exporter-toolkit/blob/master/docs/web-configuration.md |
| 144 | +--collector.interval=3s memory bandwidth exporter collect metrics interval |
| 145 | +--NRIplugin.name="mb-nri-plugin" Plugin name to register to NRI |
| 146 | +--NRIplugin.idx="11" Plugin index to register to NRI |
| 147 | +--[no-]disableWatch Disable watching hook directories for new hooks |
| 148 | +--log.level=info Only log messages with the given severity or above. One of: [debug, info, warn, error] |
| 149 | +--log.format=logfmt Output format of log messages. One of: [logfmt, json] |
| 150 | +--[no-]version Show application version. |
| 151 | +``` |
0 commit comments