Skip to content

Commit d982681

Browse files
askervinpoussa
authored andcommitted
doc: update platform optimization document
- Add steps for policy and CPU affinity validation. - Add steps for policy removal. - Add a warning, link and a workaround for the text-embeddings-interface CPU affinity issue. - Add a warning on response times due to memory moves. Signed-off-by: Antti Kervinen <[email protected]>
1 parent a6fd418 commit d982681

File tree

1 file changed

+138
-10
lines changed

1 file changed

+138
-10
lines changed

doc/platform-optimization/README.md

Lines changed: 138 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,12 @@ underlying hardware topology.
4545
Warning: installing and reconfiguring the balloons policy can change
4646
allowed CPUs and memories of already running containers in the
4747
cluster. This may hurt containers that rely on the number of allowed
48-
CPUs being static.
48+
CPUs being static. Furthermore, if there are containers with gigabytes
49+
of memory allocated, reconfiguring the policy may cause the kernel to
50+
move large amounts of memory between NUMA nodes. This may cause
51+
extremely slow response times until moves have finished. Therefore, it
52+
is recommended that nodes are empty or relatively lightly loaded when
53+
new resource policy is applied.
4954

5055
Install the balloons policy with helm:
5156

@@ -61,9 +66,25 @@ Install the balloons policy with helm:
6166
```
6267

6368
Now the balloons policy is managing node resources in the cluster as a
64-
DaemonSet that communicates with the container runtime on every
65-
node. You should see `nri-resource-policy-balloons-...` pod running on
66-
every node.
69+
DaemonSet that communicates with the container runtime on every node.
70+
71+
## Validate policy status
72+
73+
The balloons policy is running on a node once you can find
74+
`nri-resource-policy-balloons-...` pod.
75+
76+
```
77+
kubectl get pods -A -o wide | grep nri-resource-policy
78+
79+
default nri-resource-policy-balloons-v6bvq 1/1 Running 0 12s 10.0.0.136 spr-2 <none> <none>
80+
```
81+
82+
Status of the policy on each node in a cluster can be read from the
83+
balloonspolicy custom resource. For instance, see Status from
84+
85+
```
86+
kubectl describe balloonspolicy default
87+
```
6788

6889
## Configure
6990

@@ -78,7 +99,20 @@ application's Gaudi accelerated pipeline.
7899

79100
In the
80101
[manifest](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/manifests/gaudi/chatqna.yaml)
81-
there are "tgi" and "tei" containers that will need a lot of CPUs.
102+
there are "tgi", "tei" and "teirerank" containers in "chatqna-tgi" and
103+
"chatqna-tei" and "chatqna-teirerank" deployments that will need a lot
104+
of CPUs. They implement text-generation-interface and
105+
text-embeddings-interface services.
106+
107+
Warning: an
108+
[issue](https://github.com/opea-project/GenAIExamples/issues/763) in
109+
the text-generation-interface causes bad performance when CPUs are
110+
managed. As a workaround, prevent CPU management of these containers
111+
by adding a pod annotation in both "chatqna-tei" and
112+
"chatqna-teirerank" deployments:
113+
```
114+
cpu.preserve.resource-policy.nri.io: "true"
115+
```
82116

83117
A note on terminology: we refer to physical CPU cores as "CPU cores"
84118
and hyperthreads as vCPUs or just CPUs. When hyperthreading is on, the
@@ -116,8 +150,10 @@ spec:
116150
hideHyperthreads: true
117151
matchExpressions:
118152
- key: name
119-
operator: Equals
120-
values: ["tei"]
153+
operator: In
154+
values:
155+
- tei
156+
- teirerank
121157
- name: default
122158
hideHyperthreads: false
123159
namespaces:
@@ -198,14 +234,106 @@ For more information about the configuration and the balloons resource
198234
policy, refer to the balloons
199235
[documentation](https://containers.github.io/nri-plugins/stable/docs/resource-policy/policy/balloons.html).
200236

237+
238+
## Validate CPU affinity and hardware alignment in containers
239+
240+
CPUs allowed in each container of the ChatQnA RAG pipeline can be
241+
listed by running grep in each container. Assuming that the pipeline
242+
is running in the "chatqna" namespace, this can be done as follows.
243+
244+
```
245+
namespace=chatqna
246+
for pod in $(kubectl get pods -n $namespace -o name); do
247+
echo $(kubectl exec -t -n $namespace $pod -- grep Cpus_allowed_list /proc/self/status) $pod
248+
done | sort
249+
250+
Cpus_allowed_list: 0-30 chatqna-tgi-84c98dd9b7-26dhl
251+
Cpus_allowed_list: 32-39 chatqna-teirerank-7fd4d88d85-swjjv
252+
Cpus_allowed_list: 40-47 chatqna-tei-f5dd58487-vfv45
253+
Cpus_allowed_list: 56-62,120-126 chatqna-85fb984fb9-7rfrk
254+
Cpus_allowed_list: 56-62,120-126 chatqna-data-prep-5489d9b65d-szgth
255+
Cpus_allowed_list: 56-62,120-126 chatqna-embedding-usvc-64566dd669-hdr4k
256+
Cpus_allowed_list: 56-62,120-126 chatqna-llm-uservice-678dc9f98c-tvtqq
257+
Cpus_allowed_list: 56-62,120-126 chatqna-redis-vector-db-676fb75667-trqm6
258+
Cpus_allowed_list: 56-62,120-126 chatqna-reranking-usvc-74b5684cbc-28gdr
259+
Cpus_allowed_list: 56-62,120-126 chatqna-retriever-usvc-64fd64475b-f892k
260+
Cpus_allowed_list: 56-62,120-126 chatqna-ui-dd657bbf6-2wzhr
261+
```
262+
263+
Alignment of allowed CPU sets with the underlying hardware topology
264+
can be validated by comparing above output to CPUs in each NUMA node.
265+
266+
```
267+
lscpu | grep NUMA
268+
269+
NUMA node(s): 8
270+
NUMA node0 CPU(s): 0-7,64-71
271+
NUMA node1 CPU(s): 8-15,72-79
272+
NUMA node2 CPU(s): 16-23,80-87
273+
NUMA node3 CPU(s): 24-31,88-95
274+
NUMA node4 CPU(s): 32-39,96-103
275+
NUMA node5 CPU(s): 40-47,104-111
276+
NUMA node6 CPU(s): 48-55,112-119
277+
NUMA node7 CPU(s): 56-63,120-127
278+
```
279+
280+
This shows that chatqna-tgi is executed on CPUs 0-30, that is, on NUMA
281+
nodes 0-3. All these NUMA nodes are located in the same CPU socket, as
282+
they have the same physical package id:
283+
284+
```
285+
cat /sys/devices/system/node/node[0-3]/cpu*/topology/physical_package_id | sort -u
286+
0
287+
```
288+
289+
The output also shows that chatqna-teirerank and chatqna-tei have been
290+
given CPUs from two separate NUMA nodes (4 and 5) from the other CPU socket.
291+
292+
```
293+
cat /sys/devices/system/node/node[4-5]/cpu*/topology/physical_package_id | sort -u
294+
1
295+
```
296+
297+
Finally, taking a deeper look into CPUs of chatqna-teirerank (32-39),
298+
we can find out that each of them is selected from a separate physical
299+
CPU core in NUMA node4. That is, there are no two vCPUs (hyperthreads)
300+
from the same core.
301+
302+
```
303+
cat /sys/devices/system/node/node4/cpu3[2-9]/topology/core_id
304+
0
305+
1
306+
2
307+
3
308+
4
309+
5
310+
6
311+
7
312+
```
313+
314+
## Remove a policy
315+
316+
The balloons policy is uninstalled from the cluster with helm:
317+
318+
```
319+
helm uninstall balloons
320+
```
321+
322+
Note that removing the policy does not modify CPU affinity (cgroups
323+
cpuset.cpus files) of running containers. For that the containers need
324+
to be recreated or new policy installed.
325+
201326
## NRI topology-aware resource policy
202327
203328
NRI plugins include the topology-aware resource policy, too. Unlike
204329
balloons, it does not require configuration to start with. Instead, it
205330
will create CPU pools for containers purely based on their resource
206331
requests and limits, that must be set for effective use of the
207-
policy. Yet container and node type-specific configuration
208-
possibilities are more limited, the policy works well for ensuring
209-
NUMA alignment and more. See the topology-aware policy
332+
policy. Containers in the
333+
[Guaranteed](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#guaranteed)
334+
QoS class get dedicated CPUs. Yet container and node type-specific
335+
configuration possibilities are more limited, the policy works well
336+
for ensuring NUMA alignment and choosing CPUs with low latency access
337+
to accelerators like Gaudi cards. See the topology-aware policy
210338
[documentation](https://containers.github.io/nri-plugins/stable/docs/resource-policy/policy/topology-aware.html)
211339
for more information.

0 commit comments

Comments
 (0)