Skip to content

Latest commit

 

History

History
337 lines (272 loc) · 13.7 KB

File metadata and controls

337 lines (272 loc) · 13.7 KB

CNI meeting notes

note: the notes are checked in after every meeting to https://github.com/containernetworking/meeting-notes

An editable copy is hosted at https://hackmd.io/jU7dQ49dQ86ugrXBx1De9w. Feel free to add agenda items there

2025-10-13

  • [cdc] Owing to illness, may be absent
  • [Lionel] Demo CNI-DRA-Driver

2025-09-29

2025-09-016

  • Lionel, Marcelo and Tomo join 11AM EST...
    • Did Casay start the call 10AM EST?
    • Currently as far as we saw google calendar (of CNI), starting time is same as before, 11AM EST
    • Casay, if you change the meeting time, please change google calendar time as well.
    • Sorry, was stuck on a train!

2025-09-01

2025-08-25

2025-08-18

  • Consider moving to bi-weekly? ()
  • Pushing on ContainerD's GC implementation
    • Casey to write up from Cilium's perspective
  • Discuss PR containernetworking/plugins#1195, adding a bridge uplink
    • This adds a "physical" interface to the bridge
    • Q: is this in the plugin's wheelhouse? or should this be NMState (et al.)? A: This seems in-scope, so it's not out-of-the-question
    • Q: This is vlan-specific. What happens if there are no vlans?
    • Q: What is the intended behavior?
    • Marcelo to write up these questions to the submitter.

2025-07-28

  • cdc on vacation Aug 04, Aug 11

2025-07-21

2025-07-14

Regrets: Casey (Childcare snafu)

2025-07-07

Deadline for kubecon November maintainer track is approaching.

We discuss the ownership of Multus, now that Tomo and Doug have wandered off to more ecxAIting work.

We would like to focus on gRPC. Whether or not this is determined to be CNI 2.0 is somewhat unimportant.

Open question: did containerd add GC support?

2025-06-30

We discuss the CNI-DRA driver. Lionel mentions that determining the initial set of resources is difficult. Question: should we add a new informational verb? Or should we expose them statically in the CNI configuration file?

Challenge: some resources, most notably the node's uplink, may be shared between, say, a macvlan and ipvlan device.

2025-06-23

Sebiastian and Casey discuss DRA.

2025-06-16

  • regrets: tomo
  • [danwinship] returning errors from DEL
  • [danwinship] did people actually talk about containernetworking/cni#1162 (exec from container image) last week? I also added some thoughts to containernetworking/cni#821 (comment) (CNI 2.0 daemonization). Maybe if we try to solve a simpler problem we can make some progress on this...

2025-06-09

  • regrets: casey
  • [Doug] quick news (<5 minutes)
    • Doug starting a new role with new upstreams! (especially: vllm)
  • containernetworking/cni#1162 (just look this one)

2025-06-02

2025-05-26

  • regrets: tomo

2025-05-19

  • Plan to fix the "doomed delete" problem:

    1. List certain ADD error codes as "nothing was created"
    2. Cache when an ADD fails with this error
    3. On DEL, if the DEL fails for that specific plugin, swallow the error.
    • additionally, flag certain DEL error codes as "deletion failed, but all resources are in the namespace".
  • Discuss the confusing vlan situation: https://docs.google.com/document/d/1fHrZ6f2Syaq-jiS1pdTXRYxY29mu4dYakVCqoMZtS4U/edit?addon_store&tab=t.0

    • proposal: vlans sub-field that disables all other vlan behavior when set
    • {vlans: {"untagged": 1234, "tagged": [5, 6, 7] }}

2025-05-12

  • regrets: cdc, tomo
  • [zappa] containerd: only passing in a subset of labels
    • everyone's querying for it
    • looking into a single line change to instead pass all of the labels
    • any issues?
    • [Doug] sgtm, I need to check what's up on the crio side, and how we're leveraging it.
    • There are limitations, a pod spec can only be a meg (which is configurable) in etcd. Then there's a kernel limitation, which is also set to a meg.
  • containernetworking/plugins#1175
    • Still pending some other options

2025-05-05

  • [zappa] We have an issue where the metaplugin fails and then the runtime keeps allocating an IP. What can we do on the CNI side to stop the bleeding here.
  • We discuss this for some time. The conclusion:
    • It is not safe to ignore an error on delete, as stale resources may be re-used (i.e. IP allocations and firewall rules). Need to delete both or neither
    • If there is a failure on ADD, we don't know if any resources were created, so a DEL is required
    • STATUS is the immediate fix -- if a chained plugin is unavailable, it should fail STATUS
  • TODO
    • Write best-practices document
    • File issue adding "no non-namespaced resources were added" error code set
      • This would be used by runtimes to know that certain error corner cases could be skipped, e.g. failed ADD
      • "Don't sweep the floor, I'm about to tear down the building!" h/t: raymond chen

2025-04-28

  • [cdc] Tagged plugins v1.7.0 (and v1.7.1, oops!)
    • lesson learned: don't do git push --tags, create tag via releases page.
  • [cdc] Thanks, Lionel for dealing with maintainers DBs
    • anything else outstanding?
  • [mlguerrero12] PR for review: containernetworking/plugins#1175

2025-04-21

2025-04-14

2025-04-07

  • [Doug] v1.2.4 release for CNI?
    • Commits since v1.2.3
    • Fairly minor, but, I'd like the safe subdirectory loading changes, please and thanks.

2025-03-24:

2025-03-17:

  • continue {DRA + NRI} CNI substitution
    • casey's idea for "merging" CNI and DRA: 0. We give up chaining? - We don't necessarily have to give up anything; can we make the simple case simpler, and the complicated case possible?
      1. NRI returns IP addresses
      2. "infinite" / "virtual" / "dynamic" device creation (i.e. how do you represent a fully virtual, "free" device to the scheduler) (kep 5075)
      3. Some kind of formalized Primary Network

We discuss replacing CNI with NRI. We also discuss whether or not DRA needs to be involved at all.

2025-03-10:

2025-03-03:

  • DST: US DST is applied to this call
  • Discussion: should we work on a first draft of gRPC?
    • Tomo: meh

2025-02-24:

  • regrets:
    • Casey (on vacation)
    • Tomo (national holiday)

2025-02-17:

  • regrets:
    • Tomo (will be back when DST comes. BTW, is CNI call aligned to US DST time?)
  • Casey and Lionel chat about making DRA and CNI have similar APIs
    • Plan for now is to lift existing API in to gRPC without big changes

2025-02-10

  • Lionel: what's the status of CNI 2.0?
    • Nobody's really taken it on
  • Lionel: would like ability to report allocatable capacity
    • casey: What if we returned JSON from STATUS?
      • problem: when STATUS is error, then we lose that
    • See example from the kep:
kind: ResourceSlice
spec:
  driver: cni.dra.networking.x-k8s.io
  deviceSources:
  - name: eth1
    provisionLimit: 1000
    basic:
      attributes:
        name:
          string: "eth1"
      capacity:
        bandwidth:
          quantity: 10Gi

How can we get this information from the CNI plugin. CAPACITY verb? Casey: is this the "straw that breaks the camels back", should we move over to gRPC?

2025-02-03:

  • regrets: casey (on a plane)
    • zappa oof today (containerd 2.0 status PR fix should be merged today)
    • Tomo oof

2025-01-27:

  • [mike] update on containerd 2.0 update
  • [lionel] update on cni-dra-driver
    • [cdc] should we have better embeddable types?
  • [tomo] FYI: multus-cni support CNI 1.1
  • [Doug] Addressed comments on: containernetworking/plugins#1143 (further review appreciated, and thanks for review!)

Could the DRA config type be something like

type CNIConfig struct {
	metav1.TypeMeta `json:",inline"`

	// IfName represents the name of the network interface requested.
	IfName string `json:"ifName"`

	// Config represents the CNI Config.
	Config CNINetworkConfig
    
    Plugins []runtime.RawExtension `json:"plugins"`
}

type CNINetworkConfig struct {
    CNIVersion string
    Name string
    DisableCheck bool
    DisableGC bool
}

2025-01-20:

2025-01-13:

  • CI failing with unshare: write failed /proc/self/uid_map: Operation not permitted. Anyone have any clues?
    • Should we just remove this for now to unblock CI?
  • Tomo will skip this call due to holiday. (But please see PR1137 below)
  • PR
    • containernetworking/plugins#1137 (just remove scripts/release.sh because it is no longer used, replaced with github action)
      • github CI failed to tests (even though it is passed in my lab). Guess that github CI needs to be fixed...
  • Review CNI v1.2 ideas
  • discuss VALIDATE
    • Relevant to conversation about cni-dra
    • wow, dra scheduling is complicated
  • jitsi dies, https://meet.google.com/gjm-mmmf-cra

2025-01-06: