Skip to content

sled-agent: move instance configuration generation to Nexus #8002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gjcolombo
Copy link
Contributor

One of the determinations in RFD 505 is that Nexus should be the component that's in charge of determining how to configure a VM given a set of database records describing its instance (the Instance itself, its attached Disks and NetworkInterfaces, etc.). To summarize the rationale in the RFD, the hope is that this will promote two nice properties:

  • Local reasoning about virtual platforms: All the logic that translates instance descriptions into VM specs now lives in a single module in Nexus. In past iterations of the code, Nexus transformed database records into an intermediate sled-agent type, and sled-agent would transform those into Propolis API types, which Propolis would then use to fill in virtual hardware details. Understanding where a VM's configuration came from required the reader to look at all these components; now all the relevant logic lives in Nexus.
  • Serviceability: Putting type transformations and platform policies into sled-agent and Propolis makes them marginally more painful to update, since updating these components requires the system to migrate VMs and reboot sleds. Putting the virtual platform policy in Nexus will make it much less expensive to update in the future.

To achieve this:

  • Move sled-agent's virtual platform logic (added in ingest new Propolis VM creation API #7211) into a new Nexus module. Sled-agent needs to hold onto a bit of logic to insert OPTE port names into instance specs before sending those specs to Propolis; this needs to live in the agent since it selects the relevant object names.
  • Update the sled-agent instance registration API to take a Propolis instance spec as a parameter (and rework some other types to distinguish a bit more clearly between "Propolis VM configuration" and "sled-agent objects that need to be created to support this VM").

The main pain point in this change is that sled-agent's API now includes types that it picked up from the propolis-client API, which caused sled-agent's OpenAPI document to balloon with "duplicate" schema descriptions it inherited from propolis-client's generated types. I'm not sure if there's a great way around this (aside from changing the generated Propolis client to replace all its generated types with their "native" counterparts); I'm open to suggestions here.

Tested by booting a VM in a dev cluster, booting a comparable VM on rack2, and comparing their instance specs (as returned by Propolis's /instance/spec API) to make sure they specified the same components with the same configuration.

Copy link
Contributor Author

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to test manually that Propolis-directed region replacements still work as intended with this change (they depend on the virtual platform module having used the relevant disk record's ID as the relevant Propolis backend ID).

@gjcolombo
Copy link
Contributor Author

This will need a fresh commit hash/SHA from the Propolis repo after oxidecomputer/propolis#899 merges, but I think it is otherwise more or less ready for review (though it could probably use some unit tests of the new virtual platform logic...).

@gjcolombo gjcolombo requested a review from hawkw April 24, 2025 19:47
@iximeow iximeow self-requested a review April 24, 2025 20:21
//! sled-agent can use to initialize a Propolis VM that exposes the necessary
//! components.
//!
//! For more background on virtual platforms, see RFD 505.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considered nitpicky: maybe there ought to be a link to the RFD here?

Comment on lines +28 to +45
//! Each component in a VM spec has a name, which is also its key in the
//! component map. Generally speaking, Propolis does not care how its clients
//! name components, but it does require that callers identify those components
//! by name in API calls that refer to a specific component. To make this as
//! easy as possible for the rest of Nexus, this module names components as
//! follows:
//!
//! - If a component corresponds to a specific control plane object (i.e.
//! something like a disk or a NIC that has its own database record and a
//! corresponding unique identifier):
//! - If the component requires both a device and a backend, the *backend*
//! uses the object's ID as a name, and the device uses a module-generated
//! name.
//! - If the component is unitary (i.e. it only has one component entry in the
//! instance spec), this module uses the object ID as its name.
//! - "Default" components that don't correspond to control plane objects, such
//! as a VM's serial ports, are named using the constants in the
//! [`component_names`] module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for documenting these rules, this is lovely.

/// storage and expect that those devices will always appear in the same places
/// when the system is stopped and restarted. Changing these mappings for
/// existing instances may break them!
fn get_pci_bdf(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty nitpicky, feel free to disregard me: i would probably drop the "get" from this, i think of "get" as specifically being an accessor/lookup type thing, and the zero_padded_nvme_serial_from_str function in this module doesn't have a get_ prefix...

backend_id: SpecKey::Uuid(disk.id()),
pci_path,
serial_number: zero_padded_nvme_serial_from_str(
&disk.name().to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't this just be

Suggested change
&disk.name().to_string(),
&disk.name().as_str(),

instead of to_stringing it just to throw it away?

}

/// A list of named components to add to an instance's spec.
struct Components(HashMap<String, ComponentV0>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i notice that we use HashMap here but BTreeMap for DisksById, any rationale for that?


/// Adds the set of disks in the supplied disk list to this component
/// list.
fn add_disks(&mut self, disks: DisksById) -> Result<(), Error> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take it or leave it: perhaps we ought to stick a

Suggested change
fn add_disks(&mut self, disks: DisksById) -> Result<(), Error> {
fn add_disks(&mut self, disks: DisksById) -> Result<(), Error> {
self.0.reserve(disks.0.len() * 2);

since we know how many entries we're going to add up front?


/// Adds the supplied set of NICs to this component manifest.
fn add_nics(&mut self, nics: &[NetworkInterface]) -> Result<(), Error> {
for nic in nics.iter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, we could

Suggested change
for nic in nics.iter() {
self.0.reserve(nics.len() * 2);
for nic in nics.iter() {


let mut components = Components::default();

let mut volumes = vec![];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also nitpicky: we could save a few potential reallocations by doing

Suggested change
let mut volumes = vec![];
let mut volumes = Vec::with_capacity(disks.len());

async fn send_propolis_instance_ensure(
&self,
client: &PropolisClient,
running_zone: &RunningZone,
migrate: Option<InstanceMigrationTargetParams>,
) -> Result<(), Error> {
// A bit of history helps to explain the workings of the rest of this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants