Skip to content

Commit 358961f

Browse files
committed
Merge branch 'devlink-io-eqs'
Parav Pandit says: ==================== devlink: Add port function attribute for IO EQs Currently, PCI SFs and VFs use IO event queues to deliver netdev per channel events. The number of netdev channels is a function of IO event queues. In the second scenario of an RDMA device, the completion vectors are also a function of IO event queues. Currently, an administrator on the hypervisor has no means to provision the number of IO event queues for the SF device or the VF device. Device/firmware determines some arbitrary value for these IO event queues. Due to this, the SF netdev channels are unpredictable, and consequently, the performance is too. This short series introduces a new port function attribute: max_io_eqs. The goal is to provide administrators at the hypervisor level with the ability to provision the maximum number of IO event queues for a function. This gives the control to the administrator to provision right number of IO event queues and have predictable performance. Examples of when an administrator provisions (set) maximum number of IO event queues when using switchdev mode: $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10 $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20 $ devlink port show pci/0000:06:00.0/1 pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0 function: hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20 This sets the corresponding maximum IO event queues of the function before it is enumerated. Thus, when the VF/SF driver reads the capability from the device, it sees the value provisioned by the hypervisor. The driver is then able to configure the number of channels for the net device, as well as the number of completion vectors for the RDMA device. The device/firmware also honors the provisioned value, hence any VF/SF driver attempting to create IO EQs beyond provisioned value results in an error. With above setting now, the administrator is able to achieve the 2x performance on SFs with 20 channels. In second example when SF was provisioned for a container with 2 cpus, the administrator provisioned only 2 IO event queues, thereby saving device resources. With the above settings now in place, the administrator achieved 2x performance with the SF device with 20 channels. In the second example, when the SF was provisioned for a container with 2 CPUs, the administrator provisioned only 2 IO event queues, thereby saving device resources. changelog: v2->v3: - limited to 80 chars per line in devlink - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock on error path v1->v2: - limited comment to 80 chars per line in header file - fixed set function variables for reverse christmas tree - fixed comments from Kalesh - fixed missing kfree in get call - returning error code for get cmd failure - fixed error msg copy paste error in set on cmd failure ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2 parents 4308811 + 93197c7 commit 358961f

File tree

7 files changed

+209
-0
lines changed

7 files changed

+209
-0
lines changed

Documentation/networking/devlink/devlink-port.rst

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,9 @@ Users may also set the IPsec crypto capability of the function using
134134
Users may also set the IPsec packet capability of the function using
135135
`devlink port function set ipsec_packet` command.
136136

137+
Users may also set the maximum IO event queues of the function
138+
using `devlink port function set max_io_eqs` command.
139+
137140
Function attributes
138141
===================
139142

@@ -295,6 +298,36 @@ policy is processed in software by the kernel.
295298
function:
296299
hw_addr 00:00:00:00:00:00 ipsec_packet enabled
297300

301+
Maximum IO events queues setup
302+
------------------------------
303+
When user sets maximum number of IO event queues for a SF or
304+
a VF, such function driver is limited to consume only enforced
305+
number of IO event queues.
306+
307+
IO event queues deliver events related to IO queues, including network
308+
device transmit and receive queues (txq and rxq) and RDMA Queue Pairs (QPs).
309+
For example, the number of netdevice channels and RDMA device completion
310+
vectors are derived from the function's IO event queues. Usually, the number
311+
of interrupt vectors consumed by the driver is limited by the number of IO
312+
event queues per device, as each of the IO event queues is connected to an
313+
interrupt vector.
314+
315+
- Get maximum IO event queues of the VF device::
316+
317+
$ devlink port show pci/0000:06:00.0/2
318+
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
319+
function:
320+
hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10
321+
322+
- Set maximum IO event queues of the VF device::
323+
324+
$ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32
325+
326+
$ devlink port show pci/0000:06:00.0/2
327+
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
328+
function:
329+
hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32
330+
298331
Subfunction
299332
============
300333

drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,8 @@ static const struct devlink_port_ops mlx5_esw_pf_vf_dl_port_ops = {
9898
.port_fn_ipsec_packet_get = mlx5_devlink_port_fn_ipsec_packet_get,
9999
.port_fn_ipsec_packet_set = mlx5_devlink_port_fn_ipsec_packet_set,
100100
#endif /* CONFIG_XFRM_OFFLOAD */
101+
.port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get,
102+
.port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set,
101103
};
102104

103105
static void mlx5_esw_offloads_sf_devlink_port_attrs_set(struct mlx5_eswitch *esw,
@@ -143,6 +145,8 @@ static const struct devlink_port_ops mlx5_esw_dl_sf_port_ops = {
143145
.port_fn_state_get = mlx5_devlink_sf_port_fn_state_get,
144146
.port_fn_state_set = mlx5_devlink_sf_port_fn_state_set,
145147
#endif
148+
.port_fn_max_io_eqs_get = mlx5_devlink_port_fn_max_io_eqs_get,
149+
.port_fn_max_io_eqs_set = mlx5_devlink_port_fn_max_io_eqs_set,
146150
};
147151

148152
int mlx5_esw_offloads_devlink_port_register(struct mlx5_eswitch *esw, struct mlx5_vport *vport)

drivers/net/ethernet/mellanox/mlx5/core/eswitch.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -573,6 +573,13 @@ int mlx5_devlink_port_fn_ipsec_packet_get(struct devlink_port *port, bool *is_en
573573
int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port, bool enable,
574574
struct netlink_ext_ack *extack);
575575
#endif /* CONFIG_XFRM_OFFLOAD */
576+
int mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port,
577+
u32 *max_io_eqs,
578+
struct netlink_ext_ack *extack);
579+
int mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port,
580+
u32 max_io_eqs,
581+
struct netlink_ext_ack *extack);
582+
576583
void *mlx5_eswitch_get_uplink_priv(struct mlx5_eswitch *esw, u8 rep_type);
577584

578585
int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw,

drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@
6666

6767
#define MLX5_ESW_FT_OFFLOADS_DROP_RULE (1)
6868

69+
#define MLX5_ESW_MAX_CTRL_EQS 4
70+
6971
static struct esw_vport_tbl_namespace mlx5_esw_vport_tbl_mirror_ns = {
7072
.max_fte = MLX5_ESW_VPORT_TBL_SIZE,
7173
.max_num_groups = MLX5_ESW_VPORT_TBL_NUM_GROUPS,
@@ -4557,3 +4559,98 @@ int mlx5_devlink_port_fn_ipsec_packet_set(struct devlink_port *port,
45574559
return err;
45584560
}
45594561
#endif /* CONFIG_XFRM_OFFLOAD */
4562+
4563+
int
4564+
mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 *max_io_eqs,
4565+
struct netlink_ext_ack *extack)
4566+
{
4567+
struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
4568+
int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
4569+
u16 vport_num = vport->vport;
4570+
struct mlx5_eswitch *esw;
4571+
void *query_ctx;
4572+
void *hca_caps;
4573+
u32 max_eqs;
4574+
int err;
4575+
4576+
esw = mlx5_devlink_eswitch_nocheck_get(port->devlink);
4577+
if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
4578+
NL_SET_ERR_MSG_MOD(extack,
4579+
"Device doesn't support VHCA management");
4580+
return -EOPNOTSUPP;
4581+
}
4582+
4583+
query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
4584+
if (!query_ctx)
4585+
return -ENOMEM;
4586+
4587+
mutex_lock(&esw->state_lock);
4588+
err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx,
4589+
MLX5_CAP_GENERAL);
4590+
if (err) {
4591+
NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
4592+
goto out;
4593+
}
4594+
4595+
hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
4596+
max_eqs = MLX5_GET(cmd_hca_cap, hca_caps, max_num_eqs);
4597+
if (max_eqs < MLX5_ESW_MAX_CTRL_EQS)
4598+
*max_io_eqs = 0;
4599+
else
4600+
*max_io_eqs = max_eqs - MLX5_ESW_MAX_CTRL_EQS;
4601+
out:
4602+
mutex_unlock(&esw->state_lock);
4603+
kfree(query_ctx);
4604+
return err;
4605+
}
4606+
4607+
int
4608+
mlx5_devlink_port_fn_max_io_eqs_set(struct devlink_port *port, u32 max_io_eqs,
4609+
struct netlink_ext_ack *extack)
4610+
{
4611+
struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
4612+
int query_out_sz = MLX5_ST_SZ_BYTES(query_hca_cap_out);
4613+
u16 vport_num = vport->vport;
4614+
struct mlx5_eswitch *esw;
4615+
void *query_ctx;
4616+
void *hca_caps;
4617+
u16 max_eqs;
4618+
int err;
4619+
4620+
esw = mlx5_devlink_eswitch_nocheck_get(port->devlink);
4621+
if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
4622+
NL_SET_ERR_MSG_MOD(extack,
4623+
"Device doesn't support VHCA management");
4624+
return -EOPNOTSUPP;
4625+
}
4626+
4627+
if (check_add_overflow(max_io_eqs, MLX5_ESW_MAX_CTRL_EQS, &max_eqs)) {
4628+
NL_SET_ERR_MSG_MOD(extack, "Supplied value out of range");
4629+
return -EINVAL;
4630+
}
4631+
4632+
query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
4633+
if (!query_ctx)
4634+
return -ENOMEM;
4635+
4636+
mutex_lock(&esw->state_lock);
4637+
err = mlx5_vport_get_other_func_cap(esw->dev, vport_num, query_ctx,
4638+
MLX5_CAP_GENERAL);
4639+
if (err) {
4640+
NL_SET_ERR_MSG_MOD(extack, "Failed getting HCA caps");
4641+
goto out;
4642+
}
4643+
4644+
hca_caps = MLX5_ADDR_OF(query_hca_cap_out, query_ctx, capability);
4645+
MLX5_SET(cmd_hca_cap, hca_caps, max_num_eqs, max_eqs);
4646+
4647+
err = mlx5_vport_set_other_func_cap(esw->dev, hca_caps, vport_num,
4648+
MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE);
4649+
if (err)
4650+
NL_SET_ERR_MSG_MOD(extack, "Failed setting HCA caps");
4651+
4652+
out:
4653+
mutex_unlock(&esw->state_lock);
4654+
kfree(query_ctx);
4655+
return err;
4656+
}

include/net/devlink.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1602,6 +1602,14 @@ void devlink_free(struct devlink *devlink);
16021602
* capability. Should be used by device drivers to
16031603
* enable/disable ipsec_packet capability of a
16041604
* function managed by the devlink port.
1605+
* @port_fn_max_io_eqs_get: Callback used to get port function's maximum number
1606+
* of event queues. Should be used by device drivers to
1607+
* report the maximum event queues of a function
1608+
* managed by the devlink port.
1609+
* @port_fn_max_io_eqs_set: Callback used to set port function's maximum number
1610+
* of event queues. Should be used by device drivers to
1611+
* configure maximum number of event queues
1612+
* of a function managed by the devlink port.
16051613
*
16061614
* Note: Driver should return -EOPNOTSUPP if it doesn't support
16071615
* port function (@port_fn_*) handling for a particular port.
@@ -1651,6 +1659,12 @@ struct devlink_port_ops {
16511659
int (*port_fn_ipsec_packet_set)(struct devlink_port *devlink_port,
16521660
bool enable,
16531661
struct netlink_ext_ack *extack);
1662+
int (*port_fn_max_io_eqs_get)(struct devlink_port *devlink_port,
1663+
u32 *max_eqs,
1664+
struct netlink_ext_ack *extack);
1665+
int (*port_fn_max_io_eqs_set)(struct devlink_port *devlink_port,
1666+
u32 max_eqs,
1667+
struct netlink_ext_ack *extack);
16541668
};
16551669

16561670
void devlink_port_init(struct devlink *devlink,

include/uapi/linux/devlink.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -686,6 +686,7 @@ enum devlink_port_function_attr {
686686
DEVLINK_PORT_FN_ATTR_OPSTATE, /* u8 */
687687
DEVLINK_PORT_FN_ATTR_CAPS, /* bitfield32 */
688688
DEVLINK_PORT_FN_ATTR_DEVLINK, /* nested */
689+
DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, /* u32 */
689690

690691
__DEVLINK_PORT_FUNCTION_ATTR_MAX,
691692
DEVLINK_PORT_FUNCTION_ATTR_MAX = __DEVLINK_PORT_FUNCTION_ATTR_MAX - 1

net/devlink/port.c

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_
1616
DEVLINK_PORT_FN_STATE_ACTIVE),
1717
[DEVLINK_PORT_FN_ATTR_CAPS] =
1818
NLA_POLICY_BITFIELD32(DEVLINK_PORT_FN_CAPS_VALID_MASK),
19+
[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] = { .type = NLA_U32 },
1920
};
2021

2122
#define ASSERT_DEVLINK_PORT_REGISTERED(devlink_port) \
@@ -182,6 +183,30 @@ static int devlink_port_fn_caps_fill(struct devlink_port *devlink_port,
182183
return 0;
183184
}
184185

186+
static int devlink_port_fn_max_io_eqs_fill(struct devlink_port *port,
187+
struct sk_buff *msg,
188+
struct netlink_ext_ack *extack,
189+
bool *msg_updated)
190+
{
191+
u32 max_io_eqs;
192+
int err;
193+
194+
if (!port->ops->port_fn_max_io_eqs_get)
195+
return 0;
196+
197+
err = port->ops->port_fn_max_io_eqs_get(port, &max_io_eqs, extack);
198+
if (err) {
199+
if (err == -EOPNOTSUPP)
200+
return 0;
201+
return err;
202+
}
203+
err = nla_put_u32(msg, DEVLINK_PORT_FN_ATTR_MAX_IO_EQS, max_io_eqs);
204+
if (err)
205+
return err;
206+
*msg_updated = true;
207+
return 0;
208+
}
209+
185210
int devlink_nl_port_handle_fill(struct sk_buff *msg, struct devlink_port *devlink_port)
186211
{
187212
if (devlink_nl_put_handle(msg, devlink_port->devlink))
@@ -409,6 +434,18 @@ static int devlink_port_fn_caps_set(struct devlink_port *devlink_port,
409434
return 0;
410435
}
411436

437+
static int
438+
devlink_port_fn_max_io_eqs_set(struct devlink_port *devlink_port,
439+
const struct nlattr *attr,
440+
struct netlink_ext_ack *extack)
441+
{
442+
u32 max_io_eqs;
443+
444+
max_io_eqs = nla_get_u32(attr);
445+
return devlink_port->ops->port_fn_max_io_eqs_set(devlink_port,
446+
max_io_eqs, extack);
447+
}
448+
412449
static int
413450
devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *port,
414451
struct netlink_ext_ack *extack)
@@ -428,6 +465,9 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
428465
if (err)
429466
goto out;
430467
err = devlink_port_fn_state_fill(port, msg, extack, &msg_updated);
468+
if (err)
469+
goto out;
470+
err = devlink_port_fn_max_io_eqs_fill(port, msg, extack, &msg_updated);
431471
if (err)
432472
goto out;
433473
err = devlink_rel_devlink_handle_put(msg, port->devlink,
@@ -726,6 +766,12 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port,
726766
}
727767
}
728768
}
769+
if (tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS] &&
770+
!ops->port_fn_max_io_eqs_set) {
771+
NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS],
772+
"Function does not support max_io_eqs setting");
773+
return -EOPNOTSUPP;
774+
}
729775
return 0;
730776
}
731777

@@ -761,6 +807,13 @@ static int devlink_port_function_set(struct devlink_port *port,
761807
return err;
762808
}
763809

810+
attr = tb[DEVLINK_PORT_FN_ATTR_MAX_IO_EQS];
811+
if (attr) {
812+
err = devlink_port_fn_max_io_eqs_set(port, attr, extack);
813+
if (err)
814+
return err;
815+
}
816+
764817
/* Keep this as the last function attribute set, so that when
765818
* multiple port function attributes are set along with state,
766819
* Those can be applied first before activating the state.

0 commit comments

Comments
 (0)