Skip to content

Allow list of dictionaries for @pytest.mark.parametrize #7568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dbalabka opened this issue Jul 29, 2020 · 18 comments
Closed

Allow list of dictionaries for @pytest.mark.parametrize #7568

dbalabka opened this issue Jul 29, 2020 · 18 comments
Labels
topic: parametrize related to @pytest.mark.parametrize type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature

Comments

@dbalabka
Copy link

Right now it is only allowed list of tuples to pass into @pytest.mark.parametrize decorator:

import pytest

from datetime import datetime, timedelta

testdata = [
    (datetime(2001, 12, 12), datetime(2001, 12, 11), timedelta(1)),
    (datetime(2001, 12, 11), datetime(2001, 12, 12), timedelta(-1)),
]


@pytest.mark.parametrize("a,b,expected", testdata)
def test_timedistance_v0(a, b, expected):
    diff = a - b
    assert diff == expected

The large test data is complicated to manage testdata and it became less readable. To overcome this issue I proposing pass testdata as a list of dictionaries and keep arguments names only in testdata:

import pytest

from datetime import datetime, timedelta

testdata = [
    {
      'a': datetime(2001, 12, 12),
      'b': datetime(2001, 12, 11), 
      'expected': timedelta(1),
    },
    {
      'a': datetime(2001, 12, 11),
      'b': datetime(2001, 12, 12), 
      'expected': timedelta(-1),
    },
]


@pytest.mark.parametrize(testdata)
def test_timedistance_v0(a, b, expected):
    diff = a - b
    assert diff == expected
@The-Compiler
Copy link
Member

The-Compiler commented Jul 29, 2020

I'd love to have something like this, but the problem is that there isn't really an agreement what dictionaries mean exactly when passing them to parametrize - for a different take (which ended up being rejected) and some previous discussion, see #5487 and #5850.

I was wondering whether it'd make more sense for pytest.param to accept keyword arguments, i.e.:

((datetime(2001, 12, 12), datetime(2001, 12, 11), timedelta(1)),  # already works
pytest.param(datetime(2001, 12, 12), datetime(2001, 12, 11), timedelta(1)),  # already works
pytest.param(a=datetime(2001, 12, 12), b=datetime(2001, 12, 11), expected=timedelta(1)),  # new and seems logical

But the problem with that is that pytest.param already takes id and marks as keyword arguments, so that's kind of problematic as well. Possible solutions:

  • Use dicts like in your example (but that seems too ambiguous)
  • Make pytest.param handle everything mentioned in the pytest.mark.parametrize argument list as arguments (so, if id or marks is listed there, that'd be effectively ignored when passed as keyword argument) - but that's a bit magic and also backwards incompatible
  • Just be okay with the fact that we couldn't pass id and marks as keyword arguments to a test - but that means we could not add any new keyword arguments to param in the future
  • Introduce a new pytest.param_args(...) or whatever, but that seems confusing together with pytest.param

@Zac-HD Zac-HD added topic: parametrize related to @pytest.mark.parametrize type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature labels Jul 29, 2020
@dbalabka
Copy link
Author

@The-Compiler thanks for a quick reply. As I understood, previous discussion (#5487 and #5850) are mostly about test cases' descriptions/ids, which are also required to improve the developer experience (DX).

I will add more explanation to what I have written above. Probably, the most frustrating issue at the moment is that the developer should support a list of arguments that duplicates already declared function/method parameters method next to it:

@pytest.mark.parametrize("a,b,expected", testdata)
def test_timedistance_v0(a, b, expected):

It would be a great improvement to avoid writing this by extracting meta-information from the function/method signature. IMO it is worth to pay a small amount of performance and include some "magic" for DX enhancement.

@pytest.mark.parametrize(testdata)
def test_timedistance_v0(a, b, expected):

The proposed approach (allow maintain parameters names in test data as a dictionary) gives a possibility to have readable test data. It would be great to enrich it with test case id as well. So, in the end, we can come up with the following structure:

testdata = {
    'test case 1': {
      'a': datetime(2001, 12, 12),
      'b': datetime(2001, 12, 11), 
      'expected': timedelta(1),
    },
    'test case 2': {
      'a': datetime(2001, 12, 11),
      'b': datetime(2001, 12, 12), 
      'expected': timedelta(-1),
    },
}

Talking about id, pytest.param helps to specify test case id, but I found it too verbose.

Probably, it is possible to utilize @pytest.mark.parametrize and implement dictionary support if only the first parameter is passed:

@pytest.mark.parametrize(testdata)

@The-Compiler
Copy link
Member

Sometimes, explicit is better than implicit 😉 Arguments can also refer to fixtures, so having an explicit list of arguments to parametrize helps to get e.g. sensible error messages when there is a typo.

I also don't think it makes sense to have yet another "magic" data structure API for how to use parametrize, there are already various things in there (specifying a list of items for a single argument vs. a list of tuples for multiple, etc. etc.). This gets confusing fast.

Case in point: In your example above, you'd now always have to explicitly specify test IDs, which can get cumbersome fast. Perhaps you want something like pytest-cases though?

So, I'd still like to explore a more explicit solution involving pytest.param, but I'm -1 on any "dicts have some special meaning and then there's a lot of magic around it" solution because different people expect wildly different behaviors in that case, and I doubt it'd be good for pytest to guess what the user meant.

@graingert
Copy link
Member

graingert commented Jul 29, 2020

I actually do this:

import datetime as dt

import attr
import pytest



@attr.dataclass(fronzen=True)
class TimeDistanceParam:
    a: dt.datetime
    b: dt.datetime
    expected: dt.timedelta

    def pytest_id(self):
        return repr(self)  # usually something custom


@pytest.mark.parametrize(
    "p",
    [
        TimeDistanceParam(
            a=dt.datetime(2001, 12, 12),
            b=dt.datetime(2001, 12, 11),
            expected=dt.timedelta(1),
        ),
        TimeDistanceParam(
            a=dt.datetime(2001, 12, 11),
            b=dt.datetime(2001, 12, 12),
            expected=dt.timedelta(-1),
        ),
    ],
    ids=TimeDistanceParam.pytest_id,
)
def test_timedistance_v0(p):
    assert p.a - p.b == p.expected

@graingert
Copy link
Member

something that would help my usacase is a PytestParam abc so I can do:

class PytestParam(metaclass=abc.ABCMeta):
    @abc.abstractmethod
    def pytest_marks(self): ...

    @abc.abstractmethod
    def pytest_id(self): ...
@pytest.PytestParam.register
@attr.dataclass(frozen=True)
class TimeDistanceParam:
    a: dt.datetime
    b: dt.datetime

    def pytest_marks():
        ...

    def pytest_id():
        ...

and pytest.mark.parametrize can special case registered instances of PytestParam

@asottile
Copy link
Member

fwiw, I'm still -1 on this as per the previous (duplicate) discussions. parametrize is already type-complicated enough without introducing yet-another-way to do this by accepting mapping types

@graingert
Copy link
Member

graingert commented Jul 29, 2020

I'm -1 on this as well, as it seems fairly easy to do in user code:

import datetime as dt

import pytest


def params(d):
    return pytest.mark.parametrize(
        argnames=(argnames := sorted({k for v in d.values() for k in v.keys()})),
        argvalues=[[v.get(k) for k in argnames] for v in d.values()],
        ids=d.keys(),
    )


@params(
    {
        "test case 1": {
            "a": dt.datetime(2001, 12, 12),
            "b": dt.datetime(2001, 12, 11),
            "expected": dt.timedelta(1),
        },
        "test case 2": {
            "a": dt.datetime(2001, 12, 11),
            "b": dt.datetime(2001, 12, 12),
            "expected": dt.timedelta(-1),
        },
    }
)
def test_timedistance_v0(a, b, expected):
    assert a - b == expected

@dbalabka
Copy link
Author

dbalabka commented Jul 30, 2020

@The-Compiler

Sometimes, explicit is better than implicit 😉

In this case, it is too explicit because as I already said: "...developer should support a list of arguments that duplicates already declared function/method parameters method next to it..."

Arguments can also refer to fixtures, so having an explicit list of arguments to parametrize helps to get e.g. sensible error messages when there is a typo.

Unfortunately, I multiple times forgot to add parameters into this list and got an error that does not have relation to broken code or incorrect test logic. So I can conclude that the duplicated list of parameters does not help in most cases.
Could you please provide some typical developers mistakes that can be caught by an additional list of parameters?

I also don't think it makes sense to have yet another "magic" data structure API for how to use parametrize, there are already various things in there (specifying a list of items for a single argument vs. a list of tuples for multiple, etc. etc.). This gets confusing fast.

The proposed data structure is not an alternative to already existing. It is a logical continuation list of tuples for cases when a developer wants to explicitly provide parameters names and/or ids for test cases. So we are not introducing invariant of usage.

Case in point: In your example above, you'd now always have to explicitly specify test IDs, which can get cumbersome fast. Perhaps you want something like pytest-cases though?

Probably, I didn't mention but I would like to keep BC with the old structure. So the developers would not be required to provide IDs if they don't want to.


Findings

After some investigation of why parameters list is required, I found the place where the real magic lives. It is possible to stack decorators to get all possible combinations:

@pytest.mark.parametrize("x", [0, 1])
@pytest.mark.parametrize("y", [2, 3])
def test_foo(x, y):
    pass

In this case list of parameters is required to avoid invariant of list usage. It is impossible to say is it list parameters or list of test cases w/o explicit list of parameters.

IMO it is unfounded API complication that can be replaced with more readable variant using itertools:

@pytest.mark.parametrize("x,y", itertools.product([0,1], [2,3]))
def test_foo(x, y):
    pass

or single parameter parameterization should live in separate decorator.

Conclusion

Parametrize API already overcomplicated. IMO we should deprecate stacked decorators and use itertools IMO stacked decorators should be implemented by separate decorator to simplify @pytest.mark.parametrize functionality. This will give a possibility to rid off explicitly specified parameters list and avoid invariants in the arguments list. Otherwise proposed structure improvements will increase the complexity of implementation and will not invest much into better developers experience.

@RonnyPfannschmidt
Copy link
Member

Stacked decorator are not the only source of parameterization, any fixture as well as custom plugins can participate

It's a well used feature so your proposal doesn't exactly help/work for the project.

@dbalabka
Copy link
Author

dbalabka commented Jul 30, 2020

I appreciate everyone’s input into this ticket’s discussion. I'm closing this ticket because it is discussed.

@ckp95
Copy link

ckp95 commented Sep 24, 2020

I was directed to here from #7790 since I have a similar problem with readability and maintainability when using pytest.mark.parametrize.

(I hope it's okay to post here even though the thread is closed).

I use a helper function that wraps the pytest.mark.parametrize decorator:

from types import SimpleNamespace

class Case:
    def __init__(self, **kwargs):
        self.label = None
        self.kwargs = kwargs


class LabelledCase:
    def __init__(self, label):
        self.label = label

    def __call__(self, **kwargs):
        case = Case(**kwargs)
        case.label = self.label
        return case


def nicer_parametrize(*args):
    for case in args:
        if not isinstance(case, Case):
            raise TypeError(f"{case!r} is not an instance of Case")

    first_case = next(iter(args))
    first_attrs = first_case.kwargs.keys()
    argument_string = ",".join(sorted(list(first_attrs)))

    case_list = []
    ids_list = []
    for case in args:
        case_dict = case.kwargs
        attrs = case_dict.keys()

        if attrs != first_attrs:
            raise ValueError(
                f"Inconsistent argument signature: {first_case!r}, {case!r}"
            )

        case_tuple = tuple(value for key, value in sorted(list(case_dict.items())))
        case_list.append(case_tuple)
        ids_list.append(case.label)

    return pytest.mark.parametrize(
        argnames=argument_string, argvalues=case_list, ids=ids_list
    )

It's used like this:

@nicer_parametrize(
    LabelledCase("Strategy A")(
        flavour_prices={
            "Vanilla": 1.50,
            "Strawberry": 1.80,
            "Chocolate": 1.80,
            "Caramel": 1.65,
        },
        expected_revenue=1_200_000,
    ),
    LabelledCase("Strategy B")(
        flavour_prices={
            "Vanilla": 1.25,
            "Strawberry": 1.55,
            "Chocolate": 1.65,
            "Caramel": 2.10,
        },
        expected_revenue=1_350_000,
    ),
    # if no label is wanted/needed, just use Case
    Case(
        expected_revenue=2_400_000, # order is irrelevant
        flavour_prices={
            "Vanilla": 1.40,
            "Chocolate": 1.50,
            "Strawberry": 1.85,
            "Caramel": 1.35
        }
    )
)
@nicer_parametrize(
    # we can stack to get the Cartesian product just like mark.parametrize
    LabelledCase("USA")(country="United States"),
    LabelledCase("France")(country="France"),
    LabelledCase("Japan")(country="Japan")
)
def test_ice_cream_projections(flavour_prices, country, expected_revenue):
    ...

Gives:

tests/test_wrapper.py::test_ice_cream_projections[USA-Strategy A] PASSED                    [ 11%] 
tests/test_wrapper.py::test_ice_cream_projections[USA-Strategy B] PASSED                    [ 22%] 
tests/test_wrapper.py::test_ice_cream_projections[USA-2400000-flavour_prices2] PASSED       [ 33%] 
tests/test_wrapper.py::test_ice_cream_projections[France-Strategy A] PASSED                 [ 44%] 
tests/test_wrapper.py::test_ice_cream_projections[France-Strategy B] PASSED                 [ 55%] 
tests/test_wrapper.py::test_ice_cream_projections[France-2400000-flavour_prices2] PASSED    [ 66%] 
tests/test_wrapper.py::test_ice_cream_projections[Japan-Strategy A] PASSED                  [ 77%] 
tests/test_wrapper.py::test_ice_cream_projections[Japan-Strategy B] PASSED                  [ 88%] 
tests/test_wrapper.py::test_ice_cream_projections[Japan-2400000-flavour_prices2] PASSED     [100%]

The advantages of this over plain pytest.mark.parametrize are:

  • One less layer of indentation, fewer lines
  • More readable
  • More explicit
  • Related values (argument names, values, test id) are grouped together, rather than being smeared out over two lists and a string

The advantages over other suggestions in this thread are:

  • Does not force you to specify a test ID, and if you do specify one, it doesn't interfere with argument names (you can have argnames called id or mark or label or whatever)
  • Does not require the user to make a bespoke class for each kind of argument set
  • Does not alter the existing behaviour of pytest.mark.parametrize
  • Can be stacked with other parametrize decorators
  • Keyword arguments feel a bit safer to use than string keys, and are visually consistent with the arguments in the test function definition

Disadvantages:

  • It's a bit weird to specify the test ID in that currying-esque way
  • Probably needs better naming
  • ?

I think something like this would be a nice addition to pytest.

EDIT

Actually I overlooked a simpler way: we can use the positional-only argument feature of Python 3.8 to remove the need for the LabelledCase class:

class Case:
    def __init__(self, label=None, /, **kwargs):
        self.label = label
        self.kwargs = kwargs

That means we can just pass in an optional string at the beginning of the Case initialization and avoid the awkward currying:

Case(
    "Strategy B",
    flavour_prices={
        "Vanilla": 1.25,
        "Strawberry": 1.55,
        "Chocolate": 1.65,
        "Caramel": 2.10,
    },
    expected_revenue=1_350_000,
)

@asottile
Copy link
Member

Make sure you check out pytest.param which cuts out a lot of the boilerplate you've got https://docs.pytest.org/en/stable/example/parametrize.html#different-options-for-test-ids

@Samreay
Copy link

Samreay commented Aug 17, 2022

For anyone still looking for a simple solution, here is mine. In the top level conftest.py (or wherever you want), put this function:

def dict_parametrize(data, **kwargs):
    args = list(list(data.values())[0].keys())
    formatted_data = [[item[a] for a in args] for item in data.values()]
    ids = list(data.keys())
    return pytest.mark.parametrize(args, formatted_data, ids=ids, **kwargs)

And then in your tests you can use it like so:

from conftest import dict_parametrize


@dict_parametrize(
    {
        "these keys become the ids that identify each run": {
            "x": 1,
            "y": 2,
            "expected_value": 3,
        },
        "some_edge_case": {
            "x": 0,
            "y": 0,
            "expected_value": 0,
        },
    }
)
def test_some_func(x, y, expected_value):
    # Just ensure you declare the first dictionary args
    # in the same order as they're passed in.
    assert x + y == expected_value

The caveat for this very simple implementation is that the dictionary in the first scenarios needs to match the order of args. But the readability increase for larger test suites, especially linking the specific parameterization to a given id, is huge.

image

@ckp95
Copy link

ckp95 commented Aug 17, 2022

I also made a small package that does this, https://github.com/ckp95/pytest-parametrize-cases

@gsemet
Copy link

gsemet commented Aug 2, 2023

Pytest-paraterisarion does the job as well. https://pypi.org/project/pytest-parametrization/

Would be good to have this syntax by default, this helps maintaining long list of test cases (having the name of the test case far from the parameter is really a bad design choice from pytest.parametrize

@The-Compiler
Copy link
Member

pytest.param(value, id="name") already works and is about as close to the value as it's going to get.

@blakeNaccarato
Copy link

One way to inject readable parameter names close to their values is to define a NamedTuple and then unpack it into the pytest.param() call. This way you get "keyword arguments" but pytest.param() just sees positional arguments.

import typing

import numpy
import pytest

class Param(typing.NamedTuple):
    y: numpy.typing.ArrayLike
    expected: dict[str, float]


@pytest.mark.slow()
@pytest.mark.parametrize(
    ("y", "expected"),
    [
        pytest.param(
            *Param(y=np.array([...), expected={ T_s": 95.3, ... }),
            id="low",
        )
    ],
)
def test_model_fit(..., y, expected):
    """Test that the model fit is as expected."""
    ...

@samuell
Copy link

samuell commented Jan 10, 2025

I actually do this:

import datetime as dt

import attr
import pytest



@attr.dataclass(fronzen=True)
class TimeDistanceParam:
    a: dt.datetime
    b: dt.datetime
    expected: dt.timedelta

    def pytest_id(self):
        return repr(self)  # usually something custom


@pytest.mark.parametrize(
    "p",
    [
        TimeDistanceParam(
            a=dt.datetime(2001, 12, 12),
            b=dt.datetime(2001, 12, 11),
            expected=dt.timedelta(1),
        ),
        TimeDistanceParam(
            a=dt.datetime(2001, 12, 11),
            b=dt.datetime(2001, 12, 12),
            expected=dt.timedelta(-1),
        ),
    ],
    ids=TimeDistanceParam.pytest_id,
)
def test_timedistance_v0(p):
    assert p.a - p.b == p.expected

Just throwing in here that reading this discussion I realized one can also just do:

@pytest.mark.parametrize(
    "p",
    [
        {
            "a": dt.datetime(2001, 12, 12),
            "b": dt.datetime(2001, 12, 11),
            "expected": dt.timedelta(1),
        },
        {
            "a": dt.datetime(2001, 12, 11),
            "b": dt.datetime(2001, 12, 12),
            "expected": dt.timedelta(-1),
        },
    ],
)
def test_timedistance_v0(p):
    assert p["a"] - p["b"] == p["expected"]

... or some variation thereof.

I actually tend to like to do the following, that works well also for multiple return values:

@pytest.mark.parametrize(
    "p",
    [
        {
            "have": {
                "a": dt.datetime(2001, 12, 12), 
                "b": dt.datetime(2001, 12, 11), 
            },
            "want": {
                "result": dt.timedelta(1),      
            },
        },
        {
            "have": {
                "a": dt.datetime(2001, 12, 11), 
                "b": dt.datetime(2001, 12, 12), 
            },
            "want": {
                "result": dt.timedelta(-1),     
            },
        },
    ],
)
def test_timedistance_v0(p):
    assert p["have"]["a"] - p["have"]["b"] == p["want"]["result"]      

Hadn't thought about that before, so thank you! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: parametrize related to @pytest.mark.parametrize type: proposal proposal for a new feature, often to gather opinions or design the API around the new feature
Projects
None yet
Development

No branches or pull requests