Skip to content

[uss_qualifier] Standardize resource file loading #860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

BenjaminPelletier
Copy link
Member

This PR standardizes how data files are loaded in uss_qualifier. Previously, there were a number of different places that loaded YAML files or JSON files or both, sometimes supporting web links, sometimes not. This PR uses a single library (fileio.py) to load data files wherever practical within uss_qualifier and it supports:

  • JSON or YAML
  • Local files, web files, or package-based files
  • $ref and allOf similar to OpenAPI so that parts of other files can be referenced in the file of interest
  • All combinations of the above

This PR also makes all package paths relative to uss_qualifier -- previously, test suites were relative to uss_qualifier.suites, configurations relative to uss_qualifier.configurations, resources relative to uss_qualifier.resources, etc. This reduces the need for context to understand what configuration files, test suite definitions, etc are declaring, and makes reuse of standard loading procedures easier.

Copy link
Contributor

@barroco barroco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, see my comments before merging.


if file_name is None:
raise NotImplementedError(
f"Cannot load find a suitable file to load for {data_file}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Cannot --load--

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.



def _drop_allof(content: dict) -> None:
if "allOf" in content:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to drop allOf keys? Seems like the intention is to flatten it in case of multiple $ref children, however, it seems that the fileio library may be used for files which will include allOf keys with other objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of how OpenAPI references work is:

Say we have some files:

a.json

{"foo": 123}

b.json

{"bar": 456}

c.json

{"baz": 789, "$ref": "a.json"}

The $ref in OpenAPI "works by replacing itself and everything on its level with the definition it is pointing at"[1], but 1) that's actually a really bad idea in certain cases, so this implementation instead works by removing itself and overwriting all keys on its level with the keys in the definition it is pointing at. So, strict OpenAPI would probably render c.json as {"foo": 123} (assuming $ref was allowed at that location), but this more forgiving approach would render c.json as {"baz": 789, "foo": 123}. So, here $ref is allowing us to combine both local content and referenced content. But, what if we wanted to add the content of both a.json and b.json to c.json? Consider:

d1.json

{"baz": 789, "$ref": "a.json", "$ref": "b.json"}

It turns out RFC 8259 doesn't actually prohibit this, but ECMA-262 says the first-level parser will see {"baz": 789, "$ref": "b.json"} because last value wins there. So, this isn't a good way to use the content of both a.json and b.json in c.json. This is, in my understanding, the main purpose of allOf. With allOf, we can rewrite d1.json:

d2.json

{"baz": 789, "allOf": [{"$ref": "a.json"}, {"$ref": "b.json"}]}

When resolving d2.json, first we can evaluate the $refs:

d2.json, $refs resolved

{"baz": 789, "allOf": [{"foo": 123}, {"bar": 456}]}

The purpose of _drop_allof is to apply the multiple schemas in allOf to the parent schema (to "drop" them a level down to the parent). Dropping the two allOf schemas in the $ref-resolved d2.json above results in:

d2.json, allOfs dropped

{"baz": 789, "foo": 123, "bar": 456}

The reason allOf does this relates to the fact that OpenAPI is based on JSON Schema which is designed merely to validate incoming data, not to express a data format. This lineage is most apparent in the "description at the same level as $ref" issue in OpenAPI -- from JSON Schema's perspective, that description is irrelevant because incoming data content is validated against the criteria in the $ref content and a usage of a data type is equivalent to the data type itself (saying "this field is an Address" is the same thing as saying "this field is Address"). allOf means "the incoming data object must validate against all of the specified conditions", so JSON Schema is expecting something like "allOf": [{"$ref": "#/HasValidPhoneNumber"}, {"$ref": "#/HasValidAddress"}]. The contents of HasValidPhoneNumber are probably something like "the field phone_number is a string with between 7 and 12 characters", so we often name that set of validation criteria something like PhoneNumber rather than HasValidPhoneNumber. So, the expectation would actually be written more like "allOf": [{"$ref": "#/PhoneNumber"}, {"$ref": "#/Address"}]. With that in mind, here's an example OpenAPI spec:

openapi.json

{
  "openapi": "3.1.0",
  "info": {
    "title": "Validator inheritance demo",
    "version": "1.0.0"
  },
  "components": {
    "schemas": {
      "Address": {
        "type": "object",
        "required": ["address"],
        "properties": {
          "address": {"type": "string"}
        }
      },
      "PhoneNumber": {
        "type": "object",
        "required": ["phone_number"],
        "properties": {
          "phone_number": {"type": "string"}
        }
      },
      "CustomerByRef": {
        "allOf": [
          {"$ref": "#/components/schemas/Address"},
          {"$ref": "#/components/schemas/PhoneNumber"}
        ]
      },
      "CustomerByValue": {
        "type": "object",
        "required": ["address", "phone_number"],
        "properties": {
          "address": {"type": "string"},
          "phone_number": {"type": "string"}
        }
      }
    }
  },
  "paths": {
    "/by_ref": { "get": { "responses": { "200": {
            "content": {
              "application/json": {
                "schema": {"$ref": "#/components/schemas/CustomerByRef"}
              }
            },
            "description": "N/A"
    } } } },
    "/by_value": { "get": { "responses": { "200": {
            "content": {
              "application/json": {
                "schema": {"$ref": "#/components/schemas/CustomerByValue"}
              }
            },
            "description": "N/A"
    } } } }
  }
}

In openapi.json, the return format for both paths is the same -- the schema components referenced by allOf get combined into a single schema that is the same as adding the content of each of the $refs. The implementation in this PR will not actually load OpenAPI specs correctly -- for instance, the required properties need to be the sum of both $ref'd schemas' required properties. But, it should produce the same spiritual result: allOf combines the content of its elements into its parent. It wouldn't make sense to correctly load OpenAPI in load_dict because load_dict is for loading concrete content whereas OpenAPI describes content formats.

All I really want here is the ability to inherit content from other files/locations. Since neither JSON nor YAML has a standard means to do this, I tried to use an approach as similar to something repo users were most likely to already be familiar with. So, this is a custom adaptation of the parts of the OpenAPI spec relating to content inheritance, but as applied to concrete data objects, so some concepts like allOf have to be adapted to fit the use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. I was actually referring to the fact that all objects with allOf keys (even those without ref) will be adapted. Since the library is called fileio I would not expect as a developer of this method that the content of the file to be adapted. May I suggest to either rename the method with something like: load_normalized_dict/load_flatten_dict or, as you proposed, limit drops to allOf objects containing refs ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good -- I've made changes to:

  1. Only perform reference-import operations on "$ref": <str> entries (e.g., "$ref": 123 would be ignored)
  2. Only do anything with allOf entries when the value is a list of objects, each with a properly-formed $ref in it
  3. Rename load_dict to load_dict_with_references (indicating the special functionality above)

@BenjaminPelletier BenjaminPelletier merged commit 21a225f into interuss:master Oct 26, 2022
@BenjaminPelletier BenjaminPelletier deleted the uss_qualifier/data-files branch October 26, 2022 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants