Implement database abstraction #42

frankie567 · 2022-09-13T09:41:02Z

Description

The goal of those changes is to implement an abstraction layer for querying the database. The objective is to always get proper Pydantic models to return to the API.

Basically, we have a BaseRepository class which will contain the common/generic logic for querying the database. Notice how this class expects to have the Pydantic model class and the name of the index as class variables.

For each model, we'll have a dedicated repository extending BaseRepository. To prove the concept, I currently just implemented DatasourceRepository.

With this pattern, it's easy to add specific query or operations we want to reuse, like get_by_name in this example. This way, we avoid to have too much OpenAPI query leaking in every parts of the codebase: everything stay in the repository class.

To instantiate those repositories, we define callable dependencies for FastAPI, like get_datasource_repository. It's a good pattern that may help us in the long run, especially if we want to write unit tests. For now, the underlying OpenSearch client is hard-wired but it could also be made as a dependency for convenience.

Finally, we can use it in our API endpoints. By injecting the repository in the datasource endpoints, we are able to directly query the DB and get proper Pydantic objects.

For convenience, I've also implemented a generic shortcut get_by_key_or_404, which can get an object by key or automatically raise a 404 if not found.

Please tell me now what you think about this before I implement this pattern for the other models 😄

Type of change

Refactoring

Checklist:

I have performed a self-review of my own code
All GitHub workflows have passed
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

frankie567 · 2022-09-16T06:45:14Z

So, it turns out it became a quite big refactoring! Here is a summary of what I did:

Implementation of a Repository pattern, with base methods to query, create, update and delete data in the DB
For Datasource, Dataset and Expectation models, I implemented their own repository, adding dedicated methods when needed (e.g. so that all queries stay in one place).
In Datasource, Dataset and Expectation endpoints, I removed wherever possible all direct use of the OpenSearch client in favor of the repository helper

I also took this opportunity to improve the structuration of the Pydantic models:

Implementation of a KeyModel mixin: models inheriting from this one will get a key property. If not provided, an UUID4 is automatically generated.
Implementation of a CreateUpdateDateModel mixin: models inheriting from this one will get create_date and modified_date properties. Both will be automatically assigned to the current time if not provided.
- The repository takes care of updating modified_date automatically during update.
Implementation of create and update models variations for Datasource, Dataset and Expectation. It allows us to better control the field the user can or can't set, in particular the automatic ones like key, create_date and modified_date.

Admittedly, this is a quite big PR. I've tested the changes as much as I could and noticed no breaking changes. Waiting for your feedback on this :)

KentonParton

The changes looks great! Significantly cleaner and will make for a much easier implementation of a client when we get to it.

Left some comments and suggested updates.

backend/app/api/api_v1/endpoints/dataset.py

backend/app/repositories/expectation.py

backend/app/repositories/base.py

KentonParton · 2022-09-18T09:57:52Z

backend/app/api/api_v1/endpoints/datasource.py



-@router.post("", response_model=Datasource)
+@router.post("")


Suggested change

@router.post("")

@router.post("", response_model=Datasource)

KentonParton · 2022-09-18T10:02:08Z

backend/app/api/api_v1/endpoints/expectation.py

@@ -41,52 +40,22 @@ def list_supported_expectations():
    return JSONResponse(status_code=status.HTTP_200_OK, content=content)


-@router.put("/{expectation_id}/enable", response_model=Expectation)
+@router.put("/{expectation_id}/enable")


Are you wanting to add response models for Expectation in another update?

The problem is the same as stated above for Datasource: since we actually have objects inheriting from Expectation, we'll lose all their specific fields if we set the response_model, the output will be "down-casted" to a base Expectation.

backend/app/api/api_v1/endpoints/expectation.py

backend/app/api/shortcuts.py

backend/app/api/api_v1/endpoints/scheduler.py

backend/app/api/api_v1/endpoints/expectation.py

frankie567 · 2022-09-21T08:07:37Z

@KentonParton In 18f8f89, I implemented the discriminator approach we talked about. After all, things went quite well.

There is one small thing however which is surprising but actually works very well. The model we use for annotation is actually the Union of all classes, not the discriminator model. It has several benefits:

OpenAPI schema works
Type annotation works well, with proper type hinting and auto-completion from the IDE

Hence, I named the discriminator model ExpectationInput and the union Expectation. When working with an ExpectationInput, the code takes care of returning its __root__.

We are also able to get rid of type_map. We are able to list the available classes directly from the Union type.

swiple/backend/app/core/expectations.py

Line 13 in 18f8f89

for expectation in get_args(Expectation):

Another small refinement I made is to modify the schema so we have the actual value of expectation_type at hand, instead of manually fetching the first value of the enum. It helps us in the backend and the UI

swiple/backend/app/models/expectation.py

Lines 41 to 45 in 18f8f89

    
           def schema_extra(schema: dict[str, Any], model: type['ExpectationBase']) -> None: 
        
               expectation_type_schema = schema.get('properties', {}).get("expectation_type") 
        
               if expectation_type_schema is not None: 
        
                   expectation_type_schema["value"] = expectation_type_schema["enum"][0] 
        
                   schema["properties"]["expectation_type"] = expectation_type_schema

swiple/backend/app/core/expectations.py

Line 15 in 18f8f89

expectation_type = json_schema['properties']['expectation_type']['value']

swiple/frontend/src/screens/dataset/components/ExpectationModal.jsx

Lines 169 to 184 in 18f8f89

    
           const transformExpectationsPayload = (payload) => { 
        
             const cleanedPayload = clean(payload); 
        
             const expectation = expectationsJsonSchema.filter((item) => ( 
        
               item.properties.expectation_type.value === payload.expectation_type))[0].properties; 
        
             delete cleanedPayload.expectation_type; 
        
             return { 
        
               datasource_id: dataset.datasource_id, 
        
               dataset_id: dataset.key, 
        
               expectation_type: expectation.expectation_type.value, 
        
               kwargs: { 
        
                 ...cleanedPayload, 
        
               }, 
        
             }; 
        
           };

Let me know what. you think about it and we can go forward with the same approach for Datasource.

KentonParton · 2022-09-21T09:53:04Z

This is looking great @frankie567 💪

I like that:

we are able to get rid of the type_map
we have input and response models
code is much cleaner

Let's do the same for datasource 👍

P.S. I am getting a model validation error for GET /datasets

swiple_api             |   File "/code/./app/api/api_v1/endpoints/dataset.py", line 55, in list_datasets
swiple_api             |     return repository.query(query, size=1000)
swiple_api             |   File "/code/./app/repositories/base.py", line 27, in query
swiple_api             |     return [
swiple_api             |   File "/code/./app/repositories/base.py", line 28, in <listcomp>
swiple_api             |     self._get_object_from_dict(result["_source"]) for result in results
swiple_api             |   File "/code/./app/repositories/base.py", line 90, in _get_object_from_dict
swiple_api             |     return self.model_class.parse_obj(d)
swiple_api             |   File "/usr/local/lib/python3.9/site-packages/pydantic/main.py", line 521, in parse_obj
swiple_api             |     return cls(**obj)
swiple_api             |   File "/usr/local/lib/python3.9/site-packages/pydantic/main.py", line 341, in __init__
swiple_api             |     raise validation_error
swiple_api             | pydantic.error_wrappers.ValidationError: 1 validation error for Dataset
swiple_api             | key
swiple_api             |   none is not an allowed value (type=type_error.none.not_allowed)

…ibility

frankie567 · 2022-09-22T07:32:20Z

P.S. I am getting a model validation error for GET /datasets

Fixed!

So, here we are: Datasource is also ported to the discriminator approach. Looks much much cleaner! OpenAPI is working well with proper annotations.

KentonParton · 2022-09-22T09:41:23Z

Nice work, LGTM!

…traction

frankie567 added 8 commits September 13, 2022 11:30

Implement database abstraction and use it in Datasource API

e2e80c1

Implement repository pattern for Dataset

7c59d0a

Factorize Expectation models with inheritance and generic kwargs

f9d4796

Implement repository pattern for Expectation

0377fe4

Move remaining expectations queries to repository

b2849d9

Implement create/update models for Dataset

81b0140

Implement create/update models for Datasource

655e891

Move remaining queries to repositories for Dataset and Datasource

085038e

frankie567 changed the title ~~[WIP] Implement database abstraction~~ Implement database abstraction Sep 16, 2022

Handle modified_date automatically

27ad542

Remove print statement

b71e01d

KentonParton reviewed Sep 18, 2022

View reviewed changes

backend/app/api/api_v1/endpoints/expectation.py Show resolved Hide resolved

KentonParton assigned frankie567 Sep 18, 2022

KentonParton added the improvement Improvement to application label Sep 18, 2022

frankie567 added 8 commits September 19, 2022 09:02

Fix documentation serialization for Expectation model

b75cff1

Fix create_schedule endpoint

285bc73

Factorize validate_dataset into its own function

80acf6f

Add refresh parameter to Repository delete method

19ce2c0

Fix encoding of sample rows with jsonable_encode

6c35560

Use directly PUT /sample response data when refreshing sample

d2711ec

Fix ExpectationRepository.query_by_filter return type

04f1786

Rework Expectation models to use discriminator approach

18f8f89

frankie567 added 2 commits September 21, 2022 17:58

Put back manual id handling in repository classes for backward compat…

3268087

…ibility

Apply discriminator approach to Datasource model

b872202

KentonParton added 4 commits September 22, 2022 10:41

Merge branch 'main' into db-abstraction

ca57fd2

Update demo_video_setup.py to use models

ada96ac

Merge branch 'db-abstraction' of github.com:Swiple/swiple into db-abs…

00c62bc

…traction

Remove unused method getQuerySample()

f595ec2

KentonParton approved these changes Sep 22, 2022

View reviewed changes

KentonParton merged commit e8c7c51 into main Sep 22, 2022

KentonParton deleted the db-abstraction branch September 22, 2022 10:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement database abstraction #42

Implement database abstraction #42

Uh oh!

frankie567 commented Sep 13, 2022 •

edited

Loading

Uh oh!

frankie567 commented Sep 16, 2022

Uh oh!

KentonParton left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KentonParton Sep 18, 2022

Uh oh!

KentonParton Sep 18, 2022

Uh oh!

frankie567 Sep 19, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frankie567 commented Sep 21, 2022

Uh oh!

KentonParton commented Sep 21, 2022

Uh oh!

frankie567 commented Sep 22, 2022

Uh oh!

KentonParton commented Sep 22, 2022

Uh oh!

Uh oh!

Implement database abstraction #42

Implement database abstraction #42

Uh oh!

Conversation

frankie567 commented Sep 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

frankie567 commented Sep 16, 2022

Uh oh!

KentonParton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KentonParton Sep 18, 2022

Choose a reason for hiding this comment

Uh oh!

KentonParton Sep 18, 2022

Choose a reason for hiding this comment

Uh oh!

frankie567 Sep 19, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frankie567 commented Sep 21, 2022

Uh oh!

KentonParton commented Sep 21, 2022

Uh oh!

frankie567 commented Sep 22, 2022

Uh oh!

KentonParton commented Sep 22, 2022

Uh oh!

Uh oh!

frankie567 commented Sep 13, 2022 •

edited

Loading