EDA: Report of Comparing Dataframs (create_diff_report)

**Is your feature request related to a problem? Please describe.**
Create a report to compare dataframes. The report is like [sweetviz](https://github.com/fbdesignpro/sweetviz) and our `create_report` function.

**Describe the solution you'd like**
The API is similar to `create_report` and is as follows:
```Python
create_diff_report(
    dfs: Union[List[DataFrame], Dict[str, DataFrame]],    
    config: Optional[Dict[str, Any]] = None,
    display: Optional[List[str]] = None,
    title: Optional[str] = "DataFrame Difference Report by DataPrep",
    mode: Optional[str] = "basic",
    progress: bool = True, )
```

The `dfs` is a list of dataframes or a dict of dataframes. E.g.,  user can call `create_diff_report([df1, df2])` or `create_diff_report({'train': df1, 'test': df2})`. In the former case df is named as 'df1', 'df2'. In the later case the key is the name of the dataframe.

The layout of this function is similar to `create_report`. It has the following sections:

**1. Overview.** The overview section is like the overview in `create_report`. The content is from `plot_diff([df1, df2])`, as shown in the following figure.
![image](https://user-images.githubusercontent.com/18078770/136629631-11c636ca-6fca-4aa3-bbfd-4effa2aed818.png)

**2. Variables**
The layout is similar to the Variables section in `create_report`, or 
![image](https://user-images.githubusercontent.com/18078770/136630383-e4d2a235-c99a-4665-a43b-c1a9ef6245c3.png)
The difference is that:
   1) for the content we need to change the single dataframe statistics to multiple dataframes statistics. The layout is like what we did in `plot_diff([df1, df2], x)`:
![image](https://user-images.githubusercontent.com/18078770/136630545-1c218c0b-c5fe-4291-b5ba-a591e1f676e5.png)
  2) for the fig we need to change it to the fig of distribution comparison, e.g., show hist comparison for numerical column and bar chart comparison for categorical column. The following figs show the hist comparison and bar chart comparison fig:
![image](https://user-images.githubusercontent.com/18078770/136630882-3f18f711-e019-4e3d-8a2b-cf4a4a05977e.png)
  3) In `show details` button, we change each tab to its multiple dataframes version.


**3. ...To be continued**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EDA: Report of Comparing Dataframs (create_diff_report) #702

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

EDA: Report of Comparing Dataframs (create_diff_report) #702

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions