Description
Is your feature request related to a problem? Please describe.
Create a report to compare dataframes. The report is like sweetviz and our create_report
function.
Describe the solution you'd like
The API is similar to create_report
and is as follows:
create_diff_report(
dfs: Union[List[DataFrame], Dict[str, DataFrame]],
config: Optional[Dict[str, Any]] = None,
display: Optional[List[str]] = None,
title: Optional[str] = "DataFrame Difference Report by DataPrep",
mode: Optional[str] = "basic",
progress: bool = True, )
The dfs
is a list of dataframes or a dict of dataframes. E.g., user can call create_diff_report([df1, df2])
or create_diff_report({'train': df1, 'test': df2})
. In the former case df is named as 'df1', 'df2'. In the later case the key is the name of the dataframe.
The layout of this function is similar to create_report
. It has the following sections:
1. Overview. The overview section is like the overview in create_report
. The content is from plot_diff([df1, df2])
, as shown in the following figure.
2. Variables
The layout is similar to the Variables section in create_report
, or
The difference is that:
- for the content we need to change the single dataframe statistics to multiple dataframes statistics. The layout is like what we did in
plot_diff([df1, df2], x)
:
- for the fig we need to change it to the fig of distribution comparison, e.g., show hist comparison for numerical column and bar chart comparison for categorical column. The following figs show the hist comparison and bar chart comparison fig:
- In
show details
button, we change each tab to its multiple dataframes version.
3. ...To be continued