Thanks for the good work!
Could you please provide more insight into how the thresholds are specified for each stage of the data cleaning pipeline? Since these predefined thresholds can have a substantial impact on the final processed dataset, it would be helpful to understand the rationale behind their selection and whether they are task-dependent, empirically tuned, or based on some general statistical criterion.
Thanks for the good work!
Could you please provide more insight into how the thresholds are specified for each stage of the data cleaning pipeline? Since these predefined thresholds can have a substantial impact on the final processed dataset, it would be helpful to understand the rationale behind their selection and whether they are task-dependent, empirically tuned, or based on some general statistical criterion.