Releases · Diyago/Tabular-data-generation

29 Mar 15:12

Diyago

v3.2.0

fc970d7

v3.2.0 — BayesianGenerator Latest

Latest

What's New

BayesianGenerator (Gaussian Copula)

New generator using Gaussian Copula for fast, lightweight synthetic data generation
No neural network training required — works out of the box
Added examples to Colab notebook

AutoSynth & HuggingFace Hub Integration (v3.1.0)

AutoSynth: automatically selects the best generator for your dataset
HuggingFace Hub: push/pull synthetic datasets directly
Blog post with benchmarks and speed comparisons

Other Improvements

Execution timing for all generators and quality reports
HuggingFace Space demo (Gradio app)
Fixed HF Space: disabled SSR, made heavy deps optional
Updated PyPI description

Full Changelog: v3.0.2...v3.2.0

Assets 2

28 Mar 05:38

Diyago

v3.0.1

b0de9ba

TabGAN v3.0.1

What's New

Quality Report (HTML)

Generate self-contained HTML reports comparing original and synthetic data — column statistics, PSI per column, correlation heatmaps, distribution plots, and ML utility scores (TSTR vs TRTR).

from tabgan import QualityReport
report = QualityReport(original_df, synthetic_df, target_col="target").compute()
report.to_html("report.html")

Constraints System

Enforce business rules on generated data with 4 constraint types: RangeConstraint, UniqueConstraint, FormulaConstraint, RegexConstraint. Integrated directly into generate_data_pipe().

from tabgan import GANGenerator, RangeConstraint
new_train, _ = GANGenerator().generate_data_pipe(
    train, target, test,
    constraints=[RangeConstraint("age", min_val=0, max_val=120)]
)

Privacy Metrics

Assess re-identification risk with DCR (Distance to Closest Record), NNDR (Nearest Neighbor Distance Ratio), and membership inference risk. Returns an overall privacy score 0–1.

from tabgan import PrivacyMetrics
pm = PrivacyMetrics(original_df, synthetic_df).summary()
print(pm["overall_privacy_score"])

sklearn Pipeline Integration

TabGANTransformer — drop-in sklearn transformer for data augmentation inside Pipeline. Supports get_params/set_params, constraints, and all generator types.

from sklearn.pipeline import Pipeline
from tabgan import TabGANTransformer
pipe = Pipeline([("augment", TabGANTransformer(gen_x_times=1.5)), ("model", clf)])

Improvements

Refactored codebase: fixed mutable defaults, nested test classes, Warning() bug, make_two_digit() bug, deprecated pkg_resources
DRY generator factories via _BaseGenerator base class
Professional README with centered badges, pipeline diagram, CLI docs, new feature documentation
Python version classifiers added, python_requires updated to >= 3.9
Test coverage expanded: 39 → 115 tests