This repository is the official implementation of TabICLv2 (arXiv) and TabICL (ICML 2025).
State-of-the-art accuracy even without hyperparameter tuning: TabICLv2 is the new state-of-the-art model for tabular classification and regression on the TabArena and TALENT benchmarks. It does not require hyperparameter tuning and still outperforms heavily tuned XGBoost, CatBoost, or LightGBM on TabArena on ~80% of datasets.
Easy to use: TabICL is pip-installable and scikit-learn compliant. It is also open source (including pre-training for v1), with a permissive license.
Speed: TabICL performs fit and predict jointly via a single
forward pass through a pre-trained transformer model.
For larger datasets, we recommend a GPU.
On an H100 GPU, TabIClv2 can fit and predict a dataset
with 50,000 samples and 100 features in under 10 seconds,
which is 10x faster than TabPFN-2.5.
Through KV caching, TabICL supports faster repeated inference on the same training data.
Scalability: TabICL shows excellent performance on benchmarks with 300 to 100,000 training samples and up to 2,000 features. It can scale to even larger datasets (e.g., 500K samples) through CPU and disk offloading, though its accuracy may degrade at some point.
pip install tabiclFor pretraining, use pip install tabicl[pretrain] instead.
from tabicl import TabICLClassifier, TabICLRegressor
clf = TabICLClassifier()
clf.fit(X_train, y_train) # downloads checkpoint on first use, otherwise cheap
clf.predict(X_test) # in-context learning happens here
reg = TabICLRegressor()
reg.fit(X_train, y_train)
reg.predict(X_test)To speed up repeated inference on the same training data, enable KV caching during fit. Note that this consumes additional memory to store the cached projections, so consider the trade-off
for your use case:
clf.fit(X_train, y_train, kv_cache=True) # caches key-value projections for training data
clf.predict(X_test) # fast: only processes test data by reusing the cached contextSave and load a fitted classifier or regressor:
clf.save(
"classifier.pkl",
save_model_weights=False, # if False, reload from checkpoint on load
save_training_data=True, # if True, include training data; if False, discard it (requires KV cache)
save_kv_cache=True, # if True and KV cache exists, save it
)
clf = TabICLClassifier.load("classifier.pkl")When KV cache exists and is saved, you can set save_training_data=False to exclude
cached training data, which may be useful for data privacy.
TabICL offers a set of parameters to customize its behavior. The following example shows all available parameters with their default values and brief descriptions:
from tabicl import TabICLClassifier
clf = TabICLClassifier(
n_estimators=8, # number of ensemble members, more = better but slower
norm_methods=None, # normalization methods to try
feat_shuffle_method="latin", # feature permutation strategy
class_shuffle_method="shift", # class permutation strategy
outlier_threshold=4.0, # z-score threshold for outlier detection and clipping
softmax_temperature=0.9, # temperature to control prediction confidence
average_logits=True, # average logits (True) or probabilities (False)
support_many_classes=True, # handle >10 classes automatically
batch_size=8, # ensemble members processed together, lower to save memory
model_path=None, # path to checkpoint, None downloads from Hugging Face
allow_auto_download=True, # auto-download checkpoint if not found locally
checkpoint_version="tabicl-classifier-v2-20260212.ckpt", # pretrained checkpoint version
device=None, # inference device, None auto-selects CUDA or CPU
use_amp="auto", # automatic mixed precision for faster inference
use_fa3="auto", # Flash Attention 3 for Hopper GPUs (e.g. H100)
offload_mode="auto", # automatically decide when to use cpu/disk offloading
disk_offload_dir=None, # directory for disk offloading
random_state=42, # random seed for reproducibility
n_jobs=None, # number of PyTorch threads for CPU inference
verbose=False, # print detailed information during inference
inference_config=None, # fine-grained inference control for advanced users
)TabICLRegressor accepts the same parameters except for the classification-specific ones:
class_shuffle_method, softmax_temperature, average_logits, and support_many_classes.
| Model | Classification checkpoint | Regression checkpoint |
|---|---|---|
| TabICLv2 (arXiv) | tabicl-classifier-v2-20260212.ckpt (default) |
tabicl-regressor-v2-20260212.ckpt (default) |
| TabICLv1.1 (May 2025, no paper) | tabicl-classifier-v1.1-20250506.ckpt |
— |
| TabICLv1 (ICML 2025) | tabicl-classifier-v1-20250208.ckpt |
— |
- TabICLv2: Our state-of-the-art model, supporting both classification and regression. Strongly improved accuracy over v1 through better synthetic pre-training data, architectural improvements, and better pre-training, with comparable runtime.
- TabICLv1.1: TabICLv1 post-trained on an early version of the v2 prior. Classification only.
- TabICLv1: Original model. Classification only.
TabICLv1 and v1.1 originally used
n_estimators=32; we reduced the default to 8 afterwards.
Pre-training code (including synthetic data generation) is currently available for the v1 model. The scripts folder provides the commands for stage 1, stage 2, and stage 3 of curriculum learning. Pre-training code for v2 will be released upon publication.
We provide a minimal implementation of the TabICLv2 architecture here, for educational and experimental purposes.
What is TabICL?
TabICL is a tabular foundation model (like TabPFN).
It uses in-context learning (ICL) to learn from new data
in a single forward pass through a Transformer model:
y_pred = model(X_train, y_train, X_test) (this is called inside predict()).
It has acquired strong learning capabilities through
pre-training on millions of synthetic datasets.
How fast is TabICL? On datasets with
What dataset sizes work well? TabICLv2 is pre-trained on datasets between 300 and 48K training samples. However, it can generalize to larger datasets to some extent, and we see good results even on some datasets with 600K samples. We have not tested if TabICL generalizes to datasets smaller than 300 samples.
What about the number of columns? TabICLv2 is pre-trained on datasets between 2 and 100 columns. We see good generalization to more columns and don't know where the limit is.
If the input X to TabICL is a pandas DataFrame, TabICL will automatically:
- Detect and ordinal encode categorical columns (including string, object, category, and boolean types)
- Create a separate category for missing values in categorical features
- Perform mean imputation for missing numerical values (encoded as NaN)
If the input X is a numpy array, TabICL assumes that ordinal encoding and missing value imputation have already been performed.
For both input types, TabICL applies additional preprocessing:
- Outlier detection and removal
- Feature scaling and normalization
- Feature shuffling for ensemble diversity
Real-world datasets often contain complex heterogeneous data that benefits from more sophisticated preprocessing. For these scenarios, we recommend skrub, a powerful library designed specifically for advanced tabular data preparation.
Why use skrub?
- Handles diverse data types (numerical, categorical, text, datetime, etc.)
- Provides robust preprocessing for dirty data
- Offers sophisticated feature engineering capabilities
- Supports multi-table integration and joins
pip install skrub -UUse skrub's TableVectorizer to transform your raw data before passing it to TabICLClassifier:
from skrub import TableVectorizer
from tabicl import TabICLClassifier
from sklearn.pipeline import make_pipeline
pipeline = make_pipeline(
TableVectorizer(low_cardinality="passthrough"), # Automatically handles various data types
TabICLClassifier()
)
pipeline.fit(X_train, y_train) # X should be a DataFrame
predictions = pipeline.predict(X_test)If you use TabICL for research purposes, please cite our papers for TabICL and TabICLv2:
@inproceedings{qu2025tabicl,
title={Tab{ICL}: {A} Tabular Foundation Model for In-Context Learning on Large Data},
author={Qu, Jingang and Holzm{\"u}ller, David and Varoquaux, Ga{\"e}l and Le Morvan, Marine},
booktitle={International Conference on Machine Learning},
year={2025}
}
@article{qu2026tabiclv2,
title={{TabICLv2}: {A} better, faster, scalable, and open tabular foundation model},
author={Qu, Jingang and Holzm{\"u}ller, David and Varoquaux, Ga{\"e}l and Le Morvan, Marine},
journal={arXiv preprint arXiv:2602.11139},
year={2026}
}


