- 🌍 Conference: The Web Conference (WWW 2026)
- 📅 Date: June 29-July 3, 2026
- 📍 Dubai, United Arab Emirates
-
Authors: Peng Cui, Xingxuan Zhang, Hanjia Ye, Jintai Chen, Shuyang Li
-
Year: 2026
-
Abstract: Structured data constitutes one of the most ubiquitous data modalities in web-scale and enterprise applications, supporting tasks such as recommendation, forecasting, and user behavior analysis. Conventional modeling paradigms—ranging from generalized linear models and gradient boosting to deep structured networks—have provided strong baselines for predictive analytics and decision support. However, the recent emergence of foundation models and in-context learning (ICL) has sparked a new paradigm for structured modeling, moving from dataset-specific training toward universal, adaptable inference. Emerging structured foundation models illustrate how large-scale pretraining, synthetic data generation, and ICL-based inference can extend foundation-model principles to structured data. These developments open new possibilities for multi-task learning, zero-shot inference, and knowledge transfer across diverse structured settings. Yet, the space of structured foundation models remains largely unexplored, with open questions surrounding data generation, multi-task setting, pretraining objectives, and evaluation standards. This tutorial will provide a structured overview of both conventional modeling and recent ICL-based approaches. Participants will gain a comprehensive understanding of established methods, current advances in foundation models, and open research challenges. In particular, we will offer an in-depth introduction to structured ICL and review the most representative foundation models in this field. Several key topics in this field will be discussed, including pretraining data generation, multi-task learning, and other emerging directions in the modeling of structured data. This tutorial aims to bridge conventional machine learning and the emerging foundation-model paradigm, providing attendees with conceptual and practical insights into structured data modeling in the era of generalist foundation models.
- 🏫 Affiliation: Tsinghua University
- 🔬 Research Area: data mining, machine learning, multimedia
- 🌐 Google Scholar: https://scholar.google.com/citations?user=G8x97ZgAAAAJ&hl=en&oi=ao
Bio: Peng Cui is an Associate Professor with tenure in the Depart- ment of Computer Science at Tsinghua University. He got his PhD degree from Tsinghua University in 2010. His research in- terests include causally-regularized machine learning, network representation learning, and social dynamics modeling. He has published more than 100 papers in prestigious conferences and journals in data mining and multimedia. His recent research won the IEEE Multimedia Best Department Paper Award, SIGKDD 2016 Best Paper Finalist, ICDM 2015 Best Student Paper Award, SIGKDD 2014 Best Paper Finalist, IEEE ICME 2014 Best Paper Award, ACM MM12 Grand Challenge Multimodal Award, and MMM13 Best Paper Award. He is PC co-chair of CIKM2019 and MMM2020, SPC or area chair of ICML, KDD, WWW, IJCAI, AAAI, etc., and Associate Editors of IEEE TKDE, IEEE TBD, ACM TIST, and ACM TOMM etc. He received ACM China Rising Star Award in 2015, and CCF-IEEE CS Young Scientist Award in 2018. He is now a Distinguished Member of ACM and CCF, and a Senior Member of IEEE.
- 🏫 Affiliation: Tsinghua University
- 🔬 Research Area: computer vision, OOD Generalization, Domain Generalization, Optimization
- 🌐 Google Scholar: https://scholar.google.com/citations?user=uutKFOYAAAAJ&hl=en&oi=ao
Bio: Xingxuan Zhang is an Assistant Researcher in the Department of Computer Science at Tsinghua University, where he also re- ceived his PhD. His research interests lie in Foundation Models and Trustworthy AI, with a specialized focus on developing fun- damental modeling methods for structured data. He has published more than 30 papers in top-tier conferences and journals includ- ing ICLR, Neurips, ICML, WWW, CVPR, and ICCV, where he also served as a reviewer and program committee member.
- 🏫 Affiliation: Nanjing University
- 🔬 Research Area: Machine Learning, Data Mining, Metric Learning, Meta-Learning
- 🌐 Google Scholar: https://scholar.google.com/citations?user=mgOYhtoAAAAJ&hl=en&oi=ao
Bio: Han-Jia Ye is an Associate Professor in the School of Arti- ficial Intelligence at Nanjing University. His research centers on machine learning—especially representation learning, meta- learning, model reuse, and deep learning for tabular data. He received his Ph.D. in Computer Science from Nanjing University in 2019. He has served as Tutorial Co-Chair for SDM 2023 and Doctoral Forum Co-Chair for SDM 2022, and as an area chair for top-tier venues including ICML, NeurIPS, ICLR, AAAI, IJCAI, and CVPR. He leads the development of TALENT, a toolbox of representative deep tabular methods, and is organizing an AAAI 2026 tutorial on representation learning for structured tabular data.
- 🏫 Affiliation: The Hong Kong University of Science and Technology (Guangzhou)
- 🔬 Research Area: computer vision, OOD Generalization, Domain Generalization, Optimization
- 🌐 Google Scholar: https://scholar.google.com/citations?user=ZiY3xYEAAAAJ&hl=en&oi=ao
Bio: Jintai Chen is an Assistant Professor in the AI Thrust, Informa- tion Hub at The Hong Kong University of Science and Technology (Guangzhou). He received his PhD from Zhejiang University and later conducted postdoctoral research at the University of Illinois at Urbana–Champaign. His research centers on artificial intel- ligence for tabular data and healthcare, with a particular focus on developing foundation models for heterogeneous and noisy clinical and biomedical datasets. His broader interests include medical agents, clinical trial optimisation, and drug design.
- 🏫 Affiliation: Tsinghua University
- 🔬 Research Area: Deep Learning, Time Series Forecasting, Mobile Networks
- 🌐 Google Scholar: https://scholar.google.com/citations?user=ItGCvS4AAAAJ&hl=en&oi=ao
Bio: Shuyang Li is a Postdoctoral Researcher in the Department of Computer Science at Tsinghua University. He received his PhD degree in Electrical, Electronic, and Communications Engineering from Politecnico di Torino, Italy. After obtaining his PhD, he has been working as a researcher at Politecnico di Torino, collaborating on projects with Telecom Italia, and conducting research in mobile traffic forecasting and predictive maintenance. His research has been published in journals and conferences such as Computer Networks, ICC, WCNC, VTC, etc. His main research interests include time series analysis, synthetic data generation, cellular network optimization, and mobile traffic modeling.
-
LimiX: Unleashing structured-data modeling capability for generalist intelligence
https://arxiv.org/abs/2509.03505 -
Tabpfn: A transformer that solves small tabular classification problems in a second
https://arxiv.org/abs/2207.01848 -
A closer look at TabPFN v2: Understanding its strengths and extending its capabilities
https://arxiv.org/abs/2502.17361 -
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
https://arxiv.org/abs/2511.08667 -
Tabpfn unleashed: A scalable and effective solution to tabular classification problems
https://arxiv.org/abs/2502.02527 -
Accurate predictions on small data with a tabular foundation model
https://www.nature.com/articles/s41586-024-08328-6 -
Tabicl: A tabular foundation model for in-context learning on large data
https://arxiv.org/abs/2502.05564 -
TabICLv2: A better, faster, scalable, and open tabular foundation model
https://arxiv.org/abs/2602.11139 -
Tabdpt: Scaling tabular foundation models on real data
https://arxiv.org/abs/2410.18164 -
Why do tree-based models still outperform deep learning on typical tabular data?
https://proceedings.neurips.cc/paper_files/paper/2022/hash/0378c7692da36807bdec87ab043cdadc-Abstract-Datasets_and_Benchmarks.html -
On Finetuning Tabular Foundation Models
https://arxiv.org/abs/2506.08982 -
Retrieval & Fine-Tuning for In-Context Tabular Models
https://proceedings.neurips.cc/paper_files/paper/2024/hash/c40daf14d7a6469e65116507c21faeb7-Abstract-Conference.html -
Deep neural networks and tabular data: A survey
https://ieeexplore.ieee.org/abstract/document/9998482 -
Time-series forecasting with deep learning: a survey
https://royalsocietypublishing.org/rsta/article/379/2194/20200209/41189 -
A comprehensive survey of deep learning for time series forecasting: architectural diversity and open challenges
https://link.springer.com/article/10.1007/s10462-025-11223-9 -
Transformers can do bayesian inference
https://arxiv.org/abs/2112.10510 -
Revisiting Deep Learning Models for Tabular Data
https://proceedings.neurips.cc/paper_files/paper/2021/hash/9d86d83f925f2149e9edb0ac3b49229c-Abstract.html -
Well-tuned Simple Nets Excel on Tabular Datasets
https://proceedings.neurips.cc/paper/2021/hash/c902b497eb972281fb5b4e206db38ee6-Abstract.html -
An Inductive Bias for Tabular Deep Learning
https://proceedings.neurips.cc/paper_files/paper/2023/file/8671b6dffc08b4fcf5b8ce26799b2bef-Paper-Conference.pdf -
On embeddings for numerical features in tabular deep learning
https://proceedings.neurips.cc/paper_files/paper/2022/file/9e9f0ffc3d836836ca96cbf8fe14b105-Paper-Conference.pdf -
Representation Learning for Tabular Data: A Comprehensive Survey
https://arxiv.org/pdf/2504.16109 -
Better by default: Strong pre-tuned MLPs and boosted trees on tabular data
https://proceedings.neurips.cc/paper_files/paper/2024/file/2ee1c87245956e3eaa71aaba5f5753eb-Paper-Conference.pdf -
TabR: Tabular deep learning meets nearest neighbors
https://arxiv.org/abs/2307.14338 -
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later
https://arxiv.org/pdf/2407.03257 -
TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling
https://arxiv.org/pdf/2410.24210? -
Unveiling the Role of Data Uncertainty in Tabular Deep Learning
https://arxiv.org/abs/2509.04430 -
Mechanism for feature learning in neural networks and backpropagation-free machine learning models
https://www.science.org/doi/abs/10.1126/science.adi5639 -
Danets: Deep abstract networks for tabular data classification and regression
https://ojs.aaai.org/index.php/AAAI/article/view/20309 -
A Closer Look at Deep Learning Methods on Tabular Datasets
https://arxiv.org/abs/2407.00956 -
Tabarena: A living benchmark for machine learning on tabular data
https://arxiv.org/abs/2506.16791 -
TALENT: A Tabular Analytics and Learning Toolbox
https://www.jmlr.org/papers/v26/25-0512.html -
Making Pre-trained Language Models Great on Tabular Prediction
https://arxiv.org/abs/2403.01841 -
TabLLM: Few-shot Classification of Tabular Data with Large Language Models
https://proceedings.mlr.press/v206/hegselmann23a.html -
Large Scale Transfer Learning for Tabular Data via Language Modeling
https://proceedings.neurips.cc/paper_files/paper/2024/hash/4fd5cfd2e31bebbccfa5ffa354c04bdc-Abstract-Conference.html -
Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering
https://proceedings.neurips.cc/paper_files/paper/2023/hash/8c2df4c35cdbee764ebb9e9d0acd5197-Abstract-Conference.html -
Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning
https://arxiv.org/abs/2404.09491 -
The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features
https://arxiv.org/abs/2501.02945v3 -
Turning tabular foundation models into graph foundation models
https://arxiv.org/abs/2508.20906 -
Of graphs and tables: Zero-shot node classification with tabular foundation models
https://openreview.net/pdf?id=u3vyZf5Jv2 -
Tabular Foundation Models are Strong Graph Anomaly Detectors
https://arxiv.org/pdf/2601.17301
We thank the organizers of WWW 2026, all contributors to this tutorial, and Jiawei Chen (Beihang University, https://scholar.google.com/citations?user=2803pOEAAAAJ) for assisting us in preparing the materials.

