GitHub - slies/WWW-26-Tutorial

🌍 Conference: The Web Conference (WWW 2026)
📅 Date: June 29-July 3, 2026
📍 Dubai, United Arab Emirates

Generalist Model for Structured Data: Foundations, Frontiers and Applications

Authors: Peng Cui, Xingxuan Zhang, Hanjia Ye, Jintai Chen, Shuyang Li
Year: 2026
Abstract: Structured data constitutes one of the most ubiquitous data modalities in web-scale and enterprise applications, supporting tasks such as recommendation, forecasting, and user behavior analysis. Conventional modeling paradigms—ranging from generalized linear models and gradient boosting to deep structured networks—have provided strong baselines for predictive analytics and decision support. However, the recent emergence of foundation models and in-context learning (ICL) has sparked a new paradigm for structured modeling, moving from dataset-specific training toward universal, adaptable inference. Emerging structured foundation models illustrate how large-scale pretraining, synthetic data generation, and ICL-based inference can extend foundation-model principles to structured data. These developments open new possibilities for multi-task learning, zero-shot inference, and knowledge transfer across diverse structured settings. Yet, the space of structured foundation models remains largely unexplored, with open questions surrounding data generation, multi-task setting, pretraining objectives, and evaluation standards. This tutorial will provide a structured overview of both conventional modeling and recent ICL-based approaches. Participants will gain a comprehensive understanding of established methods, current advances in foundation models, and open research challenges. In particular, we will offer an in-depth introduction to structured ICL and review the most representative foundation models in this field. Several key topics in this field will be discussed, including pretraining data generation, multi-task learning, and other emerging directions in the modeling of structured data. This tutorial aims to bridge conventional machine learning and the emerging foundation-model paradigm, providing attendees with conceptual and practical insights into structured data modeling in the era of generalist foundation models.

🧑‍🏫 Tutors

Peng Cui

🏫 Affiliation: Tsinghua University
🔬 Research Area: data mining, machine learning, multimedia
🌐 Google Scholar: https://scholar.google.com/citations?user=G8x97ZgAAAAJ&hl=en&oi=ao

Bio: Peng Cui is an Associate Professor with tenure in the Depart- ment of Computer Science at Tsinghua University. He got his PhD degree from Tsinghua University in 2010. His research in- terests include causally-regularized machine learning, network representation learning, and social dynamics modeling. He has published more than 100 papers in prestigious conferences and journals in data mining and multimedia. His recent research won the IEEE Multimedia Best Department Paper Award, SIGKDD 2016 Best Paper Finalist, ICDM 2015 Best Student Paper Award, SIGKDD 2014 Best Paper Finalist, IEEE ICME 2014 Best Paper Award, ACM MM12 Grand Challenge Multimodal Award, and MMM13 Best Paper Award. He is PC co-chair of CIKM2019 and MMM2020, SPC or area chair of ICML, KDD, WWW, IJCAI, AAAI, etc., and Associate Editors of IEEE TKDE, IEEE TBD, ACM TIST, and ACM TOMM etc. He received ACM China Rising Star Award in 2015, and CCF-IEEE CS Young Scientist Award in 2018. He is now a Distinguished Member of ACM and CCF, and a Senior Member of IEEE.

Xingxuan Zhang

🏫 Affiliation: Tsinghua University
🔬 Research Area: computer vision, OOD Generalization, Domain Generalization, Optimization
🌐 Google Scholar: https://scholar.google.com/citations?user=uutKFOYAAAAJ&hl=en&oi=ao

Bio: Xingxuan Zhang is an Assistant Researcher in the Department of Computer Science at Tsinghua University, where he also re- ceived his PhD. His research interests lie in Foundation Models and Trustworthy AI, with a specialized focus on developing fun- damental modeling methods for structured data. He has published more than 30 papers in top-tier conferences and journals includ- ing ICLR, Neurips, ICML, WWW, CVPR, and ICCV, where he also served as a reviewer and program committee member.

Han-Jia Ye

🏫 Affiliation: Nanjing University
🔬 Research Area: Machine Learning, Data Mining, Metric Learning, Meta-Learning
🌐 Google Scholar: https://scholar.google.com/citations?user=mgOYhtoAAAAJ&hl=en&oi=ao

Bio: Han-Jia Ye is an Associate Professor in the School of Arti- ficial Intelligence at Nanjing University. His research centers on machine learning—especially representation learning, meta- learning, model reuse, and deep learning for tabular data. He received his Ph.D. in Computer Science from Nanjing University in 2019. He has served as Tutorial Co-Chair for SDM 2023 and Doctoral Forum Co-Chair for SDM 2022, and as an area chair for top-tier venues including ICML, NeurIPS, ICLR, AAAI, IJCAI, and CVPR. He leads the development of TALENT, a toolbox of representative deep tabular methods, and is organizing an AAAI 2026 tutorial on representation learning for structured tabular data.

Jintai Chen

🏫 Affiliation: The Hong Kong University of Science and Technology (Guangzhou)
🔬 Research Area: computer vision, OOD Generalization, Domain Generalization, Optimization
🌐 Google Scholar: https://scholar.google.com/citations?user=ZiY3xYEAAAAJ&hl=en&oi=ao

Bio: Jintai Chen is an Assistant Professor in the AI Thrust, Informa- tion Hub at The Hong Kong University of Science and Technology (Guangzhou). He received his PhD from Zhejiang University and later conducted postdoctoral research at the University of Illinois at Urbana–Champaign. His research centers on artificial intel- ligence for tabular data and healthcare, with a particular focus on developing foundation models for heterogeneous and noisy clinical and biomedical datasets. His broader interests include medical agents, clinical trial optimisation, and drug design.

Shuyang Li

🏫 Affiliation: Tsinghua University
🔬 Research Area: Deep Learning, Time Series Forecasting, Mobile Networks
🌐 Google Scholar: https://scholar.google.com/citations?user=ItGCvS4AAAAJ&hl=en&oi=ao

Bio: Shuyang Li is a Postdoctoral Researcher in the Department of Computer Science at Tsinghua University. He received his PhD degree in Electrical, Electronic, and Communications Engineering from Politecnico di Torino, Italy. After obtaining his PhD, he has been working as a researcher at Politecnico di Torino, collaborating on projects with Telecom Italia, and conducting research in mobile traffic forecasting and predictive maintenance. His research has been published in journals and conferences such as Computer Networks, ICC, WCNC, VTC, etc. His main research interests include time series analysis, synthetic data generation, cellular network optimization, and mobile traffic modeling.

📌 Paper List

LimiX: Unleashing structured-data modeling capability for generalist intelligence
https://arxiv.org/abs/2509.03505
Tabpfn: A transformer that solves small tabular classification problems in a second
https://arxiv.org/abs/2207.01848
A closer look at TabPFN v2: Understanding its strengths and extending its capabilities
https://arxiv.org/abs/2502.17361
TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models
https://arxiv.org/abs/2511.08667
Tabpfn unleashed: A scalable and effective solution to tabular classification problems
https://arxiv.org/abs/2502.02527
Accurate predictions on small data with a tabular foundation model
https://www.nature.com/articles/s41586-024-08328-6
Tabicl: A tabular foundation model for in-context learning on large data
https://arxiv.org/abs/2502.05564
TabICLv2: A better, faster, scalable, and open tabular foundation model
https://arxiv.org/abs/2602.11139
Tabdpt: Scaling tabular foundation models on real data
https://arxiv.org/abs/2410.18164
Why do tree-based models still outperform deep learning on typical tabular data?
https://proceedings.neurips.cc/paper_files/paper/2022/hash/0378c7692da36807bdec87ab043cdadc-Abstract-Datasets_and_Benchmarks.html
On Finetuning Tabular Foundation Models
https://arxiv.org/abs/2506.08982
Retrieval & Fine-Tuning for In-Context Tabular Models
https://proceedings.neurips.cc/paper_files/paper/2024/hash/c40daf14d7a6469e65116507c21faeb7-Abstract-Conference.html
Deep neural networks and tabular data: A survey
https://ieeexplore.ieee.org/abstract/document/9998482
Time-series forecasting with deep learning: a survey
https://royalsocietypublishing.org/rsta/article/379/2194/20200209/41189
A comprehensive survey of deep learning for time series forecasting: architectural diversity and open challenges
https://link.springer.com/article/10.1007/s10462-025-11223-9
Transformers can do bayesian inference
https://arxiv.org/abs/2112.10510
Revisiting Deep Learning Models for Tabular Data
https://proceedings.neurips.cc/paper_files/paper/2021/hash/9d86d83f925f2149e9edb0ac3b49229c-Abstract.html
Well-tuned Simple Nets Excel on Tabular Datasets
https://proceedings.neurips.cc/paper/2021/hash/c902b497eb972281fb5b4e206db38ee6-Abstract.html
An Inductive Bias for Tabular Deep Learning
https://proceedings.neurips.cc/paper_files/paper/2023/file/8671b6dffc08b4fcf5b8ce26799b2bef-Paper-Conference.pdf
On embeddings for numerical features in tabular deep learning
https://proceedings.neurips.cc/paper_files/paper/2022/file/9e9f0ffc3d836836ca96cbf8fe14b105-Paper-Conference.pdf
Representation Learning for Tabular Data: A Comprehensive Survey
https://arxiv.org/pdf/2504.16109
Better by default: Strong pre-tuned MLPs and boosted trees on tabular data
https://proceedings.neurips.cc/paper_files/paper/2024/file/2ee1c87245956e3eaa71aaba5f5753eb-Paper-Conference.pdf
TabR: Tabular deep learning meets nearest neighbors
https://arxiv.org/abs/2307.14338
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later
https://arxiv.org/pdf/2407.03257
TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling
https://arxiv.org/pdf/2410.24210?
Unveiling the Role of Data Uncertainty in Tabular Deep Learning
https://arxiv.org/abs/2509.04430
Mechanism for feature learning in neural networks and backpropagation-free machine learning models
https://www.science.org/doi/abs/10.1126/science.adi5639
Danets: Deep abstract networks for tabular data classification and regression
https://ojs.aaai.org/index.php/AAAI/article/view/20309
A Closer Look at Deep Learning Methods on Tabular Datasets
https://arxiv.org/abs/2407.00956
Tabarena: A living benchmark for machine learning on tabular data
https://arxiv.org/abs/2506.16791
TALENT: A Tabular Analytics and Learning Toolbox
https://www.jmlr.org/papers/v26/25-0512.html
Making Pre-trained Language Models Great on Tabular Prediction
https://arxiv.org/abs/2403.01841
TabLLM: Few-shot Classification of Tabular Data with Large Language Models
https://proceedings.mlr.press/v206/hegselmann23a.html
Large Scale Transfer Learning for Tabular Data via Language Modeling
https://proceedings.neurips.cc/paper_files/paper/2024/hash/4fd5cfd2e31bebbccfa5ffa354c04bdc-Abstract-Conference.html
Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering
https://proceedings.neurips.cc/paper_files/paper/2023/hash/8c2df4c35cdbee764ebb9e9d0acd5197-Abstract-Conference.html
Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning
https://arxiv.org/abs/2404.09491
The tabular foundation model tabpfn outperforms specialized time series forecasting models based on simple features
https://arxiv.org/abs/2501.02945v3
Turning tabular foundation models into graph foundation models
https://arxiv.org/abs/2508.20906
Of graphs and tables: Zero-shot node classification with tabular foundation models
https://openreview.net/pdf?id=u3vyZf5Jv2
Tabular Foundation Models are Strong Graph Anomaly Detectors
https://arxiv.org/pdf/2601.17301

⭐ Acknowledgements

We thank the organizers of WWW 2026, all contributors to this tutorial, and Jiawei Chen (Beihang University, https://scholar.google.com/citations?user=2803pOEAAAAJ) for assisting us in preparing the materials.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
image		image
slide		slide
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalist Model for Structured Data: Foundations, Frontiers and Applications

🧑‍🏫 Tutors

Peng Cui

Xingxuan Zhang

Han-Jia Ye

Jintai Chen

Shuyang Li

📌 Paper List

⭐ Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Generalist Model for Structured Data: Foundations, Frontiers and Applications

🧑‍🏫 Tutors

Peng Cui

Xingxuan Zhang

Han-Jia Ye

Jintai Chen

Shuyang Li

📌 Paper List

⭐ Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages