Bio Notes
Paolo Papotti got his Ph.D. degree from the University of Roma Tre (Italy) in 2007 and is a professor in the Data Science department at EURECOM (France) since 2017. Before joining EURECOM, he has been a scientist in the data analytics group at QCRI (Qatar) and an assistant professor at Arizona State University (USA). His research is in the broad areas of scalable data management and NLP, with a focus on data integration and information quality.
News
(Complete list)- 9/2025 - Keynote on 'SQL and Large Language Models: A Marriage Made in Heaven?' at DaSH workshop at VLDB25 (slides).
- 9/2025 - Keynote on `Reinforcement Learning to enable Reasoning LLMs for Text2SQL' at TaDa workshop at VLDB25 (slides).
- 9/2025 - Co-chaired panel on 'Tabular Foundation Models, LLMs... or both?' at VLDB 2025.
- 9/2025 - Two papers accepted at EMNLP 2025 (Main): 'SQUAB: Evaluating LLM's robustness to Ambiguous and Unanswerable Questions in Semantic Parsing' and 'Refining Attention for Explainable and Noise-Robust Fact-Checking with Transformers'.
- 6/2025 - 'TableKV: KV Cache Compression for In-Context Table Processing' accepted at the TRL Workshop@ACL.
- 5/2025 - Our Think2SQL (model) is the first reasoning LLMs for Text2SQL (paper).
- 4/2025 - Distinguished meta-reviewer award at SIGMOD 2025.
- 3/2025 - Research paper 'Logical and Physical Optimizations for SQL Query Execution over Large Language Models' accepted at SIGMOD 2025 (code) (paper).
- 3/2025 - New work on KV cache compression: 'Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning' (pdf).
- 3/2025 - Distinguished reviewer award at EDBT 2025.
- 2/2025 - Long paper 'An LLM-Based Approach for Insight Generation in Data Analysis' accepted at NAACL 2025 (pdf).
- 2/2025 - Invited talk at the ELLIS workshop on Representation Learning and Generative Models for Structured Data.
- 1/2025 - Paper presented at COLING "Automated Detection of Tropes In Short Texts".
Recent Activities
(Complete list)- Co-Chair: NOVAS@SIGMOD (2025), ICDE demo track (2026)
- Associate Editor: SIGMOD (2027, 2026, 2025), VLDBJ (since 2023), ICDE (2025)
- PC Member: SIGMOD (2024), VLDB (2026, 2024), EDBT (2025, 2024), SEBD (2025, 2024), NeurIPS (2024), TaDA@VLDB (2025, 2024), QDB@VLDB (2025), TRL@ACL (2025), TRL@NeurIPS (2024)
Selected Publications
Data Cleaning
- R. Cappuzzo, P. Papotti, S. Thirumuruganathan
Relational Data Imputation with Graph Neural Networks.
In EDBT, 2024. (.pdf) (code) - R. Shrestha, O. Habibelahian, A. Termehchy, P. Papotti
Exploratory Training: When Annotators Learn About Data.
In SIGMOD, 2023. (.pdf) - R. Cappuzzo, P. Papotti, S. Thirumuruganathan
Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks.
In SIGMOD, 2020. (.pdf) (code) (video) - S. Ortona, V. Meduri, P. Papotti
Robust Discovery of Positive and Negative Rules in Knowledge-Bases.
In ICDE, 2018. (Tech. Report) (code) (.pdf) - R. Singh, V. Meduri, A. Elmagarmid, S. Madden, P. Papotti, J. Quiane, N. Tang, A. Solar
Synthesizing Entity Matching Rules by Examples.
PVLDB, 2016. (.pdf) - E. Veltri, D. Santoro, G. Mecca, P. Papotti, J. He, G. Li, N. Tang
Interactive and Deterministic Data Cleaning.
In SIGMOD, 2016. (.pdf) - Z. Abedjan, X. Chu, D. Deng, R. Fernandez, I. Ilyas, M. Ouzzani, P. Papotti, M. Stonebraker, N. Tang
Detecting Data Errors: Where are we and what needs to be done?.
PVLDB, 2016. (.pdf) - F. Geerts, G. Mecca, P. Papotti, D. Santoro.
The LLUNATIC Data-Cleaning Framework.
PVLDB, 2013. (.pdf) (code) - X. Chu, I. Ilyas, P. Papotti
Discovering Denial Constraints.
PVLDB, 2013. (.pdf)
Information Integrity and Computational Fact Checking
- G. Burel et al.
CimpleKG: A Continuously Updated Knowledge Graph on Misinformation, Factors and Fact-Checks.
(.pdf) ISWC, 2024 (Best resource paper award). - J.F. Bussotti et al.
Unknown Claims: Generation of Fact-Checking Training Examples from Unstructured and Structured Data.
(.pdf) EMNLP, 2024. - M. Mannino et al.
Data Void Exploits: Tracking & Mitigation Strategies.
(.pdf) CIKM, 2024 (Best paper award). - R. Advani et al.
Maximizing Neutrality in News Ordering.
(.pdf) KDD, 2023. - M. Saeed et al.
Crowdsourced Fact-Checking at Twitter: How Does the Crowd Compare With Experts?.
(.pdf) CIKM, 2022. - M. Mori et al.
Neural machine Translation for Fact-Checking Temporal Claims.
(.pdf) FEVER, 2022. - P. Nakov et al.
Automated Fact-Checking for Assisting Human Fact-Checkers.
IJCAI, 2021. (.pdf) - G. Karagiannis, M. Saeed, P. Papotti, I. Trummer.
Scrutinizer: a mixed-initiative approach to large-scale, data-driven claim verification.
PVLDB, 2020. (.pdf) (code) (video) - P. Huynh, P. Papotti.
A Benchmark for Fact Checking Algorithms Built on Knowledge Bases.
CIKM, 2019. (.pdf) (code) - N. Ahmadi, J. Lee, P. Papotti, M. Saeed.
Explainable Fact Checking with Probabilistic Answer Set Programming.
Conference for Truth and Trust Online (TTO), 2019. (.pdf) (code)
Transformer Architecture
- G. Corallo, P. Papotti
FINCH: Prompt-guided Key-Value Cache Compression for Large Language Models.
In TACL, 2024. (.pdf) (code) - M. Saeed, P. Papotti
You are my type! Type embeddings for pre-trained language models.
In EMNLP (Findings), 2022. (.pdf) (code) - M. Saeed et al.
RuleBERT: Teaching Soft Rules to Pre-Trained Language Models.
EMNLP, 2021. (.pdf) (code)
Table Representation Learning
- J.F. Bussotti et al
Generation of Training Examples for Tabular Natural Language Inference.
In SIGMOD, 2024. (pdf) (code) - Saeed, De Cao, Papotti
Querying Large Language Models with SQL.
In EDBT (Vision), 2024. (code) (pdf) (blog post) - Papicchio, Papotti, Cagliero
QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data.
In NeurIPS (Dataset and benchmark track), 2023. (code) (pdf) - M. Hulsebos, X. Deng, H. Sun, P. Papotti
Models and Practice of Neural Table Representations.
In SIGMOD (Tutorial), 2023. (code) (slides) (video) - G. Badaro, M. Saeed, P. Papotti
Transformers for Tabular Data Representation: A Survey of Models and Applications.
In Transactions of the ACL (TACL), 2023. (.pdf) - E. Veltri, G. Badaro, M. Saeed, P. Papotti
Data Ambiguity Profiling for the Generation of Training Examples.
In ICDE, 2023. (.pdf) (code) - G. Badaro, P. Papotti.
Transformers for Tabular Data Representation: Models and Applications.
VLDB (Tutorial), 2022. (.pdf) (slides) - E. Veltri, D. Santoro, G. Badaro, M. Saeed, P. Papotti
Pythia: Unsupervised Generation of Ambiguous Textual Claims from Relational Data.
In SIGMOD (demo), 2022. (.pdf) (code) - N. Ahmadi, A. Sand, P. Papotti.
Unsupervised Matching of Data and Text.
ICDE, 2022. (.pdf) (code)
Data Exchange
- P. Atzeni, L. Bellomarini, P. Papotti, R. Torlone.
Meta-Mappings for Schema Mapping Reuse.
PVLDB, 2019. (.pdf) - B. Marnette, G. Mecca, P. Papotti.
Scalable Data Exchange with Functional Dependencies.
PVLDB, 2010. (.pdf) (.ppt) (code) - G. Mecca, P. Papotti, S. Raunich.
Core Schema Mappings.
In SIGMOD Conference, 2009. (.pdf) (.ppt) (tech. report) (code) - M.A. Hernandez, P. Papotti, W.C. Tan.
Data Exchange with Data-Metadata Translations.
In VLDB Conference, 2008. (.pdf) (.ppt) - A. Raffio, D. Braga, S.Ceri, P. Papotti, M.A. Hernandez.
Clip: a Visual Language for Explicit Schema Mappings.
In ICDE Conference, 2008. (.pdf) - A. Fuxman, M.A.Hernandez, H.Ho,
R.J. Miller, P. Papotti, L.Popa.
Nested Mappings: Schema Mapping Reloaded.
In VLDB Conference, 2006. (.pdf) (.ppt)
Web Data Extraction and Integration
- M. Bronzi, V. Crescenzi, P. Merialdo, P. Papotti.
Extraction and Integration of Partially Overlapping Web Sources.
PVLDB, 2013. (.pdf) - L.Blanco, V.Crescenzi, P.Merialdo, P.Papotti.
Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources.
In CAiSE Conference, 2010. (.pdf)
Schema Exchange
- P. Papotti and R. Torlone.
Schema exchange: Generic mappings for transforming data and metadata.
In Data & Knowledge Engineering, 2009. (.pdf) - P. Papotti and R. Torlone.
Automatic Generation of Model Translations.
In CAiSE Conference, 2007. (.pdf)