I am an AI research scientist at Meta. Before Meta, I was a senior research scientist Before at Megagon Labs. I have worked on research topics in data management, database theory, and natural language processing. In particular, my recent research interests have been focusing on applying machine learning techniques to data preparation and integration tasks, including entity matching, data cleaning, data discovery, and table annotation.
Before joining Megagon, I received a PhD degree in Computer Science from UC San Diego (UCSD), advised by Alin Deutsch and Victor Vianu. My PhD thesis is on the Verification of Data-driven workflows, a research direction that lies in the intersection of Database Theory, Software Model Checking, and Business Process Management. Before UCSD, I obtained my undergraduate degree in Computer Science from Hong Kong University of Science and Technology.
Bryan Wang, Yuliang Li, Zhaoyang Lv, Haijun Xia, Yan Xu, Raj Sodhi, “LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing”, IUI 2024 [Link][Video]
2023
Wang-Chiew Tan, Jane Dwivedi-Yu, Yuliang Li, Lambert Mathias, Marzieh Saeidi, Jing Nathan Yan, Alon Y. Halevy, “TimelineQA: A Benchmark for Question Answering over Timelines”, ACL (Findings) 2023 [Repo]
Wang-Chiew Tan, Yuliang Li, Pedro Rodriguez, Richard James, Xi Victoria Lin, Alon Y. Halevy, Wen-tau Yih, “Reimagining Retrieval Augmented Language Models for Answering Queries”, ACL (Findings) 2023 [Link]
Grace Fan, Jin Wang, Yuliang Li, Dan Zhang, and Renée Miller. “Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning.” in VLDB 2023 [ArXiv]
Runhui Wang, Yuliang Li, Jin Wang, “Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation”, in ICDE 2023 in [ArXiv]
Jin Wang, Yuliang Li, “Minun: Evaluating Counterfactual Explanations for Entity Matching”, in DEEM 2022 (Best paper award, co-located w. SIGMOD) [Link]
Jin Wang, Yuliang Li, Wataru Hirota, Eser Kandogan, “Machop: an End-to-End Generalized Entity Matching Framework”, in aiDM 2022 (co-located w. SIGMOD) [Link]
Yu-Ching Hu, Yuliang Li, Hung-Wei Tseng, “TCUDB: Accelerating Database with Tensor Processors”, in SIGMOD 2022 [ArXiv]
Yoshihiko Suhara, Jinfeng Li, Yuliang Li, Dan Zhang, Cagatay Demiralp, Chen Chen, Wang-Chiew Tan, “Annotating Columns with Pre-trained Language Models”, in SIGMOD 2022 [ArXiv]
2021
Jin Wang, Yuliang Li, Wataru Hirota, “Machamp: A Generalized Entity Matching Benchmark”, In CIKM 2021 [ArXiv][Datasets]
Yuliang Li, Xiaolan Wang, Zhengjie Miao, Wang-Chiew Tan, “Data Augmentation for ML-driven Data Preparation and Integration”, In VLDB Tutorial 2021 [Link][Videos][Slides]
Zhengjie Miao, Yuliang Li, Xiaolan Wang, “Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond”, In SIGMOD 2021 [Link][Blog][Demo][Code]
Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, Wang-Chiew Tan, “Deep Entity Matching with Pre-Trained Language Models”, In VLDB 2021 [ArXiv][Code]
2020
Jinfeng Li, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan, “Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments]”, In VLDB 2020 [ArXiv]
Xiaolan Wang, Yoshihiko Suhara, Natalie Nuno, Yuliang Li, Jinfeng Li, Nofar Carmeli, Stefanos Angelidis, Eser Kandogan and Wang-Chiew Tan, “ExtremeReader: An interactive explorer for customizable and explainable review summarization”, In theWebConf (WWW) 2020 (Demo track)
Zhengjie Miao, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan, “Snippext: Semi-supervised Opinion Mining with Augmented Data”, In theWebConf (WWW) 2020 [ArXiv][Slides][Code]
Xiong Zhang, Jonathan Engel, Sara Evensen, Yuliang Li, Çagatay Demiralp, Wang-Chiew Tan, “Teddy: A System for Interactive Review Analysis”, In CHI 2020 [Paper][Video][Code]
2019
Yuliang Li, Aaron Xixuan Feng, Jinfeng Li, Saran Mumick, Alon Halevy, Vivian Li, Wang-Chiew Tan, “Subjective Databases”, In PVLDB 2019 (invited to VLDBJ as “one of the best paper candidates”; Finalist of the Recruit Engine Forum) [ArXiv][Slides][Poster]
Sara Evensen, Aaron Feng, Alon Halevy, Jinfeng Li, Vivian Li, Yuliang Li, Huining Liu, George Mihaila, John Morales, Natalie Nuno, Ekaterina Pavlovic, Wang-Chiew Tan, Xiaolan Wang, “Voyageur: An Experiential Travel Search Engine”, In the Web Conference (WWW) 2019 (Demo track) [ArXiv][Poster][Demo]
Yuliang Li, Jianguo Wang, Benjamin Pullman, Nuno Bandeira, Yannis Papakonstantinou, “Index-based High-dimensional Cosine Threshold Querying with Optimality Guarantees”, In ICDT 2019 (Invited to ToCS special issue, collection the best of ICDT 2019) [Link][Full version][Slides]
2018 or older
Tara Astigarraga, Xiaoyan Chen, Yaoliang Chen, Jingxiao Gu, Richard Hull, Limei Jiao, Yuliang Li, and Petr Novotny, “Empowering Business-Level Blockchain Users with a Rules Framework for Smart Contracts”, In ICSOC 2018 [Link]
Yuliang Li, Alin Deutsch, Victor Vianu, “VERIFAS: A Practical Verifier for Artifact Systems”, In PVLDB 2018 [Link][ArXiv][Slides][Code]
Yuliang Li, “Practical Verification of Hierarchical Artifact Systems,” In VLDB PhD Workshop 2017 [Link][Slides].
Alin Deutsch, Yuliang Li, Victor Vianu, “Verification of Hierarchical Artifact Systems,” In PODS 2016 [Link][ArXiv][Slides]
PhD Thesis
Yuliang Li, “Verification of Hierarchical Data-Driven Workflows”, 2018 [link][slides]