About me
I am a fourth-year PhD student at Tsinghua University. I am fortunately advised by Prof. Juanzi Li and also work closely with Prof. Zhiyuan Liu. Previously, I received my B.E. in Computer Science and Technology from Tsinghua University in 2020. In 2019, I visited Mila and worked with Prof. Jian Tang. You can find my CV here.
My research interest lies in natural language processing and knowledge engineering. The research directions I am fascinated in and working on are:
- Understanding Lanaguge Models (Mechanistic Interpretability, Probing, etc.)
- How to understand the working mechanisms of language models and how can the findings help us improve and steer language models.
- Projects: Skill Neuron, Intrinsic Task Subspace, Conceptual Knowledge Probing
- Event Understanding (Event Extraction, Event Relation Extraction, etc.)
- How to enable models understand complicated events and their interrelations like causalities.
- Datasets: MAVEN, MAVEN-ERE
- Toolkit: OmniEvent, Evaluation Pitfalls
News
- [Jun. 2023] Check out KoLA, our new evolving world knowledge benchmark for LLMs.
- [Oct. 2022] Release a nice event extraction toolkit OmniEvent. Welcome to try it!
Highlighted Publications
Please refer to publications or my Google Scholar profile for the full list.
- Xiaozhi Wang*, Kaiyue Wen*, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li. Finding Skill Neurons in Pre-trained Transformer-based Language Models. The Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). [pdf] [code]
- Xiaozhi Wang*, Yulin Chen*, Ning Ding, Hao Peng, Zimu Wang, Yankai Lin, Xu Han, Lei Hou, Juanzi Li, Zhiyuan Liu, Peng Li, Jie Zhou. MAVEN-ERE: A Unified Large-scale Dataset for Event Coreference, Temporal, Causal, and Subevent Relation Extraction. The Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). [pdf] [code] [CodaLab]
- Hao Peng*, Xiaozhi Wang*, Shengding Hu, Hailong Jin, Lei Hou, Juanzi Li, Zhiyuan Liu, Qun Liu. COPEN: Probing Conceptual Knowledge in Pre-trained Language Models. The Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). [pdf] [code] [CodaLab]
- Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhengyan Zhang, Zhiyuan Liu, Juanzi Li, Jian Tang. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation. Transactions of the Association for Computational Linguistics (TACL), 2021. [pdf] [code] [dataset]
- Xiaozhi Wang, Ziqi Wang, Xu Han, Wangyi Jiang, Rong Han, Zhiyuan Liu, Juanzi Li, Peng Li, Yankai Lin, Jie Zhou. MAVEN: A Massive General Domain Event Detection Dataset. The Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). [pdf] [code] [CodaLab] [leaderboard]
Professional Services
- Program Committee Member/Reviewer (Conference): AAAI/IJCAI/COLING 2020, AAAI/ACL/EMNLP 2021, AAAI/COLING/SIGIR/CCKS/EMNLP 2022, AAAI/ACL/EMNLP/NeurIPS 2023, ACL Rolling Review.
- Reviewer (Journal): Neurocomputing, Complex & Intelligent Systems, AI Open, IEEE TASLP, Frontiers of Computer Science