Wai Man (Raymond) Si

CISPA – Helmholtz Center for Information Security

Saarbrücken, 66123 Saarland

I am a Ph.D student at CISPA Helmholtz Center for Information Security, supervised by Prof. Michael Backes and advised by Dr. Yang Zhang. Prior to that, I received my B.S. (2018) and M.S. (2021) degrees from Georgia Institute of Technology. where I am fortunate to work with Prof. Alexander Lerch and Prof. Mark Riedl.

My research focuses on attacks targeting NLP models, including adversarial and poisoning attacks. I am also interested in developing safer models through post-training techniques. Currently, my work explores LLM behavior using mechanistic interpretability, as well as lightweight methods for steering or modifying model behavior.

Honors and Awards

2025

Education Scholarship, Education and Youth Development Bureau of Macau
2023

Best Paper Finalist, CSAW Europe
2022

CCS Best Paper Award Honorable Mention, ACM

News

Apr 2026	Our paper titled “Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs” got accepted in ACL 2026 Main!
Apr 2026	Our paper titled “Reward Yourself: Efficient Self Rewards for Trustworthy Sampling” got accepted in ACL 2026 Findings!
Oct 2025	Our paper titled “Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms” got accepted in NeurIPS 2025!
Apr 2025	Our paper titled “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation” got accepted in ICLR 2025!
Apr 2023	Our paper titled “Two-in-One: A Model Hijacking Attack Against Text Generation Models” got accepted in USENIX Security 2023!
Nov 2022	Our paper “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots” got best paper award honorable mention at CCS 2022!
Aug 2022	Our paper titled “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbot” got accepted in CCS 2022!