Wai Man (Raymond) Si
CISPA – Helmholtz Center for Information Security
Saarbrücken, 66123 Saarland
I am a Ph.D student at CISPA Helmholtz Center for Information Security, supervised by Prof. Michael Backes and advised by Dr. Yang Zhang. Prior to that, I received my B.S. (2018) and M.S. (2021) degrees from Georgia Institute of Technology. where I am fortunate to work with Prof. Alexander Lerch and Prof. Mark Riedl.
My research focuses on attacks targeting NLP models, including adversarial and poisoning attacks. I am also interested in developing safer models through post-training techniques. Currently, my work explores LLM behavior using mechanistic interpretability, as well as lightweight methods for steering or modifying model behavior.
Honors and Awards
- 2025Education Scholarship, Education and Youth Development Bureau of Macau
- 2023Best Paper Finalist, CSAW Europe
- 2022CCS Best Paper Award Honorable Mention, ACM
News
| Apr 2026 | Our paper titled “Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs” got accepted in ACL 2026 Main! |
|---|---|
| Apr 2026 | Our paper titled “Reward Yourself: Efficient Self Rewards for Trustworthy Sampling” got accepted in ACL 2026 Findings! |
| Oct 2025 | Our paper titled “Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms” got accepted in NeurIPS 2025! |
| Apr 2025 | Our paper titled “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation” got accepted in ICLR 2025! |
| Apr 2023 | Our paper titled “Two-in-One: A Model Hijacking Attack Against Text Generation Models” got accepted in USENIX Security 2023! |
| Nov 2022 | Our paper “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots” got best paper award honorable mention at CCS 2022! |
| Aug 2022 | Our paper titled “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbot” got accepted in CCS 2022! |