Wai Man (Raymond) Si

prof_pic.jpg

CISPA – Helmholtz Center for Information Security

Saarbrücken, 66123 Saarland

I am a Ph.D student at CISPA Helmholtz Center for Information Security, supervised by Prof. Michael Backes and advised by Dr. Yang Zhang. Prior to that, I received my B.S. (2018) and M.S. (2021) degrees from Georgia Institute of Technology. where I am fortunate to work with Prof. Alexander Lerch and Prof. Mark Riedl.

My research focuses on attacks targeting NLP models, including adversarial and poisoning attacks. I am also interested in developing safer models through post-training techniques. Currently, my work explores LLM behavior using mechanistic interpretability, as well as lightweight methods for steering or modifying model behavior.


Honors and Awards

  • 2025
    Education Scholarship, Education and Youth Development Bureau of Macau
  • 2023
    Best Paper Finalist, CSAW Europe
  • 2022
    CCS Best Paper Award Honorable Mention, ACM

News

Apr 2026 Our paper titled “Pruning Unsafe Tickets: A Resource-Efficient Framework for Safer and More Robust LLMs” got accepted in ACL 2026 Main!
Apr 2026 Our paper titled “Reward Yourself: Efficient Self Rewards for Trustworthy Sampling” got accepted in ACL 2026 Findings!
Oct 2025 Our paper titled “Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms” got accepted in NeurIPS 2025!
Apr 2025 Our paper titled “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation” got accepted in ICLR 2025!
Apr 2023 Our paper titled “Two-in-One: A Model Hijacking Attack Against Text Generation Models” got accepted in USENIX Security 2023!
Nov 2022 Our paper “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots” got best paper award honorable mention at CCS 2022!
Aug 2022 Our paper titled “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbot” got accepted in CCS 2022!