Wai Man (Raymond) Si

CISPA – Helmholtz Center for Information Security

Saarbrücken, 66123 Saarland

I am a Ph.D student at CISPA Helmholtz Center for Information Security, advised by Prof. Michael Backes and Dr. Yang Zhang. Prior to that, I received my B.S. (2018) and M.S. (2021) degrees from Georgia Institute of Technology. where I am fortunate to work with Prof. Alexander Lerch and Prof. Mark Riedl.

My research focuses on attacks targeting NLP models, including adversarial and poisoning attacks. I am also interested in developing safer models through post-training techniques. Currently, my work explores LLM behavior using mechanistic interpretability, as well as lightweight methods for steering or modifying model behavior.

Honors and Awards

2025

Education Scholarship, Education and Youth Development Bureau of Macau
2023

Best Paper Finalist, CSAW Europe
2022

CCS Best Paper Award Honorable Mention, ACM

News

Apr 2025	Our paper titled “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation” got accepted in ICLR 2025!
Apr 2023	Our paper titled “Two-in-One: A Model Hijacking Attack Against Text Generation Models” got accepted in USENIX Security 2023!
Nov 2022	Our paper “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbots” got best paper award honorable mention at CCS 2022!
Aug 2022	Our paper titled “Why So Toxic? Measuring and Triggering Toxic Behavior in Open-Domain Chatbot” got accepted in CCS 2022!

Selected publications

NeurIPS

Finding and Reactivating Post-Trained LLMs’ Hidden Safety Mechanisms

Mingjie Li, Wai Man Si, Michael Backes, and 2 more authors

In , 2025
ICLR

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

Mingjie Li, Wai Man Si, Michael Backes, and 2 more authors

In , 2025
USENIX

Two-in-One: A Model Hijacking Attack Against Text Generation Models

Wai Man Si, Michael Backes, Yang Zhang, and 1 more author

In , 2023

PDF
CCS

Why So Toxic?: Measuring and Triggering Toxic Behavior in Open-Domain Chatbots

Wai Man Si, Michael Backes, Jeremy Blackburn, and 4 more authors

In , 2022

PDF