Hannah PRO
hannahcyberey
AI & ML interests
None yet
Organizations
None yet
Steering the CensorShip
- Running on ZeroAgents2
DeepSeek-R1 Censorship Steering
๐ณ2Generate text with adjustable censorship control
- Running on ZeroAgents8
Refusal Censorship Steering
๐ฆ8Generate text with adjustable censorship control
-
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Paper โข 2504.17130 โข Published โข 1
models 0
None public yet
datasets 0
None public yet