Chirag Agarwal's picture

Chirag Agarwal

AikyamLab

·

https://chirag-agarwall.github.io/

AI & ML interests

Explainability and Interpretability; AI Safety; AI Alignment

Recent Activity

upvoted a paper about 7 hours ago

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

submitted a paper about 7 hours ago

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

upvoted a paper 30 days ago

Towards Understanding the Robustness of Sparse Autoencoders

View all activity

Organizations

submitted a paper to Daily Papers about 7 hours ago

The Fragility of Chain-of-Thought Monitoring Across Typologically Diverse Languages

Paper • 2605.27901 • Published 2 days ago • 6

submitted a paper to Daily Papers 30 days ago

Towards Understanding the Robustness of Sparse Autoencoders

Paper • 2604.18756 • Published Apr 20 • 10

authored a paper about 2 years ago

Counterfactual Explanation Policies in RL

Paper • 2307.13192 • Published Jul 25, 2023

authored a paper over 2 years ago

SAM: The Sensitivity of Attribution Methods to Hyperparameters

Paper • 2003.08754 • Published Mar 4, 2020