DylanASHillier 's Collections Learning from feedback dir
updated
Suppressing Pink Elephants with Direct Principle Feedback
Paper
• 2402.07896
• Published
• 11
Policy Improvement using Language Feedback Models
Paper
• 2402.07876
• Published
• 9
Direct Language Model Alignment from Online AI Feedback
Paper
• 2402.04792
• Published
• 34
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
Learning to Learn Faster from Human Feedback with Language Model
Predictive Control
Paper
• 2402.11450
• Published
• 22
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper
• 2402.10893
• Published
• 12
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper
• 2402.14830
• Published
• 24
Iterative Length-Regularized Direct Preference Optimization: A Case
Study on Improving 7B Language Models to GPT-4 Level
Paper
• 2406.11817
• Published
• 13
Bootstrapping Language Models with DPO Implicit Rewards
Paper
• 2406.09760
• Published
• 41
Artificial Generational Intelligence: Cultural Accumulation in
Reinforcement Learning
Paper
• 2406.00392
• Published
• 14
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
• 2406.00888
• Published
• 33
Aligning Teacher with Student Preferences for Tailored Training Data
Generation
Paper
• 2406.19227
• Published
• 25
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
LLMs
Paper
• 2406.18629
• Published
• 42
Can LLMs Learn by Teaching? A Preliminary Study
Paper
• 2406.14629
• Published
• 21
Teaching Embodied Reinforcement Learning Agents: Informativeness and
Diversity of Language Use
Paper
• 2410.24218
• Published
• 6
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Paper
• 2412.05718
• Published
• 4
Moto: Latent Motion Token as the Bridging Language for Robot
Manipulation
Paper
• 2412.04445
• Published
• 22
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback
Paper
• 2506.11930
• Published
• 53
Provably Learning from Language Feedback
Paper
• 2506.10341
• Published
• 8