WaterDrum: Watermarking for Data-centric Unlearning Metric Paper • 2505.05064 • Published May 8, 2025 • 10
view article Article Announcing ReasoningLens — Visualizing and Diagnosing LLM Reasoning at a Glance Bowieee • Feb 3 • 7
OpenAI GPT-OSS - Steering Vectors & SAE Research Collection Open-source GPT models with steering vectors for controllable generation and behavior modification • 2 items • Updated Jan 16 • 1
Nemotron-Labs-Diffusion Collection Set of models of internal diffusion models • 7 items • Updated 2 days ago • 25
It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks Paper • 2602.12147 • Published Mar 4 • 4
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published 11 days ago • 45
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory Paper • 2605.15128 • Published 8 days ago • 60
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published Apr 7 • 121
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published 8 days ago • 73
view article Article Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality ibm-granite • 7 days ago • 29
Localized Sentiment Models Collection A group of sentiment detection models dedicated for specific languages • 2 items • Updated Jan 10, 2024 • 1
Finance Sentiment Collection A collections of models for detecting financial sentiment. • 8 items • Updated Jan 10, 2024 • 1