Dev Mode Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

nielsr submitted a paper 1 day ago

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

nielsr submitted a paper 12 days ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

nielsr submitted a paper 16 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

View all activity

nielsr

submitted a paper to Daily Papers 1 day ago

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Paper • 2603.28130 • Published 5 days ago • 7

fffiloni

posted an update 3 days ago

Post

2510

✅ Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 🚀

It lets you:
- 🎙️ Separate multiple speakers from an audio file
- 🎬 Extract each speaker directly from a video
- 🎧 Split audio into dialog, music, and sound effects (DnR)
- 🎥 Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here 👉 fffiloni/TIGER-audio-extraction

fffiloni

posted an update 4 days ago

Post

2138

AniDoc is back 🎉

I’ve fixed the Space and brought it back to life:
- ✅ Working again after being broken for a while
- ✅ Updated to Gradio 6
- ✅ Compatible with ZeroGPU
- ✅ Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc

Severian

posted an update 10 days ago

Post

4390

I’ve been working on a new mathematical approach to real-time video compositing and background removal, and I wanted to share a live demo.

Traditionally, real-time keyers either use 3D color-space bounding boxes (which struggle with semi-transparent hair and motion blur) or heavy Machine Learning models (which require massive GPU compute and often suffer from temporal "jitter" on the edges).

I wanted to see if I could solve this using purely deterministic math so it could run client-side in a standard browser.

The engine uses a custom mathematical framework I call CMT SRL SEFA. Instead of looking at raw color values or guessing semantics like an AI, it treats the video feed as complex-encoded sequences. It uses harmonic frequencies to map phase geometry and applies a "Stability Cost Function" to find the global minimum stability. In short: it isolates the foreground from the background by measuring signal complexity and structural contradictions.

Give it a try using your own messy plates and such. As I am not a VFX artist, I am curious to hear thoughts and what should be improved upon and made better

https://severian-cmt-sefa-realtime-vfx-keyer.hf.space/

2 replies

DongfuJiang

authored 3 papers 10 days ago

EvolveCoder: Evolving Test Cases via Adversarial Verification for Code Reinforcement Learning

Paper • 2603.12698 • Published 23 days ago

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Paper • 2603.19220 • Published 16 days ago • 65

OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Paper • 2603.20278 • Published 18 days ago • 91

nielsr

submitted a paper to Daily Papers 12 days ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Paper • 2603.19209 • Published 16 days ago • 5

nielsr

submitted a paper to Daily Papers 16 days ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

Paper • 2603.14482 • Published 20 days ago • 26

fffiloni

posted an update 17 days ago

Post

4096

I brought DALL·E mini back to life 🤖🎨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model — still weird, still fun 😄

3 replies

nielsr

submitted a paper to Daily Papers 17 days ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published 18 days ago • 20

Nymbo

posted an update 20 days ago

Post

6406

We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.

3 replies

fffiloni

posted an update 22 days ago

Post

478

A clearer demo for TADA (now multilingual) 🔊🌍

I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

👉 fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)

nielsr

authored a paper 22 days ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published 23 days ago • 64

xianbao

submitted a paper to Daily Papers 23 days ago

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published 25 days ago • 10

GeorgeBredis

authored a paper about 1 month ago

Next Embedding Prediction Makes World Models Stronger

Paper • 2603.02765 • Published Mar 3 • 20

GeorgeBredis

submitted a paper to Daily Papers about 1 month ago

Next Embedding Prediction Makes World Models Stronger

Paper • 2603.02765 • Published Mar 3 • 20

victor

submitted a paper to Daily Papers about 1 month ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published Feb 25 • 49

nielsr

submitted a paper to Daily Papers about 1 month ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Paper • 2602.17807 • Published Feb 19 • 7

Tonic

posted an update about 1 month ago

Post

3466

🤔 Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named

kurakurai ?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .

4 replies

AI & ML interests

Recent Activity

Team members 145

dev-mode-explorers's activity