view post Post 9789 deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️ > pretty insane it can parse and re-render charts in HTML> it uses CLIP and SAM features concatenated, so better grounding> very efficient per vision tokens/performance ratio> covers 100 languages See translation
Jan 26 Releases robbyant/lingbot-world-base-cam Image-to-Video • Updated Feb 2 • 327 nvidia/C-RADIOv4-H Feature Extraction • Updated Jan 30 • 5.39k • 59 deepseek-ai/DeepSeek-OCR-2 Image-Text-to-Text • Updated Feb 3 • 1.33M • 862 arcee-ai/Trinity-Large-Base Text Generation • Updated Jan 27 • 135 • 52
Jan 19 Releases Nemotron ColEmbed V2 Collection State-of-the-Art Late Interaction Vision-Language Embedding Models • 3 items • Updated about 5 hours ago • 10 Qwen/Qwen3-TTS-12Hz-1.7B-Base Updated Jan 23 • 2.29M • 348 fal/flux-2-klein-4B-outpaint-lora Image-to-Image • Updated Jan 21 • • 65 Qwen/Qwen3-TTS-Tokenizer-12Hz Audio-to-Audio • Updated Jan 29 • 99.5k • 51
Nemotron ColEmbed V2 Collection State-of-the-Art Late Interaction Vision-Language Embedding Models • 3 items • Updated about 5 hours ago • 10
Jan 26 Releases robbyant/lingbot-world-base-cam Image-to-Video • Updated Feb 2 • 327 nvidia/C-RADIOv4-H Feature Extraction • Updated Jan 30 • 5.39k • 59 deepseek-ai/DeepSeek-OCR-2 Image-Text-to-Text • Updated Feb 3 • 1.33M • 862 arcee-ai/Trinity-Large-Base Text Generation • Updated Jan 27 • 135 • 52
Jan 19 Releases Nemotron ColEmbed V2 Collection State-of-the-Art Late Interaction Vision-Language Embedding Models • 3 items • Updated about 5 hours ago • 10 Qwen/Qwen3-TTS-12Hz-1.7B-Base Updated Jan 23 • 2.29M • 348 fal/flux-2-klein-4B-outpaint-lora Image-to-Image • Updated Jan 21 • • 65 Qwen/Qwen3-TTS-Tokenizer-12Hz Audio-to-Audio • Updated Jan 29 • 99.5k • 51
Nemotron ColEmbed V2 Collection State-of-the-Art Late Interaction Vision-Language Embedding Models • 3 items • Updated about 5 hours ago • 10
Running on CPU Upgrade 18 Daggr Image To 3d 👀 Convert images into 3D assets with background removal and enhancement
Running on Zero Featured 111 SAM3 Video Segmentation 🐠 Track and label objects in videos using text prompts or clicks