view post Post 4741 OpenAI is now open again! Check out OpenAI’s brand new gpt‑oss‑20b model hosted on ZeroGPU 🤗 merterbak/gpt-oss-20b-demo See translation
view post Post 5029 Qwen 3 technical report released🚀Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf See translation
Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 110 LoRA Learns Less and Forgets Less Paper • 2405.09673 • Published May 15, 2024 • 91 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 53 RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 53
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Qwen 3 Alibaba's Qwen 3 models Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26, 2025 • 10.2M • 1.07k Qwen/Qwen3-1.7B Text Generation • Updated Jul 26, 2025 • 4.08M • • 420 Qwen/Qwen3-4B Text Generation • 4B • Updated Jul 26, 2025 • 5.1M • 548 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26, 2025 • 4.68M • • 924
Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 110 LoRA Learns Less and Forgets Less Paper • 2405.09673 • Published May 15, 2024 • 91 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 53 RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 53
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Qwen 3 Alibaba's Qwen 3 models Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26, 2025 • 10.2M • 1.07k Qwen/Qwen3-1.7B Text Generation • Updated Jul 26, 2025 • 4.08M • • 420 Qwen/Qwen3-4B Text Generation • 4B • Updated Jul 26, 2025 • 5.1M • 548 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26, 2025 • 4.68M • • 924
pinned Running on Zero Featured 412 DeepSeek OCR 2 Demo 🚀 Try out DeepSeek-OCR-2 on your PDFs or images
Running on Zero 6 Seed Coder 8B Instruct 🚀 ByteDance Seed's coding focused Seed-Coder-8B-Instruct model
merterbak/Mistral-Small-3.1-24B-Instruct-2503-GGUF Text Generation • 24B • Updated Apr 27, 2025 • 147 • 1