Will support soon!
https://github.com/PKU-YuanGroup/Helios/issues/1
YSH PRO
BestWishYsh
AI & ML interests
None yet
Recent Activity
new activity
2 minutes ago
multimodalart/Helios-Distilled:test new activity
about 11 hours ago
multimodalart/Helios-Distilled:Fix sth updated
a Space about 11 hours ago
BestWishYsh/Helios-14B-RealTime Organizations
replied to their post 2 days ago
Post
2781
🚀 Introducing Helios: a 14B real-time long-video generation model!
It’s completely wild—faster than 1.3B models and achieves this without using self-forcing. Welcome to the new era of video generation! 😎👇
💻 Code: https://github.com/PKU-YuanGroup/Helios
🏠 Page: https://pku-yuangroup.github.io/Helios-Page
📄 Paper: Helios: Real Real-Time Long Video Generation Model (2603.04379)
🔹 True Single-GPU Extreme Speed ⚡️
No need to rely on traditional workarounds like KV-cache, quantization, sparse/linear attention, or TinyVAE. Helios hits an end-to-end 19.5 FPS on a single H100!
Training is also highly accessible: an 80GB VRAM can fit four 14B models.
🔹 Solving Long-Video "Drift" from the Core 🎥
Tired of visual drift and repetitive loops? We ditched traditional hacks (like error banks, self-forcing, or keyframe sampling).
Instead, our innovative training strategy simulates & eliminates drift directly, keeping minute-long videos incredibly coherent with stunning quality. ✨
🔹 3 Model Variants for Full Coverage 🛠️
With a unified architecture natively supporting T2V, I2V, and V2V, we are open-sourcing 3 flavors:
1️⃣ Base: Single-stage denoising for extreme high-fidelity.
2️⃣ Mid: Pyramid denoising + CFG-Zero for the perfect balance of quality & throughput.
3️⃣ Distilled: Adversarial Distillation (DMD) for ultra-fast, few-step generation.
🔹 Day-0 Ecosystem Ready 🌍
We wanted deployment to be a breeze from the second we launched. Helios drops with comprehensive Day-0 hardware and framework support:
✅ Huawei Ascend-NPU
✅ HuggingFace Diffusers
✅ vLLM-Omni
✅ SGLang-Diffusion
Try it out and let us know what you think!
It’s completely wild—faster than 1.3B models and achieves this without using self-forcing. Welcome to the new era of video generation! 😎👇
💻 Code: https://github.com/PKU-YuanGroup/Helios
🏠 Page: https://pku-yuangroup.github.io/Helios-Page
📄 Paper: Helios: Real Real-Time Long Video Generation Model (2603.04379)
🔹 True Single-GPU Extreme Speed ⚡️
No need to rely on traditional workarounds like KV-cache, quantization, sparse/linear attention, or TinyVAE. Helios hits an end-to-end 19.5 FPS on a single H100!
Training is also highly accessible: an 80GB VRAM can fit four 14B models.
🔹 Solving Long-Video "Drift" from the Core 🎥
Tired of visual drift and repetitive loops? We ditched traditional hacks (like error banks, self-forcing, or keyframe sampling).
Instead, our innovative training strategy simulates & eliminates drift directly, keeping minute-long videos incredibly coherent with stunning quality. ✨
🔹 3 Model Variants for Full Coverage 🛠️
With a unified architecture natively supporting T2V, I2V, and V2V, we are open-sourcing 3 flavors:
1️⃣ Base: Single-stage denoising for extreme high-fidelity.
2️⃣ Mid: Pyramid denoising + CFG-Zero for the perfect balance of quality & throughput.
3️⃣ Distilled: Adversarial Distillation (DMD) for ultra-fast, few-step generation.
🔹 Day-0 Ecosystem Ready 🌍
We wanted deployment to be a breeze from the second we launched. Helios drops with comprehensive Day-0 hardware and framework support:
✅ Huawei Ascend-NPU
✅ HuggingFace Diffusers
✅ vLLM-Omni
✅ SGLang-Diffusion
Try it out and let us know what you think!
replied to their post 2 days ago
Inference Speed:
posted an
update 2 days ago
Post
2781
🚀 Introducing Helios: a 14B real-time long-video generation model!
It’s completely wild—faster than 1.3B models and achieves this without using self-forcing. Welcome to the new era of video generation! 😎👇
💻 Code: https://github.com/PKU-YuanGroup/Helios
🏠 Page: https://pku-yuangroup.github.io/Helios-Page
📄 Paper: Helios: Real Real-Time Long Video Generation Model (2603.04379)
🔹 True Single-GPU Extreme Speed ⚡️
No need to rely on traditional workarounds like KV-cache, quantization, sparse/linear attention, or TinyVAE. Helios hits an end-to-end 19.5 FPS on a single H100!
Training is also highly accessible: an 80GB VRAM can fit four 14B models.
🔹 Solving Long-Video "Drift" from the Core 🎥
Tired of visual drift and repetitive loops? We ditched traditional hacks (like error banks, self-forcing, or keyframe sampling).
Instead, our innovative training strategy simulates & eliminates drift directly, keeping minute-long videos incredibly coherent with stunning quality. ✨
🔹 3 Model Variants for Full Coverage 🛠️
With a unified architecture natively supporting T2V, I2V, and V2V, we are open-sourcing 3 flavors:
1️⃣ Base: Single-stage denoising for extreme high-fidelity.
2️⃣ Mid: Pyramid denoising + CFG-Zero for the perfect balance of quality & throughput.
3️⃣ Distilled: Adversarial Distillation (DMD) for ultra-fast, few-step generation.
🔹 Day-0 Ecosystem Ready 🌍
We wanted deployment to be a breeze from the second we launched. Helios drops with comprehensive Day-0 hardware and framework support:
✅ Huawei Ascend-NPU
✅ HuggingFace Diffusers
✅ vLLM-Omni
✅ SGLang-Diffusion
Try it out and let us know what you think!
It’s completely wild—faster than 1.3B models and achieves this without using self-forcing. Welcome to the new era of video generation! 😎👇
💻 Code: https://github.com/PKU-YuanGroup/Helios
🏠 Page: https://pku-yuangroup.github.io/Helios-Page
📄 Paper: Helios: Real Real-Time Long Video Generation Model (2603.04379)
🔹 True Single-GPU Extreme Speed ⚡️
No need to rely on traditional workarounds like KV-cache, quantization, sparse/linear attention, or TinyVAE. Helios hits an end-to-end 19.5 FPS on a single H100!
Training is also highly accessible: an 80GB VRAM can fit four 14B models.
🔹 Solving Long-Video "Drift" from the Core 🎥
Tired of visual drift and repetitive loops? We ditched traditional hacks (like error banks, self-forcing, or keyframe sampling).
Instead, our innovative training strategy simulates & eliminates drift directly, keeping minute-long videos incredibly coherent with stunning quality. ✨
🔹 3 Model Variants for Full Coverage 🛠️
With a unified architecture natively supporting T2V, I2V, and V2V, we are open-sourcing 3 flavors:
1️⃣ Base: Single-stage denoising for extreme high-fidelity.
2️⃣ Mid: Pyramid denoising + CFG-Zero for the perfect balance of quality & throughput.
3️⃣ Distilled: Adversarial Distillation (DMD) for ultra-fast, few-step generation.
🔹 Day-0 Ecosystem Ready 🌍
We wanted deployment to be a breeze from the second we launched. Helios drops with comprehensive Day-0 hardware and framework support:
✅ Huawei Ascend-NPU
✅ HuggingFace Diffusers
✅ vLLM-Omni
✅ SGLang-Diffusion
Try it out and let us know what you think!
replied to their post 9 months ago
reacted to AdinaY's post with 🔥 9 months ago
Post
2680
🔥 New benchmark & dataset for Subject-to-Video generation
OPENS2V-NEXUS by Pekin University
✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match
OPENS2V-NEXUS by Pekin University
✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match
Thanks for sharing!
reacted to AdinaY's post with ❤️ 9 months ago
Post
2680
🔥 New benchmark & dataset for Subject-to-Video generation
OPENS2V-NEXUS by Pekin University
✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match
OPENS2V-NEXUS by Pekin University
✨ Fine-grained evaluation for subject consistency
BestWishYsh/OpenS2V-Eval
✨ 5M-scale dataset:
BestWishYsh/OpenS2V-5M
✨ New metrics – automatic scores for identity, realism, and text match
Post
3437
Introducing our new work: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation 🚀
We tackle the core challenges of Subject-to-Video Generation (S2V) by systematically building the first complete infrastructure—featuring an evaluation benchmark and a million-scale dataset! ✨
🧠 Introducing OpenS2V-Eval—the first fine-grained S2V benchmark, with 180 multi-domain prompts + real/synthetic test pairs. We propose NexusScore, NaturalScore, and GmeScore to precisely quantify model performance across subject consistency, naturalness, and text alignment ✔
📊 Using this framework, we conduct a comprehensive evaluation of 16 leading S2V models, revealing their strengths/weaknesses in complex scenarios!
🔥 OpenS2V-5M dataset now available! A 5.4M 720P HD collection of subject-text-video triplets, enabled by cross-video association segmentation + multi-view synthesis for diverse subjects & high-quality annotations 🚀
All resources open-sourced: Paper, Code, Data, and Evaluation Tools 📄
Let's advance S2V research together! 💡
🔗 Links:
Paper: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation (2505.20292)
Code: https://github.com/PKU-YuanGroup/OpenS2V-Nexus
Project: https://pku-yuangroup.github.io/OpenS2V-Nexus
LeaderBoard: BestWishYsh/OpenS2V-Eval
OpenS2V-Eval: BestWishYsh/OpenS2V-Eval
OpenS2V-5M: BestWishYsh/OpenS2V-5M
We tackle the core challenges of Subject-to-Video Generation (S2V) by systematically building the first complete infrastructure—featuring an evaluation benchmark and a million-scale dataset! ✨
🧠 Introducing OpenS2V-Eval—the first fine-grained S2V benchmark, with 180 multi-domain prompts + real/synthetic test pairs. We propose NexusScore, NaturalScore, and GmeScore to precisely quantify model performance across subject consistency, naturalness, and text alignment ✔
📊 Using this framework, we conduct a comprehensive evaluation of 16 leading S2V models, revealing their strengths/weaknesses in complex scenarios!
🔥 OpenS2V-5M dataset now available! A 5.4M 720P HD collection of subject-text-video triplets, enabled by cross-video association segmentation + multi-view synthesis for diverse subjects & high-quality annotations 🚀
All resources open-sourced: Paper, Code, Data, and Evaluation Tools 📄
Let's advance S2V research together! 💡
🔗 Links:
Paper: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation (2505.20292)
Code: https://github.com/PKU-YuanGroup/OpenS2V-Nexus
Project: https://pku-yuangroup.github.io/OpenS2V-Nexus
LeaderBoard: BestWishYsh/OpenS2V-Eval
OpenS2V-Eval: BestWishYsh/OpenS2V-Eval
OpenS2V-5M: BestWishYsh/OpenS2V-5M
posted an
update 9 months ago
Post
3437
Introducing our new work: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation 🚀
We tackle the core challenges of Subject-to-Video Generation (S2V) by systematically building the first complete infrastructure—featuring an evaluation benchmark and a million-scale dataset! ✨
🧠 Introducing OpenS2V-Eval—the first fine-grained S2V benchmark, with 180 multi-domain prompts + real/synthetic test pairs. We propose NexusScore, NaturalScore, and GmeScore to precisely quantify model performance across subject consistency, naturalness, and text alignment ✔
📊 Using this framework, we conduct a comprehensive evaluation of 16 leading S2V models, revealing their strengths/weaknesses in complex scenarios!
🔥 OpenS2V-5M dataset now available! A 5.4M 720P HD collection of subject-text-video triplets, enabled by cross-video association segmentation + multi-view synthesis for diverse subjects & high-quality annotations 🚀
All resources open-sourced: Paper, Code, Data, and Evaluation Tools 📄
Let's advance S2V research together! 💡
🔗 Links:
Paper: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation (2505.20292)
Code: https://github.com/PKU-YuanGroup/OpenS2V-Nexus
Project: https://pku-yuangroup.github.io/OpenS2V-Nexus
LeaderBoard: BestWishYsh/OpenS2V-Eval
OpenS2V-Eval: BestWishYsh/OpenS2V-Eval
OpenS2V-5M: BestWishYsh/OpenS2V-5M
We tackle the core challenges of Subject-to-Video Generation (S2V) by systematically building the first complete infrastructure—featuring an evaluation benchmark and a million-scale dataset! ✨
🧠 Introducing OpenS2V-Eval—the first fine-grained S2V benchmark, with 180 multi-domain prompts + real/synthetic test pairs. We propose NexusScore, NaturalScore, and GmeScore to precisely quantify model performance across subject consistency, naturalness, and text alignment ✔
📊 Using this framework, we conduct a comprehensive evaluation of 16 leading S2V models, revealing their strengths/weaknesses in complex scenarios!
🔥 OpenS2V-5M dataset now available! A 5.4M 720P HD collection of subject-text-video triplets, enabled by cross-video association segmentation + multi-view synthesis for diverse subjects & high-quality annotations 🚀
All resources open-sourced: Paper, Code, Data, and Evaluation Tools 📄
Let's advance S2V research together! 💡
🔗 Links:
Paper: OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation (2505.20292)
Code: https://github.com/PKU-YuanGroup/OpenS2V-Nexus
Project: https://pku-yuangroup.github.io/OpenS2V-Nexus
LeaderBoard: BestWishYsh/OpenS2V-Eval
OpenS2V-Eval: BestWishYsh/OpenS2V-Eval
OpenS2V-5M: BestWishYsh/OpenS2V-5M
Post
3591
🚨 Hot Take: GPT-4o might NOT be a purely autoregressive model! 🚨
There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇
Code: https://github.com/PicoTrex/GPT-ImgEval
Dataset: Yejy53/GPT-ImgEval
Paper: GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation (2504.02782)
There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇
Code: https://github.com/PicoTrex/GPT-ImgEval
Dataset: Yejy53/GPT-ImgEval
Paper: GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation (2504.02782)
posted an
update 11 months ago
Post
3591
🚨 Hot Take: GPT-4o might NOT be a purely autoregressive model! 🚨
There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇
Code: https://github.com/PicoTrex/GPT-ImgEval
Dataset: Yejy53/GPT-ImgEval
Paper: GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation (2504.02782)
There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇
Code: https://github.com/PicoTrex/GPT-ImgEval
Dataset: Yejy53/GPT-ImgEval
Paper: GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation (2504.02782)



