# CosyVoice Version Information ## Current Version: v1.0-cosyvoice-300m ### Models Installed: - CosyVoice-300M (Main model) - CosyVoice-300M-SFT (Supervised Fine-Tuning) - CosyVoice-300M-direct (Zero-shot inference) - CosyVoice-ttsfrd (Required resources) ### Features: - Multi-language TTS (Chinese, English, Japanese, Korean) - Zero-shot voice cloning - Cross-lingual synthesis - GPU acceleration with RTX A5000 ### Performance: - Generation speed: ~1x real-time - Model loading: 5-10 seconds - GPU: RTX A5000 (24GB VRAM) ### Known Issues: - Chinese accent in English/Portuguese synthesis - Model trained primarily on Chinese data ### Next Version: - CosyVoice2-0.5B (downloading) - Improved English pronunciation - Lower latency (150ms) - 30-50% reduction in pronunciation errors