# CosyVoice Version Information

## Current Version: v1.0-cosyvoice-300m

### Models Installed:
- CosyVoice-300M (Main model)
- CosyVoice-300M-SFT (Supervised Fine-Tuning)
- CosyVoice-300M-direct (Zero-shot inference)
- CosyVoice-ttsfrd (Required resources)

### Features:
- Multi-language TTS (Chinese, English, Japanese, Korean)
- Zero-shot voice cloning
- Cross-lingual synthesis
- GPU acceleration with RTX A5000

### Performance:
- Generation speed: ~1x real-time
- Model loading: 5-10 seconds
- GPU: RTX A5000 (24GB VRAM)

### Known Issues:
- Chinese accent in English/Portuguese synthesis
- Model trained primarily on Chinese data

### Next Version:
- CosyVoice2-0.5B (downloading)
- Improved English pronunciation
- Lower latency (150ms)
- 30-50% reduction in pronunciation errors