SkinTokens
Pretrained checkpoints for SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging.
This repository stores the model checkpoints used by the SkinTokens codebase, including:
- the FSQ-CVAE that learns the SkinTokens discrete representation of skinning weights, and
- the TokenRig autoregressive Transformer (Qwen3-0.6B architecture, GRPO-refined) that jointly generates skeletons and SkinTokens from a 3D mesh.
SkinTokens is the successor to UniRig (SIGGRAPH '25). While UniRig treats skeleton and skinning as decoupled stages, SkinTokens unifies both into a single autoregressive sequence via learned discrete skin tokens, yielding 98%β133% improvement in skinning accuracy and 17%β22% improvement in bone prediction over state-of-the-art baselines.
What Is Included
The repository is organized exactly like the experiments/ folder expected by the main SkinTokens codebase:
experiments/
βββ articulation_xl_quantization_256_token_4/
β βββ grpo_1400.ckpt # TokenRig autoregressive rigging model (GRPO-refined)
βββ skin_vae_2_10_32768/
βββ last.ckpt # FSQ-CVAE for SkinTokens (skin-weight tokenizer)
Approximate total size: about 1.6 GB.
The training data (
ArticulationXLsplits and processed meshes) used to train these checkpoints will be released separately in a future update.
Checkpoint Overview
SkinTokens β FSQ-CVAE (skin-weight tokenizer)
File: experiments/skin_vae_2_10_32768/last.ckpt
Compresses sparse skinning weights into discrete SkinTokens using a Finite Scalar Quantized Conditional VAE with codebook levels [8, 8, 8, 5, 5, 5] (64,000 entries). Used both to tokenize ground-truth weights during training and to decode TokenRig's output tokens back into per-vertex skinning at inference.
TokenRig β autoregressive rigging model
File: experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt
Qwen3-0.6B-based Transformer trained on a composite of ArticulationXL 2.0 (70%), VRoid Hub (20%), and ModelsResource (10%), with quantization 256 and 4 skin tokens per bone, then refined with GRPO for 1,400 steps. This is the recommended checkpoint β it generates the skeleton and the SkinTokens in a single unified sequence.
Both checkpoints are required for end-to-end inference: TokenRig generates the rig as a token sequence, and the FSQ-CVAE decoder turns SkinTokens back into dense per-vertex skinning weights.
How To Use
The easiest way is to use the helper script in the main SkinTokens codebase, which downloads both checkpoints and the required Qwen3-0.6B config into the expected layout:
git clone https://github.com/VAST-AI-Research/SkinTokens.git
cd SkinTokens
python download.py --model
Option 1 β Download with hf CLI
hf download VAST-AI/SkinTokens \
--repo-type model \
--local-dir .
Option 2 β Download with huggingface_hub (Python)
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="VAST-AI/SkinTokens",
repo_type="model",
local_dir=".",
local_dir_use_symlinks=False,
)
Option 3 β Download individual files
from huggingface_hub import hf_hub_download
tokenrig_ckpt = hf_hub_download(
repo_id="VAST-AI/SkinTokens",
filename="experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt",
)
skin_vae_ckpt = hf_hub_download(
repo_id="VAST-AI/SkinTokens",
filename="experiments/skin_vae_2_10_32768/last.ckpt",
)
Option 4 β Web UI
Browse the Files and versions tab and download the folders manually, keeping the experiments/... layout intact.
After download, you should have:
experiments/articulation_xl_quantization_256_token_4/grpo_1400.ckpt
experiments/skin_vae_2_10_32768/last.ckpt
Run TokenRig With These Weights
Once the experiments/ folder is in place (and the environment is installed per the GitHub README), you can run:
python demo.py --input examples/giraffe.glb --output results/giraffe.glb --use_transfer
Or launch the Gradio demo:
python demo.py
Then open http://127.0.0.1:1024 in your browser.
Notes
- Keep the directory names unchanged. The SkinTokens code expects the exact
experiments/.../*.ckptlayout shown above. - TokenRig requires both checkpoints.
grpo_1400.ckptgenerates discrete tokens; the SkinTokens FSQ-CVAE (last.ckpt) is needed to decode them into per-vertex skinning weights. - Qwen3-0.6B architecture. TokenRig adopts the Qwen3-0.6B architecture (GQA + RoPE) for its autoregressive backbone; the Qwen3 config is fetched automatically by
download.py. - Hardware. An NVIDIA GPU with at least 14 GB of memory is required for inference.
- Training data. The checkpoints were trained on a composite of ArticulationXL 2.0 (70%), VRoid Hub (20%), and ModelsResource (10%); the processed data splits will be released as a separate dataset repository later.
Related Links
- Your 3D AI workspace β Tripo: https://www.tripo3d.ai
- Project page: https://zjp-shadow.github.io/works/SkinTokens/
- Paper (arXiv): https://arxiv.org/abs/2602.04805
- Main code repository: https://github.com/VAST-AI-Research/SkinTokens
- Predecessor: UniRig (SIGGRAPH '25)
- More from VAST-AI Research: https://huggingface.co/VAST-AI
Acknowledgements
- UniRig β the predecessor to this work.
- Qwen3 β the LLM architecture used by the TokenRig autoregressive backbone.
- 3DShape2VecSet, Michelangelo β the shape encoder backbone used by the FSQ-CVAE.
- FSQ β Finite Scalar Quantization, the discretization scheme behind SkinTokens.
- GRPO β the policy-optimization method used for RL refinement.
Citation
If you find this work helpful, please consider citing our paper:
@article{zhang2026skintokens,
title = {SkinTokens: A Learned Compact Representation for Unified Autoregressive Rigging},
author = {Zhang, Jia-Peng and Pu, Cheng-Feng and Guo, Meng-Hao and Cao, Yan-Pei and Hu, Shi-Min},
journal = {arXiv preprint arXiv:2602.04805},
year = {2026}
}