cosmos-task1-task2

LoRA fine-tunes of Cosmos-Predict2-2B-Video2World on the push-that-thing task1 and task2 datasets.

The checkpoints in this repo are unfused — the LoRA adapters (lora_A, lora_B) are still stored separately from the base layer weights. You must run the one-shot fusion step below before using them with the run_video2world.py inference pipeline.

Repository layout

Each iteration of training writes four parallel files. They are uploaded as-is:

model/iter_<NNNNNNNNN>.pt       # net.* + net_ema.* weights with LoRA adapters
optim/iter_<NNNNNNNNN>.pt       # optimizer state (resume only)
scheduler/iter_<NNNNNNNNN>.pt   # LR scheduler state (resume only)
trainer/iter_<NNNNNNNNN>.pt     # grad scaler + iteration counter (resume only)

For inference you only need model/iter_<NNNNNNNNN>.pt. The other three folders are only required to resume training from this iteration.

Inference: fuse, then run

The inference pipeline does not apply LoRA adapters at load time, so an unfused checkpoint will load but produce garbage outputs. Fuse it first.

1. Download

ITER=000002500  # set to the iteration you want
hf download push-that-thing/cosmos-task1-task2 \
    model/iter_${ITER}.pt --local-dir ./ckpts

2. Fuse

fuse_lora_ckpt.py lives in push-that-thing/pdt-mimic under the mimic-video submodule. Clone with submodules first:

git clone --recurse-submodules https://github.com/push-that-thing/pdt-mimic.git
# or, if already cloned:
git submodule update --init --recursive

python pdt-mimic/mimic-video/model/scripts/fuse_lora_ckpt.py \
    ./ckpts/model/iter_${ITER}.pt
# writes ./ckpts/model/iter_${ITER}_fused.pt

Fusion is deterministic: it walks every key, finds matching lora_A / lora_B pairs, computes base + (alpha / rank) * B @ A, and replaces the base_layer entry with the merged tensor. Both net.* (regular) and net_ema.* (EMA) weights are fused in the same pass.

3. Run video2world

python pdt-mimic/mimic-video/model/scripts/run_video2world.py \
    --dit_path ./ckpts/model/iter_${ITER}_fused.pt \
    --input_path /path/to/conditioning.mp4 \
    --num_conditional_frames 5 \
    --prompt "Push the white object to the right into the goal white circle." \
    --save_path ./out.mp4

Important: ALPHA must match training

fuse_lora_ckpt.py hardcodes ALPHA = 32. These checkpoints were trained with the same value, so the default works as-is. If you ever re-train with a different LoRA alpha you must update that constant before fusing or the merged weights will be scaled incorrectly.

Resuming training

To resume training from a given iteration, download all four folders for that iteration and place them under <job_dir>/checkpoints/{model,optim,scheduler,trainer}/iter_<NNNNNNNNN>.pt, then write iter_<NNNNNNNNN>.pt into <job_dir>/checkpoints/latest_checkpoint.txt. The Cosmos Checkpointer will pick it up automatically.

Do not resume from a fused checkpoint — fusion deletes the lora_A/lora_B keys that the optimizer state references.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Model tree for push-that-thing/cosmos-task1-task2

Base model

nvidia/Cosmos-Predict2-2B-Video2World

Adapter

(1)

this model