Extend length of video which can be processed?

by Runzhi1949 - opened Aug 18, 2025

Runzhi1949

Aug 18, 2025

What is the maximum video length this model can process?
With the default parameters on an RTX 4090, it can handle videos up to about 6 minutes.
Can this be extended? I’d appreciate any advice on which parameters to adjust (are they all in config.json?).

Zhang199

Owner Aug 18, 2025

Are you using our eval.py file for testing? If so, you can refer to the video loading method in tinyllava/eval/eval_mlvu.py and modify the video loading part of tinyllava/eval/run_tiny_llava.py accordingly. On single RTX-4090, it should be possible to run tests on benchmarks like MLVU (which contain videos lasting several hours).

Runzhi1949

Aug 18, 2025

Thanks for your help. I will try to use function like yours to get keyframes in /eval/run_tiny_llava.py tomorrow.

Runzhi1949

Aug 21, 2025

•

edited Aug 21, 2025

I have read your code and tried modifying /eval/run_tiny_llava.py as you say, and It works.
But I still have a question: it seems that on a single RTX 4090 with 24GB of VRAM, the maximum number of frames it can handle is only about 48. Does that mean it can only process 48 frames in total? If I want to handle more frames, do I need to switch to a GPU with larger memory, or is there another solution?(I'm using the model to eval videos of about 15min)

Zhang199

Owner Aug 21, 2025

Sampling more frames will inevitably consume more GPU memory, which is also a current challenge in long video understanding. The default setting for our model is 16 frames.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment