Extend length of video which can be processed?

#2
by Runzhi1949 - opened

What is the maximum video length this model can process?
With the default parameters on an RTX 4090, it can handle videos up to about 6 minutes.
Can this be extended? I’d appreciate any advice on which parameters to adjust (are they all in config.json?).

Are you using our eval.py file for testing? If so, you can refer to the video loading method in tinyllava/eval/eval_mlvu.py and modify the video loading part of tinyllava/eval/run_tiny_llava.py accordingly. On single RTX-4090, it should be possible to run tests on benchmarks like MLVU (which contain videos lasting several hours).

Thanks for your help. I will try to use function like yours to get keyframes in /eval/run_tiny_llava.py tomorrow.

I have read your code and tried modifying /eval/run_tiny_llava.py as you say, and It works.
But I still have a question: it seems that on a single RTX 4090 with 24GB of VRAM, the maximum number of frames it can handle is only about 48. Does that mean it can only process 48 frames in total? If I want to handle more frames, do I need to switch to a GPU with larger memory, or is there another solution?(I'm using the model to eval videos of about 15min)

Sampling more frames will inevitably consume more GPU memory, which is also a current challenge in long video understanding. The default setting for our model is 16 frames.

Sign up or log in to comment