VBench Local Cache Dependency List๏
Cache Root Directory: Defaults to the environment variable
VBENCH_CACHE_DIR; if not set, it falls back to~/.cache/vbench. The same variable name can also be set at the top level of an AISBench VBench example configuration (see โSpecifying the Cache Directory in AISBench Configurationsโ below).One-click Download Script:
ais_bench/configs/vbench_examples/download_vbench_cache.shautomatically downloads/prepares resources following the layout below.
Target Directory Layout๏
The (default) ~/.cache/vbench/ should at least contain:
ViCLIP/ViClip-InternVid-10M-FLT.pthViCLIP/bpe_simple_vocab_16e6.txt.gzaesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pthamt_model/amt-s.pthbert_model/bert-base-uncased/...(Full snapshot of the HuggingFace BERT repo)caption_model/tag2text_swin_14m.pthclip_model/ViT-B-32.ptclip_model/ViT-L-14.ptdino_model/dino_vitbase16_pretrain.pthdino_model/facebookresearch_dino_main/...(Clone of the official DINO repo)grit_model/grit_b_densecap_objectdet.pthpyiqa_model/musiq_spaq_ckpt-358bb6af.pthraft_model/models/raft-things.pth(Plus other RAFT models extracted from the zip)umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth
Dependencies and Download Sources๏
CLIP Models
Paths:
clip_model/ViT-B-32.pt,clip_model/ViT-L-14.ptUsed by:
background_consistency,appearance_style,aesthetic_quality, etc.Sources:
ViT-B-32.pt:https://openaipublic.azureedge.net/clip/models/40d3657159.../ViT-B-32.ptViT-L-14.pt:https://openaipublic.azureedge.net/clip/models/b8cca3fd4.../ViT-L-14.pt
UMT Model (Human Action)
Path:
umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pthUsed by:
human_actionSource:
https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth
AMT-S Model (Motion Smoothness)
Path:
amt_model/amt-s.pthUsed by:
motion_smoothnessSource:
https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth
RAFT Optical Flow Model
Paths:
Root directory:
raft_model/Main model:
raft_model/models/raft-things.pth
Used by:
dynamic_degree,static_filter, etc.Source (zip):
https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip
DINO Model (subject_consistency, local mode)
Paths:
Repo:
dino_model/facebookresearch_dino_main/Weights:
dino_model/dino_vitbase16_pretrain.pth
Used by:
subject_consistencySources:
Repo:
https://github.com/facebookresearch/dinoWeights:
https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth
Aesthetic Predictor (LAION)
Path:
aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pthUsed by:
aesthetic_qualitySource:
https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true
MUSIQ / PyIQA (Image Quality)
Path:
pyiqa_model/musiq_spaq_ckpt-358bb6af.pthUsed by:
imaging_qualitySource:
https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth
GRiT Dense Captioning Model
Path:
grit_model/grit_b_densecap_objectdet.pthUsed by:
object_class,multiple_objects,color,spatial_relationshipSource:
https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth
Tag2Text Scene Description Model
Path:
caption_model/tag2text_swin_14m.pthUsed by:
sceneSource:
https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth
ViCLIP Video-Text Model + BPE Vocab
Paths:
Weights:
ViCLIP/ViClip-InternVid-10M-FLT.pthBPE:
ViCLIP/bpe_simple_vocab_16e6.txt.gz
Used by:
temporal_style,overall_consistencySources:
Weights:
https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pthBPE:
https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz
BERT base (bert-base-uncased)
Path:
bert_model/bert-base-uncased/(Full HF repo snapshot)Used by: Text encoding parts of
Tag2TextandGRiTLocal Lookup Logic:
First the directory pointed to by the
VBENCH_BERT_PATHenvironment variableOtherwise try
CACHE_DIR/bert_model/bert-base-uncasedIf neither exists, fall back to the HuggingFace hub id
bert-base-uncased
Recommended Download (consistent with the script):
Requires
huggingface-cli, e.g.:pip install "huggingface_hub[cli]"huggingface-cli download bert-base-uncased --local-dir ~/.cache/vbench/bert_model/bert-base-uncased --local-dir-use-symlinks False
Usage๏
Make sure
wgetandgitare installed. If you also need automatic BERT download, installhuggingface-clias well.Run from the repository root:
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
To change the cache root directory, set it before running:
export VBENCH_CACHE_DIR=/your/custom/cache/dir
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
The script automatically skips existing files and is safe to run multiple times.
Specifying the Cache Directory in AISBench Configurations๏
In a VBench example configuration (such as eval_vbench_standard.py), define the variable at the top level alongside DATA_PATH, e.g.:
VBENCH_CACHE_DIR = "/your/custom/cache/dir"
The Python-style alias vbench_cache_dir is also supported; if both exist, VBENCH_CACHE_DIR takes precedence.
Priority (effective inside the evaluation subprocess that runs VBenchEvalTask, and only before vbench is imported for the first time):
If the configuration sets a non-empty
VBENCH_CACHE_DIRorvbench_cache_dir, it is written toos.environ['VBENCH_CACHE_DIR'](with~and$VARexpanded) and overrides any existing same-name environment variable in this subprocess.If not set in the configuration, the
VBENCH_CACHE_DIRexported in the shell before launchingais_benchis used.Otherwise, vbench falls back to the default
~/.cache/vbench.
Relationship with the One-click Script: download_vbench_cache.sh only reads shell environment variables and does not read the Python configuration file above. To keep the download directory consistent with the evaluation, export the same VBENCH_CACHE_DIR before running the script, or specify the same absolute path in both places.
Manual Download and Placement Guide (When the Script Fails)๏
When network or permission issues cause download_vbench_cache.sh to fail repeatedly, you can follow this section to manually download each dependency and place it in the corresponding path, bypassing the one-click script.
Global Notes๏
Cache Root Directory
CACHE_DIRIf
VBENCH_CACHE_DIRis not set:CACHE_DIR=~/.cache/vbenchIf set:
CACHE_DIR=$VBENCH_CACHE_DIR
Directory Preparation: Before manual download, it is recommended to create the subdirectories first, e.g.:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR"/{clip_model,umt_model,amt_model,raft_model,dino_model,aesthetic_model/emb_reader,pyiqa_model,grit_model,caption_model,ViCLIP,bert_model}
Hugging Face Mirror
HF_ENDPOINT(Optional)All
https://huggingface.co/...links can be sped up by replacing the prefix with a mirror (e.g.,https://hf-mirror.com):Original:
https://huggingface.co/xxx/yyyMirror:
https://hf-mirror.com/xxx/yyy
All โTarget Pathโ entries below are relative to CACHE_DIR.
1. CLIP Models (ViT-B-32 / ViT-L-14)๏
Used by:
background_consistency,appearance_style,aesthetic_quality, etc.Target Paths:
clip_model/ViT-B-32.ptclip_model/ViT-L-14.pt
Official Download Links:
ViT-B-32.pt:https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.ptViT-L-14.pt:https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt
Command-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/clip_model"
wget -O "$CACHE_DIR/clip_model/ViT-B-32.pt" \
"https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt"
wget -O "$CACHE_DIR/clip_model/ViT-L-14.pt" \
"https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt"
Browser Method: Open both links in a browser to download, then move the files to:
ViT-B-32.ptโ$CACHE_DIR/clip_model/ViT-B-32.ptViT-L-14.ptโ$CACHE_DIR/clip_model/ViT-L-14.pt
2. UMT Model (Human Action)๏
Used by:
human_actionTarget Path:
umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pthOfficial Download Link:
Original:
https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pthWith a mirror, replace the prefix with the mirror site, e.g.:
https://hf-mirror.com/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth
Command-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/umt_model"
wget -O "$CACHE_DIR/umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth" \
"https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth"
Browser Method: Open the link in a browser, then move to:
$CACHE_DIR/umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth
3. AMT-S Model (Motion Smoothness)๏
Used by:
motion_smoothnessTarget Path:
amt_model/amt-s.pthOfficial Download Link:
Original:
https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth
Command-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/amt_model"
wget -O "$CACHE_DIR/amt_model/amt-s.pth" \
"https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth"
Browser Method: After downloading, move to:
$CACHE_DIR/amt_model/amt-s.pth
4. RAFT Optical Flow Model๏
Used by:
dynamic_degree,static_filter, etc.Target Root Directory:
raft_model/Key File:
raft_model/models/raft-things.pthOfficial Download Link (zip):
https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zipCommand-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/raft_model"
wget -O "$CACHE_DIR/raft_model/models.zip" \
"https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip"
cd "$CACHE_DIR/raft_model"
unzip -o models.zip
rm -f models.zip
Browser Method:
Download
models.zipin a browser.Place
models.zipunder$CACHE_DIR/raft_model/.Extract in that directory:
unzip models.zip.Confirm
$CACHE_DIR/raft_model/models/raft-things.pthexists, then the zip can be deleted.
5. DINO Model (subject_consistency, Local Mode)๏
Used by:
subject_consistencyTarget Paths:
Repo:
dino_model/facebookresearch_dino_main/Weights:
dino_model/dino_vitbase16_pretrain.pth
Repo URL:
https://github.com/facebookresearch/dinoWeights Download Link:
https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pthCommand-line Example (Recommended):
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/dino_model"
cd "$CACHE_DIR/dino_model"
git clone https://github.com/facebookresearch/dino facebookresearch_dino_main || true
wget -O "$CACHE_DIR/dino_model/dino_vitbase16_pretrain.pth" \
"https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth"
Browser Method:
Download the dino repository zip via Git GUI or browser, extract it, rename the directory to
facebookresearch_dino_main, and place it under$CACHE_DIR/dino_model/.Open the weights link in a browser, download it, and move to
$CACHE_DIR/dino_model/dino_vitbase16_pretrain.pth.
6. Aesthetic Predictor (LAION)๏
Used by:
aesthetic_qualityTarget Path:
aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pthOfficial Download Link:
https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=trueCommand-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/aesthetic_model/emb_reader"
wget -O "$CACHE_DIR/aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth" \
"https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true"
Browser Method: Open the link (make sure
?raw=trueis included), then move the download to the target path.
7. MUSIQ / PyIQA Image Quality Model๏
Used by:
imaging_qualityTarget Path:
pyiqa_model/musiq_spaq_ckpt-358bb6af.pthOfficial Download Link:
https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pthCommand-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/pyiqa_model"
wget -O "$CACHE_DIR/pyiqa_model/musiq_spaq_ckpt-358bb6af.pth" \
"https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth"
Browser Method: After downloading, move to
$CACHE_DIR/pyiqa_model/musiq_spaq_ckpt-358bb6af.pth.
8. GRiT Dense Captioning Model๏
Used by:
object_class,multiple_objects,color,spatial_relationshipTarget Path:
grit_model/grit_b_densecap_objectdet.pthOfficial Download Link:
Original:
https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth
Command-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/grit_model"
wget -O "$CACHE_DIR/grit_model/grit_b_densecap_objectdet.pth" \
"https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth"
Browser Method: After downloading, move to
$CACHE_DIR/grit_model/grit_b_densecap_objectdet.pth.
9. Tag2Text Scene Description Model๏
Used by:
sceneTarget Path:
caption_model/tag2text_swin_14m.pthOfficial Download Link:
Original:
https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth
Command-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/caption_model"
wget -O "$CACHE_DIR/caption_model/tag2text_swin_14m.pth" \
"https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth"
Browser Method: After downloading, move to
$CACHE_DIR/caption_model/tag2text_swin_14m.pth.
10. ViCLIP Video-Text Model + BPE Vocab๏
Used by:
temporal_style,overall_consistencyTarget Paths:
Weights:
ViCLIP/ViClip-InternVid-10M-FLT.pthVocab:
ViCLIP/bpe_simple_vocab_16e6.txt.gz(If multiple copies are needed, manually copy them asbpe_simple_vocab_16e6.txt.gz.{1,2,3}.)
Official Download Links:
Weights (original):
https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pthBPE:
https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz
Command-line Example:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/ViCLIP"
wget -O "$CACHE_DIR/ViCLIP/ViClip-InternVid-10M-FLT.pth" \
"https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pth"
wget -O "$CACHE_DIR/ViCLIP/bpe_simple_vocab_16e6.txt.gz" \
"https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz"
11. BERT base (bert-base-uncased)๏
Used by: Text encoding for
Tag2TextandGRiTTarget Path:
bert_model/bert-base-uncased/(Full HF repo snapshot)Lookup Logic Recap:
First use the directory pointed to by the environment variable
VBENCH_BERT_PATH;Otherwise try
CACHE_DIR/bert_model/bert-base-uncased;If still missing, download from Hugging Face online.
Method A: Use huggingface-cli (Recommended)๏
Install the tool:
pip install "huggingface_hub[cli]"
Log in (if necessary, optional):
huggingface-cli loginDownload to the cache directory:
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/bert_model/bert-base-uncased"
huggingface-cli download bert-base-uncased \
--local-dir "$CACHE_DIR/bert_model/bert-base-uncased" \
--local-dir-use-symlinks False
To point
VBENCH_BERT_PATHto that directory:
export VBENCH_BERT_PATH="$CACHE_DIR/bert_model/bert-base-uncased"
Method B: Browser or Other Means๏
In a browser, visit
https://huggingface.co/bert-base-uncasedand download the entire model repository (e.g., via โDownload filesโ or git lfs).Rename the directory containing files such as
config.json,pytorch_model.bin, andvocab.txttobert-base-uncased, and place it under:
$CACHE_DIR/bert_model/bert-base-uncased/
Optional: Set
VBENCH_BERT_PATHto point to that directory.
Notes: Coexistence with the One-click Script๏
After completing manual downloads, you may choose not to run
scripts/download_vbench_cache.shagain. As long as the paths and filenames match this guide, VBench can read them normally.If the one-click script is run later, it adds
.donemarker files alongside the downloads to skip them next time; this does not overwrite content you placed manually.If you use a Hugging Face mirror, simply replace the prefix in the links above with the mirror domain; the rest of the paths and placement remain unchanged.