VBench Local Cache Dependency List๏ƒ

  • Cache Root Directory: Defaults to the environment variable VBENCH_CACHE_DIR; if not set, it falls back to ~/.cache/vbench. The same variable name can also be set at the top level of an AISBench VBench example configuration (see โ€œSpecifying the Cache Directory in AISBench Configurationsโ€ below).

  • One-click Download Script: ais_bench/configs/vbench_examples/download_vbench_cache.sh automatically downloads/prepares resources following the layout below.

Target Directory Layout๏ƒ

The (default) ~/.cache/vbench/ should at least contain:

  • ViCLIP/ViClip-InternVid-10M-FLT.pth

  • ViCLIP/bpe_simple_vocab_16e6.txt.gz

  • aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth

  • amt_model/amt-s.pth

  • bert_model/bert-base-uncased/... (Full snapshot of the HuggingFace BERT repo)

  • caption_model/tag2text_swin_14m.pth

  • clip_model/ViT-B-32.pt

  • clip_model/ViT-L-14.pt

  • dino_model/dino_vitbase16_pretrain.pth

  • dino_model/facebookresearch_dino_main/... (Clone of the official DINO repo)

  • grit_model/grit_b_densecap_objectdet.pth

  • pyiqa_model/musiq_spaq_ckpt-358bb6af.pth

  • raft_model/models/raft-things.pth (Plus other RAFT models extracted from the zip)

  • umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth

Dependencies and Download Sources๏ƒ

  • CLIP Models

    • Paths: clip_model/ViT-B-32.pt, clip_model/ViT-L-14.pt

    • Used by: background_consistency, appearance_style, aesthetic_quality, etc.

    • Sources:

      • ViT-B-32.pt: https://openaipublic.azureedge.net/clip/models/40d3657159.../ViT-B-32.pt

      • ViT-L-14.pt: https://openaipublic.azureedge.net/clip/models/b8cca3fd4.../ViT-L-14.pt

  • UMT Model (Human Action)

    • Path: umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth

    • Used by: human_action

    • Source: https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth

  • AMT-S Model (Motion Smoothness)

    • Path: amt_model/amt-s.pth

    • Used by: motion_smoothness

    • Source: https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth

  • RAFT Optical Flow Model

    • Paths:

      • Root directory: raft_model/

      • Main model: raft_model/models/raft-things.pth

    • Used by: dynamic_degree, static_filter, etc.

    • Source (zip): https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip

  • DINO Model (subject_consistency, local mode)

    • Paths:

      • Repo: dino_model/facebookresearch_dino_main/

      • Weights: dino_model/dino_vitbase16_pretrain.pth

    • Used by: subject_consistency

    • Sources:

      • Repo: https://github.com/facebookresearch/dino

      • Weights: https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth

  • Aesthetic Predictor (LAION)

    • Path: aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth

    • Used by: aesthetic_quality

    • Source: https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true

  • MUSIQ / PyIQA (Image Quality)

    • Path: pyiqa_model/musiq_spaq_ckpt-358bb6af.pth

    • Used by: imaging_quality

    • Source: https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth

  • GRiT Dense Captioning Model

    • Path: grit_model/grit_b_densecap_objectdet.pth

    • Used by: object_class, multiple_objects, color, spatial_relationship

    • Source: https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth

  • Tag2Text Scene Description Model

    • Path: caption_model/tag2text_swin_14m.pth

    • Used by: scene

    • Source: https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth

  • ViCLIP Video-Text Model + BPE Vocab

    • Paths:

      • Weights: ViCLIP/ViClip-InternVid-10M-FLT.pth

      • BPE: ViCLIP/bpe_simple_vocab_16e6.txt.gz

    • Used by: temporal_style, overall_consistency

    • Sources:

      • Weights: https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pth

      • BPE: https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz

  • BERT base (bert-base-uncased)

    • Path: bert_model/bert-base-uncased/ (Full HF repo snapshot)

    • Used by: Text encoding parts of Tag2Text and GRiT

    • Local Lookup Logic:

      • First the directory pointed to by the VBENCH_BERT_PATH environment variable

      • Otherwise try CACHE_DIR/bert_model/bert-base-uncased

      • If neither exists, fall back to the HuggingFace hub id bert-base-uncased

    • Recommended Download (consistent with the script):

      • Requires huggingface-cli, e.g.:

        • pip install "huggingface_hub[cli]"

        • huggingface-cli download bert-base-uncased --local-dir ~/.cache/vbench/bert_model/bert-base-uncased --local-dir-use-symlinks False

Usage๏ƒ

  1. Make sure wget and git are installed. If you also need automatic BERT download, install huggingface-cli as well.

  2. Run from the repository root:

bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
  1. To change the cache root directory, set it before running:

export VBENCH_CACHE_DIR=/your/custom/cache/dir
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh

The script automatically skips existing files and is safe to run multiple times.

Specifying the Cache Directory in AISBench Configurations๏ƒ

In a VBench example configuration (such as eval_vbench_standard.py), define the variable at the top level alongside DATA_PATH, e.g.:

VBENCH_CACHE_DIR = "/your/custom/cache/dir"

The Python-style alias vbench_cache_dir is also supported; if both exist, VBENCH_CACHE_DIR takes precedence.

Priority (effective inside the evaluation subprocess that runs VBenchEvalTask, and only before vbench is imported for the first time):

  1. If the configuration sets a non-empty VBENCH_CACHE_DIR or vbench_cache_dir, it is written to os.environ['VBENCH_CACHE_DIR'] (with ~ and $VAR expanded) and overrides any existing same-name environment variable in this subprocess.

  2. If not set in the configuration, the VBENCH_CACHE_DIR exported in the shell before launching ais_bench is used.

  3. Otherwise, vbench falls back to the default ~/.cache/vbench.

Relationship with the One-click Script: download_vbench_cache.sh only reads shell environment variables and does not read the Python configuration file above. To keep the download directory consistent with the evaluation, export the same VBENCH_CACHE_DIR before running the script, or specify the same absolute path in both places.


Manual Download and Placement Guide (When the Script Fails)๏ƒ

When network or permission issues cause download_vbench_cache.sh to fail repeatedly, you can follow this section to manually download each dependency and place it in the corresponding path, bypassing the one-click script.

Global Notes๏ƒ

  • Cache Root Directory CACHE_DIR

    • If VBENCH_CACHE_DIR is not set: CACHE_DIR=~/.cache/vbench

    • If set: CACHE_DIR=$VBENCH_CACHE_DIR

  • Directory Preparation: Before manual download, it is recommended to create the subdirectories first, e.g.:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR"/{clip_model,umt_model,amt_model,raft_model,dino_model,aesthetic_model/emb_reader,pyiqa_model,grit_model,caption_model,ViCLIP,bert_model}
  • Hugging Face Mirror HF_ENDPOINT (Optional)

    • All https://huggingface.co/... links can be sped up by replacing the prefix with a mirror (e.g., https://hf-mirror.com):

      • Original: https://huggingface.co/xxx/yyy

      • Mirror: https://hf-mirror.com/xxx/yyy

All โ€œTarget Pathโ€ entries below are relative to CACHE_DIR.


1. CLIP Models (ViT-B-32 / ViT-L-14)๏ƒ

  • Used by: background_consistency, appearance_style, aesthetic_quality, etc.

  • Target Paths:

    • clip_model/ViT-B-32.pt

    • clip_model/ViT-L-14.pt

  • Official Download Links:

    • ViT-B-32.pt: https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt

    • ViT-L-14.pt: https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/clip_model"

wget -O "$CACHE_DIR/clip_model/ViT-B-32.pt" \
  "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt"

wget -O "$CACHE_DIR/clip_model/ViT-L-14.pt" \
  "https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt"
  • Browser Method: Open both links in a browser to download, then move the files to:

    • ViT-B-32.pt โ†’ $CACHE_DIR/clip_model/ViT-B-32.pt

    • ViT-L-14.pt โ†’ $CACHE_DIR/clip_model/ViT-L-14.pt


2. UMT Model (Human Action)๏ƒ

  • Used by: human_action

  • Target Path: umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth

  • Official Download Link:

    • Original: https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth

    • With a mirror, replace the prefix with the mirror site, e.g.: https://hf-mirror.com/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/umt_model"

wget -O "$CACHE_DIR/umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth" \
  "https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth"
  • Browser Method: Open the link in a browser, then move to: $CACHE_DIR/umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth


3. AMT-S Model (Motion Smoothness)๏ƒ

  • Used by: motion_smoothness

  • Target Path: amt_model/amt-s.pth

  • Official Download Link:

    • Original: https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/amt_model"

wget -O "$CACHE_DIR/amt_model/amt-s.pth" \
  "https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth"
  • Browser Method: After downloading, move to: $CACHE_DIR/amt_model/amt-s.pth


4. RAFT Optical Flow Model๏ƒ

  • Used by: dynamic_degree, static_filter, etc.

  • Target Root Directory: raft_model/

  • Key File: raft_model/models/raft-things.pth

  • Official Download Link (zip): https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/raft_model"

wget -O "$CACHE_DIR/raft_model/models.zip" \
  "https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip"

cd "$CACHE_DIR/raft_model"
unzip -o models.zip
rm -f models.zip
  • Browser Method:

    1. Download models.zip in a browser.

    2. Place models.zip under $CACHE_DIR/raft_model/.

    3. Extract in that directory: unzip models.zip.

    4. Confirm $CACHE_DIR/raft_model/models/raft-things.pth exists, then the zip can be deleted.


5. DINO Model (subject_consistency, Local Mode)๏ƒ

  • Used by: subject_consistency

  • Target Paths:

    • Repo: dino_model/facebookresearch_dino_main/

    • Weights: dino_model/dino_vitbase16_pretrain.pth

  • Repo URL: https://github.com/facebookresearch/dino

  • Weights Download Link: https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth

  • Command-line Example (Recommended):

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/dino_model"

cd "$CACHE_DIR/dino_model"
git clone https://github.com/facebookresearch/dino facebookresearch_dino_main || true

wget -O "$CACHE_DIR/dino_model/dino_vitbase16_pretrain.pth" \
  "https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth"
  • Browser Method:

    1. Download the dino repository zip via Git GUI or browser, extract it, rename the directory to facebookresearch_dino_main, and place it under $CACHE_DIR/dino_model/.

    2. Open the weights link in a browser, download it, and move to $CACHE_DIR/dino_model/dino_vitbase16_pretrain.pth.


6. Aesthetic Predictor (LAION)๏ƒ

  • Used by: aesthetic_quality

  • Target Path: aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth

  • Official Download Link: https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/aesthetic_model/emb_reader"

wget -O "$CACHE_DIR/aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth" \
  "https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true"
  • Browser Method: Open the link (make sure ?raw=true is included), then move the download to the target path.


7. MUSIQ / PyIQA Image Quality Model๏ƒ

  • Used by: imaging_quality

  • Target Path: pyiqa_model/musiq_spaq_ckpt-358bb6af.pth

  • Official Download Link: https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/pyiqa_model"

wget -O "$CACHE_DIR/pyiqa_model/musiq_spaq_ckpt-358bb6af.pth" \
  "https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth"
  • Browser Method: After downloading, move to $CACHE_DIR/pyiqa_model/musiq_spaq_ckpt-358bb6af.pth.


8. GRiT Dense Captioning Model๏ƒ

  • Used by: object_class, multiple_objects, color, spatial_relationship

  • Target Path: grit_model/grit_b_densecap_objectdet.pth

  • Official Download Link:

    • Original: https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/grit_model"

wget -O "$CACHE_DIR/grit_model/grit_b_densecap_objectdet.pth" \
  "https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth"
  • Browser Method: After downloading, move to $CACHE_DIR/grit_model/grit_b_densecap_objectdet.pth.


9. Tag2Text Scene Description Model๏ƒ

  • Used by: scene

  • Target Path: caption_model/tag2text_swin_14m.pth

  • Official Download Link:

    • Original: https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/caption_model"

wget -O "$CACHE_DIR/caption_model/tag2text_swin_14m.pth" \
  "https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth"
  • Browser Method: After downloading, move to $CACHE_DIR/caption_model/tag2text_swin_14m.pth.


10. ViCLIP Video-Text Model + BPE Vocab๏ƒ

  • Used by: temporal_style, overall_consistency

  • Target Paths:

    • Weights: ViCLIP/ViClip-InternVid-10M-FLT.pth

    • Vocab: ViCLIP/bpe_simple_vocab_16e6.txt.gz (If multiple copies are needed, manually copy them as bpe_simple_vocab_16e6.txt.gz.{1,2,3}.)

  • Official Download Links:

    • Weights (original): https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pth

    • BPE: https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz

  • Command-line Example:

export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/ViCLIP"

wget -O "$CACHE_DIR/ViCLIP/ViClip-InternVid-10M-FLT.pth" \
  "https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pth"

wget -O "$CACHE_DIR/ViCLIP/bpe_simple_vocab_16e6.txt.gz" \
  "https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz"

11. BERT base (bert-base-uncased)๏ƒ

  • Used by: Text encoding for Tag2Text and GRiT

  • Target Path: bert_model/bert-base-uncased/ (Full HF repo snapshot)

  • Lookup Logic Recap:

    • First use the directory pointed to by the environment variable VBENCH_BERT_PATH;

    • Otherwise try CACHE_DIR/bert_model/bert-base-uncased;

    • If still missing, download from Hugging Face online.

Method B: Browser or Other Means๏ƒ

  1. In a browser, visit https://huggingface.co/bert-base-uncased and download the entire model repository (e.g., via โ€œDownload filesโ€ or git lfs).

  2. Rename the directory containing files such as config.json, pytorch_model.bin, and vocab.txt to bert-base-uncased, and place it under:

$CACHE_DIR/bert_model/bert-base-uncased/
  1. Optional: Set VBENCH_BERT_PATH to point to that directory.


Notes: Coexistence with the One-click Script๏ƒ

  • After completing manual downloads, you may choose not to run scripts/download_vbench_cache.sh again. As long as the paths and filenames match this guide, VBench can read them normally.

  • If the one-click script is run later, it adds .done marker files alongside the downloads to skip them next time; this does not overwrite content you placed manually.

  • If you use a Hugging Face mirror, simply replace the prefix in the links above with the mirror domain; the rest of the paths and placement remain unchanged.