## VBench Local Cache Dependency List

- **Cache Root Directory**: Defaults to the environment variable `VBENCH_CACHE_DIR`; if not set, it falls back to `~/.cache/vbench`. The same variable name can also be set at the **top level** of an AISBench VBench example configuration (see "Specifying the Cache Directory in AISBench Configurations" below).
- **One-click Download Script**: `ais_bench/configs/vbench_examples/download_vbench_cache.sh` automatically downloads/prepares resources following the layout below.

### Target Directory Layout

The (default) `~/.cache/vbench/` should at least contain:

- `ViCLIP/ViClip-InternVid-10M-FLT.pth`
- `ViCLIP/bpe_simple_vocab_16e6.txt.gz`
- `aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth`
- `amt_model/amt-s.pth`
- `bert_model/bert-base-uncased/...` (Full snapshot of the HuggingFace BERT repo)
- `caption_model/tag2text_swin_14m.pth`
- `clip_model/ViT-B-32.pt`
- `clip_model/ViT-L-14.pt`
- `dino_model/dino_vitbase16_pretrain.pth`
- `dino_model/facebookresearch_dino_main/...` (Clone of the official DINO repo)
- `grit_model/grit_b_densecap_objectdet.pth`
- `pyiqa_model/musiq_spaq_ckpt-358bb6af.pth`
- `raft_model/models/raft-things.pth` (Plus other RAFT models extracted from the zip)
- `umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth`

### Dependencies and Download Sources

- **CLIP Models**
  - **Paths**: `clip_model/ViT-B-32.pt`, `clip_model/ViT-L-14.pt`
  - **Used by**: `background_consistency`, `appearance_style`, `aesthetic_quality`, etc.
  - **Sources**:
    - `ViT-B-32.pt`: `https://openaipublic.azureedge.net/clip/models/40d3657159.../ViT-B-32.pt`
    - `ViT-L-14.pt`: `https://openaipublic.azureedge.net/clip/models/b8cca3fd4.../ViT-L-14.pt`

- **UMT Model (Human Action)**
  - **Path**: `umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth`
  - **Used by**: `human_action`
  - **Source**: `https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth`

- **AMT-S Model (Motion Smoothness)**
  - **Path**: `amt_model/amt-s.pth`
  - **Used by**: `motion_smoothness`
  - **Source**: `https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth`

- **RAFT Optical Flow Model**
  - **Paths**:
    - Root directory: `raft_model/`
    - Main model: `raft_model/models/raft-things.pth`
  - **Used by**: `dynamic_degree`, `static_filter`, etc.
  - **Source (zip)**: `https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip`

- **DINO Model (subject_consistency, local mode)**
  - **Paths**:
    - Repo: `dino_model/facebookresearch_dino_main/`
    - Weights: `dino_model/dino_vitbase16_pretrain.pth`
  - **Used by**: `subject_consistency`
  - **Sources**:
    - Repo: `https://github.com/facebookresearch/dino`
    - Weights: `https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth`

- **Aesthetic Predictor (LAION)**
  - **Path**: `aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth`
  - **Used by**: `aesthetic_quality`
  - **Source**: `https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true`

- **MUSIQ / PyIQA (Image Quality)**
  - **Path**: `pyiqa_model/musiq_spaq_ckpt-358bb6af.pth`
  - **Used by**: `imaging_quality`
  - **Source**: `https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth`

- **GRiT Dense Captioning Model**
  - **Path**: `grit_model/grit_b_densecap_objectdet.pth`
  - **Used by**: `object_class`, `multiple_objects`, `color`, `spatial_relationship`
  - **Source**: `https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth`

- **Tag2Text Scene Description Model**
  - **Path**: `caption_model/tag2text_swin_14m.pth`
  - **Used by**: `scene`
  - **Source**: `https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth`

- **ViCLIP Video-Text Model + BPE Vocab**
  - **Paths**:
    - Weights: `ViCLIP/ViClip-InternVid-10M-FLT.pth`
    - BPE: `ViCLIP/bpe_simple_vocab_16e6.txt.gz`
  - **Used by**: `temporal_style`, `overall_consistency`
  - **Sources**:
    - Weights: `https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pth`
    - BPE: `https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz`

- **BERT base (bert-base-uncased)**
  - **Path**: `bert_model/bert-base-uncased/` (Full HF repo snapshot)
  - **Used by**: Text encoding parts of `Tag2Text` and `GRiT`
  - **Local Lookup Logic**:
    - First the directory pointed to by the `VBENCH_BERT_PATH` environment variable
    - Otherwise try `CACHE_DIR/bert_model/bert-base-uncased`
    - If neither exists, fall back to the HuggingFace hub id `bert-base-uncased`
  - **Recommended Download** (consistent with the script):
    - Requires `huggingface-cli`, e.g.:
      - `pip install "huggingface_hub[cli]"`
      - `huggingface-cli download bert-base-uncased --local-dir ~/.cache/vbench/bert_model/bert-base-uncased --local-dir-use-symlinks False`

### Usage

1. Make sure `wget` and `git` are installed. If you also need automatic BERT download, install `huggingface-cli` as well.
2. Run from the repository root:

```bash
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
```

3. To change the cache root directory, set it before running:

```bash
export VBENCH_CACHE_DIR=/your/custom/cache/dir
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
```

The script automatically skips existing files and is safe to run multiple times.

### Specifying the Cache Directory in AISBench Configurations

In a VBench example configuration (such as `eval_vbench_standard.py`), define the variable at the top level alongside `DATA_PATH`, e.g.:

```python
VBENCH_CACHE_DIR = "/your/custom/cache/dir"
```

The Python-style alias `vbench_cache_dir` is also supported; if both exist, `VBENCH_CACHE_DIR` takes precedence.

**Priority** (effective inside the evaluation subprocess that runs `VBenchEvalTask`, and only **before** vbench is imported for the first time):

1. If the configuration sets a **non-empty** `VBENCH_CACHE_DIR` or `vbench_cache_dir`, it is written to `os.environ['VBENCH_CACHE_DIR']` (with `~` and `$VAR` expanded) and **overrides** any existing same-name environment variable in this subprocess.
2. If not set in the configuration, the `VBENCH_CACHE_DIR` exported in the shell before launching `ais_bench` is used.
3. Otherwise, vbench falls back to the default `~/.cache/vbench`.

**Relationship with the One-click Script**: `download_vbench_cache.sh` only reads **shell environment variables** and does not read the Python configuration file above. To keep the download directory consistent with the evaluation, `export` the same `VBENCH_CACHE_DIR` before running the script, or specify the same absolute path in both places.

---

## Manual Download and Placement Guide (When the Script Fails)

When network or permission issues cause `download_vbench_cache.sh` to fail repeatedly, you can follow this section to **manually download each dependency and place it in the corresponding path**, bypassing the one-click script.

### Global Notes

- **Cache Root Directory `CACHE_DIR`**
  - If `VBENCH_CACHE_DIR` is not set: `CACHE_DIR=~/.cache/vbench`
  - If set: `CACHE_DIR=$VBENCH_CACHE_DIR`
- **Directory Preparation**: Before manual download, it is recommended to create the subdirectories first, e.g.:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR"/{clip_model,umt_model,amt_model,raft_model,dino_model,aesthetic_model/emb_reader,pyiqa_model,grit_model,caption_model,ViCLIP,bert_model}
```

- **Hugging Face Mirror `HF_ENDPOINT` (Optional)**
  - All `https://huggingface.co/...` links can be sped up by replacing the prefix with a mirror (e.g., `https://hf-mirror.com`):
    - Original: `https://huggingface.co/xxx/yyy`
    - Mirror: `https://hf-mirror.com/xxx/yyy`

All "Target Path" entries below are relative to `CACHE_DIR`.

---

### 1. CLIP Models (ViT-B-32 / ViT-L-14)

- **Used by**: `background_consistency`, `appearance_style`, `aesthetic_quality`, etc.
- **Target Paths**:
  - `clip_model/ViT-B-32.pt`
  - `clip_model/ViT-L-14.pt`
- **Official Download Links**:
  - `ViT-B-32.pt`:
    `https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt`
  - `ViT-L-14.pt`:
    `https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/clip_model"

wget -O "$CACHE_DIR/clip_model/ViT-B-32.pt" \
  "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt"

wget -O "$CACHE_DIR/clip_model/ViT-L-14.pt" \
  "https://openaipublic.azureedge.net/clip/models/b8cca3fd41ae0c99ba7e8951adf17d267cdb84cd88be6f7c2e0eca1737a03836/ViT-L-14.pt"
```

- **Browser Method**: Open both links in a browser to download, then move the files to:
  - `ViT-B-32.pt` → `$CACHE_DIR/clip_model/ViT-B-32.pt`
  - `ViT-L-14.pt` → `$CACHE_DIR/clip_model/ViT-L-14.pt`

---

### 2. UMT Model (Human Action)

- **Used by**: `human_action`
- **Target Path**: `umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth`
- **Official Download Link**:
  - Original: `https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth`
  - With a mirror, replace the prefix with the mirror site, e.g.:
    `https://hf-mirror.com/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/umt_model"

wget -O "$CACHE_DIR/umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth" \
  "https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/l16_ptk710_ftk710_ftk400_f16_res224.pth"
```

- **Browser Method**: Open the link in a browser, then move to:
  `$CACHE_DIR/umt_model/l16_ptk710_ftk710_ftk400_f16_res224.pth`

---

### 3. AMT-S Model (Motion Smoothness)

- **Used by**: `motion_smoothness`
- **Target Path**: `amt_model/amt-s.pth`
- **Official Download Link**:
  - Original: `https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/amt_model"

wget -O "$CACHE_DIR/amt_model/amt-s.pth" \
  "https://huggingface.co/lalala125/AMT/resolve/main/amt-s.pth"
```

- **Browser Method**: After downloading, move to: `$CACHE_DIR/amt_model/amt-s.pth`

---

### 4. RAFT Optical Flow Model

- **Used by**: `dynamic_degree`, `static_filter`, etc.
- **Target Root Directory**: `raft_model/`
- **Key File**: `raft_model/models/raft-things.pth`
- **Official Download Link (zip)**:
  `https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/raft_model"

wget -O "$CACHE_DIR/raft_model/models.zip" \
  "https://dl.dropboxusercontent.com/s/4j4z58wuv8o0mfz/models.zip"

cd "$CACHE_DIR/raft_model"
unzip -o models.zip
rm -f models.zip
```

- **Browser Method**:
  1. Download `models.zip` in a browser.
  2. Place `models.zip` under `$CACHE_DIR/raft_model/`.
  3. Extract in that directory: `unzip models.zip`.
  4. Confirm `$CACHE_DIR/raft_model/models/raft-things.pth` exists, then the zip can be deleted.

---

### 5. DINO Model (subject_consistency, Local Mode)

- **Used by**: `subject_consistency`
- **Target Paths**:
  - Repo: `dino_model/facebookresearch_dino_main/`
  - Weights: `dino_model/dino_vitbase16_pretrain.pth`
- **Repo URL**: `https://github.com/facebookresearch/dino`
- **Weights Download Link**:
  `https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth`
- **Command-line Example (Recommended)**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/dino_model"

cd "$CACHE_DIR/dino_model"
git clone https://github.com/facebookresearch/dino facebookresearch_dino_main || true

wget -O "$CACHE_DIR/dino_model/dino_vitbase16_pretrain.pth" \
  "https://dl.fbaipublicfiles.com/dino/dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth"
```

- **Browser Method**:
  1. Download the dino repository zip via Git GUI or browser, extract it, rename the directory to `facebookresearch_dino_main`, and place it under `$CACHE_DIR/dino_model/`.
  2. Open the weights link in a browser, download it, and move to `$CACHE_DIR/dino_model/dino_vitbase16_pretrain.pth`.

---

### 6. Aesthetic Predictor (LAION)

- **Used by**: `aesthetic_quality`
- **Target Path**: `aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth`
- **Official Download Link**:
  `https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/aesthetic_model/emb_reader"

wget -O "$CACHE_DIR/aesthetic_model/emb_reader/sa_0_4_vit_l_14_linear.pth" \
  "https://github.com/LAION-AI/aesthetic-predictor/blob/main/sa_0_4_vit_l_14_linear.pth?raw=true"
```

- **Browser Method**: Open the link (make sure `?raw=true` is included), then move the download to the target path.

---

### 7. MUSIQ / PyIQA Image Quality Model

- **Used by**: `imaging_quality`
- **Target Path**: `pyiqa_model/musiq_spaq_ckpt-358bb6af.pth`
- **Official Download Link**:
  `https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/pyiqa_model"

wget -O "$CACHE_DIR/pyiqa_model/musiq_spaq_ckpt-358bb6af.pth" \
  "https://github.com/chaofengc/IQA-PyTorch/releases/download/v0.1-weights/musiq_spaq_ckpt-358bb6af.pth"
```

- **Browser Method**: After downloading, move to `$CACHE_DIR/pyiqa_model/musiq_spaq_ckpt-358bb6af.pth`.

---

### 8. GRiT Dense Captioning Model

- **Used by**: `object_class`, `multiple_objects`, `color`, `spatial_relationship`
- **Target Path**: `grit_model/grit_b_densecap_objectdet.pth`
- **Official Download Link**:
  - Original: `https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/grit_model"

wget -O "$CACHE_DIR/grit_model/grit_b_densecap_objectdet.pth" \
  "https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/grit_b_densecap_objectdet.pth"
```

- **Browser Method**: After downloading, move to `$CACHE_DIR/grit_model/grit_b_densecap_objectdet.pth`.

---

### 9. Tag2Text Scene Description Model

- **Used by**: `scene`
- **Target Path**: `caption_model/tag2text_swin_14m.pth`
- **Official Download Link**:
  - Original: `https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/caption_model"

wget -O "$CACHE_DIR/caption_model/tag2text_swin_14m.pth" \
  "https://huggingface.co/spaces/xinyu1205/recognize-anything/resolve/main/tag2text_swin_14m.pth"
```

- **Browser Method**: After downloading, move to `$CACHE_DIR/caption_model/tag2text_swin_14m.pth`.

---

### 10. ViCLIP Video-Text Model + BPE Vocab

- **Used by**: `temporal_style`, `overall_consistency`
- **Target Paths**:
  - Weights: `ViCLIP/ViClip-InternVid-10M-FLT.pth`
  - Vocab: `ViCLIP/bpe_simple_vocab_16e6.txt.gz`
    (If multiple copies are needed, manually copy them as `bpe_simple_vocab_16e6.txt.gz.{1,2,3}`.)
- **Official Download Links**:
  - Weights (original):
    `https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pth`
  - BPE:
    `https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz`
- **Command-line Example**:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/ViCLIP"

wget -O "$CACHE_DIR/ViCLIP/ViClip-InternVid-10M-FLT.pth" \
  "https://huggingface.co/OpenGVLab/VBench_Used_Models/resolve/main/ViClip-InternVid-10M-FLT.pth"

wget -O "$CACHE_DIR/ViCLIP/bpe_simple_vocab_16e6.txt.gz" \
  "https://raw.githubusercontent.com/openai/CLIP/main/clip/bpe_simple_vocab_16e6.txt.gz"
```

---

### 11. BERT base (bert-base-uncased)

- **Used by**: Text encoding for `Tag2Text` and `GRiT`
- **Target Path**: `bert_model/bert-base-uncased/` (Full HF repo snapshot)
- **Lookup Logic Recap**:
  - First use the directory pointed to by the environment variable `VBENCH_BERT_PATH`;
  - Otherwise try `CACHE_DIR/bert_model/bert-base-uncased`;
  - If still missing, download from Hugging Face online.

#### Method A: Use huggingface-cli (Recommended)

1. Install the tool:

```bash
pip install "huggingface_hub[cli]"
```

2. Log in (if necessary, optional): `huggingface-cli login`

3. Download to the cache directory:

```bash
export CACHE_DIR=${VBENCH_CACHE_DIR:-$HOME/.cache/vbench}
mkdir -p "$CACHE_DIR/bert_model/bert-base-uncased"

huggingface-cli download bert-base-uncased \
  --local-dir "$CACHE_DIR/bert_model/bert-base-uncased" \
  --local-dir-use-symlinks False
```

4. To point `VBENCH_BERT_PATH` to that directory:

```bash
export VBENCH_BERT_PATH="$CACHE_DIR/bert_model/bert-base-uncased"
```

#### Method B: Browser or Other Means

1. In a browser, visit `https://huggingface.co/bert-base-uncased` and download the entire model repository (e.g., via "Download files" or git lfs).
2. Rename the directory containing files such as `config.json`, `pytorch_model.bin`, and `vocab.txt` to `bert-base-uncased`, and place it under:

```text
$CACHE_DIR/bert_model/bert-base-uncased/
```

3. Optional: Set `VBENCH_BERT_PATH` to point to that directory.

---

### Notes: Coexistence with the One-click Script

- After completing manual downloads, you **may choose not to run** `scripts/download_vbench_cache.sh` again. As long as the paths and filenames match this guide, VBench can read them normally.
- If the one-click script is run later, it adds `.done` marker files alongside the downloads to skip them next time; this does not overwrite content you placed manually.
- If you use a Hugging Face mirror, simply replace the prefix in the links above with the mirror domain; the rest of the paths and placement remain unchanged.