VBench 1.0

VBench (VBench: Comprehensive Benchmark Suite for Video Generative Models) is a benchmark suite for video generative models. It organizes evaluation around perception-related metrics such as subject consistency, motion smoothness, temporal flickering, and spatial relationship (the official Standard suite has 16 dimensions), and provides matching prompts, pipelines, and validation methods for each dimension.

AISBench has adapted to VBench 1.0. The repository directory ais_bench/configs/vbench_examples/ contains standalone configuration file examples for running quality/semantic dimension evaluation on generated videos on GPU or NPU. AISBench currently does not include multimodal video generation, so please generate videos first and then run the evaluation. (For Standard mode, see the Dataset Generation section.)

Table of Contents

Dependencies and Environment
Quick Start
Configuration and Output
Score Aggregation (Quality / Semantic / Total)
Prompt Suite (Official Prompt Structure)
Dataset Generation
Sampling Pseudocode (Reference Official)
Format Requirements
VBench-1.0-mini (AISBench Official Sampled Subset)

Dependencies and Environment

decord (Video Decoding)

On x86_64, pip install decord usually works directly. On ARM and other environments without prebuilt wheels, build from source, for example:

git clone https://github.com/dmlc/decord
cd decord
mkdir build && cd build
cmake .. -DUSE_CUDA=0 -DCMAKE_BUILD_TYPE=Release
make
cd ../python
python3 setup.py install --user

detectron2 and GRiT

Dimensions like object_class, multiple_objects, color, and spatial_relationship depend on GRiT, which in turn depends on detectron2. AISBench uniformly uses the in-repo ais_bench/third_party/detectron2 (shared by GPU/NPU). Run an editable install from the repo root:

pip install -e ais_bench/third_party/detectron2 --no-build-isolation

torchvision on Ascend (Optional)

Some torchvision operators (such as nms and roi_align) may run only on CPU on Ascend, leading to low evaluation efficiency. If torch < 2.7.1, refer to Ascend torchvision adaptation to install a matching version for speedup.

Quick Start

Prepare the video directory For both Standard and Custom modes, set DATA_PATH in the corresponding configuration to the root directory of the generated videos (absolute or relative path). You can also copy the configuration file, change DATA_PATH, and then run ais_bench <your_config.py> --mode eval. (See Dataset Generation for video sampling notes.)
Download third-party dependencies to local cache VBench loads multiple small model weights for video generation quality evaluation. It is recommended to download them in advance. By default, the evaluation will also try to download dependencies automatically, but downloads may fail and break the evaluation. For details, see vbench_cache_dependencies.md.

# Use default cache directory ~/.cache/vbench
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh

# Or specify a custom cache directory
VBENCH_CACHE_DIR=/your/custom/cache/dir \
  bash ais_bench/configs/vbench_examples/download_vbench_cache.sh

Specify the cache in the config Setting VBENCH_CACHE_DIR = "/path/to/cache" (or the alias vbench_cache_dir) at the top of the example configuration overrides the environment variable in the evaluation subprocess before vbench is imported. If not set, the export VBENCH_CACHE_DIR from the shell is used.
Run the evaluation (must explicitly specify --mode eval)

# Standard (16 dimensions, official Prompt Suite)
ais_bench ais_bench/configs/vbench_examples/eval_vbench_standard.py --mode eval --max-num-workers 1

# Custom (10 dimensions, custom prompts)
ais_bench ais_bench/configs/vbench_examples/eval_vbench_custom.py --mode eval --max-num-workers 1

Note: It is recommended to set --max-num-workers <num> to evaluate on multiple devices in parallel for better throughput.

Configuration and Output

Common Configuration Items

Config Item / Environment Variable	Description
`DATA_PATH`	Root directory of videos to evaluate (required). See `ais_bench/configs/vbench_examples/eval_vbench_standard.py` and `ais_bench/configs/vbench_examples/eval_vbench_custom.py`
`VBENCH_CACHE_DIR` (env var or top-level config)	Cache root directory for small models and weights; default is `~/.cache/vbench`

Preset Configurations

Config Name	Description	Configuration File
eval_vbench_standard	Standard prompt evaluation, 16 dimensions; requires the video directory and an (optional) full_info json	`ais_bench/configs/vbench_examples/eval_vbench_standard.py`
eval_vbench_custom	Custom input (prompt from file or filename), 10 dimensions	`ais_bench/configs/vbench_examples/eval_vbench_custom.py`

Evaluation Result Path

Written per dimension:

{work_dir}/results/vbench_eval/vbench_<dim>.json

Standard mode uses VBenchSummarizer to aggregate Quality, Semantic, and Total; Custom mode uses DefaultSummarizer to output per-dimension scores. The implementation follows the official cal_final_score.py. See ais_bench/benchmark/summarizers/vbench.py.

Score Aggregation (Quality / Semantic / Total)

Per-Dimension Normalization and Weighting

For each dimension, the raw accuracy is linearly scaled to ((raw - Min)/(Max - Min)) using the dimension’s Min and Max in NORMALIZE_DIC, then multiplied by DIM_WEIGHT. If (Max - Min \le 0), the implementation falls back to the boundary value. Constants are aligned with the official ones, all in vbench.py. dynamic_degree has DIM_WEIGHT 0.5, while all other aggregated dimensions are 1.

Quality Group (Video Generation Quality)

Sum the per-dimension weighted scores within the Quality group, then divide by the sum of DIM_WEIGHTs (weighted average). Includes:

subject_consistency, background_consistency, temporal_flickering, motion_smoothness, aesthetic_quality, imaging_quality, dynamic_degree.

Semantic Group (Content and Semantic Consistency)

Computed similarly; all dimensions in this group have DIM_WEIGHT 1. Includes:

object_class, multiple_objects, human_action, color, spatial_relationship, scene, appearance_style, temporal_style, overall_consistency.

Total (Overall Score)

Total = (Quality × 4 + Semantic × 1) / 5 (corresponds to QUALITY_WEIGHT = 4 and SEMANTIC_WEIGHT = 1 in the code).

Missing Dimensions and Output Directory

When a dimension is missing from the results, aggregation treats it as 0 (consistent with normalized.get(k, 0)).

The default work_dir is outputs/default; use --work_dir to change it.

Prompt Suite (Official Prompt Structure)

Paths are relative to ais_bench/third_party/vbench/:

Path	Description
`prompts/prompts_per_dimension/`	Prompt files per evaluation dimension (~100 entries/dimension)
`prompts/all_dimension.txt`	Combined list across all dimensions
`prompts/prompts_per_category/`	8 categories: Animal, Architecture, Food, Human, Lifestyle, Plant, Scenery, Vehicles
`prompts/all_category.txt`	Combined across all categories
`prompts/metadata/`	Metadata that requires semantic parsing, such as `color` and `object_class`

Dimension and Prompt Suite Mapping (Standard, 16 Dimensions)

The following table shows the mapping of all 16 dimensions in Standard mode to the official Prompt Suite files and their entry counts. During evaluation, prompts are matched automatically via VBench_full_info.json:

Dimension	Prompt Suite	Prompt Count
subject_consistency	subject_consistency	72
background_consistency	scene	86
temporal_flickering	temporal_flickering	75
motion_smoothness	subject_consistency	72
dynamic_degree	subject_consistency	72
aesthetic_quality	overall_consistency	93
imaging_quality	overall_consistency	93
object_class	object_class	79
multiple_objects	multiple_objects	82
human_action	human_action	100
color	color	85
spatial_relationship	spatial_relationship	84
scene	scene	86
temporal_style	temporal_style	100
appearance_style	appearance_style	90
overall_consistency	overall_consistency	93

Inference Result (Video) Generation

This section is for users who need to generate evaluation videos using the official approach (and does not conflict with the Quick Start that runs evaluation on an existing directory).

Standard Dataset (eval_vbench_standard)

Data Source: The Prompt Suite under ais_bench/third_party/vbench/prompts/.
Metadata: Requires VBench_full_info.json (default file in the third-party directory above).
Sampling Scale: Typically 5 videos per prompt; temporal_flickering requires 25 videos so that enough samples remain after the static filter.
Random Seed: A different seed per video is recommended (e.g., index or seed+index) to balance diversity and reproducibility.
Directory Shape: A flat directory or per-dimension subdirectories are both supported. If you use the per-dimension subdirectory layout, it is recommended to use the same subdirectory naming as the evaluation reading side (see below). The mapping logic is in dim_to_subdir in ais_bench/third_party/vbench/__init__.py.

DATA_PATH Directory Layout for Standard Mode

DATA_PATH/
|-- subject_consistency/        # same-name dimension
|-- scene/                      # background_consistency is mapped here first
|-- overall_consistency/        # aesthetic_quality / imaging_quality are mapped here first
|-- object_class/
|-- multiple_objects/
|-- color/
|-- spatial_relationship/
|-- temporal_style/
|-- human_action/
|-- temporal_flickering/
`-- appearance_style/

Among them, the following dimensions preferentially use the mapped subdirectory during evaluation (if it exists):

background_consistency → scene/
aesthetic_quality → overall_consistency/
imaging_quality → overall_consistency/
motion_smoothness → subject_consistency/
dynamic_degree → subject_consistency/

Filenames are recommended to follow {prompt}-{i}.mp4. If you generated 0~24 for temporal_flickering (used by the static filter), at least make sure 0~4 exist; the evaluation side defaults to looking up 0~4 when constructing the temporary full_info.

Custom Dataset (eval_vbench_custom)

Data Source: A custom prompt list or prompt file.
Dimensions: Includes subject_consistency, background_consistency, aesthetic_quality, imaging_quality, temporal_style, overall_consistency, human_action, temporal_flickering, motion_smoothness, dynamic_degree. Excludes dimensions that require auxiliary_info (such as object_class, color, and spatial_relationship).

Sampling Pseudocode (Reference Official)

The pseudocode below aligns with the Standard reading-side logic: iterate over dimensions, save videos under the corresponding subdirectory, and use a larger sample count for temporal_flickering.

import os

# Directory mapping on the evaluation reading side (see ais_bench/third_party/vbench/__init__.py)
dim_to_subdir = {
    "background_consistency": "scene",
    "aesthetic_quality": "overall_consistency",
    "imaging_quality": "overall_consistency",
    "motion_smoothness": "subject_consistency",
    "dynamic_degree": "subject_consistency"
}

dimension_list = [
    "subject_consistency", "background_consistency", "aesthetic_quality", "imaging_quality",
    "object_class", "multiple_objects", "color", "spatial_relationship", "scene",
    "temporal_style", "overall_consistency", "human_action", "temporal_flickering",
    "motion_smoothness", "dynamic_degree", "appearance_style",
]

if args.seed is not None:
    torch.manual_seed(args.seed)

for dimension in dimension_list:
    prompt_file = f"ais_bench/third_party/vbench/prompts/prompts_per_dimension/{dimension}.txt"
    with open(prompt_file, "r") as f:
        prompt_list = [line.strip() for line in f if line.strip()]

    n = 25 if dimension == "temporal_flickering" else 5

    subdir = dim_to_subdir.get(dimension, dimension)
    save_dir = os.path.join(args.save_path, subdir)
    os.makedirs(save_dir, exist_ok=True)

    for prompt in prompt_list:
        for index in range(n):
            video = sample_func(prompt, index)
            save_path = os.path.join(save_dir, f"{prompt}-{index}.mp4")
            torchvision.io.write_video(save_path, video, fps=8)

sample_func denotes the function that connects your generative model to a prompt and produces a video.

Format Requirements

Standard Mode

Filename: {prompt}-{i}.mp4, where {prompt} is prompt_en from VBench_full_info.json and i ranges from 0 to 4.
Extensions: .mp4, .gif, .jpg, .png.

Custom Mode (More Flexible)

Method 1: Embed the prompt in the filename: get_prompt_from_filename parses xxx from {xxx}.mp4 or {xxx}-0.mp4.
Method 2: Provide a prompt_file (JSON: {video_path: prompt}); filename conventions can be ignored.

VBench-1.0-mini (AISBench Official Sampled Subset)

VBench-1.0-mini is a VBench 1.0 sampled subset provided by AISBench, randomly selecting a small number of prompts from each of the 16 dimensions in the Prompt Suite. It is intended for fast model capability validation and evaluation pipeline verification. Dataset URL: VBench-1.0-mini.

The core change in VBench-1.0-mini is replacing the original VBench_full_info.json with a condensed version that only contains the sampled prompts and their dimension mappings. All other evaluation code, dimension implementations, and model weights reuse the existing VBench system — no additional installation or modification is required.

Preparation

Download the VBench-1.0-mini dataset

Download the dataset from Modelers. After downloading and extracting, note the dataset root directory path (referred to below as <MINI_ROOT>).
Download third-party dependency cache

Same as the Standard mode, download VBench small model weights and resources in advance:
```
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
```

Replace VBench_full_info.json

VBench-1.0-mini provides a condensed VBench_full_info.json corresponding to the sampled prompts. Specify the mini JSON path in the configuration file (e.g., eval_vbench_standard.py) using the full_json_dir field under eval_cfg or dataset:

vbench_eval_cfg = dict(
    load_ckpt_from_local=True,
    full_json_dir="<MINI_ROOT>/VBench_full_info.json",
)

Inference Result (Video) Generation

VBench-1.0-mini works the same way as Standard mode — only with fewer prompts. It is still recommended to organize generated videos in per-dimension subdirectories:

import os

dim_to_subdir = {
    "background_consistency": "scene",
    "aesthetic_quality": "overall_consistency",
    "imaging_quality": "overall_consistency",
    "motion_smoothness": "subject_consistency",
    "dynamic_degree": "subject_consistency",
}

# Read the dimensions and prompts to generate from the mini full_info
import json
with open("ais_bench/third_party/vbench/VBench_full_info.json", "r") as f:
    full_info = json.load(f)

# Aggregate dimensions by prompt
from collections import defaultdict
prompt_dim_map = defaultdict(set)
for entry in full_info:
    prompt_dim_map[entry["prompt_en"]].update(entry["dimension"])

for prompt, dims in prompt_dim_map.items():
    for dim in dims:
        subdir = dim_to_subdir.get(dim, dim)
        save_dir = os.path.join(args.save_path, subdir)
        os.makedirs(save_dir, exist_ok=True)

        n = 25 if dim == "temporal_flickering" else 5
        for index in range(n):
            video = sample_func(prompt, index)
            save_path = os.path.join(save_dir, f"{prompt}-{index}.mp4")
            torchvision.io.write_video(save_path, video, fps=8)

Run Evaluation

After replacing VBench_full_info.json and preparing the video directory, the evaluation command is exactly the same as Standard mode:

# Must explicitly specify --mode eval, and set DATA_PATH to your video directory
ais_bench ais_bench/configs/vbench_examples/eval_vbench_standard.py --mode eval --max-num-workers 1

If the prompts come from a custom video directory and do not use the dimension mappings in VBench_full_info.json, you can also use the Custom mode configuration for evaluation.