VBench 1.0ο
VBench (VBench: Comprehensive Benchmark Suite for Video Generative Models) is a benchmark suite for video generative models. It organizes evaluation around perception-related metrics such as subject consistency, motion smoothness, temporal flickering, and spatial relationship (the official Standard suite has 16 dimensions), and provides matching prompts, pipelines, and validation methods for each dimension.
AISBench has adapted to VBench 1.0. The repository directory ais_bench/configs/vbench_examples/ contains standalone configuration file examples for running quality/semantic dimension evaluation on generated videos on GPU or NPU. AISBench currently does not include multimodal video generation, so please generate videos first and then run the evaluation. (For Standard mode, see the Dataset Generation section.)
Table of Contentsο
Dependencies and Environmentο
decord (Video Decoding)ο
On x86_64, pip install decord usually works directly. On ARM and other environments without prebuilt wheels, build from source, for example:
git clone https://github.com/dmlc/decord
cd decord
mkdir build && cd build
cmake .. -DUSE_CUDA=0 -DCMAKE_BUILD_TYPE=Release
make
cd ../python
python3 setup.py install --user
detectron2 and GRiTο
Dimensions like object_class, multiple_objects, color, and spatial_relationship depend on GRiT, which in turn depends on detectron2. AISBench uniformly uses the in-repo ais_bench/third_party/detectron2 (shared by GPU/NPU). Run an editable install from the repo root:
pip install -e ais_bench/third_party/detectron2 --no-build-isolation
torchvision on Ascend (Optional)ο
Some torchvision operators (such as nms and roi_align) may run only on CPU on Ascend, leading to low evaluation efficiency. If torch < 2.7.1, refer to Ascend torchvision adaptation to install a matching version for speedup.
Quick Startο
Prepare the video directory For both Standard and Custom modes, set
DATA_PATHin the corresponding configuration to the root directory of the generated videos (absolute or relative path). You can also copy the configuration file, changeDATA_PATH, and then runais_bench <your_config.py> --mode eval. (See Dataset Generation for video sampling notes.)Download third-party dependencies to local cache VBench loads multiple small model weights for video generation quality evaluation. It is recommended to download them in advance. By default, the evaluation will also try to download dependencies automatically, but downloads may fail and break the evaluation. For details, see
vbench_cache_dependencies.md.
# Use default cache directory ~/.cache/vbench
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
# Or specify a custom cache directory
VBENCH_CACHE_DIR=/your/custom/cache/dir \
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
Specify the cache in the config Setting
VBENCH_CACHE_DIR = "/path/to/cache"(or the aliasvbench_cache_dir) at the top of the example configuration overrides the environment variable in the evaluation subprocess before vbench is imported. If not set, theexport VBENCH_CACHE_DIRfrom the shell is used.Run the evaluation (must explicitly specify
--mode eval)
# Standard (16 dimensions, official Prompt Suite)
ais_bench ais_bench/configs/vbench_examples/eval_vbench_standard.py --mode eval --max-num-workers 1
# Custom (10 dimensions, custom prompts)
ais_bench ais_bench/configs/vbench_examples/eval_vbench_custom.py --mode eval --max-num-workers 1
Note: It is recommended to set --max-num-workers <num> to evaluate on multiple devices in parallel for better throughput.
Configuration and Outputο
Common Configuration Itemsο
Config Item / Environment Variable |
Description |
|---|---|
|
Root directory of videos to evaluate (required). See |
|
Cache root directory for small models and weights; default is |
Preset Configurationsο
Config Name |
Description |
Configuration File |
|---|---|---|
eval_vbench_standard |
Standard prompt evaluation, 16 dimensions; requires the video directory and an (optional) full_info json |
|
eval_vbench_custom |
Custom input (prompt from file or filename), 10 dimensions |
|
Evaluation Result Pathο
Written per dimension:
{work_dir}/results/vbench_eval/vbench_<dim>.json
Standard mode uses VBenchSummarizer to aggregate Quality, Semantic, and Total; Custom mode uses DefaultSummarizer to output per-dimension scores. The implementation follows the official cal_final_score.py. See ais_bench/benchmark/summarizers/vbench.py.
Score Aggregation (Quality / Semantic / Total)ο
Per-Dimension Normalization and Weightingο
For each dimension, the raw accuracy is linearly scaled to ((raw - Min)/(Max - Min)) using the dimensionβs Min and Max in NORMALIZE_DIC, then multiplied by DIM_WEIGHT. If (Max - Min \le 0), the implementation falls back to the boundary value. Constants are aligned with the official ones, all in vbench.py. dynamic_degree has DIM_WEIGHT 0.5, while all other aggregated dimensions are 1.
Quality Group (Video Generation Quality)ο
Sum the per-dimension weighted scores within the Quality group, then divide by the sum of DIM_WEIGHTs (weighted average). Includes:
subject_consistency, background_consistency, temporal_flickering, motion_smoothness, aesthetic_quality, imaging_quality, dynamic_degree.
Semantic Group (Content and Semantic Consistency)ο
Computed similarly; all dimensions in this group have DIM_WEIGHT 1. Includes:
object_class, multiple_objects, human_action, color, spatial_relationship, scene, appearance_style, temporal_style, overall_consistency.
Total (Overall Score)ο
Total = (Quality Γ 4 + Semantic Γ 1) / 5 (corresponds to QUALITY_WEIGHT = 4 and SEMANTIC_WEIGHT = 1 in the code).
Missing Dimensions and Output Directoryο
When a dimension is missing from the results, aggregation treats it as 0 (consistent with normalized.get(k, 0)).
The default work_dir is outputs/default; use --work_dir to change it.
Prompt Suite (Official Prompt Structure)ο
Paths are relative to ais_bench/third_party/vbench/:
Path |
Description |
|---|---|
|
Prompt files per evaluation dimension (~100 entries/dimension) |
|
Combined list across all dimensions |
|
8 categories: Animal, Architecture, Food, Human, Lifestyle, Plant, Scenery, Vehicles |
|
Combined across all categories |
|
Metadata that requires semantic parsing, such as |
Dimension and Prompt Suite Mapping (Standard, 16 Dimensions)ο
The following table shows the mapping of all 16 dimensions in Standard mode to the official Prompt Suite files and their entry counts. During evaluation, prompts are matched automatically via VBench_full_info.json:
Dimension |
Prompt Suite |
Prompt Count |
|---|---|---|
subject_consistency |
subject_consistency |
72 |
background_consistency |
scene |
86 |
temporal_flickering |
temporal_flickering |
75 |
motion_smoothness |
subject_consistency |
72 |
dynamic_degree |
subject_consistency |
72 |
aesthetic_quality |
overall_consistency |
93 |
imaging_quality |
overall_consistency |
93 |
object_class |
object_class |
79 |
multiple_objects |
multiple_objects |
82 |
human_action |
human_action |
100 |
color |
color |
85 |
spatial_relationship |
spatial_relationship |
84 |
scene |
scene |
86 |
temporal_style |
temporal_style |
100 |
appearance_style |
appearance_style |
90 |
overall_consistency |
overall_consistency |
93 |
Inference Result (Video) Generationο
This section is for users who need to generate evaluation videos using the official approach (and does not conflict with the Quick Start that runs evaluation on an existing directory).
Standard Dataset (eval_vbench_standard)ο
Data Source: The Prompt Suite under
ais_bench/third_party/vbench/prompts/.Metadata: Requires
VBench_full_info.json(default file in the third-party directory above).Sampling Scale: Typically 5 videos per prompt;
temporal_flickeringrequires 25 videos so that enough samples remain after the static filter.Random Seed: A different seed per video is recommended (e.g.,
indexorseed+index) to balance diversity and reproducibility.Directory Shape: A flat directory or per-dimension subdirectories are both supported. If you use the per-dimension subdirectory layout, it is recommended to use the same subdirectory naming as the evaluation reading side (see below). The mapping logic is in
dim_to_subdirinais_bench/third_party/vbench/__init__.py.
DATA_PATH Directory Layout for Standard Modeο
DATA_PATH/
|-- subject_consistency/ # same-name dimension
|-- scene/ # background_consistency is mapped here first
|-- overall_consistency/ # aesthetic_quality / imaging_quality are mapped here first
|-- object_class/
|-- multiple_objects/
|-- color/
|-- spatial_relationship/
|-- temporal_style/
|-- human_action/
|-- temporal_flickering/
`-- appearance_style/
Among them, the following dimensions preferentially use the mapped subdirectory during evaluation (if it exists):
background_consistencyβscene/aesthetic_qualityβoverall_consistency/imaging_qualityβoverall_consistency/motion_smoothnessβsubject_consistency/dynamic_degreeβsubject_consistency/
Filenames are recommended to follow {prompt}-{i}.mp4. If you generated 0~24 for temporal_flickering (used by the static filter), at least make sure 0~4 exist; the evaluation side defaults to looking up 0~4 when constructing the temporary full_info.
Custom Dataset (eval_vbench_custom)ο
Data Source: A custom prompt list or prompt file.
Dimensions: Includes
subject_consistency,background_consistency,aesthetic_quality,imaging_quality,temporal_style,overall_consistency,human_action,temporal_flickering,motion_smoothness,dynamic_degree. Excludes dimensions that requireauxiliary_info(such asobject_class,color, andspatial_relationship).
Sampling Pseudocode (Reference Official)ο
The pseudocode below aligns with the Standard reading-side logic: iterate over dimensions, save videos under the corresponding subdirectory, and use a larger sample count for temporal_flickering.
import os
# Directory mapping on the evaluation reading side (see ais_bench/third_party/vbench/__init__.py)
dim_to_subdir = {
"background_consistency": "scene",
"aesthetic_quality": "overall_consistency",
"imaging_quality": "overall_consistency",
"motion_smoothness": "subject_consistency",
"dynamic_degree": "subject_consistency",
}
dimension_list = [
"subject_consistency", "background_consistency", "aesthetic_quality", "imaging_quality",
"object_class", "multiple_objects", "color", "spatial_relationship", "scene",
"temporal_style", "overall_consistency", "human_action", "temporal_flickering",
"motion_smoothness", "dynamic_degree", "appearance_style",
]
if args.seed is not None:
torch.manual_seed(args.seed)
for dimension in dimension_list:
prompt_file = f"ais_bench/third_party/vbench/prompts/prompts_per_dimension/{dimension}.txt"
with open(prompt_file, "r") as f:
prompt_list = [line.strip() for line in f if line.strip()]
n = 25 if dimension == "temporal_flickering" else 5
subdir = dim_to_subdir.get(dimension, dimension)
save_dir = os.path.join(args.save_path, subdir)
os.makedirs(save_dir, exist_ok=True)
for prompt in prompt_list:
for index in range(n):
video = sample_func(prompt, index)
save_path = os.path.join(save_dir, f"{prompt}-{index}.mp4")
torchvision.io.write_video(save_path, video, fps=8)
sample_func denotes the function that connects your generative model to a prompt and produces a video.
Format Requirementsο
Standard Modeο
Filename:
{prompt}-{i}.mp4, where{prompt}isprompt_enfromVBench_full_info.jsonandiranges from 0 to 4.Extensions:
.mp4,.gif,.jpg,.png.
Custom Mode (More Flexible)ο
Method 1: Embed the prompt in the filename:
get_prompt_from_filenameparsesxxxfrom{xxx}.mp4or{xxx}-0.mp4.Method 2: Provide a
prompt_file(JSON:{video_path: prompt}); filename conventions can be ignored.
VBench-1.0-mini (AISBench Official Sampled Subset)ο
VBench-1.0-mini is a VBench 1.0 sampled subset provided by AISBench, randomly selecting a small number of prompts from each of the 16 dimensions in the Prompt Suite. It is intended for fast model capability validation and evaluation pipeline verification. Dataset URL: VBench-1.0-mini.
The core change in VBench-1.0-mini is replacing the original VBench_full_info.json with a condensed version that only contains the sampled prompts and their dimension mappings. All other evaluation code, dimension implementations, and model weights reuse the existing VBench system β no additional installation or modification is required.
Preparationο
Download the VBench-1.0-mini dataset
Download the dataset from Modelers. After downloading and extracting, note the dataset root directory path (referred to below as
<MINI_ROOT>).Download third-party dependency cache
Same as the Standard mode, download VBench small model weights and resources in advance:
bash ais_bench/configs/vbench_examples/download_vbench_cache.sh
Replace VBench_full_info.jsonο
VBench-1.0-mini provides a condensed VBench_full_info.json corresponding to the sampled prompts. Specify the mini JSON path in the configuration file (e.g., eval_vbench_standard.py) using the full_json_dir field under eval_cfg or dataset:
vbench_eval_cfg = dict(
load_ckpt_from_local=True,
full_json_dir="<MINI_ROOT>/VBench_full_info.json",
)
Inference Result (Video) Generationο
VBench-1.0-mini works the same way as Standard mode β only with fewer prompts. It is still recommended to organize generated videos in per-dimension subdirectories:
import os
dim_to_subdir = {
"background_consistency": "scene",
"aesthetic_quality": "overall_consistency",
"imaging_quality": "overall_consistency",
"motion_smoothness": "subject_consistency",
"dynamic_degree": "subject_consistency",
}
# Read the dimensions and prompts to generate from the mini full_info
import json
with open("ais_bench/third_party/vbench/VBench_full_info.json", "r") as f:
full_info = json.load(f)
# Aggregate dimensions by prompt
from collections import defaultdict
prompt_dim_map = defaultdict(set)
for entry in full_info:
prompt_dim_map[entry["prompt_en"]].update(entry["dimension"])
for prompt, dims in prompt_dim_map.items():
for dim in dims:
subdir = dim_to_subdir.get(dim, dim)
save_dir = os.path.join(args.save_path, subdir)
os.makedirs(save_dir, exist_ok=True)
n = 25 if dim == "temporal_flickering" else 5
for index in range(n):
video = sample_func(prompt, index)
save_path = os.path.join(save_dir, f"{prompt}-{index}.mp4")
torchvision.io.write_video(save_path, video, fps=8)
Run Evaluationο
After replacing VBench_full_info.json and preparing the video directory, the evaluation command is exactly the same as Standard mode:
# Must explicitly specify --mode eval, and set DATA_PATH to your video directory
ais_bench ais_bench/configs/vbench_examples/eval_vbench_standard.py --mode eval --max-num-workers 1
If the prompts come from a custom video directory and do not use the dimension mappings in VBench_full_info.json, you can also use the Custom mode configuration for evaluation.