AISBench Benchmark Tool

🚀 Get Started

  • Tool Installation & Uninstallation
  • Quick Start
  • Dataset Preparation Guide

🧭 Basic Tutorials

  • Supported Evaluation Scenarios
  • Explanation of Evaluation Results
  • Detailed Parameter Description

🔬 Advanced Tutorials

  • Running AISBench with a Custom Configuration File
  • Service-Oriented Steady-State Performance Testing
  • Request Sending Rate (RPS) Distribution Control and Visualization Guide
  • Guide to Multi-Turn Dialogue Evaluation
  • Guide to Using Random Synthetic Datasets
  • Guide to Using Custom Datasets
  • Evaluation Using Judge Model

📐 Extended Benchmarks

  • Extended Multimodal Generation Benchmarks
    • GEdit-Bench
    • VBench 1.0

💪 Best Practices

  • Evaluating the Mathematical Capabilities of DeepSeek-R1-Distill-Qwen-14B Based on NVIDIA A100 Accelerator Card: 100% Paper Reproduction
  • Evaluating DeepSeek-R1’s Mathematical Capabilities Based on Ascend 800I-A2: 100% Paper Reproduction
  • Reproducing Dataset Evaluation Results from Large Language Model (LLM) Papers (Technical Reports) — Taking the GPQA Dataset Used by DeepSeek R1 as an Example

❓ FAQs

  • AISBench FAQ (Frequently Asked Questions)
  • Error Code Description

👨‍💻 Developer Guide

  • Contributing Guide
  • Supporting New Model Backends
  • Supporting New Datasets and Accuracy Evaluators
  • Supporting New Inferencers

📝 Prompt Engineering

  • Prompt Template
  • Meta Template
  • Prompt Overview
  • Retriever

🏷️ Others

  • 🔜 Coming Soon
  • 🤝 Acknowledgments
AISBench Benchmark Tool
  • Extended Multimodal Generation Benchmarks
  • View page source

Extended Multimodal Generation Benchmarks

  • GEdit-Bench
    • Introduction to GEdit-Bench
    • AISBench GEdit-Bench Evaluation Practice
  • VBench 1.0
    • Table of Contents
    • Dependencies and Environment
    • Quick Start
    • Configuration and Output
    • Score Aggregation (Quality / Semantic / Total)
    • Prompt Suite (Official Prompt Structure)
    • Inference Result (Video) Generation
    • Sampling Pseudocode (Reference Official)
    • Format Requirements
    • VBench-1.0-mini (AISBench Official Sampled Subset)
Previous Next

© Copyright 2025, AISBench AI System Performance Evaluation Benchmark Committee.

Built with Sphinx using a theme provided by Read the Docs.