AISBench Benchmark Tool

🚀 Get Started

  • Tool Installation & Uninstallation
  • Quick Start
  • Dataset Preparation Guide

🧭 Basic Tutorials

  • Supported Evaluation Scenarios
    • Introduction to Evaluation Scenarios
    • Service-Oriented Accuracy Evaluation
    • Pure Model Accuracy Evaluation
    • Guide to Service-Oriented Performance Evaluation
    • Modification of Configuration Files Corresponding to Tasks
    • Execute Commands
    • View Performance Results
    • View Performance Details
    • Preconditions for Service-Oriented Performance Evaluation
    • Main Functional Scenarios
  • Explanation of Evaluation Results
  • Detailed Parameter Description

🔬 Advanced Tutorials

  • Running AISBench with a Custom Configuration File
  • Service-Oriented Steady-State Performance Testing
  • Request Sending Rate (RPS) Distribution Control and Visualization Guide
  • Guide to Multi-Turn Dialogue Evaluation
  • Guide to Using Random Synthetic Datasets
  • Guide to Using Custom Datasets
  • Evaluation Using Judge Model

📐 Extended Benchmarks

  • Extended Multimodal Generation Benchmarks

💪 Best Practices

  • Evaluating the Mathematical Capabilities of DeepSeek-R1-Distill-Qwen-14B Based on NVIDIA A100 Accelerator Card: 100% Paper Reproduction
  • Evaluating DeepSeek-R1’s Mathematical Capabilities Based on Ascend 800I-A2: 100% Paper Reproduction
  • Reproducing Dataset Evaluation Results from Large Language Model (LLM) Papers (Technical Reports) — Taking the GPQA Dataset Used by DeepSeek R1 as an Example

❓ FAQs

  • AISBench FAQ (Frequently Asked Questions)
  • Error Code Description

👨‍💻 Developer Guide

  • Contributing Guide
  • Supporting New Model Backends
  • Supporting New Datasets and Accuracy Evaluators
  • Supporting New Inferencers

📝 Prompt Engineering

  • Prompt Template
  • Meta Template
  • Prompt Overview
  • Retriever

🏷️ Others

  • 🔜 Coming Soon
  • 🤝 Acknowledgments
AISBench Benchmark Tool
  • Supported Evaluation Scenarios
  • View page source

Supported Evaluation Scenarios

  • Introduction to Evaluation Scenarios
    • Accuracy Evaluation
    • Performance Evaluation
  • Service-Oriented Accuracy Evaluation
    • Preconditions for Service-Oriented Accuracy Evaluation
    • Main Functional Scenarios
    • Other Functional Scenarios
  • Pure Model Accuracy Evaluation
    • Test Preparation
    • Main Functions
    • Other Functions
  • Guide to Service-Oriented Performance Evaluation
    • Introduction
    • Quick Start for Service-Oriented Performance Evaluation
  • Modification of Configuration Files Corresponding to Tasks
  • Execute Commands
    • View Task Execution Details
  • View Performance Results
  • View Performance Details
  • Preconditions for Service-Oriented Performance Evaluation
  • Main Functional Scenarios
    • Single-Task Evaluation
    • Multi-Task Evaluation
    • Other Functional Scenarios
    • Specifications for Service-Oriented Performance Testing
Previous Next

© Copyright 2025, AISBench AI System Performance Evaluation Benchmark Committee.

Built with Sphinx using a theme provided by Read the Docs.