AISBench Benchmark Tool

🚀 Get Started

  • Tool Installation & Uninstallation
  • Quick Start
  • Dataset Preparation Guide

🧭 Basic Tutorials

  • Supported Evaluation Scenarios
  • Explanation of Evaluation Results
    • Accuracy Evaluation Scenarios: Analysis of Evaluation Metrics
    • Explanation of Performance Evaluation Results
    • User Guide for Performance Test Visualization Concurrency Chart
  • Detailed Parameter Description

🔬 Advanced Tutorials

  • Running AISBench with a Custom Configuration File
  • Service-Oriented Steady-State Performance Testing
  • Request Sending Rate (RPS) Distribution Control and Visualization Guide
  • Guide to Multi-Turn Dialogue Evaluation
  • Guide to Using Random Synthetic Datasets
  • Guide to Using Custom Datasets
  • Evaluation Using Judge Model

📐 Extended Benchmarks

  • Extended Multimodal Generation Benchmarks

💪 Best Practices

  • Evaluating the Mathematical Capabilities of DeepSeek-R1-Distill-Qwen-14B Based on NVIDIA A100 Accelerator Card: 100% Paper Reproduction
  • Evaluating DeepSeek-R1’s Mathematical Capabilities Based on Ascend 800I-A2: 100% Paper Reproduction
  • Reproducing Dataset Evaluation Results from Large Language Model (LLM) Papers (Technical Reports) — Taking the GPQA Dataset Used by DeepSeek R1 as an Example

❓ FAQs

  • AISBench FAQ (Frequently Asked Questions)
  • Error Code Description

👨‍💻 Developer Guide

  • Contributing Guide
  • Supporting New Model Backends
  • Supporting New Datasets and Accuracy Evaluators
  • Supporting New Inferencers

📝 Prompt Engineering

  • Prompt Template
  • Meta Template
  • Prompt Overview
  • Retriever

🏷️ Others

  • 🔜 Coming Soon
  • 🤝 Acknowledgments
AISBench Benchmark Tool
  • Explanation of Evaluation Results
  • View page source

Explanation of Evaluation Results

  • Accuracy Evaluation Scenarios: Analysis of Evaluation Metrics
    • I. Relationship between n, k in Formulas and num_return_sequences in API Configuration Files
    • II. Definitions and Relationships of pass@k, cons@k, avg@n
    • III. Difference Analysis between accuracy (n runs average) and avg@n
  • Explanation of Performance Evaluation Results
    • 1. Performance Output Results for Individual Inference Requests
    • 2. End-to-End Performance Output Results
  • User Guide for Performance Test Visualization Concurrency Chart
    • I. Basic Interactive Operations
    • II. Usage of Advanced Functions
    • III. Cross-Platform Support Description
    • IV. FAQ (Frequently Asked Questions)
Previous Next

© Copyright 2025, AISBench AI System Performance Evaluation Benchmark Committee.

Built with Sphinx using a theme provided by Read the Docs.