Welcome to AISBench Benchmark Tool English Tutorial β¨ο
π Introductionο
AISBench Benchmark is a model evaluation tool built on OpenCompass, compatible with OpenCompassβs configuration system, dataset structure, and model backend implementation, while extending support for service-based models.
Currently, AISBench supports two major types of inference task evaluation scenarios:
π Accuracy Evaluation: Supports accuracy verification of service-based models and local models on various question-answering and reasoning benchmark datasets.
π Performance Evaluation: Supports the assessment of latency and throughput for service-oriented models, and enables extreme performance testing under stress test scenarios.
π Recommended Getting Started Pathο
To help you quickly get started with AISBench Benchmark Tool, we recommend learning in the following order:
For users who want to use AISBench Benchmark Tool, it is recommended to first read the Installation Guide to ensure correct environment configuration.
The Quick Start provided in this tutorial will guide you through basic accuracy evaluation configuration and execution.
The Dataset Preparation Guide will help you understand the supported datasets and how to prepare them for evaluation.
The Basic Tutorial section will introduce Evaluation Scenario Introduction, Evaluation Result Explanation, and Detailed Parameter Description to help you better understand the use of major evaluation scenarios.
For a deeper understanding of advanced usage of AISBench Benchmark Tool, you can refer to the Advanced Tutorial.
You can refer to the Best Practices section to learn best practices for using AISBench Benchmark Tool in different scenarios.
Finally, you can refer to the Frequently Asked Questions section to solve problems encountered during the use of AISBench Benchmark Tool.