# Introduction to Evaluation Scenarios ### Accuracy Evaluation #### Service-Oriented Accuracy Evaluation - **Function Description**: Evaluate the prediction accuracy of a model deployed as a service on specific datasets. Currently supports accuracy evaluation based on generative and PPL (Perplexity-based) modes. - **Requirements**: The model has been deployed, and its actual service capabilities need to be tested. - **Model Tasks and Dataset Tasks Supported by This Scenario**: - **Model Tasks**: 📚 [Service-Oriented Inference Backend](../all_params/models.md#service-oriented-inference-backend) - **Dataset Tasks**: 📚 [Open-Source Datasets](../all_params/datasets.md#open-source-datasets) and 📚 [Custom Datasets](../all_params/datasets.md#custom-datasets) - **Constraint**: Currently, PPL mode accuracy evaluation tasks only support `vllm_api_general` and `vllm_api_general_chat` model configurations; other configurations are not supported. After selecting the **model task** and **dataset task** according to your usage needs, refer to the document for detailed usage of this scenario: 📚 [Service-Oriented Accuracy Evaluation Guide](accuracy_benchmark.md) #### Pure Model Accuracy Evaluation - **Function Description**: Evaluate the accuracy of locally loaded models (non-service-oriented) on different datasets. - **Requirements**: Offline model weights and a deployment environment. - **Supported Items**: - **Model Tasks**: 📚 [Local Model Backend](../all_params/models.md#local-model-backend) - **Dataset Tasks**: 📚 [Open-Source Datasets](../all_params/datasets.md#open-source-datasets) and 📚 [Custom Datasets](../all_params/datasets.md#custom-datasets) - **Constraint**: PPL mode evaluation tasks are not supported. After selecting the **model task** and **dataset task** according to your usage needs, refer to the document for detailed usage of this scenario: 📚 [Pure Model Accuracy Evaluation Guide](accuracy_benchmark_local.md) ### Performance Evaluation #### Service-Oriented Performance Evaluation - **Function Description**: Evaluate the operational efficiency (throughput, latency) of a service model in a real deployment environment. - **Requirements**: The model inference service must support access via a **streaming interface**. - **Supported Items**: - **Model Tasks**: Streaming interface types in 📚 [Service-Oriented Inference Backend](../all_params/models.md#service-oriented-inference-backend) - **Dataset Tasks**: All data types in 📚 [Supported Dataset Types](../all_params/datasets.md#supported-dataset-types) - **Note**: The cache size occupied by performance evaluation is proportional to the context length of requests and the number of requests, so it usually increases positively with the evaluation duration. - **Constraint**: PPL mode evaluation tasks are not supported. After selecting the **model task** and **dataset task** according to your usage needs, refer to the document for detailed usage of this scenario: 📚 [Service-Oriented Performance Evaluation Guide](performance_benchmark.md)