# 🔜 Coming Soon
- **[2025.9]** Provide mainstream Agent evaluation capabilities in the industry, supporting the evaluation of DeepSeek V3.1 Search/Code Agent
- **[2025.10]** Support plug-and-play integration of cutting-edge test benchmarks under the AISBench framework to address the increasingly complex and diverse testing tasks in the industry
- **[2025.11]** Provide cutting-edge multimodal evaluation capabilities in the industry
- [x] **[2025.8]** Will support performance evaluation of multi-turn dialogue datasets such as ShareGPT and BFCL.
- [x] **[2025.8]** Optimize the computing efficiency of the evaluation phase in performance testing, reduce the memory usage of tools, and supplement the tool usage specifications.
- [x] **[2025.7]** For custom datasets used in performance evaluation scenarios, support defining the maximum output length limit for each piece of data.

# 🤝 Acknowledgments
- The code of this project is extended and developed based on 🔗 [OpenCompass](https://github.com/open-compass/opencompass).
- Some datasets and prompt implementations of this project are modified from [simple-evals](https://github.com/openai/simple-evals).
- The performance indicators tracked in the code of this project are aligned with [VLLM Benchmark](https://github.com/vllm-project/vllm/tree/main/benchmarks).
- The BFCL function calling capability evaluation feature of this project is implemented based on [Berkeley Function Calling Leaderboard (BFCL)](https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard).