Supporting New Model Backends
Currently, AISBench supports the following model types:
Service Model Backends (executing inference tasks by accessing endpoints provided by service frameworks): vLLM, Triton, TGI, MindIE
Local Models Backends (executing inference tasks by loading local model files): HuggingFace, vllmOfflineVL, HuggingFaceVL
For certain custom service frameworks or inference backends, it is usually necessary to implement custom models to access services or call models. Currently, two methods are supported: adding new API models and local models.
Adding New Service Model Backends
To add a new service model backend, create a new file my_custom_api.py in ais_bench/benchmark/models/api_models, inherit from BaseAPIModel, and implement the corresponding functional interfaces according to usage scenarios. The currently supported extensible interfaces are as follows:
(Required)
get_request_body: Get the request body, used to construct the request body(Required)
_get_url: Get the request URL(Required when model supports non-streaming inference)
parse_text_response: Parse text response, called when model parameterstreamisFalse(Required when model supports streaming inference)
parse_stream_response: Parse stream response, called when model parameterstreamisTrue
Note:
When the model supports performance evaluation, the
parse_stream_responseinterface must be implemented
from typing import Dict, Optional, Union
from ais_bench.benchmark.utils.prompt import PromptList
from ais_bench.benchmark.models import BaseAPIModel
from ais_bench.benchmark.models.output import RequestOutput, Output
PromptType = Union[PromptList, str]
class MyCustomAPI(BaseAPIModel):
is_api: bool = True
def __init__(
self,
path: str = "", # Model vocabulary path, used to load model vocabulary
stream: bool = False, # Whether it is streaming inference
max_out_len: int = 4096, # Maximum output length
retry: int = 2, # Number of retries on request failure
api_key: str = "", # API key
host_ip: str = "localhost", # Host IP
host_port: int = 8080, # Host port
url: str = "", # Custom URL
trust_remote_code: bool = False, # Whether to trust remote code
generation_kwargs: Optional[Dict] = dict(), # Generation parameters, additional parameters passed to endpoint
):
super().__init__(
path=path,
stream=stream,
max_out_len=max_out_len,
retry=retry,
api_key=api_key,
host_ip=host_ip,
host_port=host_port,
url=url,
generation_kwargs=generation_kwargs,
)
self.url = self._get_url()
...
def _get_url(self):
"""Concatenate URI to form complete request URL"""
...
async def get_request_body(self, input: PromptType, max_out_len: int, output: RequestOutput, **args) -> dict:
"""Assemble into dict format request body according to endpoint protocol format, and save debugging information to output"""
...
async def parse_text_response(self, data: Dict, output: RequestOutput):
"""Parse text response returned by server according to endpoint protocol format, save response content to output"""
...
async def parse_stream_response(self, data: Dict, output: RequestOutput):
"""Parse stream response returned by server according to endpoint protocol format, save response content to output"""
...
It is recommended to add the new API model class to __init__.py for convenient automatic import later.
For detailed implementation, refer to: VLLMCustomAPIChat
To use the custom API model, add the following configuration in the configuration folder ais_bench/benchmark/configs/models:
from ais_bench.benchmark.models.api_models.my_custom_api import MyCustomAPI
models = [
dict(
attr="service", # (Required) Flag indicating the model is a service API model
type=MyCustomAPI, # (Required) Custom API model class
abbr='my_custom_api', # (Required) Unique model identifier
path="", # (Optional) Model vocabulary path, used to load model vocabulary, must be configured for performance evaluation
model="", # (Optional) Model name, some endpoints need this parameter to access services, can call self._get_service_model_path() to automatically obtain
stream=False, # (Optional) Whether it is a streaming interface
request_rate=0, # (Optional) Request sending rate, send 1 request to server every 1/request_rate seconds, if less than 0.1 then send all requests at once
retry=2, # (Optional) Maximum number of retries per request
api_key="", # (Optional) API key
host_ip="localhost", # (Optional) Host IP
host_port=8080, # (Optional) Host port
url="", # (Optional) Custom URL path
max_out_len=512, # (Optional) Maximum number of tokens output by inference service
batch_size=1, # (Optional) Maximum concurrency of request sending
trust_remote_code=False, # (Optional) Whether tokenizer trusts remote code, default False
generation_kwargs=dict( # (Optional) Model inference parameters, refer to endpoint documentation for configuration, AISBench evaluation tool does not process, attached in sent requests
temperature=0.01,
ignore_eos=False
),
)
]
Then execute the command to start service performance evaluation:
ais_bench --models my_custom_api --datasets demo_gsm8k_gen_4_shot_cot_chat_prompt
Adding New Local Model Backends
To add a new local model backend, create a new file my_custom_model.py in ais_bench/benchmark/models/local_models, inherit from BaseModel, and implement the corresponding functional interfaces according to usage scenarios. The currently supported extensible interfaces are as follows:
__init__: Initialize model and vocabularygenerate: Call the loaded local model to perform generative inference and return inference results
class MyCustomModel(BaseModel):
def __init__(self,
path: str, # (Required) Model vocabulary path, used to load model vocabulary
max_seq_len: int = 2048, # Maximum sequence length
tokenizer_only: bool = False, # Whether to only load vocabulary
meta_template: Optional[Dict] = None, # Meta template
generation_kwargs: Optional[Dict] = dict(), # Generation parameters
sync_rank: bool = False, # Whether to synchronize input
**kwargs):
super().__init__(
path,
max_seq_len,
tokenizer_only,
meta_template,
generation_kwargs,
sync_rank,
)
...
def generate(self, input: PromptType, max_out_len: int, **kwargs) -> List[str]:
"""Call the loaded local model to perform inference and return inference results"""
...
It is recommended to add the new local model class to __init__.py for convenient automatic import later.
For detailed implementation, refer to: HuggingFacewithChatTemplate
To use the custom local model, add the following configuration in the configuration folder ais_bench/benchmark/configs/models/:
from ais_bench.benchmark.models import MyCustomModel
models = [
dict(
attr="local", # (Required) Backend type identifier, fixed as `local` (local model) or `service` (service inference)
type=MyCustomModel, # (Required) Model type, custom model class
abbr='my_custom_model', # (Required) Unique model identifier
path='THUDM/chatglm-6b', # (Required) Model weight path and vocabulary path
max_out_len=100, # (Required) Maximum output token length
batch_size=1, # (Required) Batch size for each inference
max_seq_len=2048, # (Required) Maximum sequence length
generation_kwargs=dict( # (Optional) Generation parameters, refer to model documentation for configuration
temperature=0.0,
stop_token_ids=None
),
# ... Other optional parameters for model initialization and inference task configuration
)
]
Then execute the command to start the evaluation task:
ais_bench --models my_custom_model --datasets demo_gsm8k_gen_4_shot_cot_chat_prompt