Supporting New Model Backends

Currently, AISBench supports the following model types:

  • Service Model Backends (executing inference tasks by accessing endpoints provided by service frameworks): vLLM, Triton, TGI, MindIE

  • Local Models Backends (executing inference tasks by loading local model files): HuggingFace, vllmOfflineVL, HuggingFaceVL

For certain custom service frameworks or inference backends, it is usually necessary to implement custom models to access services or call models. Currently, two methods are supported: adding new API models and local models.

Adding New Service Model Backends

To add a new service model backend, create a new file my_custom_api.py in ais_bench/benchmark/models/api_models, inherit from BaseAPIModel, and implement the corresponding functional interfaces according to usage scenarios. The currently supported extensible interfaces are as follows:

  • (Required) get_request_body: Get the request body, used to construct the request body

  • (Required) _get_url: Get the request URL

  • (Required when model supports non-streaming inference) parse_text_response: Parse text response, called when model parameter stream is False

  • (Required when model supports streaming inference) parse_stream_response: Parse stream response, called when model parameter stream is True

Note:

  • When the model supports performance evaluation, the parse_stream_response interface must be implemented

from typing import Dict, Optional, Union

from ais_bench.benchmark.utils.prompt import PromptList
from ais_bench.benchmark.models import BaseAPIModel
from ais_bench.benchmark.models.output import RequestOutput, Output

PromptType = Union[PromptList, str]

class MyCustomAPI(BaseAPIModel):
    is_api: bool = True

    def __init__(
        self,
        path: str = "",             # Model vocabulary path, used to load model vocabulary
        stream: bool = False,       # Whether it is streaming inference
        max_out_len: int = 4096,    # Maximum output length
        retry: int = 2,             # Number of retries on request failure
        api_key: str = "",          # API key
        host_ip: str = "localhost", # Host IP
        host_port: int = 8080,      # Host port
        url: str = "",              # Custom URL
        trust_remote_code: bool = False, # Whether to trust remote code
        generation_kwargs: Optional[Dict] = dict(), # Generation parameters, additional parameters passed to endpoint
    ):
        super().__init__(
            path=path,
            stream=stream,
            max_out_len=max_out_len,
            retry=retry,
            api_key=api_key,
            host_ip=host_ip,
            host_port=host_port,
            url=url,
            generation_kwargs=generation_kwargs,
        )
        self.url = self._get_url()
        ...

    def _get_url(self):
        """Concatenate URI to form complete request URL"""
        ...

    async def get_request_body(self, input: PromptType, max_out_len: int, output: RequestOutput, **args) -> dict:
        """Assemble into dict format request body according to endpoint protocol format, and save debugging information to output"""
        ...

    async def parse_text_response(self, data: Dict, output: RequestOutput):
        """Parse text response returned by server according to endpoint protocol format, save response content to output"""
        ...

    async def parse_stream_response(self, data: Dict, output: RequestOutput):
        """Parse stream response returned by server according to endpoint protocol format, save response content to output"""
        ...

It is recommended to add the new API model class to __init__.py for convenient automatic import later.

For detailed implementation, refer to: VLLMCustomAPIChat

To use the custom API model, add the following configuration in the configuration folder ais_bench/benchmark/configs/models:

from ais_bench.benchmark.models.api_models.my_custom_api import MyCustomAPI
models = [
    dict(
        attr="service",         # (Required) Flag indicating the model is a service API model
        type=MyCustomAPI,       # (Required) Custom API model class
        abbr='my_custom_api',   # (Required) Unique model identifier
        path="",                # (Optional) Model vocabulary path, used to load model vocabulary, must be configured for performance evaluation
        model="",               # (Optional) Model name, some endpoints need this parameter to access services, can call self._get_service_model_path() to automatically obtain
        stream=False,           # (Optional) Whether it is a streaming interface
        request_rate=0,         # (Optional) Request sending rate, send 1 request to server every 1/request_rate seconds, if less than 0.1 then send all requests at once
        retry=2,                # (Optional) Maximum number of retries per request
        api_key="",             # (Optional) API key
        host_ip="localhost",    # (Optional) Host IP
        host_port=8080,         # (Optional) Host port
        url="",                 # (Optional) Custom URL path
        max_out_len=512,        # (Optional) Maximum number of tokens output by inference service
        batch_size=1,           # (Optional) Maximum concurrency of request sending
        trust_remote_code=False, # (Optional) Whether tokenizer trusts remote code, default False
        generation_kwargs=dict(   # (Optional) Model inference parameters, refer to endpoint documentation for configuration, AISBench evaluation tool does not process, attached in sent requests
            temperature=0.01,
            ignore_eos=False
        ),
    )
]

Then execute the command to start service performance evaluation:

ais_bench --models my_custom_api --datasets demo_gsm8k_gen_4_shot_cot_chat_prompt

Adding New Local Model Backends

To add a new local model backend, create a new file my_custom_model.py in ais_bench/benchmark/models/local_models, inherit from BaseModel, and implement the corresponding functional interfaces according to usage scenarios. The currently supported extensible interfaces are as follows:

  • __init__: Initialize model and vocabulary

  • generate: Call the loaded local model to perform generative inference and return inference results

class MyCustomModel(BaseModel):

    def __init__(self,
                 path: str,                # (Required) Model vocabulary path, used to load model vocabulary
                 max_seq_len: int = 2048,  # Maximum sequence length
                 tokenizer_only: bool = False, # Whether to only load vocabulary
                 meta_template: Optional[Dict] = None, # Meta template
                 generation_kwargs: Optional[Dict] = dict(), # Generation parameters
                 sync_rank: bool = False, # Whether to synchronize input
                 **kwargs):
        super().__init__(
            path,
            max_seq_len,
            tokenizer_only,
            meta_template,
            generation_kwargs,
            sync_rank,
        )
        ...

    def generate(self, input: PromptType, max_out_len: int, **kwargs) -> List[str]:
        """Call the loaded local model to perform inference and return inference results"""
        ...

It is recommended to add the new local model class to __init__.py for convenient automatic import later.

For detailed implementation, refer to: HuggingFacewithChatTemplate

To use the custom local model, add the following configuration in the configuration folder ais_bench/benchmark/configs/models/:

from ais_bench.benchmark.models import MyCustomModel

models = [
    dict(
        attr="local",               # (Required) Backend type identifier, fixed as `local` (local model) or `service` (service inference)
        type=MyCustomModel,         # (Required) Model type, custom model class
        abbr='my_custom_model',     # (Required) Unique model identifier
        path='THUDM/chatglm-6b',    # (Required) Model weight path and vocabulary path
        max_out_len=100,            # (Required) Maximum output token length
        batch_size=1,               # (Required) Batch size for each inference
        max_seq_len=2048,           # (Required) Maximum sequence length
        generation_kwargs=dict(     # (Optional) Generation parameters, refer to model documentation for configuration
            temperature=0.0,
            stop_token_ids=None
        ),
        # ... Other optional parameters for model initialization and inference task configuration
    )
]

Then execute the command to start the evaluation task:

ais_bench --models my_custom_model --datasets demo_gsm8k_gen_4_shot_cot_chat_prompt