Inferencer Overview

Inferencer is the core component in AISBench responsible for executing model inference. It connects datasets, retrievers (Retriever), and models, and is responsible for sending processed prompts to models for inference and collecting and managing inference results.

Core Functions

Inferencer undertakes the following core responsibilities in AISBench’s evaluation workflow:

Data Preparation: Get data list from retriever (Retriever), including input prompts, ground truth, and other information
Model Invocation: Adopt different methods to call models for inference according to model type (API models or local models)
- API Models: Call service inference interfaces through async HTTP requests
- Local Models: Directly call locally loaded models for batch inference
Result Management: Collect, process, and save inference results, including:
- Model-generated text content
- Inference status (success/failure)
- Performance metrics (such as latency, throughput, etc., in performance mode)
- Error information (if inference fails)
Status Tracking: In performance evaluation mode, track and statistics request status, including:
- Number of sent requests (post)
- Number of received responses (rev)
- Number of failed requests (failed)
- Number of completed requests (finish)

Architecture Design

Inferencer adopts a layered design, including the following base classes:

BaseInferencer: Base class for all inferencers, providing common functions such as model building and output processing
BaseApiInferencer: Base class for API model inferencers, providing async request processing, status tracking, and other functions
BaseLocalInferencer: Base class for local model inferencers, providing batch inference, data loading, and other functions

Inferencers can inherit from both BaseApiInferencer and BaseLocalInferencer as needed to support both API models and local models.

Currently Supported Inferencer Types

AISBench currently supports the following inferencer types:

1. GenInferencer (Generative Inferencer)

Function: Inferencer for generative tasks, supporting text generation, question answering, and other tasks.

Features:

Supports both API models and local models
Supports streaming and non-streaming inference
Supports performance evaluation mode
Supports custom stopping criteria (stopping_criteria)

Use Cases:

Text generation tasks
Question answering tasks
Code generation tasks
Mathematical reasoning tasks

Implementation File: icl_gen_inferencer.py

2. MultiTurnGenInferencer (Multi-turn Dialogue Inferencer)

Function: Inferencer for multi-turn dialogue tasks, supporting multi-turn interactive dialogue scenarios.

Features:

Supports both API models and local models
Supports multiple inference modes:
- every: Round-by-round inference, using model’s previous round output as next round input
- last: Only infer the last round
- every_with_gt: Round-by-round inference, but using ground truth instead of model output
Supports performance evaluation mode

Use Cases:

Multi-turn dialogue tasks
Tasks requiring contextual interaction
Conversational question answering tasks

Implementation File: icl_multiturn_inferencer.py

3. PPLInferencer (Perplexity Inferencer)

Function: Inferencer for Perplexity evaluation, selecting answers by calculating the perplexity of each option, mainly used for multiple choice question (MCQ) tasks.

Features:

Only supports API models (does not support local models)
Does not support streaming inference
Does not support performance evaluation mode
Selects the option with the lowest perplexity as the prediction result by calculating the perplexity of each candidate answer

Use Cases:

Multiple choice question (MCQ) tasks
Classification tasks requiring selection based on perplexity

Implementation File: ppl_inferencer.py

4. BFCLV3FunctionCallInferencer (Function Call Inferencer)

Function: Inferencer for function call tasks, supporting scenarios where models call external functions or tools.

Features:

Only supports API models
Supports multi-turn function calls
Supports holdout function mechanism
Supports result processing and feedback for function calls

Use Cases:

Function call tasks
Tool usage tasks
Tasks requiring models to call external APIs

Implementation File: icl_bfcl_v3_inferencer.py

Inferencer Selection Guide

Select the appropriate inferencer according to different task types and model types:

Task Type	Model Type	Recommended Inferencer
Text generation, Question answering	API models	GenInferencer
Text generation, Question answering	Local models	GenInferencer
Multi-turn dialogue	API models	MultiTurnGenInferencer
Multi-turn dialogue	Local models	MultiTurnGenInferencer
Multiple choice questions (MCQ)	API models	PPLInferencer
Function calls	API models	BFCLV3FunctionCallInferencer

Inferencer Overview

Core Functions

Architecture Design

Currently Supported Inferencer Types

1. GenInferencer (Generative Inferencer)

2. MultiTurnGenInferencer (Multi-turn Dialogue Inferencer)

3. PPLInferencer (Perplexity Inferencer)

4. BFCLV3FunctionCallInferencer (Function Call Inferencer)

Inferencer Selection Guide

Relationship with Related Components

Further Reading