Retriever

In dataset configuration files, there is a retriever field that indicates how to retrieve samples from the dataset as in-context examples. The most commonly used one is FixKRetriever, which means fixed use of k samples, thus it is k-shot. There is also ZeroRetriever, which means no samples are used, which in most cases means 0-shot.

On the other hand, in-context samples can also be directly specified in the dataset template. In this case, ZeroRetriever will also be used, but the evaluation at this time is not 0-shot, and needs to be determined according to the specific template. For details, please refer to prompt_template.

Currently, AISBench supports the following Retriever types:

ZeroRetriever: Does not use any samples as in-context examples
FixKRetriever: Fixed use of k samples as in-context examples
RandomRetriever: Random use of k samples as in-context examples

ZeroRetriever

ZeroRetriever is a zero-shot retriever that does not retrieve any samples from the training set as in-context. For each test sample, it returns an empty index list, so it is usually used to implement 0-shot evaluation.

Configuration Method

from ais_bench.benchmark.openicl.icl_retriever import ZeroRetriever

infer_cfg = dict(
    retriever=dict(type=ZeroRetriever),
    # ... Other configurations
)

Function Description

Return Value: For all test samples, returns an empty index list []
Use Cases:
- 0-shot evaluation scenarios
- When in-context samples are already hardcoded in the prompt template (in this case, although ZeroRetriever is used, it is actually not 0-shot)

Actual Example

Assume we have a question-answering dataset with the following samples in the training set:

Training Set (train):

Sample 0: {"question": "What is artificial intelligence?", "answer": "Artificial intelligence is a branch of computer science"}
Sample 1: {"question": "What is Python?", "answer": "Python is a programming language"}

Test Set (test):

Sample 0: {"question": "What is machine learning?", "answer": "Machine learning is a subfield of AI"}

When using ZeroRetriever, for test sample 0, no training samples will be retrieved, and the generated prompt will not contain any in-context examples.

FixKRetriever

FixKRetriever is a fixed K-sample retriever that will use a fixed K samples from the training set as in-context examples for all test samples. This is the most commonly used way to implement k-shot evaluation.

Configuration Method

from ais_bench.benchmark.openicl.icl_retriever import FixKRetriever

infer_cfg = dict(
    retriever=dict(
        type=FixKRetriever,
        fix_id_list=[0, 1, 2, 3, 4]  # Specify to use samples with indices 0,1,2,3,4 from training set
    ),
    # ... Other configurations
)

Parameter Description

fix_id_list (List[int]): Required parameter, specifies the list of training sample indices to use. All test samples will use the same these samples as in-context.

Function Description

Return Value: For all test samples, returns the same index list (i.e., fix_id_list)
Use Cases:
- k-shot evaluation scenarios (such as 5-shot, 10-shot, etc.)
- When it is necessary to ensure that all test samples use the same in-context examples to ensure evaluation consistency

Actual Example

Assume we have a reading comprehension dataset:

Training Set (train):

Sample 0: {"article": "Article A...", "question": "Question 1", "answer": "A"}
Sample 1: {"article": "Article B...", "question": "Question 2", "answer": "B"}
Sample 2: {"article": "Article C...", "question": "Question 3", "answer": "C"}
Sample 3: {"article": "Article D...", "question": "Question 4", "answer": "D"}
Sample 4: {"article": "Article E...", "question": "Question 5", "answer": "A"}

Test Set (test):

Sample 0: {"article": "Article X...", "question": "Test Question 1", "answer": "B"}
Sample 1: {"article": "Article Y...", "question": "Test Question 2", "answer": "C"}

Configuration example (5-shot):

retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4])

Workflow:

For test sample 0:
- Retrieve training samples [0, 1, 2, 3, 4]
- Use ice_template to format these samples as in-context examples
- Insert in-context examples into the test sample’s prompt
For test sample 1:
- Also retrieve training samples [0, 1, 2, 3, 4] (same as test sample 0)
- Use the same in-context examples

Generated Prompt Example (assuming using a simple template):

Read the article, and answer the question by replying A, B, C or D.

Article:
Article A...

Q: Question 1
Answer: A
Read the article, and answer the question by replying A, B, C or D.

Article:
Article B...

Q: Question 2
Answer: B
... (more examples)
Read the article, and answer the question by replying A, B, C or D.

Article:
Article X...

Q: Test Question 1
Answer:

Configuration Examples

The following are some usage examples from actual configuration files:

Example 1: 5-shot Configuration

# ais_bench/benchmark/configs/datasets/race/race_middle_gen_5_shot_chat.py
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4])

Example 2: Using range to Generate Index List

# ais_bench/benchmark/configs/datasets/triviaqa/triviaqa_gen_5_shot_chat_prompt.py
k = 5
retriever=dict(type=FixKRetriever, fix_id_list=list(range(k)))

Example 3: 10-shot Configuration

# ais_bench/benchmark/configs/datasets/hellaswag/hellaswag_gen_10_shot_chat_prompt.py
retriever=dict(type=FixKRetriever, fix_id_list=list(range(10)))

Example 4: 25-shot Configuration

# ais_bench/benchmark/configs/datasets/ARC_c/ARC_c_gen_25_shot_chat_prompt.py
retriever=dict(type=FixKRetriever, fix_id_list=[i for i in range(25)])

Notes

Index Range Check: Indices in fix_id_list must be within the valid range of the training set ([0, len(train))), otherwise an AISBenchValueError exception will be thrown.
Index Order: The order in fix_id_list determines the order in which in-context examples appear in prompts.
Use with ice_template: When using FixKRetriever, it is usually necessary to configure ice_template to format retrieved samples.

RandomRetriever

RandomRetriever is a random retriever that randomly selects K samples from the training set as in-context examples for each test sample. Unlike FixKRetriever, each test sample uses different in-context examples, all randomly selected.

Configuration Method

from ais_bench.benchmark.openicl.icl_retriever.icl_random_retriever import RandomRetriever

infer_cfg = dict(
    retriever=dict(
        type=RandomRetriever,
        ice_num=5,  # Specify the number of samples to retrieve for each test sample
        seed=43     # Random seed, used to ensure result reproducibility, default is 43
    ),
    # ... Other configurations
)

Parameter Description

ice_num (int): Required parameter, specifies the number of samples to retrieve for each test sample. Default is 1.
seed (Optional[int]): Optional parameter, random seed, used to ensure result reproducibility. Default is 43. If the same seed is set, multiple runs will get the same result.

Function Description

Return Value: For each test sample, returns a randomly selected index list with length ice_num
Randomness: In-context examples for each test sample are independently randomly selected
Reproducibility: By setting the seed parameter, reproducible results can be guaranteed under the same configuration
Use Cases:
- When different in-context examples need to be used for each test sample
- Research on the impact of different in-context examples on model performance
- When random sampling is needed to reduce overfitting risk

Actual Example

Assume we have a classification dataset:

Training Set (train):

Sample 0: {"text": "This is an article about technology", "label": "Technology"}
Sample 1: {"text": "This is an article about sports", "label": "Sports"}
Sample 2: {"text": "This is an article about entertainment", "label": "Entertainment"}
Sample 3: {"text": "This is an article about finance", "label": "Finance"}
Sample 4: {"text": "This is an article about education", "label": "Education"}
Sample 5: {"text": "This is an article about health", "label": "Health"}

Test Set (test):

Sample 0: {"text": "This is an article about AI", "label": "Technology"}
Sample 1: {"text": "This is an article about football", "label": "Sports"}

Configuration example (3-shot, seed=123):

retriever=dict(type=RandomRetriever, ice_num=3, seed=123)

Workflow:

For test sample 0:
- Randomly select 3 samples from training set (e.g., [1, 3, 5])
- Use ice_template to format these samples as in-context examples
- Insert in-context examples into the test sample’s prompt
For test sample 1:
- Randomly select 3 samples from training set (e.g., [0, 2, 4], may be different from test sample 0)
- Use ice_template to format these samples as in-context examples
- Insert in-context examples into the test sample’s prompt

Generated Prompt Example (assuming test sample 0 randomly selected training samples [1, 3, 5]):

</E>
Text: This is an article about sports
Label: Sports
</E>
Text: This is an article about finance
Label: Finance
</E>
Text: This is an article about health
Label: Health
</E>
Text: This is an article about AI
Label:

Reproducibility Note:

If the same seed value is used, multiple runs will get the same random result. For example:

# First run
retriever1 = RandomRetriever(dataset, ice_num=3, seed=123)
result1 = retriever1.retrieve()

# Second run (same configuration)
retriever2 = RandomRetriever(dataset, ice_num=3, seed=123)
result2 = retriever2.retrieve()

# result1 and result2 are identical

Notes

Not Fully Tested: The RandomRetriever class has a warning in the code indicating that it has not been fully tested and should be used with caution.
Difference from FixKRetriever:
- FixKRetriever: All test samples use the same in-context examples
- RandomRetriever: Each test sample uses different randomly selected in-context examples
Random Seed: If seed is not specified or different seed values are used each time, results will be different each run, which may affect evaluation result reproducibility.
Use with ice_template: When using RandomRetriever, it is usually necessary to configure ice_template to format retrieved samples.
Import Path: RandomRetriever is not exported in __init__.py and needs to be imported directly from the module path:
```
from ais_bench.benchmark.openicl.icl_retriever.icl_random_retriever import RandomRetriever
```

Complete Configuration Example

The following is a complete dataset configuration example showing how to use both ice_template and FixKRetriever:

from ais_bench.benchmark.openicl.icl_prompt_template import PromptTemplate
from ais_bench.benchmark.openicl.icl_retriever import FixKRetriever
from ais_bench.benchmark.openicl.icl_inferencer import GenInferencer

reader_cfg = dict(
    input_columns=['article', 'question', 'A', 'B', 'C', 'D'],
    output_column='answer',
)

infer_cfg = dict(
    ice_template=dict(
        type=PromptTemplate,
        template=dict(
            begin='</E>',
            round=[
                dict(role='HUMAN', prompt='Read the article, and answer the question by replying A, B, C or D.\n\nArticle:\n{article}\n\nQ: {question}\n\nA. {A}\nB. {B}\nC. {C}\nD. {D}\nAnswer:'),
                dict(role='BOT', prompt='{answer}'),
            ]
        ),
        ice_token='</E>',  # Used to identify the position of in-context examples
    ),
    retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]),  # 5-shot
    inferencer=dict(type=GenInferencer),
)

Complete Configuration Example (RandomRetriever)

The following is a complete configuration example using RandomRetriever:

from ais_bench.benchmark.openicl.icl_prompt_template import PromptTemplate
from ais_bench.benchmark.openicl.icl_retriever.icl_random_retriever import RandomRetriever
from ais_bench.benchmark.openicl.icl_inferencer import GenInferencer

reader_cfg = dict(
    input_columns=['text'],
    output_column='label',
)

infer_cfg =dict(
    ice_template=dict(
        type=PromptTemplate,
        template=dict(
            begin='</E>',
            round=[
                dict(role='HUMAN', prompt='Text: {text}'),
                dict(role='BOT', prompt='Label: {label}'),
            ]
        ),
        ice_token='</E>',
    ),
    retriever=dict(type=RandomRetriever, ice_num=3, seed=123),  # 3-shot, random selection
    inferencer=dict(type=GenInferencer),
)