Retrieverο
In dataset configuration files, there is a retriever field that indicates how to retrieve samples from the dataset as in-context examples. The most commonly used one is FixKRetriever, which means fixed use of k samples, thus it is k-shot. There is also ZeroRetriever, which means no samples are used, which in most cases means 0-shot.
On the other hand, in-context samples can also be directly specified in the dataset template. In this case, ZeroRetriever will also be used, but the evaluation at this time is not 0-shot, and needs to be determined according to the specific template. For details, please refer to prompt_template.
Currently, AISBench supports the following Retriever types:
ZeroRetriever: Does not use any samples as in-context examplesFixKRetriever: Fixed use of k samples as in-context examplesRandomRetriever: Random use of k samples as in-context examples
ZeroRetrieverο
ZeroRetriever is a zero-shot retriever that does not retrieve any samples from the training set as in-context. For each test sample, it returns an empty index list, so it is usually used to implement 0-shot evaluation.
Configuration Methodο
from ais_bench.benchmark.openicl.icl_retriever import ZeroRetriever
infer_cfg = dict(
retriever=dict(type=ZeroRetriever),
# ... Other configurations
)
Function Descriptionο
Return Value: For all test samples, returns an empty index list
[]Use Cases:
0-shot evaluation scenarios
When in-context samples are already hardcoded in the prompt template (in this case, although
ZeroRetrieveris used, it is actually not 0-shot)
Actual Exampleο
Assume we have a question-answering dataset with the following samples in the training set:
Training Set (train):
Sample 0:
{"question": "What is artificial intelligence?", "answer": "Artificial intelligence is a branch of computer science"}Sample 1:
{"question": "What is Python?", "answer": "Python is a programming language"}
Test Set (test):
Sample 0:
{"question": "What is machine learning?", "answer": "Machine learning is a subfield of AI"}
When using ZeroRetriever, for test sample 0, no training samples will be retrieved, and the generated prompt will not contain any in-context examples.
FixKRetrieverο
FixKRetriever is a fixed K-sample retriever that will use a fixed K samples from the training set as in-context examples for all test samples. This is the most commonly used way to implement k-shot evaluation.
Configuration Methodο
from ais_bench.benchmark.openicl.icl_retriever import FixKRetriever
infer_cfg = dict(
retriever=dict(
type=FixKRetriever,
fix_id_list=[0, 1, 2, 3, 4] # Specify to use samples with indices 0,1,2,3,4 from training set
),
# ... Other configurations
)
Parameter Descriptionο
fix_id_list(List[int]): Required parameter, specifies the list of training sample indices to use. All test samples will use the same these samples as in-context.
Function Descriptionο
Return Value: For all test samples, returns the same index list (i.e.,
fix_id_list)Use Cases:
k-shot evaluation scenarios (such as 5-shot, 10-shot, etc.)
When it is necessary to ensure that all test samples use the same in-context examples to ensure evaluation consistency
Actual Exampleο
Assume we have a reading comprehension dataset:
Training Set (train):
Sample 0:
{"article": "Article A...", "question": "Question 1", "answer": "A"}Sample 1:
{"article": "Article B...", "question": "Question 2", "answer": "B"}Sample 2:
{"article": "Article C...", "question": "Question 3", "answer": "C"}Sample 3:
{"article": "Article D...", "question": "Question 4", "answer": "D"}Sample 4:
{"article": "Article E...", "question": "Question 5", "answer": "A"}
Test Set (test):
Sample 0:
{"article": "Article X...", "question": "Test Question 1", "answer": "B"}Sample 1:
{"article": "Article Y...", "question": "Test Question 2", "answer": "C"}
Configuration example (5-shot):
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4])
Workflow:
For test sample 0:
Retrieve training samples [0, 1, 2, 3, 4]
Use
ice_templateto format these samples as in-context examplesInsert in-context examples into the test sampleβs prompt
For test sample 1:
Also retrieve training samples [0, 1, 2, 3, 4] (same as test sample 0)
Use the same in-context examples
Generated Prompt Example (assuming using a simple template):
Read the article, and answer the question by replying A, B, C or D.
Article:
Article A...
Q: Question 1
Answer: A
Read the article, and answer the question by replying A, B, C or D.
Article:
Article B...
Q: Question 2
Answer: B
... (more examples)
Read the article, and answer the question by replying A, B, C or D.
Article:
Article X...
Q: Test Question 1
Answer:
Configuration Examplesο
The following are some usage examples from actual configuration files:
Example 1: 5-shot Configurationο
# ais_bench/benchmark/configs/datasets/race/race_middle_gen_5_shot_chat.py
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4])
Example 2: Using range to Generate Index Listο
# ais_bench/benchmark/configs/datasets/triviaqa/triviaqa_gen_5_shot_chat_prompt.py
k = 5
retriever=dict(type=FixKRetriever, fix_id_list=list(range(k)))
Example 3: 10-shot Configurationο
# ais_bench/benchmark/configs/datasets/hellaswag/hellaswag_gen_10_shot_chat_prompt.py
retriever=dict(type=FixKRetriever, fix_id_list=list(range(10)))
Example 4: 25-shot Configurationο
# ais_bench/benchmark/configs/datasets/ARC_c/ARC_c_gen_25_shot_chat_prompt.py
retriever=dict(type=FixKRetriever, fix_id_list=[i for i in range(25)])
Notesο
Index Range Check: Indices in
fix_id_listmust be within the valid range of the training set ([0, len(train))), otherwise anAISBenchValueErrorexception will be thrown.Index Order: The order in
fix_id_listdetermines the order in which in-context examples appear in prompts.Use with ice_template: When using
FixKRetriever, it is usually necessary to configureice_templateto format retrieved samples.
RandomRetrieverο
RandomRetriever is a random retriever that randomly selects K samples from the training set as in-context examples for each test sample. Unlike FixKRetriever, each test sample uses different in-context examples, all randomly selected.
Configuration Methodο
from ais_bench.benchmark.openicl.icl_retriever.icl_random_retriever import RandomRetriever
infer_cfg = dict(
retriever=dict(
type=RandomRetriever,
ice_num=5, # Specify the number of samples to retrieve for each test sample
seed=43 # Random seed, used to ensure result reproducibility, default is 43
),
# ... Other configurations
)
Parameter Descriptionο
ice_num(int): Required parameter, specifies the number of samples to retrieve for each test sample. Default is 1.seed(Optional[int]): Optional parameter, random seed, used to ensure result reproducibility. Default is 43. If the same seed is set, multiple runs will get the same result.
Function Descriptionο
Return Value: For each test sample, returns a randomly selected index list with length
ice_numRandomness: In-context examples for each test sample are independently randomly selected
Reproducibility: By setting the
seedparameter, reproducible results can be guaranteed under the same configurationUse Cases:
When different in-context examples need to be used for each test sample
Research on the impact of different in-context examples on model performance
When random sampling is needed to reduce overfitting risk
Actual Exampleο
Assume we have a classification dataset:
Training Set (train):
Sample 0:
{"text": "This is an article about technology", "label": "Technology"}Sample 1:
{"text": "This is an article about sports", "label": "Sports"}Sample 2:
{"text": "This is an article about entertainment", "label": "Entertainment"}Sample 3:
{"text": "This is an article about finance", "label": "Finance"}Sample 4:
{"text": "This is an article about education", "label": "Education"}Sample 5:
{"text": "This is an article about health", "label": "Health"}
Test Set (test):
Sample 0:
{"text": "This is an article about AI", "label": "Technology"}Sample 1:
{"text": "This is an article about football", "label": "Sports"}
Configuration example (3-shot, seed=123):
retriever=dict(type=RandomRetriever, ice_num=3, seed=123)
Workflow:
For test sample 0:
Randomly select 3 samples from training set (e.g., [1, 3, 5])
Use
ice_templateto format these samples as in-context examplesInsert in-context examples into the test sampleβs prompt
For test sample 1:
Randomly select 3 samples from training set (e.g., [0, 2, 4], may be different from test sample 0)
Use
ice_templateto format these samples as in-context examplesInsert in-context examples into the test sampleβs prompt
Generated Prompt Example (assuming test sample 0 randomly selected training samples [1, 3, 5]):
</E>
Text: This is an article about sports
Label: Sports
</E>
Text: This is an article about finance
Label: Finance
</E>
Text: This is an article about health
Label: Health
</E>
Text: This is an article about AI
Label:
Reproducibility Note:
If the same seed value is used, multiple runs will get the same random result. For example:
# First run
retriever1 = RandomRetriever(dataset, ice_num=3, seed=123)
result1 = retriever1.retrieve()
# Second run (same configuration)
retriever2 = RandomRetriever(dataset, ice_num=3, seed=123)
result2 = retriever2.retrieve()
# result1 and result2 are identical
Notesο
Not Fully Tested: The
RandomRetrieverclass has a warning in the code indicating that it has not been fully tested and should be used with caution.Difference from FixKRetriever:
FixKRetriever: All test samples use the same in-context examplesRandomRetriever: Each test sample uses different randomly selected in-context examples
Random Seed: If
seedis not specified or differentseedvalues are used each time, results will be different each run, which may affect evaluation result reproducibility.Use with ice_template: When using
RandomRetriever, it is usually necessary to configureice_templateto format retrieved samples.Import Path:
RandomRetrieveris not exported in__init__.pyand needs to be imported directly from the module path:from ais_bench.benchmark.openicl.icl_retriever.icl_random_retriever import RandomRetriever
Complete Configuration Exampleο
The following is a complete dataset configuration example showing how to use both ice_template and FixKRetriever:
from ais_bench.benchmark.openicl.icl_prompt_template import PromptTemplate
from ais_bench.benchmark.openicl.icl_retriever import FixKRetriever
from ais_bench.benchmark.openicl.icl_inferencer import GenInferencer
reader_cfg = dict(
input_columns=['article', 'question', 'A', 'B', 'C', 'D'],
output_column='answer',
)
infer_cfg = dict(
ice_template=dict(
type=PromptTemplate,
template=dict(
begin='</E>',
round=[
dict(role='HUMAN', prompt='Read the article, and answer the question by replying A, B, C or D.\n\nArticle:\n{article}\n\nQ: {question}\n\nA. {A}\nB. {B}\nC. {C}\nD. {D}\nAnswer:'),
dict(role='BOT', prompt='{answer}'),
]
),
ice_token='</E>', # Used to identify the position of in-context examples
),
retriever=dict(type=FixKRetriever, fix_id_list=[0, 1, 2, 3, 4]), # 5-shot
inferencer=dict(type=GenInferencer),
)
Complete Configuration Example (RandomRetriever)ο
The following is a complete configuration example using RandomRetriever:
from ais_bench.benchmark.openicl.icl_prompt_template import PromptTemplate
from ais_bench.benchmark.openicl.icl_retriever.icl_random_retriever import RandomRetriever
from ais_bench.benchmark.openicl.icl_inferencer import GenInferencer
reader_cfg = dict(
input_columns=['text'],
output_column='label',
)
infer_cfg =dict(
ice_template=dict(
type=PromptTemplate,
template=dict(
begin='</E>',
round=[
dict(role='HUMAN', prompt='Text: {text}'),
dict(role='BOT', prompt='Label: {label}'),
]
),
ice_token='</E>',
),
retriever=dict(type=RandomRetriever, ice_num=3, seed=123), # 3-shot, random selection
inferencer=dict(type=GenInferencer),
)