Tutorial on Running AIGV Evaluations
This tutorial explains how to run evaluations using the AIGVE toolkit
. Once you have prepared your dataset, customized the dataloader, and implemented your metric module, you can configure and launch the AIGVE evaluation loop with ease.
Prerequisites
Before running an evaluation, make sure:
-
Your dataset is formatted and organized correctly (Dataset Preparation)
-
Your dataloader is defined and included in a config file (Customizable Dataloaders)
-
Your metric is implemented and registered (Modular Metrics)
-
Your configuration is ready (Configuration Files)
Running an Evaluation from a Configuration file
You can navigate to the AIGVE main directory and run your evaluation using the main_aigve.py
script with your desired config:
For example, to run the GSTVQA metric under its configuration file:
There are some other examples:
For SimpleVQA:
For LightVQAPlus:
cd AIGVE_Tool/aigve
python main_aigve.py AIGVE_Tool/aigve/configs/lightvqa_plus.py --work-dir ./output
Note: Some metrics require downloading pretrained models manually. Make sure they are downloaded correctly and placed in the correct paths as specified in the config (e.g., model_path
).
Under our AIGVE Toolkit design, these running processes will:
-
Load the dataset and dataloader
-
Initialize the metric evaluator
-
Run the evaluation loop
-
Save the evaluation results
Evaluation Output
The evaluation output depends on your metric class. Common output includes:
-
Console logs (e.g., mean scores, per-sample results)
-
Saved JSON file with results (e.g.,
output_results.json
)
Tips for Running AIGV Evaluations
-
Start small: use toy version data and small configs to verify your setup.
-
Check paths: ensure configuration file path and working directory path are correct.
-
Some metrics require downloading pretrained models manually. Make sure they are downloaded correctly and placed in the correct paths as specified in the configuration files.
What's Next?
Once your evaluation runs successfully, you can:
-
Try running different existing metrics (you can choose from here)
-
Benchmark on our AIGVE-Bench dataset (toy version is in here, the full version will publish soon!)
-
Create your new AIGVE metrics following Tutorial on customize evaluation metrics
Enjoy using AIGVE Toolkit for robust, modular, and reproducible AIGV evaluation!