ctf4science.benchmark_module.ModelBenchmarker#
- class ctf4science.benchmark_module.ModelBenchmarker(config_path: str, num_runs: int = 5)#
Bases:
objectBenchmarks a model with optimal hyperparameters for a given dataset and pair_id.
Runs multiple independent training and evaluation runs with different random seeds. Designed to be run from within each model directory.
- Parameters:
- config_pathstr
Path to the configuration file (must exist).
- num_runsint, optional
Number of independent evaluation runs to perform, by default
5.
Methods
run_benchmark()Run multiple benchmarking evaluations and save results.
- Raises:
- FileNotFoundError
If config file does not exist.
- ValueError
If dataset pair_id is not a single integer or list of one integer.
Notes
Class Methods:
run_benchmark():
Run multiple benchmarking evaluations and save results. Runs all evaluations, computes statistics (mean/std) when 3+ runs succeed.
- Returns:
Dict[str, Any] benchmark_results (model_name, dataset_name, pair_id, planned_num_runs, successful_runs, run_results, statistics, performance_summary, timestamp, output_file).
_construct_output_dir():
Construct the output directory path for benchmark results.
- Returns:
Path
results/benchmark_results/{dataset_name}/{model_name}/pair_id_{pair_id}/{timestamp}/.
_create_run_config(self, run_idx, seed):
Create a configuration file for a specific run with a given seed.
- Parameters:
run_idx : int. Index of the run (0-based).
seed : int. Random seed for this run.
- Returns:
Path to the created config file for the run.
_run_single_evaluation(self, run_idx, seed):
Run a single evaluation of the model.
- Parameters:
run_idx : int. Index of the run.
seed : int. Random seed for this run.
- Returns:
Dict[str, Any] run results (run_idx, seed, duration, config_path, results, success) or error info.
_find_and_load_results_for_run(self, run_idx):
Find and load the results from the most recent run (for this pair_id).
- Parameters:
run_idx : int. Index of the run.
- Returns:
Dict[str, Any] evaluation results loaded from
evaluation_results.yaml.
_extract_run_results(self, all_runs):
Extract run results for each successful run, keyed by run identifier.
- Parameters:
all_runs : List[Dict]. List of all run results.
- Returns:
Dict[str, Any] run results keyed by
run_{n}_seed_{seed}.
_calculate_statistics(self, all_runs):
Calculate mean and standard deviation for all metrics; requires 3+ successful runs.
- Parameters:
all_runs : List[Dict]. List of all run results.
- Returns:
Dict[str, Any] metric means, stds, timing stats. Requires at least 3 successful runs.