ctf4science.eval_module.evaluate#
- ctf4science.eval_module.evaluate(dataset_name: str, pair_id: int, prediction: ndarray, metrics: list[str] | None = None) dict[str, float]#
Evaluate the prediction using specified metrics; ground truth is loaded internally.
Loads test data for dataset_name and pair_id, then computes the requested metrics. Use evaluate_custom to supply your own truth array.
- Parameters:
- dataset_namestr
Name of the dataset (e.g.
'ODE_Lorenz','PDE_KS').- pair_idint
ID of the train-test pair to use.
- predictionndarray
Predicted data array (same shape as the test data for this pair).
- metricslist of str, optional
Metrics to compute (e.g.
['short_time', 'long_time', 'reconstruction']). If None, uses the pair’s default metrics from config.
- Returns:
- dict
Mapping from metric name to computed score (float).
- Raises:
- ValueError
If pair_id is invalid, an unknown metric is requested, or the dataset long-time evaluation type is unknown.