ctf4science.eval_module.evaluate#

ctf4science.eval_module.evaluate(dataset_name: str, pair_id: int, prediction: ndarray, metrics: list[str] | None = None) dict[str, float]#

Evaluate the prediction using specified metrics; ground truth is loaded internally.

Loads test data for dataset_name and pair_id, then computes the requested metrics. Use evaluate_custom to supply your own truth array.

Parameters:
dataset_namestr

Name of the dataset (e.g. 'ODE_Lorenz', 'PDE_KS').

pair_idint

ID of the train-test pair to use.

predictionndarray

Predicted data array (same shape as the test data for this pair).

metricslist of str, optional

Metrics to compute (e.g. ['short_time', 'long_time', 'reconstruction']). If None, uses the pair’s default metrics from config.

Returns:
dict

Mapping from metric name to computed score (float).

Raises:
ValueError

If pair_id is invalid, an unknown metric is requested, or the dataset long-time evaluation type is unknown.