ctf4science.eval_module.evaluate#

ctf4science.eval_module.evaluate(dataset_name: str, pair_id: int, prediction: ndarray, metrics: list[str] | None = None) → dict[str, float]#

Evaluate the prediction using specified metrics; ground truth is loaded internally.

Loads test data for dataset_name and pair_id, then computes the requested metrics. Use evaluate_custom to supply your own truth array.

Parameters:

dataset_namestr: Name of the dataset (e.g. 'ODE_Lorenz', 'PDE_KS').
pair_idint: ID of the train-test pair to use.
predictionndarray: Predicted data array (same shape as the test data for this pair).
metricslist of str, optional: Metrics to compute (e.g. ['short_time', 'long_time', 'reconstruction']). If None, uses the pair’s default metrics from config.

Returns:

dict: Mapping from metric name to computed score (float).

Raises:

ValueError: If pair_id is invalid, an unknown metric is requested, or the dataset long-time evaluation type is unknown.