ctf4science.eval_module.evaluate_custom#

ctf4science.eval_module.evaluate_custom(dataset_name: str, pair_id: int, truth: ndarray, prediction: ndarray, metrics: list[str] | None = None, flexible_k: bool = False) dict[str, float]#

Evaluate the prediction against a provided truth array using specified metrics.

Uses the given truth and prediction arrays and the dataset config to determine evaluation parameters and long-time evaluation type. Use evaluate to load ground-truth test data internally instead.

Parameters:
dataset_namestr

Name of the dataset (e.g. 'ODE_Lorenz', 'PDE_KS').

pair_idint

ID of the train-test pair (used to select config and metrics).

truthndarray

Ground truth data array.

predictionndarray

Predicted data array, same shape as truth.

metricslist of str, optional

Metrics to compute. If None, uses the pair’s default metrics from config.

flexible_kbool, optional

Whether to use a flexible k value. If True, the k value is min(timesteps, k) for the metric.. If False, the k value is the value from the dataset config. Used for when hyperparameter optimization results in less timesteps than the k value in the dataset config.

Returns:
dict

Mapping from metric name to computed score (float).

Raises:
ValueError

If pair_id is invalid, an unknown metric is requested, or the dataset long-time evaluation type is unknown.