ctf4science.eval_module.evaluate_custom#

ctf4science.eval_module.evaluate_custom(dataset_name: str, pair_id: int, truth: ndarray, prediction: ndarray, metrics: list[str] | None = None, flexible_k: bool = False) → dict[str, float]#

Evaluate the prediction against a provided truth array using specified metrics.

Uses the given truth and prediction arrays and the dataset config to determine evaluation parameters and long-time evaluation type. Use evaluate to load ground-truth test data internally instead.

Parameters:

dataset_namestr: Name of the dataset (e.g. 'ODE_Lorenz', 'PDE_KS').
pair_idint: ID of the train-test pair (used to select config and metrics).
truthndarray: Ground truth data array.
predictionndarray: Predicted data array, same shape as truth.
metricslist of str, optional: Metrics to compute. If None, uses the pair’s default metrics from config.
flexible_kbool, optional: Whether to use a flexible k value. If True, the k value is min(timesteps, k) for the metric.. If False, the k value is the value from the dataset config. Used for when hyperparameter optimization results in less timesteps than the k value in the dataset config.

Returns:

dict: Mapping from metric name to computed score (float).

Raises:

ValueError: If pair_id is invalid, an unknown metric is requested, or the dataset long-time evaluation type is unknown.