ctf4science.eval_module.evaluate_custom#
- ctf4science.eval_module.evaluate_custom(dataset_name: str, pair_id: int, truth: ndarray, prediction: ndarray, metrics: list[str] | None = None, flexible_k: bool = False) dict[str, float]#
Evaluate the prediction against a provided truth array using specified metrics.
Uses the given truth and prediction arrays and the dataset config to determine evaluation parameters and long-time evaluation type. Use evaluate to load ground-truth test data internally instead.
- Parameters:
- dataset_namestr
Name of the dataset (e.g.
'ODE_Lorenz','PDE_KS').- pair_idint
ID of the train-test pair (used to select config and metrics).
- truthndarray
Ground truth data array.
- predictionndarray
Predicted data array, same shape as truth.
- metricslist of str, optional
Metrics to compute. If None, uses the pair’s default metrics from config.
- flexible_kbool, optional
Whether to use a flexible k value. If True, the k value is min(timesteps, k) for the metric.. If False, the k value is the value from the dataset config. Used for when hyperparameter optimization results in less timesteps than the k value in the dataset config.
- Returns:
- dict
Mapping from metric name to computed score (float).
- Raises:
- ValueError
If pair_id is invalid, an unknown metric is requested, or the dataset long-time evaluation type is unknown.