Evaluation

Compute evaluation metrics.

Overview

This section provides evaluation schemes for both Numerai Classic and Signals. The Evaluator takes a NumerFrame as input and returns a Pandas DataFrame containing metrics for each given prediction column.

0. Base

BaseEvaluator implements all the evaluation logic that is common for Numerai Classic and Signals. This includes: - Mean, Standard Deviation and Sharpe for era returns. - Max drawdown - Annual Percentage Yield (APY) - Correlation with example predictions - Max feature exposure - Feature Neutral Mean (FNC), Standard deviation and Sharpe - Exposure Dissimilarity - Mean, Standard Deviation and Sharpe for TB200 (Buy top 200 stocks and sell bottom 200 stocks). - Mean, Standard Deviation and Sharpe for TB500 (Buy top 500 stocks and sell bottom 500 stocks).


source

BaseEvaluator

 BaseEvaluator (era_col:str='era', fast_mode=False)

Evaluation functionality that is relevant for both Numerai Classic and Numerai Signals.

:param era_col: Column name pointing to eras.

Most commonly β€œera” for Numerai Classic and β€œfriday_date” for Numerai Signals.

:param fast_mode: Will skip compute intensive metrics if set to True, namely max_exposure, feature neutral mean, TB200 and TB500.

Note that we calculate the sample standard deviation with ddof=0. It may differ slightly from the standard Pandas calculation, but is consistent with how NumPy computes standard deviation. More info: https://stackoverflow.com/questions/24984178/different-std-in-pandas-vs-numpy

1. Numerai Classic

NumeraiClassicEvaluator extends the base evaluation scheme with metrics specific to Numerai Classic.

Additional metrics specific to Numerai are:


source

NumeraiClassicEvaluator

 NumeraiClassicEvaluator (era_col:str='era', fast_mode=False)

Evaluator for all metrics that are relevant in Numerai Classic.

2. Numerai Signals

NumeraiSignalsEvaluator extends the base evaluation scheme with metrics specific to Numerai Signals.


source

NumeraiSignalsEvaluator

 NumeraiSignalsEvaluator (era_col:str='friday_date', fast_mode=False)

Evaluator for all metrics that are relevant in Numerai Signals.

Example usage

NumeraiClassicEvaluator

We will test NumeraiClassicEvaluator on version 3 evaluation data with example predictions. The baseline reference (example_col) will be random predictions.

from numerblox.download import NumeraiClassicDownloader

directory = "eval_test_1234321/"
downloader = NumeraiClassicDownloader(directory_path=directory)
No existing directory found at 'eval_test_1234321'. Creating directory...
downloader.download_single_dataset(filename="v4.1/validation.parquet",
                                   dest_path=directory + "validation.parquet")
downloader.download_single_dataset(filename="v4.1/validation_example_preds.parquet",
                                   dest_path=directory + "validation_example_preds.parquet")
πŸ“ Downloading 'v4.1/validation.parquet' πŸ“
2023-04-24 22:13:48,364 INFO numerapi.utils: starting download
eval_test_1234321/validation.parquet: 1.56GB [01:00, 25.8MB/s]                            
πŸ“ Downloading 'v4.1/validation_example_preds.parquet' πŸ“
2023-04-24 22:14:49,378 INFO numerapi.utils: starting download
eval_test_1234321/validation_example_preds.parquet: 58.4MB [00:02, 24.5MB/s]                            
np.random.seed(1234)
test_dataf = create_numerframe(directory + "validation.parquet",
                               columns=['era', 'data_type', 'feature_honoured_observational_balaamite',
                                        'feature_polaroid_vadose_quinze', 'target', 'target_nomi_v4_20', 'target_nomi_v4_60', 'id'])
example_preds = pd.read_parquet(directory + "validation_example_preds.parquet")
test_dataf = test_dataf.merge(example_preds, on="id", how="left").reset_index()
test_dataf = test_dataf.sample(10_000, random_state=1234)

test_dataf.loc[:, "prediction_random"] = np.random.uniform(size=len(test_dataf))
test_dataf.head(2)
id era data_type feature_honoured_observational_balaamite feature_polaroid_vadose_quinze target target_nomi_v4_20 target_nomi_v4_60 prediction prediction_random
2117478 ndd72a206d37937a 0990 validation 1.0 1.0 0.25 0.25 0.5 0.914756 0.191519
1101464 n0510bc8ab84bb50 0794 validation 0.0 0.0 0.75 0.75 0.5 0.357319 0.622109

The Evaluator returns a Pandas DataFrame containing metrics for each prediction column defined. Note that any column can be used as example prediction. For practical use cases we recommend using proper example predictions (provided by Numerai) instead of random predictions.

Fast evaluation

fast_mode skips max. feature exposure, feature neutral mean, FNCv3, Exposure Dissimilarity, TB200 and TB500 calculations, which can take a while to compute on full Numerai datasets.

evaluator = NumeraiClassicEvaluator(fast_mode=True)
val_stats_fast = evaluator.full_evaluation(
    dataf=test_dataf,
    target_col="target",
    pred_cols=["prediction", "prediction_random"],
    example_col="prediction_random",
)
val_stats_fast
WARNING: No suitable feature set defined for FNC. Skipping calculation of FNC.
target mean std sharpe max_drawdown apy corr_with_example_preds legacy_mean legacy_std legacy_sharpe
prediction target 0.027599 0.242488 0.113815 -0.999107 37.731619 -0.006675 0.030432 0.238109 0.127808
prediction_random target 0.006112 0.218777 0.027939 -0.999991 -33.733250 0.980596 0.004005 0.220446 0.018168

Full evaluation

The full evaluation also computes the metrics from fast mode. Additionally, it computes max. feature exposure, feature neutral mean, FNCv3, Exposure Dissimilarity, TB200 and TB500 calculations. Note that this can take a long time when computing on the full dataset and using all features.

evaluator = NumeraiClassicEvaluator(fast_mode=False)
val_stats_full = evaluator.full_evaluation(
    dataf=test_dataf,
    target_col="target",
    pred_cols=["prediction", "prediction_random"],
    example_col="prediction_random",
)
val_stats_full
WARNING: No suitable feature set defined for FNC. Skipping calculation of FNC.
πŸ€– Neutralized 'prediction' with proportion '1.0' πŸ€–
New neutralized column = 'prediction_neutralized_1.0'.
βœ… Finished step FeatureNeutralizer. Output shape=(10000, 11). Time taken for step: 0:00:02.064070. βœ…
πŸ€– Neutralized 'prediction_random' with proportion '1.0' πŸ€–
New neutralized column = 'prediction_random_neutralized_1.0'.
βœ… Finished step FeatureNeutralizer. Output shape=(10000, 12). Time taken for step: 0:00:02.025188. βœ…
target mean std sharpe max_drawdown apy corr_with_example_preds legacy_mean legacy_std legacy_sharpe ... feature_neutral_mean feature_neutral_std feature_neutral_sharpe tb200_mean tb200_std tb200_sharpe tb500_mean tb500_std tb500_sharpe exposure_dissimilarity
prediction target 0.027599 0.242488 0.113815 -0.999107 37.731619 -0.006675 0.030432 0.238109 0.127808 ... 0.027312 0.240111 0.113747 0.035923 0.239002 0.150306 0.035923 0.239002 0.150306 3.007952
prediction_random target 0.006112 0.218777 0.027939 -0.999991 -33.733250 0.980596 0.004005 0.220446 0.018168 ... 0.004291 0.218485 0.019639 0.003355 0.218311 0.015367 0.003355 0.218311 0.015367 0.000000

2 rows Γ— 21 columns

Plot correlations

The plot_correlations method will use matplotlib to plot per era correlation scores over time. The plots default to a rolling window of 20 eras in order to best align with repuation scores as measured on the Numerai leaderboards.

evaluator.plot_correlations(
    test_dataf.fillna(0.5), pred_cols=["prediction", "prediction_random"], roll_mean=20
)

# Clean up environment
downloader.remove_base_directory()
⚠ Deleting directory for 'NumeraiClassicDownloader' ⚠
Path: '/home/clepelaars/numerblox/nbs/eval_test_1234321'