from numerblox.download import NumeraiClassicDownloader
= "eval_test_1234321/"
directory = NumeraiClassicDownloader(directory_path=directory) downloader
No existing directory found at 'eval_test_1234321'. Creating directory...
This section provides evaluation schemes for both Numerai Classic and Signals. The Evaluator
takes a NumerFrame
as input and returns a Pandas DataFrame containing metrics for each given prediction column.
BaseEvaluator
implements all the evaluation logic that is common for Numerai Classic and Signals. This includes: - Mean, Standard Deviation and Sharpe for era returns. - Max drawdown - Annual Percentage Yield (APY) - Correlation with example predictions - Max feature exposure - Feature Neutral Mean (FNC), Standard deviation and Sharpe - Exposure Dissimilarity - Mean, Standard Deviation and Sharpe for TB200 (Buy top 200 stocks and sell bottom 200 stocks). - Mean, Standard Deviation and Sharpe for TB500 (Buy top 500 stocks and sell bottom 500 stocks).
BaseEvaluator (era_col:str='era', fast_mode=False)
Evaluation functionality that is relevant for both Numerai Classic and Numerai Signals.
:param era_col: Column name pointing to eras.
Most commonly βeraβ for Numerai Classic and βfriday_dateβ for Numerai Signals.
:param fast_mode: Will skip compute intensive metrics if set to True, namely max_exposure, feature neutral mean, TB200 and TB500.
Note that we calculate the sample standard deviation with ddof=0. It may differ slightly from the standard Pandas calculation, but is consistent with how NumPy computes standard deviation. More info: https://stackoverflow.com/questions/24984178/different-std-in-pandas-vs-numpy
NumeraiClassicEvaluator
extends the base evaluation scheme with metrics specific to Numerai Classic.
Additional metrics specific to Numerai are:
NumeraiClassicEvaluator (era_col:str='era', fast_mode=False)
Evaluator for all metrics that are relevant in Numerai Classic.
NumeraiSignalsEvaluator
extends the base evaluation scheme with metrics specific to Numerai Signals.
NumeraiSignalsEvaluator (era_col:str='friday_date', fast_mode=False)
Evaluator for all metrics that are relevant in Numerai Signals.
We will test NumeraiClassicEvaluator
on version 3 evaluation data with example predictions. The baseline reference (example_col
) will be random predictions.
from numerblox.download import NumeraiClassicDownloader
directory = "eval_test_1234321/"
downloader = NumeraiClassicDownloader(directory_path=directory)
No existing directory found at 'eval_test_1234321'. Creating directory...
downloader.download_single_dataset(filename="v4.1/validation.parquet",
dest_path=directory + "validation.parquet")
downloader.download_single_dataset(filename="v4.1/validation_example_preds.parquet",
dest_path=directory + "validation_example_preds.parquet")
π Downloading 'v4.1/validation.parquet' π
2023-04-24 22:13:48,364 INFO numerapi.utils: starting download
eval_test_1234321/validation.parquet: 1.56GB [01:00, 25.8MB/s]
π Downloading 'v4.1/validation_example_preds.parquet' π
2023-04-24 22:14:49,378 INFO numerapi.utils: starting download
eval_test_1234321/validation_example_preds.parquet: 58.4MB [00:02, 24.5MB/s]
np.random.seed(1234)
test_dataf = create_numerframe(directory + "validation.parquet",
columns=['era', 'data_type', 'feature_honoured_observational_balaamite',
'feature_polaroid_vadose_quinze', 'target', 'target_nomi_v4_20', 'target_nomi_v4_60', 'id'])
example_preds = pd.read_parquet(directory + "validation_example_preds.parquet")
test_dataf = test_dataf.merge(example_preds, on="id", how="left").reset_index()
test_dataf = test_dataf.sample(10_000, random_state=1234)
test_dataf.loc[:, "prediction_random"] = np.random.uniform(size=len(test_dataf))
test_dataf.head(2)
id | era | data_type | feature_honoured_observational_balaamite | feature_polaroid_vadose_quinze | target | target_nomi_v4_20 | target_nomi_v4_60 | prediction | prediction_random | |
---|---|---|---|---|---|---|---|---|---|---|
2117478 | ndd72a206d37937a | 0990 | validation | 1.0 | 1.0 | 0.25 | 0.25 | 0.5 | 0.914756 | 0.191519 |
1101464 | n0510bc8ab84bb50 | 0794 | validation | 0.0 | 0.0 | 0.75 | 0.75 | 0.5 | 0.357319 | 0.622109 |
The Evaluator
returns a Pandas DataFrame containing metrics for each prediction column defined. Note that any column can be used as example prediction. For practical use cases we recommend using proper example predictions (provided by Numerai) instead of random predictions.
fast_mode
skips max. feature exposure, feature neutral mean, FNCv3, Exposure Dissimilarity, TB200 and TB500 calculations, which can take a while to compute on full Numerai datasets.
evaluator = NumeraiClassicEvaluator(fast_mode=True)
val_stats_fast = evaluator.full_evaluation(
dataf=test_dataf,
target_col="target",
pred_cols=["prediction", "prediction_random"],
example_col="prediction_random",
)
val_stats_fast
WARNING: No suitable feature set defined for FNC. Skipping calculation of FNC.
target | mean | std | sharpe | max_drawdown | apy | corr_with_example_preds | legacy_mean | legacy_std | legacy_sharpe | |
---|---|---|---|---|---|---|---|---|---|---|
prediction | target | 0.027599 | 0.242488 | 0.113815 | -0.999107 | 37.731619 | -0.006675 | 0.030432 | 0.238109 | 0.127808 |
prediction_random | target | 0.006112 | 0.218777 | 0.027939 | -0.999991 | -33.733250 | 0.980596 | 0.004005 | 0.220446 | 0.018168 |
The full evaluation also computes the metrics from fast mode. Additionally, it computes max. feature exposure, feature neutral mean, FNCv3, Exposure Dissimilarity, TB200 and TB500 calculations. Note that this can take a long time when computing on the full dataset and using all features.
evaluator = NumeraiClassicEvaluator(fast_mode=False)
val_stats_full = evaluator.full_evaluation(
dataf=test_dataf,
target_col="target",
pred_cols=["prediction", "prediction_random"],
example_col="prediction_random",
)
val_stats_full
WARNING: No suitable feature set defined for FNC. Skipping calculation of FNC.
π€ Neutralized 'prediction' with proportion '1.0' π€
New neutralized column = 'prediction_neutralized_1.0'.
β Finished step FeatureNeutralizer. Output shape=(10000, 11). Time taken for step: 0:00:02.064070. β
π€ Neutralized 'prediction_random' with proportion '1.0' π€
New neutralized column = 'prediction_random_neutralized_1.0'.
β Finished step FeatureNeutralizer. Output shape=(10000, 12). Time taken for step: 0:00:02.025188. β
target | mean | std | sharpe | max_drawdown | apy | corr_with_example_preds | legacy_mean | legacy_std | legacy_sharpe | ... | feature_neutral_mean | feature_neutral_std | feature_neutral_sharpe | tb200_mean | tb200_std | tb200_sharpe | tb500_mean | tb500_std | tb500_sharpe | exposure_dissimilarity | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prediction | target | 0.027599 | 0.242488 | 0.113815 | -0.999107 | 37.731619 | -0.006675 | 0.030432 | 0.238109 | 0.127808 | ... | 0.027312 | 0.240111 | 0.113747 | 0.035923 | 0.239002 | 0.150306 | 0.035923 | 0.239002 | 0.150306 | 3.007952 |
prediction_random | target | 0.006112 | 0.218777 | 0.027939 | -0.999991 | -33.733250 | 0.980596 | 0.004005 | 0.220446 | 0.018168 | ... | 0.004291 | 0.218485 | 0.019639 | 0.003355 | 0.218311 | 0.015367 | 0.003355 | 0.218311 | 0.015367 | 0.000000 |
2 rows Γ 21 columns
The plot_correlations
method will use matplotlib to plot per era correlation scores over time. The plots default to a rolling window of 20 eras in order to best align with repuation scores as measured on the Numerai leaderboards.
evaluator.plot_correlations(
test_dataf.fillna(0.5), pred_cols=["prediction", "prediction_random"], roll_mean=20
)
β Deleting directory for 'NumeraiClassicDownloader' β Path: '/home/clepelaars/numerblox/nbs/eval_test_1234321'