Full API Reference
This section provides a detailed reference to all objects defined in NumerBlox.
Download
BaseDownloader
Bases: BaseIO
Abstract base class for downloaders.
:param directory_path: Base folder to download files to.
Source code in src/numerblox/download.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
__call__(*args, **kwargs)
The most common use case will be to get weekly inference data. So calling the class itself returns inference data.
Source code in src/numerblox/download.py
153 154 155 156 157 |
|
download_live_data(*args, **kwargs)
abstractmethod
Download minimal amount of files needed for weekly inference.
Source code in src/numerblox/download.py
135 136 137 138 |
|
download_training_data(*args, **kwargs)
abstractmethod
Download all necessary files needed for training.
Source code in src/numerblox/download.py
130 131 132 133 |
|
BaseIO
Bases: ABC
Basic functionality for IO (downloading and uploading).
:param directory_path: Base folder for IO. Will be created if it does not exist.
Source code in src/numerblox/download.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
get_all_files
property
Return all paths of contents in directory.
is_empty
property
Check if directory is empty.
download_directory_from_gcs(bucket_name, gcs_path)
Copy full directory from GCS bucket to local environment. :param gcs_path: Name of directory on GCS bucket.
Source code in src/numerblox/download.py
57 58 59 60 61 62 63 64 65 66 67 |
|
download_file_from_gcs(bucket_name, gcs_path)
Get file from GCS bucket and download to local directory. :param gcs_path: Path to file on GCS bucket.
Source code in src/numerblox/download.py
38 39 40 41 42 43 44 45 46 |
|
remove_base_directory()
Remove directory with all contents.
Source code in src/numerblox/download.py
32 33 34 35 36 |
|
upload_directory_to_gcs(bucket_name, gcs_path)
Upload full base directory to GCS bucket. :param gcs_path: Name of directory on GCS bucket.
Source code in src/numerblox/download.py
69 70 71 72 73 74 75 76 77 78 |
|
upload_file_to_gcs(bucket_name, gcs_path, local_path)
Upload file to some GCS bucket. :param gcs_path: Path to file on GCS bucket.
Source code in src/numerblox/download.py
48 49 50 51 52 53 54 55 |
|
EODDownloader
Bases: BaseDownloader
Download data from EOD historical data.
More info: https://eodhistoricaldata.com/
Make sure you have the underlying Python package installed.
pip install eod
.
:param directory_path: Base folder to download files to.
:param key: Valid EOD client key.
:param tickers: List of valid EOD tickers (Bloomberg ticker format).
:param frequency: Choose from [d, w, m].
Daily data by default.
Source code in src/numerblox/download.py
482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 |
|
download_live_data()
Download one year of data for defined tickers.
Source code in src/numerblox/download.py
514 515 516 517 518 |
|
download_training_data(start=None)
Download full date length available. start: Starting data in %Y-%m-%d format.
Source code in src/numerblox/download.py
520 521 522 523 524 525 526 527 |
|
generate_full_dataf(start)
Collect all price data for list of EOD ticker symbols (Bloomberg tickers). start: Starting data in %Y-%m-%d format.
Source code in src/numerblox/download.py
537 538 539 540 541 542 543 544 545 546 547 |
|
generate_stock_dataf(ticker, start)
Generate Price DataFrame for a single ticker. ticker: EOD ticker symbol (Bloomberg tickers). For example, Apple stock = AAPL.US. start: Starting data in %Y-%m-%d format.
Source code in src/numerblox/download.py
549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 |
|
get_numerframe_data(start)
Get NumerFrame data from some starting date. start: Starting data in %Y-%m-%d format.
Source code in src/numerblox/download.py
529 530 531 532 533 534 535 |
|
KaggleDownloader
Bases: BaseDownloader
Download financial data from Kaggle.
For authentication, make sure you have a directory called .kaggle in your home directory with therein a kaggle.json file. kaggle.json should have the following structure:
{"username": USERNAME, "key": KAGGLE_API_KEY}
More info on authentication: github.com/Kaggle/kaggle-api#api-credentials
More info on the Kaggle Python API: kaggle.com/donkeys/kaggle-python-api
:param directory_path: Base folder to download files to.
Source code in src/numerblox/download.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 |
|
download_live_data(kaggle_dataset_path)
Download arbitrary Kaggle dataset. :param kaggle_dataset_path: Path on Kaggle (URL slug on kaggle.com/)
Source code in src/numerblox/download.py
458 459 460 461 462 463 |
|
download_training_data(kaggle_dataset_path)
Download arbitrary Kaggle dataset. :param kaggle_dataset_path: Path on Kaggle (URL slug on kaggle.com/)
Source code in src/numerblox/download.py
465 466 467 468 469 470 471 472 |
|
NumeraiClassicDownloader
Bases: BaseDownloader
Download from NumerAPI for Numerai Classic data. More information: https://numer.ai/data
:param directory_path: Base folder to download files to. All kwargs will be passed to NumerAPI initialization.
Source code in src/numerblox/download.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
|
download_example_data(subfolder='', version='5.0', round_num=None)
Download all example prediction data in specified folder for given version.
:param subfolder: Specify folder to create folder within base directory root. Saves in base directory root by default. :param version: Numerai dataset version. 5.0 = Atlas (default) :param round_num: Numerai tournament round number. Downloads latest round by default.
Source code in src/numerblox/download.py
223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
|
download_live_data(subfolder='', version='5.0', round_num=None)
Download all live data in specified folder for given version (i.e. minimal data needed for inference).
:param subfolder: Specify folder to create folder within directory root. Saves in directory root by default. :param version: Numerai dataset version. 5.0 = Atlas (default) :param round_num: Numerai tournament round number. Downloads latest round by default.
Source code in src/numerblox/download.py
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 |
|
download_meta_model_preds(subfolder='', filename='v4.3/meta_model.parquet')
Download Meta model predictions through NumerAPI. :param subfolder: Specify folder to create folder within base directory root. Saves in base directory root by default. :param filename: name for meta model predictions file. :return: Meta model predictions as DataFrame.
Source code in src/numerblox/download.py
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
|
download_single_dataset(filename, dest_path, round_num=None)
Download one of the available datasets through NumerAPI.
:param filename: Name as listed in NumerAPI (Check NumerAPI().list_datasets() for full overview) :param dest_path: Full path where file will be saved. :param round_num: Numerai tournament round number. Downloads latest round by default.
Source code in src/numerblox/download.py
196 197 198 199 200 201 202 203 204 205 |
|
download_training_data(subfolder='', version='5.0')
Get Numerai classic training and validation data. :param subfolder: Specify folder to create folder within base directory root. Saves in base directory root by default. :param version: Numerai dataset version. 5.0 = Atlas (default)
Source code in src/numerblox/download.py
182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
get_classic_features(subfolder='', filename='v5.0/features.json', *args, **kwargs)
Download feature overview (stats and feature sets) through NumerAPI and load as dict. :param subfolder: Specify folder to create folder within base directory root. Saves in base directory root by default. :param filename: name for feature overview. args, *kwargs will be passed to the JSON loader. :return: Feature overview dict
Source code in src/numerblox/download.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 |
|
NumeraiCryptoDownloader
Bases: BaseDownloader
Download Numerai Crypto data. More information: https://crypto.numer.ai/data
:param directory_path: Base folder to download files to.
Source code in src/numerblox/download.py
365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
|
download_live_data(subfolder='', version='1.0')
Download all live data in specified folder (i.e. minimal data needed for inference).
:param subfolder: Specify folder to create folder within directory root. Saves in directory root by default. :param version: Numerai dataset version.
Source code in src/numerblox/download.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 |
|
download_single_dataset(filename, dest_path)
Download one of the available datasets through CryptoAPI.
:param filename: Name as listed in CryptoAPI (Check CryptoAPI().list_datasets() for full overview) :param dest_path: Full path where file will be saved.
Source code in src/numerblox/download.py
423 424 425 426 427 428 429 430 431 432 433 434 |
|
download_training_data(subfolder='', version='1.0')
Download all training data in specified folder for given version.
:param subfolder: Specify folder to create folder within directory root. Saves in directory root by default. :param version: Numerai dataset version.
Source code in src/numerblox/download.py
381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 |
|
NumeraiSignalsDownloader
Bases: BaseDownloader
Support for Numerai Signals data. More information: https://signals.numer.ai/data Downloading from SignalsAPI for Numerai Signals data.
:param directory_path: Base folder to download files to.
All kwargs will be passed to SignalsAPI initialization.
Source code in src/numerblox/download.py
276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 |
|
download_example_data(subfolder='', version='2.0')
Download all example prediction data in specified folder for given version.
:param subfolder: Specify folder to create folder within base directory root. Saves in base directory root by default. :param version: Numerai dataset version.
Source code in src/numerblox/download.py
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 |
|
download_live_data(subfolder='', version='2.0')
Download all live data in specified folder (i.e. minimal data needed for inference).
:param subfolder: Specify folder to create folder within directory root. Saves in directory root by default. :param version: Numerai dataset version.
Source code in src/numerblox/download.py
323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 |
|
download_single_dataset(filename, dest_path)
Download one of the available datasets through SignalsAPI.
:param filename: Name as listed in SignalsAPI (Check SignalsAPI().list_datasets() for full overview) :param dest_path: Full path where file will be saved.
Source code in src/numerblox/download.py
310 311 312 313 314 315 316 317 318 319 320 321 |
|
download_training_data(subfolder='', version='2.0')
Get Numerai Signals training and validation data. :param subfolder: Specify folder to create folder within base directory root. Saves in base directory root by default. :param version: Numerai Signals dataset version.
Source code in src/numerblox/download.py
297 298 299 300 301 302 303 304 305 306 307 308 |
|
NumerFrame
NumerFrame
Bases: DataFrame
Data structure which extends Pandas DataFrames and allows for additional Numerai specific functionality.
Source code in src/numerblox/numerframe.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
|
get_aux_data
property
All columns that are not features, targets or predictions.
get_dates_from_era_col
property
Column of all dates from era column.
get_era_data
property
Column of all eras.
get_eras_from_date_col
property
Column of all eras from date column.
get_feature_data
property
All columns for which name starts with 'target'.
get_fncv3_feature_data
property
Get FNCv3 features.
get_medium_feature_data
property
Medium subset of the Numerai dataset for v5 data.
get_prediction_aux_data
property
All predictions columns and aux columns (for ensembling, etc.).
get_prediction_data
property
All columns for which name starts with 'prediction'.
get_single_target_data
property
Column with name 'target' (Main Numerai target column).
get_small_feature_data
property
Small subset of the Numerai dataset for v5 data.
get_target_data
property
All columns for which name starts with 'target'.
get_unique_eras
property
Get all unique eras in the data.
__init_meta_attrs()
Dynamically track column groups.
Source code in src/numerblox/numerframe.py
37 38 39 40 41 42 43 |
|
__set_era_col()
Each NumerFrame should have an era column to benefit from all functionality.
Source code in src/numerblox/numerframe.py
45 46 47 48 49 50 51 52 |
|
get_column_selection(cols)
Return NumerFrame from selection of columns.
Source code in src/numerblox/numerframe.py
54 55 56 |
|
get_date_from_era(era)
staticmethod
Get the date from a specific era. :param era: Era number for which to get date. Should be an integer which is at least 1. :return: Datetime object representing the date of the given era.
Source code in src/numerblox/numerframe.py
227 228 229 230 231 232 233 234 235 236 237 |
|
get_date_range(start_date, end_date)
Get all eras between two dates. :param start_date: Starting date (inclusive). :param end_date: Ending date (inclusive). :return: NumerFrame with all eras between start_date and end_date.
Source code in src/numerblox/numerframe.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
|
get_era_batch(eras, aemlp_batch=False, features=None, targets=None)
Get feature target pair batch of 1 or multiple eras.
:param eras: Selection of era names that should be present in era_col.
:param aemlp_batch: Specific target batch for autoencoder training.
y
output will contain three components: features, targets and targets.
:param features: List of features to select. All by default
:param targets: List of targets to select. All by default.
Source code in src/numerblox/numerframe.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
get_era_from_date(date_object)
staticmethod
Get the era number from a specific date. :param date_object: Pandas Timestamp object for which to get era. :return: Era number.
Source code in src/numerblox/numerframe.py
215 216 217 218 219 220 221 222 223 224 225 |
|
get_era_range(start_era, end_era)
Get all eras between two era numbers. :param start_era: Era number to start from (inclusive). :param end_era: Era number to end with (inclusive). :return: NumerFrame with all eras between start_era and end_era.
Source code in src/numerblox/numerframe.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
|
get_feature_group(group)
Get feature group based on name or list of names.
Source code in src/numerblox/numerframe.py
123 124 125 126 |
|
get_feature_target_pair(multi_target=False)
Get split of feature and target columns. :param multi_target: Returns only 'target' column by default. Returns all target columns when set to True.
Source code in src/numerblox/numerframe.py
135 136 137 138 139 140 141 142 143 |
|
get_last_n_eras(n)
Get data for the last n eras. Make sure eras are sorted in the way you prefer. :param n: Number of eras to select. :return: NumerFrame with last n eras.
Source code in src/numerblox/numerframe.py
113 114 115 116 117 118 119 120 121 |
|
get_pattern_data(pattern)
Get columns based on pattern (for example '_20' to get all 20-day Numerai targets). :param pattern: A 'like' pattern (pattern in column_name == True)
Source code in src/numerblox/numerframe.py
128 129 130 131 132 133 |
|
create_numerframe(file_path, columns=None, *args, **kwargs)
Convenient function to initialize NumerFrame. Support most used file formats for Pandas DataFrames
(.csv, .parquet, .xls, .pkl, etc.). For more details check https://pandas.pydata.org/docs/reference/io.html
:param file_path: Relative or absolute path to data file.
:param columns: Which columns to read (All by default).
args, *kwargs will be passed to Pandas loading function.
Source code in src/numerblox/numerframe.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
|
Preprocessing
Base Preprocessing
BasePreProcessor
Bases: TransformerMixin
, BaseEstimator
Common functionality for preprocessors and postprocessors.
Source code in src/numerblox/preprocessing/base.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Classic Preprocessing
GroupStatsPreProcessor
Bases: BasePreProcessor
Note that this class only works with pd.DataFrame
input.
When using in a Pipeline, make sure that the Pandas output API is set (.set_output(transform="pandas")
.
Calculates group statistics for all data groups.
:param groups: Groups to create features for. All groups by default.
Source code in src/numerblox/preprocessing/classic.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
get_feature_names_out(input_features=None)
Return feature names.
Source code in src/numerblox/preprocessing/classic.py
47 48 49 50 51 52 53 54 55 56 57 |
|
transform(X)
Check validity and add group features.
Source code in src/numerblox/preprocessing/classic.py
27 28 29 30 |
|
Signals Preprocessing
DifferencePreProcessor
Bases: BasePreProcessor
Add difference features based on given windows. Run LagPreProcessor first.
Usage in Pipeline works only with Pandas API.
Run .set_output("pandas")
on your pipeline first.
:param windows: All lag windows to process for all features.
:param feature_names: All features for which you want to create differences. All features that also have lags by default.
:param pct_change: Method to calculate differences. If True, will calculate differences with a percentage change. Otherwise calculates a simple difference. Defaults to False
:param abs_diff: Whether to also calculate the absolute value of all differences. Defaults to True
Source code in src/numerblox/preprocessing/signals.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
|
transform(X)
Create difference feature from lag features. :param X: DataFrame with lag features. NOTE: Make sure only lag features are present in the DataFrame.
Source code in src/numerblox/preprocessing/signals.py
358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 |
|
EraQuantileProcessor
Bases: BasePreProcessor
Transform features into quantiles by era. :param num_quantiles: Number of quantiles to use for quantile transformation. :param random_state: Random state for QuantileTransformer. :param cpu_cores: Number of CPU cores to use for parallel processing.
Source code in src/numerblox/preprocessing/signals.py
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
|
get_feature_names_out(input_features=None)
Return feature names.
Source code in src/numerblox/preprocessing/signals.py
276 277 278 279 280 281 282 283 284 |
|
transform(X, era_series=None)
Quantile all features by era. :param X: Array or DataFrame containing features to be quantiled. :param era_series: Series containing era information. :return: Quantiled features.
Source code in src/numerblox/preprocessing/signals.py
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
|
HLOCVAdjuster
Bases: BasePreProcessor
Adjust HLOCV data for splits and dividends based on ratio of unadjusted and adjusted close prices. NOTE: This step only works with DataFrame input. Usage in intermediate steps of a scikit-learn Pipeline works with the Pandas set_output API. i.e. pipeline.set_output(transform="pandas").
Source code in src/numerblox/preprocessing/signals.py
456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 |
|
transform(X)
Adjust open, high, low, close and volume for splits and dividends. :param X: DataFrame with columns: [high, low, open, close, volume] (HLOCV) :return: Array with adjusted HLOCV columns
Source code in src/numerblox/preprocessing/signals.py
479 480 481 482 483 484 485 486 487 488 489 490 |
|
KatsuFeatureGenerator
Bases: BasePreProcessor
Effective feature engineering setup based on Katsu's starter notebook. Based on source by Katsu1110: https://www.kaggle.com/code1110/numeraisignals-starter-for-beginners
:param windows: Time interval to apply for window features:
-
Percentage Rate of change
-
Volatility
-
Moving Average gap
:param ticker_col: Columns with tickers to iterate over.
:param close_col: Column name where you have closing price stored. :param num_cores: Number of cores to use for multiprocessing.
:param verbose: Print additional information.
Source code in src/numerblox/preprocessing/signals.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
|
__ema1(series, span)
staticmethod
Exponential moving average
Source code in src/numerblox/preprocessing/signals.py
182 183 184 185 186 |
|
feature_engineering(dataf)
Feature engineering for single ticker.
Source code in src/numerblox/preprocessing/signals.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
get_feature_names_out(input_features=None)
Return feature names.
Source code in src/numerblox/preprocessing/signals.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
|
transform(dataf)
Multiprocessing feature engineering.
:param dataf: DataFrame with columns: [ticker, date, open, high, low, close, volume]
Source code in src/numerblox/preprocessing/signals.py
115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
LagPreProcessor
Bases: BasePreProcessor
Add lag features based on given windows.
:param windows: All lag windows to process for all features.
[5, 10, 15, 20] by default (4 weeks lookback)
Source code in src/numerblox/preprocessing/signals.py
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
|
get_feature_names_out(input_features=None)
Return feature names.
Source code in src/numerblox/preprocessing/signals.py
330 331 332 |
|
MinimumDataFilter
Bases: BasePreProcessor
Filter dates and tickers based on minimum data requirements. NOTE: This step only works with DataFrame input.
:param min_samples_date: Minimum number of samples per date. Defaults to 200. :param min_samples_ticker: Minimum number of samples per ticker. Defaults to 1200. :param blacklist_tickers: List of tickers to exclude from the dataset. Defaults to None. :param date_col: Column name for date. Defaults to "date". :param ticker_col: Column name for ticker. Defaults to "bloomberg_ticker".
Source code in src/numerblox/preprocessing/signals.py
496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 |
|
transform(X)
Filter dates and tickers based on minimum data requirements. :param X: DataFrame with columns: [ticker_col, date_col, open, high, low, close, volume] (HLOCV) :return: Array with filtered DataFrame
Source code in src/numerblox/preprocessing/signals.py
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 |
|
PandasTaFeatureGenerator
Bases: BasePreProcessor
Generate features with pandas-ta.
https://github.com/twopirllc/pandas-ta
Usage in Pipeline works only with Pandas API.
Run .set_output("pandas")
on your pipeline first.
:param strategy: Valid Pandas Ta strategy.
For more information on creating a strategy, see:
https://github.com/twopirllc/pandas-ta#pandas-ta-strategy
By default, a strategy with RSI(14) and RSI(60) is used.
:param ticker_col: Column name for grouping by tickers.
:param num_cores: Number of cores to use for multiprocessing.
By default, all available cores are used.
Source code in src/numerblox/preprocessing/signals.py
383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 |
|
add_features(ticker_df)
The TA strategy is applied to the DataFrame here. :param ticker_df: DataFrame for a single ticker. :return: DataFrame with features added.
Source code in src/numerblox/preprocessing/signals.py
440 441 442 443 444 445 446 447 448 449 450 |
|
transform(X)
Main feature generation method.
:param X: DataFrame with columns: [ticker, date, open, high, low, close, volume]
:return: PandasTA features
Source code in src/numerblox/preprocessing/signals.py
411 412 413 414 415 416 417 418 419 420 421 422 |
|
ReduceMemoryProcessor
Bases: BasePreProcessor
Reduce memory usage as much as possible.
Credits to kainsama and others for writing about memory usage reduction for Numerai data: https://forum.numer.ai/t/reducing-memory/313
:param deep_mem_inspect: Introspect the data deeply by interrogating object dtypes. Yields a more accurate representation of memory usage if you have complex object columns. :param verbose: Print memory usage before and after optimization.
Source code in src/numerblox/preprocessing/signals.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
get_feature_names_out(input_features=None)
Return feature names.
Source code in src/numerblox/preprocessing/signals.py
85 86 87 |
|
Meta
CrossValEstimator
Bases: TransformerMixin
, BaseEstimator
Split your data into multiple folds and fit an estimator on each fold.
For transforms predictions are concatenated into a 2D array.
:param cv: Cross validation object that follows scikit-learn conventions.
:param estimator: Estimator to fit on each fold.
:param evaluation_func: Custom evaluation logic that is executed on validation data for each fold. Must accepts as input y_true and y_pred.
For example, evaluation_func can handle logging metrics for each fold.
Anything that evaluation_func returns is stored in self.eval_results_
.
:param predict_func: Name of the function that will be used for prediction.
Must be one of 'predict', 'predict_proba', 'predict_log_proba'.
For example, XGBRegressor has 'predict' and 'predict_proba' functions.
:param verbose: Whether to print progress.
Source code in src/numerblox/meta.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
|
__sklearn_is_fitted__()
Check fitted status.
Source code in src/numerblox/meta.py
195 196 197 198 |
|
fit(X, y, **kwargs)
Use cross validation object to fit estimators.
Source code in src/numerblox/meta.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
transform(X, model_idxs=None, **kwargs)
Use cross validation object to transform estimators. :param X: Input data for inference. :param y: Target data for inference. :param model_idxs: List of indices of models to use for inference. By default, all fitted models are used. :param kwargs: Additional arguments to pass to the estimator's predict function.
Source code in src/numerblox/meta.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
|
MetaEstimator
Bases: MetaEstimatorMixin
, TransformerMixin
, BaseEstimator
Helper for NumeraiPipeline and NumeraiFeatureUnion to use a model as a transformer.
:param estimator: Underlying estimator like XGBoost, Catboost, scikit-learn, etc. :param predict_func: Name of the function that will be used for prediction. Must be one of 'predict', 'predict_proba', 'predict_log_proba'. For example, XGBRegressor has 'predict' and 'predict_proba' functions. :param model_type: "regressor" or "classifier". Used to determine if the estimator is multi output.
Source code in src/numerblox/meta.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
fit(X, y, **kwargs)
Fit underlying estimator and set attributes.
Source code in src/numerblox/meta.py
42 43 44 45 46 47 48 49 50 51 |
|
predict(X, **kwargs)
For if a MetaEstimator happens to be the last step in the pipeline. Has same behavior as transform.
Source code in src/numerblox/meta.py
64 65 66 67 68 |
|
transform(X, **kwargs)
Apply the predict_func
on the fitted estimator.
Shape (X.shape[0], )
if estimator is not multi output and else (X.shape[0], y.shape[1])
.
All additional kwargs are passed to the underlying estimator's predict function.
Source code in src/numerblox/meta.py
53 54 55 56 57 58 59 60 61 62 |
|
MetaPipeline
Bases: Pipeline
Pipeline which turns all estimators into transformers by wrapping them in MetaEstimator. This allows to have pipeline steps after models. For example, a FeatureNeutralizer after an XGBRegressor.
:param steps: List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an instance of BaseNeutralizer. :param memory: Used to cache the fitted transformers of the pipeline. :param verbose: If True, the time elapsed while fitting each step will be printed as it is completed. :param predict_func: Name of the function that will be used for prediction.
Source code in src/numerblox/meta.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
|
wrap_estimators_as_transformers(steps)
Converts all estimator steps (except the last step) into transformers by wrapping them in MetaEstimator. :param steps: List of (name, transform) tuples specifying the pipeline steps. :return: Modified steps with all estimators wrapped as transformers.
Source code in src/numerblox/meta.py
222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
|
make_meta_pipeline(*steps, memory=None, verbose=False)
Convenience function for creating a MetaPipeline. :param steps: List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an instance of BaseNeutralizer. :param memory: Used to cache the fitted transformers of the pipeline. :param verbose: If True, the time elapsed while fitting each step will be printed as it is completed.
Source code in src/numerblox/meta.py
264 265 266 267 268 269 270 271 |
|
Ensemble
NumeraiEnsemble
Bases: TransformerMixin
, BaseEstimator
Ensembler that standardizes predictions by era and averages them. :param weights: Sequence of weights (float or int), optional, default: None. If None, then uniform weights are used. :param n_jobs: The number of jobs to run in parallel for fit. Will revert to 1 CPU core if not defined. -1 means using all processors. :param donate_weighted: Whether to use Donate et al.'s weighted average formula. Often used when ensembling predictions from multiple folds over time. Paper Link: https://doi.org/10.1016/j.neucom.2012.02.053 Example donate weighting for 5 folds: [0.0625, 0.0625, 0.125, 0.25, 0.5]
Source code in src/numerblox/ensemble.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
predict(X, era_series)
For if a NumeraiEnsemble happens to be the last step in the pipeline. Has same behavior as transform.
Source code in src/numerblox/ensemble.py
89 90 91 92 93 |
|
transform(X, era_series)
Standardize by era and ensemble. :param X: Input data where each column contains predictions from an estimator. :param era_series: Era labels (strings) for each row in X. :return: Ensembled predictions.
Source code in src/numerblox/ensemble.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
PredictionReducer
Bases: TransformerMixin
, BaseEstimator
Reduce multiclassification and proba preds to 1 column per model. If predictions were generated with a regressor or regular predict you don't need this step. :param n_models: Number of resulting columns. This indicates how many models were trained to generate the prediction array. :param n_classes: Number of classes for each prediction. If predictions were generated with predict_proba and binary classification -> n_classes = 2.
Source code in src/numerblox/ensemble.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
|
predict(X)
For if PredictionReducer happens to be the last step in the pipeline. Has same behavior as transform. :param X: Input predictions. :return: Reduced predictions of shape (X.shape[0], self.n_models).
Source code in src/numerblox/ensemble.py
178 179 180 181 182 183 184 |
|
transform(X)
:param X: Input predictions. :return: Reduced predictions of shape (X.shape[0], self.n_models).
Source code in src/numerblox/ensemble.py
159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
Neutralizers
BaseNeutralizer
Bases: TransformerMixin
, BaseEstimator
Base class for neutralization so it is compatible with scikit-learn. :param new_col_name: Name of new neutralized column.
Source code in src/numerblox/neutralizers.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
fit_transform(X, features, era_series=None)
Convenience function for scikit-learn compatibility. Needed because fit and transform except different arguments here.
Source code in src/numerblox/neutralizers.py
38 39 40 41 42 43 |
|
get_feature_names_out(input_features=None)
Get feature names for neutralized output.
:param input_features: Optional list of input feature names. :return: List of feature names for neutralized output.
Source code in src/numerblox/neutralizers.py
45 46 47 48 49 50 51 52 |
|
predict(X, features, era_series=None)
Convenience function for scikit-learn compatibility.
Source code in src/numerblox/neutralizers.py
34 35 36 |
|
FeatureNeutralizer
Bases: BaseNeutralizer
Classic feature neutralization by subtracting a linear model.
:param pred_name: Name of prediction column. For creating the new column name. :param proportion: Number in range [0...1] indicating how much to neutralize. :param suffix: Optional suffix that is added to new column name. :param num_cores: Number of cores to use for parallel processing. By default, all CPU cores are used.
Source code in src/numerblox/neutralizers.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
neutralize(dataf, columns, by, proportion)
Neutralize on CPU. :param dataf: DataFrame with features and predictions. :param columns: List of prediction column names. :param by: List of feature column names. :param proportion: Proportion to neutralize. :return: Neutralized predictions.
Source code in src/numerblox/neutralizers.py
131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
normalize(dataf)
staticmethod
Normalize predictions. 1. Rank predictions. 2. Normalize ranks. 3. Gaussianize ranks. :param dataf: DataFrame with predictions. :return: Gaussianized rank predictions.
Source code in src/numerblox/neutralizers.py
145 146 147 148 149 150 151 152 153 154 155 156 |
|
normalize_and_neutralize(dataf, columns, by, proportion)
Gaussianize predictions and neutralize with one combination of prediction and proportion. :param dataf: DataFrame with features and predictions. :param columns: List of prediction column names. :param by: List of feature column names. :param proportion: Proportion to neutralize. :return: Neutralized predictions DataFrame.
Source code in src/numerblox/neutralizers.py
158 159 160 161 162 163 164 165 166 167 168 169 |
|
transform(X, features, era_series=None)
Main transform function. :param X: Input predictions to neutralize.
:param features: DataFrame with features for neutralization.
:param era_series: Series with era labels for each row in features.
Features, era_series and the prediction column must all have the same length. :return: Neutralized predictions NumPy array.
Source code in src/numerblox/neutralizers.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
Penalizers
BasePenalizer
Bases: TransformerMixin
, BaseEstimator
Base class for penalization so it is compatible with scikit-learn. :param new_col_name: Name of new neutralized column.
Source code in src/numerblox/penalizers.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
fit_transform(X, features, era_series)
Convenience function for scikit-learn compatibility. Needed because fit and transform except different arguments here.
Source code in src/numerblox/penalizers.py
41 42 43 44 45 46 |
|
get_feature_names_out(input_features=None)
Get feature names for neutralized output.
:param input_features: Optional list of input feature names. :return: List of feature names for neutralized output.
Source code in src/numerblox/penalizers.py
48 49 50 51 52 53 54 55 |
|
predict(X, features, era_series)
Convenience function for scikit-learn compatibility.
Source code in src/numerblox/penalizers.py
37 38 39 |
|
FeaturePenalizer
Bases: BasePenalizer
Feature penalization with TensorFlow.
Source (by jrb): https://github.com/jonrtaylor/twitch/blob/master/FE_Clipping_Script.ipynb
Source of first PyTorch implementation (by Michael Oliver / mdo): https://forum.numer.ai/t/model-diagnostics-feature-exposure/899/12
:param max_exposure: Number in range [0...1] indicating how much to reduce max feature exposure to. :param pred_name: Prediction column name. Used for new column name.
:param suffix: Optional suffix that is added to new column name.
Source code in src/numerblox/penalizers.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
transform(X, features, era_series)
Main transform method. :param X: Input predictions to neutralize. :param features: DataFrame with features for neutralization. :param era_series: Series with era labels for each row in features. Features, eras and the prediction column must all have the same length. :return: Penalized predictions.
Source code in src/numerblox/penalizers.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
Prediction Loaders
BasePredictionLoader
Bases: TransformerMixin
, BaseEstimator
Shared functionality for all Prediction Loaders.
Source code in src/numerblox/prediction_loaders.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
get_feature_names_out(input_features=None)
abstractmethod
Return feature names.
Source code in src/numerblox/prediction_loaders.py
24 25 26 27 |
|
transform(X=None, y=None)
abstractmethod
Return Predictions generated by model.
Source code in src/numerblox/prediction_loaders.py
19 20 21 22 |
|
ExamplePredictions
Bases: BasePredictionLoader
Load example predictions. :param file_name: File to download from NumerAPI. By default this is example predictions for v5.0 data. 'v5.0/live_example_preds.parquet' by default. Example predictions in previous versions: - v5.0. validation examples -> "v5.0/validation_example_preds.parquet" - v5.0. live benchmark models -> "v5.0/live_benchmark_models.parquet" - v5.0. validation benchmark models -> "v5.0/validation_benchmark_models.parquet" :param round_num: Optional round number. Downloads most recent round by default. :param keep_files: Whether to keep downloaded files. By default, files are deleted after the predictions are loaded.
Source code in src/numerblox/prediction_loaders.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
transform(X=None, y=None)
Return example predictions.
Source code in src/numerblox/prediction_loaders.py
51 52 53 54 55 56 57 |
|
Targets
BaseTargetProcessor
Bases: TransformerMixin
, BaseEstimator
Common functionality for preprocessors and postprocessors.
Source code in src/numerblox/targets.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
BayesianGMMTargetProcessor
Bases: BaseTargetProcessor
Generate synthetic (fake) target using a Bayesian Gaussian Mixture model.
Based on Michael Oliver's GitHub Gist implementation:
https://gist.github.com/the-moliver/dcdd2862dc2c78dda600f1b449071c93
:param n_components: Number of components for fitting Bayesian Gaussian Mixture Model.
Source code in src/numerblox/targets.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
|
fit(X, y, era_series)
Fit Bayesian Gaussian Mixture model on coefficients and normalize. :param X: DataFrame containing features. :param y: Series containing real target. :param era_series: Series containing era information.
Source code in src/numerblox/targets.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
get_feature_names_out(input_features=None)
Return feature names.
Source code in src/numerblox/targets.py
124 125 126 |
|
transform(X, era_series)
Main method for generating fake target. :param X: DataFrame containing features. :param era_series: Series containing era information.
Source code in src/numerblox/targets.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
|
SignalsTargetProcessor
Bases: BaseTargetProcessor
Engineer targets for Numerai Signals.
More information on implements Numerai Signals targets:
https://forum.numer.ai/t/decoding-the-signals-target/2501
:param price_col: Column from which target will be derived.
:param windows: Timeframes to use for engineering targets. 10 and 20-day by default.
:param bins: Binning used to create group targets. Nomi binning by default.
:param labels: Scaling for binned target. Must be same length as resulting bins (bins-1). Numerai labels by default.
Source code in src/numerblox/targets.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
get_feature_names_out(input_features=None)
Return feature names of Signals targets.
Source code in src/numerblox/targets.py
164 165 166 167 168 169 170 171 172 173 174 |
|
Evaluation
BaseEvaluator
Evaluation functionality that is relevant for both Numerai Classic and Numerai Signals.
Metrics include: - Mean, Standard Deviation and Sharpe (Corrv2) for era returns. - Max drawdown. - Annual Percentage Yield (APY). - Correlation with benchmark predictions. - Max feature exposure: https://forum.numer.ai/t/model-diagnostics-feature-exposure/899. - Feature Neutral Mean, Standard deviation and Sharpe: https://docs.numer.ai/tournament/feature-neutral-correlation. - Smart Sharpe - Exposure Dissimilarity: https://forum.numer.ai/t/true-contribution-details/5128/4. - Autocorrelation (1st order). - Calmar Ratio. - Churn: https://forum.numer.ai/t/better-lgbm-params-signals-v2-data-and-reducing-signals-churn/7638#p-17958-reducing-signals-churn-3. - Performance vs. Benchmark predictions. - Mean, Standard Deviation, Sharpe and Churn for TB200 (Buy top 200 stocks and sell bottom 200 stocks). - Mean, Standard Deviation, Sharpe and Churn for TB500 (Buy top 500 stocks and sell bottom 500 stocks).
:param metrics_list: List of metrics to calculate. Default: FAST_METRICS. :param era_col: Column name pointing to eras. Most commonly "era" for Numerai Classic and "date" for Numerai Signals. :param custom_functions: Additional functions called in evaluation. Check out the NumerBlox docs on evaluation for more info on using custom functions. :param show_detailed_progress_bar: Show detailed progress bar for evaluation of each prediction column.
Note that we calculate the sample standard deviation with ddof=0. It may differ slightly from the standard Pandas calculation, but is consistent with how NumPy computes standard deviation. More info: https://stackoverflow.com/questions/24984178/different-std-in-pandas-vs-numpy
Source code in src/numerblox/evaluation.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 |
|
apy(era_corrs, stake_compounding_lag=4)
staticmethod
Annual percentage yield. :param era_corrs: Correlation scores by era :param stake_compounding_lag: Compounding lag for Numerai rounds (4 for Numerai Classic)
Source code in src/numerblox/evaluation.py
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
|
autocorr1(era_corrs)
1st order autocorrelation. :param era_corrs: Correlation scores by era.
Source code in src/numerblox/evaluation.py
621 622 623 624 625 626 |
|
autocorr_penalty(era_corrs)
Adjusting factor for autocorrelation. Used in Smart Sharpe. :param era_corrs: Correlation scores by era.
Source code in src/numerblox/evaluation.py
611 612 613 614 615 616 617 618 619 |
|
churn(dataf, pred_col, target_col, tb=None)
Describes how the alpha scores of a signal changes over time. More information: https://forum.numer.ai/t/better-lgbm-params-signals-v2-data-and-reducing-signals-churn/7638#p-17958-reducing-signals-churn-3
Uses Numerai's official scoring function for churn under the hood. More information: https://github.com/numerai/numerai-tools/blob/575ae46c97e66bb6d7258803a2f4196b93cb99e8/numerai_tools/signals.py#L12 :param dataf: DataFrame containing era_col, pred_col and target_col. :param pred_col: Prediction column to calculate churn for. :param target_col: Target column to calculate churn against. :param tb: How many of top and bottom predictions to focus on. For example, tb200_churn -> tb=200. tb500_churn -> tb=500. By default all predictions are considered. :return: Churn score for the given prediction column.
Source code in src/numerblox/evaluation.py
681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 |
|
contributive_correlation(dataf, pred_col, target_col, other_col)
Calculate the contributive correlation of predictions with respect to the meta model. More information: https://docs.numer.ai/numerai-tournament/scoring/meta-model-contribution-mmc-and-bmc
Uses Numerai's official scoring function for contribution under the hood. More information: https://github.com/numerai/numerai-tools/blob/master/numerai_tools/scoring.py
Calculate contributive correlation by: 1. tie-kept ranking each prediction and the meta model 2. gaussianizing each prediction and the meta model 3. orthogonalizing each prediction wrt the meta model 3.5. scaling the targets to buckets [-2, -1, 0, 1, 2] 4. dot product the orthogonalized predictions and the targets then normalize by the length of the target (equivalent to covariance)
:param dataf: DataFrame containing era_col, pred_col, target_col and other_col. :param pred_col: Prediction column to calculate MMC for. :param target_col: Target column to calculate MMC against. Make sure the range of targets is [0, 1] (inclusive). If the function is called from full_evalation, this is guaranteed because of the checks. :param other_col: Meta model column containing predictions to neutralize against.
:return: A 1D NumPy array of contributive correlations by era.
Source code in src/numerblox/evaluation.py
650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 |
|
cross_correlation(dataf, pred_col, other_col)
Corrv2 correlation with other predictions (like another model, example predictions or meta model prediction). :param dataf: DataFrame containing both pred_col and other_col. :param pred_col: Main Prediction. :param other_col: Other prediction column to calculate correlation with pred_col.
:return: Correlation between Corrv2's of pred_col and other_col.
Source code in src/numerblox/evaluation.py
385 386 387 388 389 390 391 392 393 394 395 396 397 398 |
|
evaluation_one_col(dataf, feature_cols, pred_col, target_col, benchmark_cols=None)
Perform evaluation for one prediction column against given target and benchmark column(s).
Source code in src/numerblox/evaluation.py
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 |
|
exposure_dissimilarity(dataf, pred_col, other_col, corr_method='pearson')
Model pattern of feature exposure to the another column. See TC details forum post: https://forum.numer.ai/t/true-contribution-details/5128/4 :param dataf: DataFrame containing both pred_col and other_col. :param pred_col: Main Prediction. :param other_col: Other prediction column to calculate exposure dissimilarity against. :param corr_method: Correlation method to use for calculating feature exposures. corr_method should be one of ['pearson', 'kendall', 'spearman']. Default: 'pearson'.
Source code in src/numerblox/evaluation.py
438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 |
|
feature_neutral_mean_std_sharpe(dataf, pred_col, target_col, feature_names)
Feature neutralized mean performance. More info: https://docs.numer.ai/tournament/feature-neutral-correlation
Source code in src/numerblox/evaluation.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 |
|
full_evaluation(dataf, pred_cols, target_col='target', benchmark_cols=None)
Perform evaluation for each prediction column in pred_cols. By default only the "prediction" column is evaluated. Evaluation is done against given target and benchmark prediction column. :param dataf: DataFrame containing era_col, pred_cols, target_col and optional benchmark_cols. :param pred_cols: List of prediction columns to calculate evaluation metrics for. :param target_col: Target column to evaluate against. :param benchmark_cols: Optional list of benchmark columns to calculate evaluation metrics for.
Source code in src/numerblox/evaluation.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
get_feature_exposures_corrv2(dataf, pred_col, feature_list, cpu_cores=-1)
Calculate feature exposures for each era using 'Numerai Corr'. Results will be similar to get_feature_exposures() but more accurate. This method will take longer to compute.
:param dataf: DataFrame containing predictions, features, and eras. :param pred_col: Prediction column to calculate feature exposures for. :param feature_list: List of feature columns in X. :param cpu_cores: Number of CPU cores to use for parallelization. Default: -1 (all cores). :return: DataFrame with Corrv2 feature exposures by era for each feature.
Source code in src/numerblox/evaluation.py
566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 |
|
get_feature_exposures_pearson(dataf, pred_col, feature_list, cpu_cores=-1)
Calculate feature exposures for each era using Pearson correlation.
:param dataf: DataFrame containing predictions, features, and eras. :param pred_col: Prediction column to calculate feature exposures for. :param feature_list: List of feature columns in X. :param cpu_cores: Number of CPU cores to use for parallelization. :return: DataFrame with Pearson feature exposures by era for each feature.
Source code in src/numerblox/evaluation.py
529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 |
|
legacy_contribution(dataf, pred_col, target_col, other_col)
Legacy contibution mean, standard deviation and sharpe ratio. More info: https://forum.numer.ai/t/mmc2-announcement/93
:param dataf: DataFrame containing era_col, pred_col, target_col and other_col. :param pred_col: Prediction column to calculate MMC for. :param target_col: Target column to calculate MMC against. :param other_col: Meta model column containing predictions to neutralize against.
:return: List of legacy contribution scores by era.
Source code in src/numerblox/evaluation.py
628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 |
|
max_drawdown(era_corrs)
staticmethod
Maximum drawdown per era.
Source code in src/numerblox/evaluation.py
358 359 360 361 362 363 364 365 |
|
max_feature_exposure(dataf, feature_cols, pred_col)
Maximum exposure over all features.
Source code in src/numerblox/evaluation.py
400 401 402 403 404 |
|
mean_std_sharpe(era_corrs)
Average, standard deviation and Sharpe ratio for correlations per era.
Source code in src/numerblox/evaluation.py
328 329 330 331 332 333 334 335 336 |
|
numerai_corr(dataf, pred_col, target_col)
Computes 'Numerai Corr' aka 'Corrv2'. More info: https://forum.numer.ai/t/target-cyrus-new-primary-target/6303
Assumes original target col as input (i.e. in [0...1] range).
Source code in src/numerblox/evaluation.py
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 |
|
per_era_corrs(dataf, pred_col, target_col)
Correlation between prediction and target for each era.
Source code in src/numerblox/evaluation.py
320 321 322 |
|
per_era_numerai_corrs(dataf, pred_col, target_col)
Numerai Corr between prediction and target for each era.
Source code in src/numerblox/evaluation.py
324 325 326 |
|
plot_correlations(dataf, pred_cols, corr_cols=None, target_col='target', roll_mean=20)
Plot per era correlations over time. :param dataf: DataFrame that contains at least all pred_cols, target_col and corr_cols. :param pred_cols: List of prediction columns to calculate per era correlations for and plot. :param corr_cols: Per era correlations already prepared to include in the plot. This is optional for if you already have per era correlations prepared in your input dataf. :param target_col: Target column name to compute per era correlations against. :param roll_mean: How many eras should be averaged to compute a rolling score.
Source code in src/numerblox/evaluation.py
724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 |
|
smart_sharpe(era_corrs)
Sharpe adjusted for autocorrelation. :param era_corrs: Correlation scores by era
Source code in src/numerblox/evaluation.py
604 605 606 607 608 609 |
|
tbx_mean_std_sharpe(dataf, pred_col, target_col, tb=200)
Calculate Mean, Standard deviation and Sharpe ratio when we focus on the x top and x bottom predictions. :param tb: How many of top and bottom predictions to focus on. TB200 and TB500 are the most common situations.
Source code in src/numerblox/evaluation.py
428 429 430 431 432 433 434 435 436 |
|
update_progress_bar(pbar, desc, update=1, close=False)
Update progress bar for evaluation. :param pbar: tqdm progress bar object. :param desc: Description to show in progress bar. :param update: Update progress by n steps. :param close: Close progress bar.
Source code in src/numerblox/evaluation.py
306 307 308 309 310 311 312 313 314 315 316 317 318 |
|
NumeraiClassicEvaluator
Bases: BaseEvaluator
Evaluator for all Numerai Classic metrics.
Source code in src/numerblox/evaluation.py
810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 |
|
NumeraiSignalsEvaluator
Bases: BaseEvaluator
Evaluator for all metrics that are relevant in Numerai Signals.
Source code in src/numerblox/evaluation.py
871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 |
|
__await_diagnostics(api, model_id, diagnostics_id, timeout_min, interval_sec=15)
staticmethod
Wait for diagnostics to be uploaded. Try every 'interval_sec' seconds until 'timeout_min' minutes have passed.
Source code in src/numerblox/evaluation.py
917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 |
|
get_diagnostics(val_dataf, model_name, key, timeout_min=2, col='validationFncV4')
Retrieved neutralized validation correlation by era.
Calculated on Numerai servers.
:param val_dataf: A DataFrame containing prediction, date, ticker and data_type columns.
data_type column should contain 'validation' instances.
:param model_name: Any model name for which you have authentication credentials.
:param key: Key object to authenticate upload of diagnostics.
:param timeout_min: How many minutes to wait on diagnostics Computing on Numerai servers before timing out.
:param col: Which column to return. Should be one of ['validationCorrV4', 'validationFncV4', 'validationIcV2', 'validationRic']. If None, all columns will be returned.
2 minutes by default.
:return: Pandas Series with era as index and neutralized validation correlations (validationCorr).
Source code in src/numerblox/evaluation.py
889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 |
|
Submission
BaseSubmitter
Bases: BaseIO
Basic functionality for submitting to Numerai. Uses numerapi under the hood. More info: https://numerapi.readthedocs.io
:param directory_path: Directory to store and read submissions from. :param api: NumerAPI, SignalsAPI or CryptoAPI :param max_retries: Maximum number of retries for uploading predictions to Numerai. :param sleep_time: Time to sleep between uploading retries. :param fail_silently: Whether to skip uploading to Numerai without raising an error. Useful for if you are uploading many models in a loop and want to skip models that fail to upload.
Source code in src/numerblox/submission.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
get_model_mapping
property
Mapping between raw model names and model IDs.
__call__(dataf, model_name, file_name='submission.csv', cols='prediction', *args, **kwargs)
The most common use case will be to create a CSV and submit it immediately after that. full_submission handles this.
Source code in src/numerblox/submission.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
|
combine_csvs(csv_paths, aux_cols, era_col=None, pred_col='prediction')
Read in csv files and combine all predictions with a rank mean.
Multi-target predictions will be averaged out.
:param csv_paths: List of full paths to .csv prediction files.
:param aux_cols: ['id'] for Numerai Classic.
['ticker', 'last_friday', 'data_type'], for example, with Numerai Signals.
:param era_col: Column indicating era ('era' or 'last_friday').
Will be used for Grouping the rank mean if given. Skip groupby if no era_col provided.
:param pred_col: 'prediction' for Numerai Classic and 'signal' for Numerai Signals.
Source code in src/numerblox/submission.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
full_submission(dataf, model_name, cols, file_name='submission.csv', *args, **kwargs)
Save DataFrame to csv and upload predictions through API.
:param dataf: Main DataFrame containing cols
.
:param model_name: Lowercase Numerai model name.
:param file_name: path to save model to relative to base directory.
:param cols: Columns to be saved in submission file.
1 prediction column for Numerai Classic.
At least 1 prediction column and 1 ticker column for Numerai Signals.
args, *kwargs are passed to numerapi API.
For example version
argument in Numerai Classic submissions.
Source code in src/numerblox/submission.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
save_csv(dataf, file_name, cols, *args, **kwargs)
abstractmethod
For Numerai Classic: Save index column + 'cols' (targets) to CSV. For Numerai Signals: Save ticker, date, data_type and signal columns to CSV.
Source code in src/numerblox/submission.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
upload_predictions(file_name, model_name, *args, **kwargs)
Upload CSV file to Numerai for given model name. :param file_name: File name/path relative to directory_path. :param model_name: Lowercase raw model name (For example, 'integration_test').
Source code in src/numerblox/submission.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
NumerBaySubmitter
Bases: BaseSubmitter
Submit to NumerBay to fulfill sale orders, in addition to submission to Numerai.
:param tournament_submitter: Base tournament submitter (NumeraiClassicSubmitter or NumeraiSignalsSubmitter). This submitter will use the same directory path. :param upload_to_numerai: Whether to also submit to Numerai using the tournament submitter. Defaults to True, set to False to only upload to NumerBay. :param numerbay_username: NumerBay username :param numerbay_password: NumerBay password
Source code in src/numerblox/submission.py
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 |
|
__call__(dataf, model_name, numerbay_product_full_name, file_name='submission.csv', cols='prediction', *args, **kwargs)
The most common use case will be to create a CSV and submit it immediately after that. full_submission handles this.
Source code in src/numerblox/submission.py
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 |
|
full_submission(dataf, model_name, cols, numerbay_product_full_name, file_name='submission.csv', *args, **kwargs)
Save DataFrame to csv and upload predictions through API.
:param dataf: Main DataFrame containing cols
.
:param model_name: Lowercase Numerai model name.
:param numerbay_product_full_name: NumerBay product full name in the format of [category]-[product name], e.g. 'numerai-predictions-numerbay'
:param file_name: path to save model to relative to base directory.
:param cols: Columns to be saved in submission file.
1 prediction column for Numerai Classic.
At least 1 prediction column and 1 ticker column for Numerai Signals.
args, *kwargs are passed to numerapi API.
For example version
argument in Numerai Classic submissions.
Source code in src/numerblox/submission.py
346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
|
upload_predictions(file_name, model_name, numerbay_product_full_name, *args, **kwargs)
Upload CSV file to NumerBay (and Numerai if 'upload_to_numerai' is True) for given model name and NumerBay product full name. :param file_name: File name/path relative to directory_path. :param model_name: Lowercase raw model name (For example, 'integration_test'). :param numerbay_product_full_name: NumerBay product full name in the format of [category]-[product name], e.g. 'numerai-predictions-numerbay'
Source code in src/numerblox/submission.py
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 |
|
NumeraiClassicSubmitter
Bases: BaseSubmitter
Submit for Numerai Classic.
:param directory_path: Base directory to save and read prediction files from.
:param key: Key object containing valid credentials for Numerai Classic.
:param max_retries: Maximum number of retries for uploading predictions to Numerai. :param sleep_time: Time to sleep between uploading retries. :param fail_silently: Whether to skip uploading to Numerai without raising an error. Useful for if you are uploading many models in a loop and want to skip models that fail to upload. args, *kwargs will be passed to NumerAPI initialization.
Source code in src/numerblox/submission.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
save_csv(dataf, file_name='submission.csv', cols='prediction', *args, **kwargs)
:param dataf: DataFrame which should have at least the following columns: 1. id (as index column) 2. cols (for example, 'prediction_mymodel'). Will be saved in 'prediction' column :param file_name: .csv file path. :param cols: Prediction column name. For example, 'prediction' or 'prediction_mymodel'.
Source code in src/numerblox/submission.py
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
NumeraiCryptoSubmitter
Bases: BaseSubmitter
Submit for Numerai Crypto.
:param directory_path: Base directory to save and read prediction files from.
:param key: Key object containing valid credentials for Numerai Crypto.
:param max_retries: Maximum number of retries for uploading predictions to Numerai. :param sleep_time: Time to sleep between uploading retries. :param fail_silently: Whether to skip uploading to Numerai without raising an error. Useful for if you are uploading many models in a loop and want to skip models that fail to upload. args, *kwargs will be passed to CryptoAPI initialization.
Source code in src/numerblox/submission.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 |
|
save_csv(dataf, cols, file_name='submission.csv', *args, **kwargs)
:param dataf: DataFrame which should have at least the following columns: 1. symbol col 2. signal (Values between 0 and 1 (exclusive)) Additional columns for if you include validation data (optional): 3. date (YYYY-MM-DD format date indication) 4. data_type ('val' and 'live' partitions)
:param cols: All cols that are saved in CSV. cols should contain at least 1 ticker column and a 'signal' column. For example: ['bloomberg_ticker', 'signal'] :param file_name: .csv file path.
Source code in src/numerblox/submission.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 |
|
NumeraiSignalsSubmitter
Bases: BaseSubmitter
Submit for Numerai Signals.
:param directory_path: Base directory to save and read prediction files from.
:param key: Key object containing valid credentials for Numerai Signals.
:param max_retries: Maximum number of retries for uploading predictions to Numerai. :param sleep_time: Time to sleep between uploading retries. :param fail_silently: Whether to skip uploading to Numerai without raising an error. Useful for if you are uploading many models in a loop and want to skip models that fail to upload. args, *kwargs will be passed to SignalsAPI initialization.
Source code in src/numerblox/submission.py
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 |
|
save_csv(dataf, cols, file_name='submission.csv', *args, **kwargs)
:param dataf: DataFrame which should have at least the following columns: 1. One of supported ticker formats (cusip, sedol, ticker, numerai_ticker or bloomberg_ticker) 2. signal (Values between 0 and 1 (exclusive)) Additional columns for if you include validation data (optional): 3. date (YYYY-MM-DD format date indication) 4. data_type ('val' and 'live' partitions)
:param cols: All cols that are saved in CSV. cols should contain at least 1 ticker column and a 'signal' column. For example: ['bloomberg_ticker', 'signal'] :param file_name: .csv file path.
Source code in src/numerblox/submission.py
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
|
Model Upload
NumeraiModelUpload
A class to handle the uploading of machine learning models to Numerai's servers.
:param key: API key object containing public and secret keys for NumerAPI authentication. :param max_retries: Maximum number of attempts to upload the model. :param sleep_time: Number of seconds to wait between retries. :param fail_silently: Whether to suppress exceptions during upload.
Source code in src/numerblox/model_upload.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 |
|
get_model_mapping
property
Retrieves the mapping of model names to their IDs from the user's Numerai account.
:return: A dictionary mapping model names to model IDs.
__init__(key=None, max_retries=2, sleep_time=10, fail_silently=False, *args, **kwargs)
Initializes the NumeraiModelUpload class with the necessary configuration.
:param key: API key object containing public and secret keys for NumerAPI authentication. :param max_retries: Maximum number of retry attempts for model upload. :param sleep_time: Time (in seconds) to wait between retries. :param fail_silently: If True, suppress errors during model upload. :param args: Additional arguments for NumerAPI. :param *kwargs: Additional keyword arguments for NumerAPI.
Source code in src/numerblox/model_upload.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
create_and_upload_model(model, feature_cols=None, model_name=None, file_path=None, data_version=None, docker_image=None, custom_predict_func=None)
Creates a model prediction function, serializes it, and uploads the model to Numerai. :param model: The machine learning model object. :param feature_cols: List of feature column names for predictions. Defaults to None. :param model_name: The name of the model to upload. :param file_path: The file path where the serialized model function will be saved. :param data_version: Data version to use for model upload. :param docker_image: Docker image to use for model upload. :param custom_predict_func: Custom prediction function to use instead of the model's predict method.
:return: Upload ID if the upload is successful, None otherwise.
Source code in src/numerblox/model_upload.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
|
get_available_data_versions()
Retrieves the available data versions for model uploads.
:return: A dictionary of available data versions.
Source code in src/numerblox/model_upload.py
102 103 104 105 106 107 108 109 |
|
get_available_docker_images()
Retrieves the available Docker images for model uploads.
:return: A dictionary of available Docker images.
Source code in src/numerblox/model_upload.py
111 112 113 114 115 116 117 118 |
|
Models
EraBoostedXGBRegressor
Bases: XGBRegressor
Custom XGBRegressor model that upweights the worst eras in the data. The worst eras are determined by Corrv2. NOTE: Currently only supports single target regression.
This idea was first proposed by Richard Craib in the Numerai forums: https://forum.numer.ai/t/era-boosted-models/189
Credits to Michael Oliver (mdo) for proposing the 1st XGBoost implementation of era boosting: https://forum.numer.ai/t/era-boosted-models/189/3
:param proportion: Proportion of eras to upweight. :param trees_per_step: Number of trees to add per iteration. :param num_iters: Number of total era boosting iterations.
Source code in src/numerblox/models.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
|
get_feature_names_out(input_features=None)
Get output feature names for transformation.
Source code in src/numerblox/models.py
67 68 69 70 |
|
Miscellaneous
AttrDict
Bases: dict
Access dictionary elements as attributes.
Source code in src/numerblox/misc.py
4 5 6 7 8 9 |
|
Key
Numerai credentials.
Source code in src/numerblox/misc.py
12 13 14 15 16 17 18 19 20 21 22 23 |
|
load_key_from_json(file_path, *args, **kwargs)
Initialize Key object from JSON file.
Credentials file must have the following format:
{"pub_id": "PUBLIC_ID", "secret_key": "SECRET_KEY"}
Source code in src/numerblox/misc.py
26 27 28 29 30 31 32 33 34 35 36 |
|