Postprocessing
Feature Neutralization
FeatureNeutralizer
provides classic feature neutralization by subtracting linear model influence, ensuring that predictions are not overly influenced by a specific set of features.
Why?
- Reduce Overfitting: By neutralizing predictions, you can potentially reduce the risk of overfitting to specific feature characteristics.
- Control Feature Influence: Allows you to have a granular control on how much influence a set of features can exert on the final predictions.
- Enhance Model Robustness: By limiting the influence of potentially noisy or unstable features, you might improve the robustness of your model's predictions across different data periods.
Quickstart
Make sure to pass both the features to use for penalization as a pd.DataFrame
and the accompanying era column as a pd.Series
to the predict
method.
Additionally, pred_name
and proportion
can be lists. In this case, the neutralization will be performed for each prediction name and proportion. For example, if pred_name=["prediction1", "prediction2"]
and proportion=[0.5, 0.7]
, then the result will be an array with 4 neutralized prediction columns.
All neutralizations will be performed in parallel.
Single column neutralization:
import pandas as pd
from numerblox.neutralizers import FeatureNeutralizer
predictions = pd.Series([0.24, 0.87, 0.6])
feature_data = pd.DataFrame([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
era_data = pd.Series([1, 1, 2])
neutralizer = FeatureNeutralizer(pred_name="prediction", proportion=0.5)
neutralizer.fit()
neutralized_predictions = neutralizer.predict(X=predictions, features=feature_data, era_series=era_data)
Multiple column neutralization:
import pandas as pd
from numerblox.neutralizers import FeatureNeutralizer
predictions = pd.DataFrame({"prediction1": [0.24, 0.87, 0.6], "prediction2": [0.24, 0.87, 0.6]})
feature_data = pd.DataFrame([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
era_data = pd.Series([1, 1, 2])
neutralizer = FeatureNeutralizer(pred_name=["prediction1", "prediction2"], proportion=[0.5, 0.7])
neutralizer.fit()
neutralized_predictions = neutralizer.predict(X=predictions, features=feature_data, era_series=era_data)
FeaturePenalizer
FeaturePenalizer
neutralizes predictions using TensorFlow based on provided feature exposures. It's designed to integrate seamlessly with scikit-learn.
Why?
- Limit Feature Exposure: Ensures that predictions are not excessively influenced by any individual feature, which can help in achieving more stable predictions.
- Enhanced Prediction Stability: By penalizing high feature exposures, it might lead to more stable and consistent predictions across different eras or data splits.
- Mitigate Model Biases: If a model is relying too heavily on a particular feature, penalizing can help in balancing out the biases and making the model more generalizable.
Quickstart
Make sure to pass both the features to use for penalization as a pd.DataFrame
and the accompanying era column as a pd.Series
to the predict
method.
from numerblox.penalizers import FeaturePenalizer
predictions = pd.Series([0.24, 0.87, 0.6])
feature_data = pd.DataFrame([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
era_data = pd.Series([1, 1, 2])
penalizer = FeaturePenalizer(max_exposure=0.1, pred_name="prediction")
penalizer.fit(X=predictions)
penalized_predictions = penalizer.predict(X=predictions, features=feature_data, era_series=era_data)