= pd.DataFrame({'XGBoost': models['XGBoost'].predict_proba(models['test_X_ML'])[:,1],
test_X_ens 'SVC': logistic(models['SVC'].decision_function(models['test_X_ML'])),
'ElasticNet': models['ElasticNet'].predict_proba(models['test_X_ML'])[:,1],
'LASSO': models['LASSO'].predict_proba(models['test_X_ML'])[:,1],
'logit': models['logit'].predict(models['test_pd'][models['vars']])})
= test_X_ens.rank()
rank_X_ens = rank_X_ens.XGBoost + rank_X_ens.SVC + rank_X_ens.ElasticNet + rank_X_ens.LASSO + rank_X_ens.logit
arank_X_ens = metrics.roc_auc_score(models['test_pd'].Restate_Int, arank_X_ens)
auc = metrics.roc_curve(models['test_pd'].Restate_Int, arank_X_ens)
fpr, tpr, thresholds = metrics.RocCurveDisplay(fpr=fpr, tpr=tpr, roc_auc=auc)
display display.plot()
Main application: Ensembling
- Idea: Predict instances of intentional misreporting?
- Testing: Predicting 10-K/A irregularities using finance, textual style, and topics
Dependent Variable
Intentional misreporting as stated in 10-K/A filings
Independent Variables
- 17 Financial measures
- 20 Style characteristics
- 31 10-K discussion topics
This test mirrors a subset of Brown, Crowley and Elliott (2020 JAR)
We will combine the models from the past two weeks