First we will import the needed packages. We will use the
tidyverse set of packages for code readability and
simplicity, along with glmnet for LASSO and Elastic
Net.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1 ── Conflicts ────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(glmnet)
Loading required package: Matrix
Attaching package: ‘Matrix’
The following objects are masked from ‘package:tidyr’:
expand, pack, unpack
Loaded glmnet 4.1-7
library(yardstick)
Attaching package: ‘yardstick’
The following object is masked from ‘package:readr’:
spec
library(coefplot)
Next, we need to import the dataset for the sessions exercises.
While the original csv file is ~38MB, the gzipped version is only 17MB. Using gzipped (or zipped) files is a good way to save storage space while working with csv files in R (and in python as well).
df <- read_csv('../../Data/S1_data.csv.gz')
Rows: 14301 Columns: 198── Column specification ────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Filing, date filed_x, FYE_x, restate_filing, Form, Date, loc
dbl (191): gvkey, Firm, sic, year, logtotasset, rsst_acc, chg_recv, chg_inv, soft_assets, pct_chg_cashsales,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df <- df %>% mutate(Restate_Int_f = factor(Restate_Int, levels=c(0,1)))
There are multiple ways to do this. The most robust is to separate the training and testing samples by time. We can do this using the techniques taught in part 1.
unique(df$year)
[1] 2002 2003 2004 1999 2000 2001
# Subset the final year to be the testing year
train <- df %>% filter(year < 2004)
test <- df %>% filter(year == 2004)
print(c(nrow(df), nrow(train), nrow(test)))
[1] 14301 11478 2823
The second approach is to randomly assign observations into the training and testing samples. This is a reasonable choice if predictive performance is not the primary goal of the analysis, or if the data is all from the same point in time. This can be done simply by using caret. Note that scikit-learn expects the DVs and IVs to be in separate dataframes, so we will need to do this first.
Y1 <- df$sdvol1
X1 = df %>% select(-sdvol1)
library(caret)
Loading required package: lattice
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Attaching package: ‘caret’
The following objects are masked from ‘package:yardstick’:
precision, recall, sensitivity, specificity
The following object is masked from ‘package:purrr’:
lift
# test_size specifies the percent of the files to hold for testing
index <- createDataPartition(df$sdvol1, p=0.8, list=FALSE)
train2 <- df %>% slice(index)
Warning: Slicing with a 1-column matrix was deprecated in dplyr 1.1.0.
test2 <- df %>% slice(-index)
print(c(nrow(df), nrow(train2), nrow(test2)))
[1] 14301 11441 2860
First, we will try a simple linear regression using statsmodels to
replicate a replication of Bao and Datta (2014) from Brown, Crowley and
Elliott (2020) (henceforth BCE). This is simply a linear regression of
future stock return volatility, sdvol1, on the topic
measures from the BCE study.
We will use this as a stepping stone to get to the LASSO model.
BD_eq <- as.formula(paste("sdvol1 ~ ", paste(paste0("Topic_",1:30,"_n_oI"), collapse=" + "), collapse=""))
model_lm <- lm(BD_eq, train)
summary(model_lm)
Call:
lm(formula = BD_eq, data = train)
Residuals:
Min 1Q Median 3Q Max
-0.18799 -0.01707 -0.00646 0.00904 0.49410
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0457521 0.0002674 171.114 < 2e-16 ***
Topic_1_n_oI 1.1709484 0.3404372 3.440 0.000585 ***
Topic_2_n_oI 0.5367261 0.2615383 2.052 0.040174 *
Topic_3_n_oI 0.4004462 0.4160324 0.963 0.335801
Topic_4_n_oI 0.6475066 0.2386256 2.713 0.006668 **
Topic_5_n_oI 0.6776698 0.2462900 2.752 0.005941 **
Topic_6_n_oI 0.5421747 0.3630189 1.494 0.135330
Topic_7_n_oI -0.6519468 0.2858123 -2.281 0.022565 *
Topic_8_n_oI 0.5089414 0.2529234 2.012 0.044219 *
Topic_9_n_oI 2.1940373 0.2245302 9.772 < 2e-16 ***
Topic_10_n_oI 0.6721560 0.2073181 3.242 0.001190 **
Topic_11_n_oI -1.2180112 0.2593631 -4.696 2.68e-06 ***
Topic_12_n_oI -0.0310882 0.2949973 -0.105 0.916072
Topic_13_n_oI 0.5372461 0.8110550 0.662 0.507726
Topic_14_n_oI -1.9815456 0.2785780 -7.113 1.20e-12 ***
Topic_15_n_oI 0.7314503 0.1908809 3.832 0.000128 ***
Topic_16_n_oI -1.8828359 0.4468124 -4.214 2.53e-05 ***
Topic_17_n_oI -0.2010616 0.3389921 -0.593 0.553115
Topic_18_n_oI 1.9874176 0.3746523 5.305 1.15e-07 ***
Topic_19_n_oI 1.4410208 0.2321345 6.208 5.56e-10 ***
Topic_20_n_oI -1.5942810 0.3681695 -4.330 1.50e-05 ***
Topic_21_n_oI 2.9558915 0.3383267 8.737 < 2e-16 ***
Topic_22_n_oI 0.8333680 0.2050302 4.065 4.84e-05 ***
Topic_23_n_oI 0.1696218 0.2128786 0.797 0.425583
Topic_24_n_oI 0.6899507 0.2453693 2.812 0.004934 **
Topic_25_n_oI 0.7990728 0.2134840 3.743 0.000183 ***
Topic_26_n_oI 0.5920543 0.2056951 2.878 0.004006 **
Topic_27_n_oI 1.5067016 0.5758417 2.617 0.008895 **
Topic_28_n_oI 1.2984541 0.2634863 4.928 8.42e-07 ***
Topic_29_n_oI 0.6240868 0.1997018 3.125 0.001782 **
Topic_30_n_oI -0.4545888 0.4645688 -0.979 0.327839
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02864 on 11447 degrees of freedom
Multiple R-squared: 0.1614, Adjusted R-squared: 0.1592
F-statistic: 73.45 on 30 and 11447 DF, p-value: < 2.2e-16
For the logistic regression example, we will run a single window of
the intentional restatement prediction test from BCE. For this we need
to use glm instead of lm.
BCE_eq <- as.formula(paste("Restate_Int ~ logtotasset + rsst_acc + chg_recv + chg_inv +
soft_assets + pct_chg_cashsales + chg_roa + issuance +
oplease_dum + book_mkt + lag_sdvol + merger + bigNaudit +
midNaudit + cffin + exfin + restruct + bullets + headerlen +
newlines + alltags + processedsize + sentlen_u + wordlen_s +
paralen_s + repetitious_p + sentlen_s + typetoken +
clindex + fog + active_p + passive_p + lm_negative_p +
lm_positive_p + allcaps + exclamationpoints + questionmarks + ",
paste(paste0("Topic_",1:30,"_n_oI"), collapse=" + "), collapse=""))
model_logit <- glm(BCE_eq, train, family="binomial")
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model_logit)
Call:
glm(formula = BCE_eq, family = "binomial", data = train)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.634e+00 5.591e+00 -1.187 0.23541
logtotasset 9.363e-02 6.442e-02 1.454 0.14607
rsst_acc 3.269e-01 3.226e-01 1.013 0.31095
chg_recv 6.838e-01 1.307e+00 0.523 0.60085
chg_inv -1.428e+00 1.509e+00 -0.947 0.34378
soft_assets 1.451e+00 4.698e-01 3.088 0.00201 **
pct_chg_cashsales -1.230e-03 8.480e-03 -0.145 0.88472
chg_roa -2.584e-01 2.635e-01 -0.981 0.32666
issuance 2.336e-01 4.218e-01 0.554 0.57971
oplease_dum 1.529e-01 3.136e-01 0.488 0.62572
book_mkt 7.977e-03 4.436e-02 0.180 0.85731
lag_sdvol -4.005e-02 1.003e-01 -0.399 0.68984
merger -2.662e-01 2.563e-01 -1.039 0.29903
bigNaudit -1.544e-01 4.452e-01 -0.347 0.72877
midNaudit 3.926e-01 5.218e-01 0.752 0.45180
cffin 5.806e-01 2.970e-01 1.954 0.05065 .
exfin -2.024e-02 2.428e-02 -0.834 0.40447
restruct 6.353e-01 2.158e-01 2.945 0.00323 **
bullets 1.075e-05 3.477e-05 0.309 0.75726
headerlen 2.224e-05 1.128e-04 0.197 0.84368
newlines -3.370e-04 2.108e-04 -1.599 0.10991
alltags -1.430e-07 2.800e-07 -0.511 0.60962
processedsize 5.323e-06 1.840e-06 2.893 0.00381 **
sentlen_u -9.797e-02 1.108e-01 -0.884 0.37658
wordlen_s 9.048e-01 2.063e+00 0.438 0.66103
paralen_s 4.620e-02 2.120e-02 2.179 0.02935 *
repetitious_p 1.594e+00 1.984e+00 0.803 0.42181
sentlen_s -3.233e-02 3.069e-02 -1.053 0.29212
typetoken -8.385e+00 3.526e+00 -2.378 0.01739 *
clindex -3.223e-01 2.461e-01 -1.310 0.19026
fog 2.704e-01 2.311e-01 1.170 0.24198
active_p -8.746e-01 2.264e+00 -0.386 0.69924
passive_p 4.665e+00 3.770e+00 1.238 0.21590
lm_negative_p 9.207e+01 1.840e+01 5.003 5.65e-07 ***
lm_positive_p -4.306e+01 6.455e+01 -0.667 0.50467
allcaps -1.501e-04 1.941e-04 -0.773 0.43940
exclamationpoints -3.360e-02 5.750e-02 -0.584 0.55903
questionmarks -1.432e+01 4.971e+02 -0.029 0.97703
Topic_1_n_oI 1.252e+02 1.376e+02 0.910 0.36285
Topic_2_n_oI 7.744e+01 9.375e+01 0.826 0.40882
Topic_3_n_oI 5.209e+01 1.643e+02 0.317 0.75125
Topic_4_n_oI 1.301e+02 9.796e+01 1.329 0.18400
Topic_5_n_oI 3.121e+01 1.029e+02 0.303 0.76169
Topic_6_n_oI 4.627e+01 1.556e+02 0.297 0.76616
Topic_7_n_oI 1.846e+02 1.147e+02 1.610 0.10750
Topic_8_n_oI 2.201e+01 1.046e+02 0.210 0.83338
Topic_9_n_oI 1.211e+02 9.787e+01 1.238 0.21589
Topic_10_n_oI 9.114e+01 9.025e+01 1.010 0.31255
Topic_11_n_oI 2.348e+01 1.051e+02 0.223 0.82327
Topic_12_n_oI 1.916e+01 1.182e+02 0.162 0.87124
Topic_13_n_oI 2.393e+02 2.073e+02 1.154 0.24839
Topic_14_n_oI 7.659e+01 1.081e+02 0.709 0.47848
Topic_15_n_oI 8.002e+01 7.623e+01 1.050 0.29382
Topic_16_n_oI 1.969e+02 1.554e+02 1.267 0.20517
Topic_17_n_oI -1.373e+02 1.890e+02 -0.726 0.46774
Topic_18_n_oI 3.698e+01 1.511e+02 0.245 0.80666
Topic_19_n_oI 5.055e+01 9.465e+01 0.534 0.59326
Topic_20_n_oI -2.146e+02 2.667e+02 -0.805 0.42097
Topic_21_n_oI 7.994e+01 1.282e+02 0.624 0.53293
Topic_22_n_oI -1.302e+01 1.030e+02 -0.126 0.89944
Topic_23_n_oI 1.911e+02 8.272e+01 2.311 0.02085 *
Topic_24_n_oI 1.717e+00 1.633e+02 0.011 0.99161
Topic_25_n_oI 1.145e+02 8.230e+01 1.391 0.16425
Topic_26_n_oI 7.543e+01 8.073e+01 0.934 0.35013
Topic_27_n_oI -5.451e+01 2.389e+02 -0.228 0.81955
Topic_28_n_oI 8.490e+00 1.117e+02 0.076 0.93943
Topic_29_n_oI 8.017e+01 7.911e+01 1.013 0.31085
Topic_30_n_oI 6.847e+01 1.601e+02 0.428 0.66890
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1414.5 on 11477 degrees of freedom
Residual deviance: 1244.1 on 11410 degrees of freedom
AIC: 1380.1
Number of Fisher Scoring iterations: 20
For in-sample prediction, we can apply the predict()
function to our current data. For out-of-sample prediction, we can
likewise apply predict() to our testing data.
What we really want are quantitative measures of how well a model
works. These tend to be rather distributed in R, though there are some
packages like yardstick that attempt to provide an
all-around solution (though typically only in a certain ecosystem). For
our uses, we will manually calculate metrics for linear regression, and
will use yardstick for the logistic regression
evaluation.
We will check in-sample and out-of-sample scores for the following:
The linear regression metrics are quite easy to calculate in R, given it’s vector-oriented nature.
apply_rmse <- function(v1, v2) {
sqrt(mean((v1 - v2)^2, na.rm=T))
}
apply_mae <- function(v1, v2) {
mean(abs(v1-v2), na.rm=T)
}
Next, we can apply them to our training and testing data.
# Linear, in-sample
Y_hat_train = predict(model_lm, train)
# squared=False means this function will return RMSE instead of MSE.
rmse = apply_rmse(train$sdvol1, Y_hat_train)
mae = apply_mae(train$sdvol1, Y_hat_train)
print(paste0('RMSE: ', rmse, ' MAE: ', mae))
[1] "RMSE: 0.0286054321352924 MAE: 0.0190667869656718"
# Linear, out-of-sample
Y_hat_test = predict(model_lm, test)
# squared=False means this function will return RMSE instead of MSE.
rmse = apply_rmse(test$sdvol1, Y_hat_test)
mae = apply_mae(test$sdvol1, Y_hat_test)
print(paste0('RMSE: ', rmse, ' MAE: ', mae))
[1] "RMSE: 0.0223173146894147 MAE: 0.0188355387294677"
As seen above, performance out-of-sample is quite similar to performance in-sample for predicting stock market volatility. This is a good sign for our model.
To calculate AUC, will will use the functions
prediction() and performance() from
ROCR.
# Logit, in-sample
train$Y_hat_train <- predict(model_logit, train, type="response")
auc_out <- train %>% roc_auc(Restate_Int_f, Y_hat_train, event_level='second')
print(paste0('ROC AUC: ', auc_out$.estimate))
[1] "ROC AUC: 0.779455349342667"
# Logit, out-of-sample
test$Y_hat_test <- predict(model_logit, test, type="response")
auc_out <- test %>% roc_auc(Restate_Int_f, Y_hat_test, event_level='second')
print(paste0('ROC AUC: ', auc_out$.estimate))
[1] "ROC AUC: 0.609252244304914"
In this case, we see that out-of-sample performance is much lower.
We can also plot out an ROC curve by calculating the True Positive
rate and False Positive Rate. We can use yardstick, which
is internally using ggplot, to get a nice visualization of
this.
test$Y_hat_test <- predict(model_logit, test, type="response")
curve_test <- test %>%
roc_curve(Restate_Int_f, Y_hat_test, event_level='second')
train$Y_hat_train <- predict(model_logit, train, type="response")
curve_train <- train %>%
roc_curve(Restate_Int_f, Y_hat_train, event_level='second')
curve_test <- curve_test %>% group_by(sensitivity) %>% slice(c(1, n())) %>% ungroup()
curve_train <- curve_train %>% group_by(sensitivity) %>% slice(c(1, n())) %>% ungroup()
ggplot() +
geom_line(data=curve_test, aes(y=sensitivity, x=1-specificity, color="Testing")) +
geom_line(data=curve_train, aes(y=sensitivity, x=1-specificity, color="Training")) +
geom_abline(slope=1)
One of the simplest machine learning approaches to implement is
LASSO. A nice implementation of LASSO is available in
glmnet. LASSO is not far removed from the OLS and logistic
regressions we have seen – the only difference between them is that
LASSO adds an L1-norm penalty based on the coefficients in the model.
More specifically, the penalty is the sum of the absolute values of the
model coefficients times a penalty parameter.
Because there is a new parameter though, we can’t simply solve by minimizing the sum of squared error! Instead, we’ll need to also provide a value for the parameter, or ideally optimize it.
We will start by re-running our above analyses in a single iteration. Note that for glmnet, it is generally best to let it optimize the penalty rather than supplying your own value(s).
Note that glmnet expects us to pass matrices rather than
data.frames. As such, we need to get our data in shape first. This is
easy to accomplish using the Base R command
model.matrix().
x_lm <- model.matrix(BD_eq, data=train)[,-1] # [,-1] to remove intercept
y_lm <- model.frame(BD_eq, data=train)[,"sdvol1"]
fit_LASSO_lm <- glmnet(x=x_lm, y=y_lm,
family = "gaussian",
alpha = 1 # Specifies LASSO. alpha = 0 is ridge
)
To easily view the model’s coefficients, we can use the
coefplot package.
coefplot(fit_LASSO_lm, sort='magnitude')
To view the paths coefficients take for differing penalties, we can
simply call plot() on the model.
plot(fit_LASSO_lm)
This is pretty much the same as the above, except we will swap out
family="gaussian" for family="binomial".
x <- model.matrix(BCE_eq, data=train)[,-1] # [,-1] to remove intercept
y <- model.frame(BCE_eq, data=train)[,"Restate_Int"]
fit_LASSO_logit <- glmnet(x=x, y=y,
family = "binomial",
alpha = 1 # Specifies LASSO. alpha = 0 is ridge
)
coefplot(fit_LASSO_logit, sort='magnitude')
plot(fit_LASSO_logit)
Using cross-validation is built right into glmnet. We
can simply call the correspondingly tweaked function, tell it how many
folds we want, and let it optimize!
For linear LASSO, the cross-validation approach will optimize \([R]MSE\) by default. We can specify some other forms, byt \(R^2\) is not an option.
cvfit_lm = cv.glmnet(x=x_lm, y=y_lm, family = "gaussian", alpha = 1, type.measure="mse")
For the CV function, the default plot shows the path of the chosen measure type. The first dashed line corresponds to the optimal parameter, while the second corresponds to the variant that is simpler, yet within 1 standard error of the optimal parameter.
plot(cvfit_lm)
To see the parameters, we can call the following:
cvfit_lm$lambda.min
[1] 0.0002988144
cvfit_lm$lambda.1se
[1] 0.002313612
We can take a look at both models:
#coef(cvfit, s = "lambda.min")
coefplot(cvfit_lm, lambda='lambda.min', sort='magnitude')
coefplot(cvfit_lm, lambda='lambda.1se', sort='magnitude')
This is again pretty similar, just changing the family and measure.
cvfit_logit = cv.glmnet(x=x, y=y, family = "binomial", alpha = 1, type.measure="auc")
plot(cvfit_logit)
cvfit_logit$lambda.min
[1] 0.003052213
cvfit_logit$lambda.1se
[1] 0.006424615
coefplot(cvfit_logit, lambda='lambda.min', sort='magnitude')
coefplot(cvfit_logit, lambda='lambda.1se', sort='magnitude')
We can quickly gauge model performance on our test data using
assess.glmnet().
newx <- model.matrix(BCE_eq, data=test)[,-1] # [,-1] to remove intercept
newy <- model.frame(BCE_eq, data=test)[,"Restate_Int"]
assess.glmnet(cvfit_logit, newx = newx, newy = newy, s='lambda.min')
$deviance
lambda.min
0.2319859
attr(,"measure")
[1] "Binomial Deviance"
$class
lambda.min
0.02479632
attr(,"measure")
[1] "Misclassification Error"
$auc
[1] 0.6392663
attr(,"measure")
[1] "AUC"
$mse
lambda.min
0.04829721
attr(,"measure")
[1] "Mean-Squared Error"
$mae
lambda.min
0.07534207
attr(,"measure")
[1] "Mean Absolute Error"
assess.glmnet(cvfit_logit, newx = newx, newy = newy, s='lambda.1se')
$deviance
lambda.1se
0.2401724
attr(,"measure")
[1] "Binomial Deviance"
$class
lambda.1se
0.02479632
attr(,"measure")
[1] "Misclassification Error"
$auc
[1] 0.6228167
attr(,"measure")
[1] "AUC"
$mse
lambda.1se
0.04861516
attr(,"measure")
[1] "Mean-Squared Error"
$mae
lambda.1se
0.07223063
attr(,"measure")
[1] "Mean Absolute Error"
One drawback of glmnet is that it does not supply a way to cross
validate the weighting between L1 and L2 regularization (alpha in
glmnet). As such, one would need to do a grid search using something
like caret or parsnip + tune.
As such, we will just do a simple example that is halfway between LASSO (L1) and Ridge (L2).
cvfit_en = cv.glmnet(x=x, y=y, family = "binomial", alpha = 0.5, type.measure="auc")
plot(cvfit_en)
cvfit_en$lambda.min
[1] 0.002407711
cvfit_en$lambda.1se
[1] 0.01410201
coefplot(cvfit_en, lambda='lambda.min', sort='magnitude')
coefplot(cvfit_en, lambda='lambda.1se', sort='magnitude')
assess.glmnet(cvfit_en, newx = newx, newy = newy, s='lambda.min')
$deviance
lambda.min
0.2285354
attr(,"measure")
[1] "Binomial Deviance"
$class
lambda.min
0.02479632
attr(,"measure")
[1] "Misclassification Error"
$auc
[1] 0.65775
attr(,"measure")
[1] "AUC"
$mse
lambda.min
0.04805499
attr(,"measure")
[1] "Mean-Squared Error"
$mae
lambda.min
0.07634622
attr(,"measure")
[1] "Mean Absolute Error"
assess.glmnet(cvfit_en, newx = newx, newy = newy, s='lambda.1se')
$deviance
lambda.1se
0.2430068
attr(,"measure")
[1] "Binomial Deviance"
$class
lambda.1se
0.02479632
attr(,"measure")
[1] "Misclassification Error"
$auc
[1] 0.6157127
attr(,"measure")
[1] "AUC"
$mse
lambda.1se
0.04869231
attr(,"measure")
[1] "Mean-Squared Error"
$mae
lambda.1se
0.07139904
attr(,"measure")
[1] "Mean Absolute Error"
One could do a simple optimization using something like the following, however:
alphas <- c(0, 0.2, 0.4, 0.6, 0.8, 1)
cv_ens <- lapply(alphas, function(alpha){
cv.glmnet(x=x, y=y, family="binomial", alpha=alpha, type.measure='auc')
})
aucs <- sapply(1:length(alphas), function(i){
min(cv_ens[[i]]$cvm)
})
aucs
[1] 0.6225651 0.5395982 0.5404898 0.5861872 0.5642322 0.5892308
model_to_choose <- which.max(aucs)
model_to_choose
[1] 1
selected_model <- cv_ens[[model_to_choose]]
# Usual output
plot(selected_model)
selected_model$lambda.min
[1] 0.01667203
selected_model$lambda.1se
[1] 7.051007
coefplot(selected_model, lambda='lambda.min', sort='magnitude')
coefplot(selected_model, lambda='lambda.1se', sort='magnitude')
assess.glmnet(selected_model, newx = newx, newy = newy, s='lambda.min')
$deviance
lambda.min
0.22894
attr(,"measure")
[1] "Binomial Deviance"
$class
lambda.min
0.02479632
attr(,"measure")
[1] "Misclassification Error"
$auc
[1] 0.6615381
attr(,"measure")
[1] "AUC"
$mse
lambda.min
0.04806021
attr(,"measure")
[1] "Mean-Squared Error"
$mae
lambda.min
0.07538013
attr(,"measure")
[1] "Mean Absolute Error"
assess.glmnet(selected_model, newx = newx, newy = newy, s='lambda.1se')
$deviance
lambda.1se
0.2444658
attr(,"measure")
[1] "Binomial Deviance"
$class
lambda.1se
0.02479632
attr(,"measure")
[1] "Misclassification Error"
$auc
[1] 0.6391936
attr(,"measure")
[1] "AUC"
$mse
lambda.1se
0.04872675
attr(,"measure")
[1] "Mean-Squared Error"
$mae
lambda.1se
0.07099867
attr(,"measure")
[1] "Mean Absolute Error"