| Title: | Univariate-Guided Sparse Regression |
|---|---|
| Description: | Fit a univariate-guided sparse regression (lasso), by a two-stage procedure. The first stage fits p separate univariate models to the response. The second stage gives more weight to the more important univariate features, and preserves their signs. Conveniently, it returns an objects that inherits from class 'glmnet', so that all of the methods for 'glmnet' are available. See Chatterjee, Hastie and Tibshirani (2025) <doi:10.1162/99608f92.c79ff6db> for details. |
| Authors: | Trevor Hastie [aut, cre], Rob Tibshirani [aut], Sourav Chatterjee [aut] |
| Maintainer: | Trevor Hastie <[email protected]> |
| License: | GPL-2 |
| Version: | 2.11 |
| Built: | 2026-06-03 07:19:06 UTC |
| Source: | https://github.com/trevorhastie/unilasso |
Fit a univariate-guided sparse regression (lasso), by a two-stage procedure.
The first stage fits p separate univariate models to the response. The second stage gives more weight to the more important univariate features, and preserves their signs.
Conveniently, it returns an objects that inherits from glmnet, so that
all of the methods for glmnet can be applied, such as predict, plot, coef andprint.
Fit a univariate-guided regression, by a two-stage procedure.
The first stage fits p separate univariate models to the response. The second stage fits a regression model, preserving the univariate signs.
ci.uniReg( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, B = 500, alpha = 0.05, ... ) uniLasso( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, loo = TRUE, lower.limits = 0, standardize = FALSE, info = NULL, loob.nit = 2, loob.ridge = 0, loob.eps = 1e-06, ... ) uniReg( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, loo = TRUE, lower.limits = 0, standardize = FALSE, info = NULL, loob.nit = 2, loob.ridge = 0, loob.eps = 1e-04, hard.zero = TRUE, ... )ci.uniReg( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, B = 500, alpha = 0.05, ... ) uniLasso( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, loo = TRUE, lower.limits = 0, standardize = FALSE, info = NULL, loob.nit = 2, loob.ridge = 0, loob.eps = 1e-06, ... ) uniReg( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, loo = TRUE, lower.limits = 0, standardize = FALSE, info = NULL, loob.nit = 2, loob.ridge = 0, loob.eps = 1e-04, hard.zero = TRUE, ... )
x |
Input matrix, of dimension |
y |
Response variable. Quantitative for |
family |
one of "gaussian","binomial" or "cox". Currently only these families are implemented. In the future others will be added. |
weights |
optional vector of non-negative weights, default is NULL which results in all weights = 1. |
B |
Number of bootstrap samples. Default is 500. |
alpha |
size of confidence interval. |
loo |
TRUE (the default) means that uniLasso uses the prevalidated loo fits (approximate loo or 'alo' for "binomial" and "cox") for each univariate model as features to avoid overfitting.
|
lower.limits |
= 0 (default) means that uniLasso constrains the sign of the coefs produced in the second round to be the same as those in the univariate fits. (Since uniLasso uses the univariate fits as features, a positivity constraint at the second stage is equivalent.) |
standardize |
input argument to glmnet for final non-negative lasso fit. Strongly recommend |
info |
Users can supply results of |
loob.nit |
Number of Newton iterations for GLM or Cox in computing univariate linear predictors. Default is 2. |
loob.ridge |
A nonnegative number to apply ridge penalization to the slope parameters. This is helpful if some of the variables are near constant or have very small standard deviations. Default is 0.0. |
loob.eps |
A small number used in regularizing the Hessian for the Cox model. Default is 1e-6. |
hard.zero |
if |
... |
additional arguments passed to |
Fits a two stage lasso model. First stage replaces each feature by the univariate fit for that feature. Second stage fits a (positive) lasso using the first stage features (which preserves the signs of the first stage model). Hence the second stage selects and modifies the coefficients of the first stage model, similar to the adaptive lasso. Leads to sparser and more interpretable models.
For "binomial" family y is a binary response.
For "cox" family, y should be a Surv object for right censored data,
or a matrix with columns labeled 'time' and 'status'
Although glmnet has more flexible options say for binary responses, and for cox
responses, these are not yet implemented (but are possible and will appear in future versions).
Likewise, other glm families are possible as well, but not yet implemented.
loo = TRUE means it uses the prevalidated loo fits (approximate loo or 'alo' for binomial and cox) for each univariate model as features to avoid overfitting in the second stage. The coefficients are then multiplied into the original univariate coefficients to get the final model.
loo = FALSE means it uses the univariate fitted predictor, and hence it is a form of adaptive lasso, but tends to overfit.
lower.limits = 0 means uniLasso constrains the sign of the coefs in the second round to be that of the univariate fits.
An object that inherits from "glmnet". There is one additional parameter returned, which is info and has two components.
They are beta0 and beta, the intercepts and slopes for the usual (non-LOO) univariate fits from stage 1.
# ci.uniReg usage sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma ci <- ci.uniReg(x, y, B=100) print(ci) # uniLasso usage # Default uniLasso usage for Gaussian data sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma xtest=matrix(rnorm(n * p), n, p) ytest <- xtest %*% beta + rnorm(n)*sigma fit <- uniLasso(x, y) plot(fit) predict(fit,xtest[1:10,],s=1) #predict on test data # Two-stage variation where we carve off a small dataset for computing the univariate coefs. cset=1:20 info = uniInfo(x[cset,],y[cset]) fit_two_stage <- uniLasso(x[-cset,], y[-cset], info = info) plot(fit_two_stage) # Binomial response uniLasso yb =as.numeric(y>0) fitb = uniLasso(x, y) predict(fitb, xtest[1:10,], s=1, type="response") # uniLasso with same positivity constraints, but starting `beta` # from univariate fits on the same data. With loo=FALSE, does not tend to do as well, # probably due to overfitting. fit_pos_adapt <- uniLasso(x, y, loo = FALSE) plot(fit_pos_adapt) # uniLasso with no constraints, but starting `beta` from univariate fits. # This is a version of the adaptive lasso, which tends to overfit, and loses interpretability. fit_adapt <- uniLasso(x, y, loo = FALSE, lower.limits = -Inf) plot(fit_adapt) # Cox response uniLasso set.seed(10101) N = 1000 p = 30 nzc = p/3 x = matrix(rnorm(N * p), N, p) beta = rnorm(nzc) fx = x[, seq(nzc)] %*% beta/3 hx = exp(fx) ty = rexp(N, hx) tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator y = cbind(time = ty, status = 1 - tcens) # y=Surv(ty,1-tcens) with library(survival) fitc = uniLasso(x, y, family = "cox") plot(fitc) # uniReg usage sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma xtest=matrix(rnorm(n * p), n, p) fit <- uniReg(x, y) predict(fit,xtest[1:10,]) #predict on test data coef(fit) print(fit) fita <- uniReg(x, y, hard.zero = FALSE) print(fita) fitb <- uniReg(x, y>0, family = "binomial") coef(fitb) print(fitb)# ci.uniReg usage sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma ci <- ci.uniReg(x, y, B=100) print(ci) # uniLasso usage # Default uniLasso usage for Gaussian data sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma xtest=matrix(rnorm(n * p), n, p) ytest <- xtest %*% beta + rnorm(n)*sigma fit <- uniLasso(x, y) plot(fit) predict(fit,xtest[1:10,],s=1) #predict on test data # Two-stage variation where we carve off a small dataset for computing the univariate coefs. cset=1:20 info = uniInfo(x[cset,],y[cset]) fit_two_stage <- uniLasso(x[-cset,], y[-cset], info = info) plot(fit_two_stage) # Binomial response uniLasso yb =as.numeric(y>0) fitb = uniLasso(x, y) predict(fitb, xtest[1:10,], s=1, type="response") # uniLasso with same positivity constraints, but starting `beta` # from univariate fits on the same data. With loo=FALSE, does not tend to do as well, # probably due to overfitting. fit_pos_adapt <- uniLasso(x, y, loo = FALSE) plot(fit_pos_adapt) # uniLasso with no constraints, but starting `beta` from univariate fits. # This is a version of the adaptive lasso, which tends to overfit, and loses interpretability. fit_adapt <- uniLasso(x, y, loo = FALSE, lower.limits = -Inf) plot(fit_adapt) # Cox response uniLasso set.seed(10101) N = 1000 p = 30 nzc = p/3 x = matrix(rnorm(N * p), N, p) beta = rnorm(nzc) fx = x[, seq(nzc)] %*% beta/3 hx = exp(fx) ty = rexp(N, hx) tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator y = cbind(time = ty, status = 1 - tcens) # y=Surv(ty,1-tcens) with library(survival) fitc = uniLasso(x, y, family = "cox") plot(fitc) # uniReg usage sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma xtest=matrix(rnorm(n * p), n, p) fit <- uniReg(x, y) predict(fit,xtest[1:10,]) #predict on test data coef(fit) print(fit) fita <- uniReg(x, y, hard.zero = FALSE) print(fita) fitb <- uniReg(x, y>0, family = "binomial") coef(fitb) print(fitb)
Fit a univariate-guided sparse regression uniLasso model using cross-validation to select the second stage lasso penalty parameter. Conveniently, it returns an object that inherits from cv.glmnet, so that
all of the methods for cv.glmnet can be applied, such as predict, plot, coef, print,
and assess.glmnet.
Fit a cross-validated univariate-guided sparse regression uniLasso model, with a focus on the end of the path which corresponds to the uniReg fit. Conveniently, it returns an object that inherits from cv.glmnet,and methods such as predict, plot, coef, print all gives sensible results.
cv.uniLasso( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, loo = TRUE, lower.limits = 0, standardize = FALSE, info = NULL, loob.nit = 2, loob.ridge = 0, loob.eps = 1e-06, ... ) cv.uniReg( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, loo = TRUE, lower.limits = 0, standardize = FALSE, info = NULL, loob.nit = 2, loob.ridge = 0, loob.eps = 1e-06, hard.zero = TRUE, ... )cv.uniLasso( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, loo = TRUE, lower.limits = 0, standardize = FALSE, info = NULL, loob.nit = 2, loob.ridge = 0, loob.eps = 1e-06, ... ) cv.uniReg( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, loo = TRUE, lower.limits = 0, standardize = FALSE, info = NULL, loob.nit = 2, loob.ridge = 0, loob.eps = 1e-06, hard.zero = TRUE, ... )
x |
Input matrix, of dimension |
y |
Response variable. Quantitative for |
family |
one of "gaussian","binomial" or "cox". Currently only these families are implemented. In the future others will be added. |
weights |
optional vector of non-negative weights, default is NULL which results in all weights = 1. |
loo |
TRUE (the default) means that uniLasso uses the prevalidated loo fits (approximate loo or 'alo' for "binomial" and "cox") for each univariate model as features to avoid overfitting.
|
lower.limits |
= 0 (default) means that uniLasso constrains the sign of the coefs in the second round to be that of the univariate fits. |
standardize |
input argument to glmnet for final non-negative lasso fit. Strongly recommend |
info |
Users can supply results of |
loob.nit |
Number of Newton iterations for GLM or Cox in computing univariate linear predictors. Default is 2. |
loob.ridge |
A nonnegative number to apply ridge penalization to the slope parameters. This is helpful if some of the variables are near constant or have very small standard deviations. Default is 0.0. |
loob.eps |
A small number used in regularizing the Hessian for the Cox model. Default is 0.0001. |
hard.zero |
if |
... |
additional arguments passed to |
Fits a two stage lasso model and selects the penalty parameter by cross validation. First stage replaces each feature by the univariate fit for that feature. Second stage fits a (positive) lasso using the first stage features. Hence the second stage selects and modifies the coefficients of the first stage model, similar to the adaptive lasso. Leads to potentially sparser models.
For "binomial" family y is a binary response.
For "cox" family, y should be a Surv object for right censored data,
or a matrix with columns labeled 'time' and 'status'
Although glmnet has more flexible options say for binary responses, and for cox
responses, these are not yet implemented (but are possible and will appear in future versions).
Likewise, other glm families are possible as well, but not yet implemented.
This is a one-visit function that returns a 'cv.glmnet' object.
You can make predictions from the whole path, at 'lambda.min' etc just like you can for
a 'cv.glmnet object'.
loo = TRUE means it uses the prevalidated loo fits (approximate loo or 'alo' for binomial and cox) for each univariate model as features to avoid overfitting in the second stage. The coefficients are then multiplied into the original univariate coefficients to get the final model.
loo = FALSE means it uses the univariate fitted predictor, and hence it is a form of adaptive lasso, but tends to overfit.
lower.limits = 0 means uniLasso constrains the sign of the coefs in the second round to be that of the univariate fits.
An object that inherits from class "cv.glmnet". There is one additional parameter returned, which is info and has two components.
They are beta0 and beta, the intercepts and slopes for the usual (non-LOO) univariate fits from stage 1.
# cv.uniLasso examples # Default usage with Gaussian data sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma xtest=matrix(rnorm(n * p), n, p) ytest <- xtest %*% beta + rnorm(n)*sigma cvfit <- cv.uniLasso(x, y) plot(cvfit) predict(cvfit,xtest[1:10,], s="lambda.min") # predict at some test data points # Two-stage variation where we carve off a small dataset for computing the univariate coefs. cset=1:20 info = uniInfo(x[cset,],y[cset]) cvfit_two_stage <- cv.uniLasso(x[-cset,], y[-cset], info = info) plot(cvfit_two_stage) # Binomial response cv.uniLasso yb =as.numeric(y>0) cvfitb = cv.uniLasso(x, yb, family="binomial") predict(cvfitb, xtest[1:10,], type="response") # predict at default s = "lambda.1se" # cv.uniLasso with same positivity constraints, but starting `beta` # from univariate fits on the same data. With loo=FALSE, does not tend to do as well, # probably due to overfitting. cvfit_pos_adapt <- cv.uniLasso(x, y, loo = FALSE) plot(cvfit_pos_adapt) # cv.uniLasso with no constraints, but starting `beta` from univariate fits. # This is a version of the adaptive lasso, which tends to overfit, and loses interpretability. cvfit_adapt <- cv.uniLasso(x, y, loo = FALSE, lower.limits = -Inf) plot(cvfit_adapt) # Cox response cv.uniLasso set.seed(10101) N = 1000 p = 30 nzc = p/3 x = matrix(rnorm(N * p), N, p) beta = rnorm(nzc) fx = x[, seq(nzc)] %*% beta/3 hx = exp(fx) ty = rexp(N, hx) tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator y = cbind(time = ty, status = 1 - tcens) # y=Surv(ty,1-tcens) with library(survival) cvfitc = cv.uniLasso(x, y, family = "cox") plot(cvfitc) # cv.uniReg usage sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma xtest=matrix(rnorm(n * p), n, p) fit <- cv.uniReg(x, y) plot(fit) coef(fit) print(fit) predict(fit,xtest[1:10,]) #predict on test data fita <- cv.uniReg(x, y, hard.zero = FALSE) plot(fita) print(fita) fitb <- cv.uniReg(x, y>0, family = "binomial") plot(fitb) print(fitb)# cv.uniLasso examples # Default usage with Gaussian data sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma xtest=matrix(rnorm(n * p), n, p) ytest <- xtest %*% beta + rnorm(n)*sigma cvfit <- cv.uniLasso(x, y) plot(cvfit) predict(cvfit,xtest[1:10,], s="lambda.min") # predict at some test data points # Two-stage variation where we carve off a small dataset for computing the univariate coefs. cset=1:20 info = uniInfo(x[cset,],y[cset]) cvfit_two_stage <- cv.uniLasso(x[-cset,], y[-cset], info = info) plot(cvfit_two_stage) # Binomial response cv.uniLasso yb =as.numeric(y>0) cvfitb = cv.uniLasso(x, yb, family="binomial") predict(cvfitb, xtest[1:10,], type="response") # predict at default s = "lambda.1se" # cv.uniLasso with same positivity constraints, but starting `beta` # from univariate fits on the same data. With loo=FALSE, does not tend to do as well, # probably due to overfitting. cvfit_pos_adapt <- cv.uniLasso(x, y, loo = FALSE) plot(cvfit_pos_adapt) # cv.uniLasso with no constraints, but starting `beta` from univariate fits. # This is a version of the adaptive lasso, which tends to overfit, and loses interpretability. cvfit_adapt <- cv.uniLasso(x, y, loo = FALSE, lower.limits = -Inf) plot(cvfit_adapt) # Cox response cv.uniLasso set.seed(10101) N = 1000 p = 30 nzc = p/3 x = matrix(rnorm(N * p), N, p) beta = rnorm(nzc) fx = x[, seq(nzc)] %*% beta/3 hx = exp(fx) ty = rexp(N, hx) tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator y = cbind(time = ty, status = 1 - tcens) # y=Surv(ty,1-tcens) with library(survival) cvfitc = cv.uniLasso(x, y, family = "cox") plot(cvfitc) # cv.uniReg usage sigma =3 set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma xtest=matrix(rnorm(n * p), n, p) fit <- cv.uniReg(x, y) plot(fit) coef(fit) print(fit) predict(fit,xtest[1:10,]) #predict on test data fita <- cv.uniReg(x, y, hard.zero = FALSE) plot(fita) print(fita) fitb <- cv.uniReg(x, y>0, family = "binomial") plot(fitb) print(fitb)
Plots the cross-validation cv.uniLasso curve (which is a cv.glmnet curve), and upper and lower standard deviation
curves, as a function of the lambda values used. It highlights the value at the end of the path, which is either lambda = 0, or else the smallest lambda if hard.zero = FALSE.
## S3 method for class 'cv.uniReg' plot(x, ...)## S3 method for class 'cv.uniReg' plot(x, ...)
x |
fitted |
... |
Other graphical parameters to plot |
A plot is produced, and nothing is returned.
A plot is produced, and nothing is returned.
Trevor Hastie and Rob Tibshirani
Maintainer:
Trevor Hastie [email protected]
cv.uniLasso and glmnet:::cv.glmnet.
set.seed(1010) n = 1000 p = 100 nzc = trunc(p/10) x = matrix(rnorm(n * p), n, p) beta = rnorm(nzc) fx = (x[, seq(nzc)] %*% beta) eps = rnorm(n) * 5 y = drop(fx + eps) cvob0 = cv.uniReg(x, y) plot(cvob0) cvob = cv.uniReg(x, y, hard.zero = FALSE) plot(cvob)set.seed(1010) n = 1000 p = 100 nzc = trunc(p/10) x = matrix(rnorm(n * p), n, p) beta = rnorm(nzc) fx = (x[, seq(nzc)] %*% beta) eps = rnorm(n) * 5 y = drop(fx + eps) cvob0 = cv.uniReg(x, y) plot(cvob0) cvob = cv.uniReg(x, y, hard.zero = FALSE) plot(cvob)
This function has two stages. In the first we fit a
univariate-guided sparse regression uniLasso model using
cross-validation to select the lasso penalty
parameter (using s = "lambda.min"). In the second stage, we use the predictions from this chosen
model as an offset, and fit a cross-validated unrestricted lasso
model. For squared error loss, this means we post-fit a lasso model
to the residuals. Conveniently, it returns an object that inherits
from cv.glmnet, in which the two models are stitched
together. What this means is that the chosen coefficients from the
first model are added to the coefficients from the second, and other related components are updated as well.
This means at predict time we do not have to fiddle with offsets. All
of the methods for cv.glmnet can be applied, such as
predict, plot, coef, print, and assess.glmnet.
polish.uniLasso( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, ... )polish.uniLasso( x, y, family = c("gaussian", "binomial", "cox"), weights = NULL, ... )
x |
Input matrix, of dimension |
y |
Response variable. Quantitative for |
family |
one of "gaussian","binomial" or "cox". Currently only these families are implemented. In the future others will be added. |
weights |
optional vector of non-negative weights, default is NULL which results in all weights = 1. |
... |
additional arguments passed to |
An object of class "polish.unilasso" that inherits
from class "cv.glmnet". The "glmnet.fit" is the
stitched second-stage model, from which predictions are
made. An additional component named "cv.uniLasso" is the
first stage model.
# Gaussian data, p=1000, n=300, SNR=1 "medium SNR" # use the built-in simulate function to create Gaussian data set.seed(101) data <- simulate_uniLasso("medium-SNR") attach(data) # has components "x","y","xtest","ytest","mutest","sigma" pfit <- polish.uniLasso(x,y) plot(pfit) pred <- predict(pfit, newx = xtest, s = "lambda.min") # ie predict from a "cv.glmnet" object. mean((ytest-pred)^2) # test error print(pfit) print(pfit$glmnet.fit) plot(pfit$glmnet.fit) # coefficient plot of the second stage plot(pfit$cv.uniLasso) # cv.glmnet plot of the first stage plot(pfit$cv.uniLasso$glmnet.fit) # coefficient plot of the first stage # Binomial response yb =as.numeric(y>0) pfitb = polish.uniLasso(x, yb, family="binomial") predict(pfitb, xtest[1:10,], type="response") # predict at default s = "lambda.1se" plot(pfitb) plot(pfitb$glmnet.fit) # plot second stage lasso coefficient path plot(pfitb$cv.uniLasso) # plot first stage cv.uniLasso results # Cox response set.seed(10101) N = 1000 p = 30 nzc = p/3 x = matrix(rnorm(N * p), N, p) beta = rnorm(nzc) fx = x[, seq(nzc)] %*% beta/3 hx = exp(fx) ty = rexp(N, hx) tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator y = cbind(time = ty, status = 1 - tcens) # y=Surv(ty,1-tcens) with library(survival) pfitc = polish.uniLasso(x, y, family = "cox") plot(pfitc) plot(pfitc$cv.uniLasso)# Gaussian data, p=1000, n=300, SNR=1 "medium SNR" # use the built-in simulate function to create Gaussian data set.seed(101) data <- simulate_uniLasso("medium-SNR") attach(data) # has components "x","y","xtest","ytest","mutest","sigma" pfit <- polish.uniLasso(x,y) plot(pfit) pred <- predict(pfit, newx = xtest, s = "lambda.min") # ie predict from a "cv.glmnet" object. mean((ytest-pred)^2) # test error print(pfit) print(pfit$glmnet.fit) plot(pfit$glmnet.fit) # coefficient plot of the second stage plot(pfit$cv.uniLasso) # cv.glmnet plot of the first stage plot(pfit$cv.uniLasso$glmnet.fit) # coefficient plot of the first stage # Binomial response yb =as.numeric(y>0) pfitb = polish.uniLasso(x, yb, family="binomial") predict(pfitb, xtest[1:10,], type="response") # predict at default s = "lambda.1se" plot(pfitb) plot(pfitb$glmnet.fit) # plot second stage lasso coefficient path plot(pfitb$cv.uniLasso) # plot first stage cv.uniLasso results # Cox response set.seed(10101) N = 1000 p = 30 nzc = p/3 x = matrix(rnorm(N * p), N, p) beta = rnorm(nzc) fx = x[, seq(nzc)] %*% beta/3 hx = exp(fx) ty = rexp(N, hx) tcens = rbinom(n = N, prob = 0.3, size = 1) # censoring indicator y = cbind(time = ty, status = 1 - tcens) # y=Surv(ty,1-tcens) with library(survival) pfitc = polish.uniLasso(x, y, family = "cox") plot(pfitc) plot(pfitc$cv.uniLasso)
This function makes predictions from a cross-validated uniReg model, using
the stored "glmnet.fit" object, and by default the smallest value of lambda used.
## S3 method for class 'cv.uniReg' predict(object, newx, s = c("zero", "lambda.1se", "lambda.min"), ...)## S3 method for class 'cv.uniReg' predict(object, newx, s = c("zero", "lambda.1se", "lambda.min"), ...)
object |
Fitted |
newx |
Matrix of new values for |
s |
Value(s) of the penalty parameter |
... |
Not used. Other arguments to predict. |
This function makes it easier to use the results of cross-validation to make a prediction.
The object returned depends on the ... argument which is passed
on to the predict method for glmnet objects.
Trevor Hastie and Rob Tibshirani
Maintainer:
Trevor Hastie [email protected]
print, and coef methods, and
cv.uniReg.
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) cv.fit = cv.uniReg(x, y) predict(cv.fit, newx = x[1:5, ]) coef(cv.fit)x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) cv.fit = cv.uniReg(x, y) predict(cv.fit, newx = x[1:5, ]) coef(cv.fit)
Print a summary of the results of cross-validation for a uniReg model.
## S3 method for class 'cv.uniReg' print(x, digits = max(3, getOption("digits") - 3), ...)## S3 method for class 'cv.uniReg' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
fitted 'cv.uniReg' object |
digits |
significant digits in printout |
... |
additional print arguments |
A summary of the cross-validated uniReg fit is produced. This is an augmented summary of a cv.glmnet object, with an extra row corresponding to the smallest lambda in the path
A summary is printed, and nothing is returned.
Trevor Hastie and Rob Tibshirani
Maintainer:
Trevor Hastie [email protected]
x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit1 = cv.uniReg(x, y) print(fit1)x = matrix(rnorm(100 * 20), 100, 20) y = rnorm(100) fit1 = cv.uniReg(x, y) print(fit1)
A particular counterexample where the first two features are strongly positively correlated, yet they have coefficients of opposite sign in a multiple regression.
simulate_counterexample(ntrain, ntest)simulate_counterexample(ntrain, ntest)
ntrain |
number of training examples. |
ntest |
number of test examples. |
a list with components "x", "y", "xtest", "ytest", "mutest", and "sigma", where "mutest" is the true test mean, and "ytest <- mutest + rnorm(ntest)*sigma."
dat = simulate_counterexample(300,3000) fit = cv.uniLasso(dat$x, dat$y) err = mean( (predict(fit, dat$xtest,s="lambda.min")- dat$mutest)^2)dat = simulate_counterexample(300,3000) fit = cv.uniLasso(dat$x, dat$y) err = mean( (predict(fit, dat$xtest,s="lambda.min")- dat$mutest)^2)
A simulator that builds a training and test set with particular characteristics, as used in our "uniLasso" paper.
simulate_Gaussian( ntrain = 300, ntest = 3000, p = 1000, snr = 1, rho = 0.8, sparsity = 0.1, homecourt = FALSE )simulate_Gaussian( ntrain = 300, ntest = 3000, p = 1000, snr = 1, rho = 0.8, sparsity = 0.1, homecourt = FALSE )
ntrain |
number of training examples. |
ntest |
number of test examples. |
p |
number of features. |
snr |
desired SNR (signal-to-noise ratio). |
rho |
for |
sparsity |
fraction of variables with nonzero coefficients. |
homecourt |
logical; if |
a list with components "x", "y", "xtest", "ytest", "mutest", and "sigma", where "mutest" is the true test mean, and "ytest <- mutest + rnorm(ntest)*sigma."
dat = simulate_Gaussian(300,3000,p=500,snr=1.2) fit = cv.uniLasso(dat$x, dat$y) mse = mean( (predict(fit, dat$xtest)- dat$mutest)^2)dat = simulate_Gaussian(300,3000,p=500,snr=1.2) fit = cv.uniLasso(dat$x, dat$y) mse = mean( (predict(fit, dat$xtest)- dat$mutest)^2)
simulate two class data
simulate_twoclass(ntrain, ntest, wide = TRUE)simulate_twoclass(ntrain, ntest, wide = TRUE)
ntrain |
number of training examples. |
ntest |
number of test examples. |
wide |
logical. If TRUE |
a list with components "x", "y", "xtest", "ytest", "mutest", and "sigma", where "mutest" is the true test mean, and "ytest <- mutest + rnorm(ntest)*sigma."
dat = simulate_twoclass(300,3000) fit = cv.uniLasso(dat$x, dat$y, family="binomial") misclass = mean( sign(predict(fit, dat$xtest,s="lambda.min"))== sign(dat$ytest-0.5))dat = simulate_twoclass(300,3000) fit = cv.uniLasso(dat$x, dat$y, family="binomial") misclass = mean( sign(predict(fit, dat$xtest,s="lambda.min"))== sign(dat$ytest-0.5))
We use some standard examples in our uniLasso paper, and for convenience we provide generators for these datasets.
simulate_uniLasso( example = c("low-SNR", "medium-SNR", "high-SNR", "home-court", "two-class", "counter-example"), wide = TRUE )simulate_uniLasso( example = c("low-SNR", "medium-SNR", "high-SNR", "home-court", "two-class", "counter-example"), wide = TRUE )
example |
which of the prepackaged examples to use. Choices are "low-SNR","medium-SNR","high-SNR","home-court","two-class","counter-example", as described in the uniLasso paper. The three SNRs used are 0.5 (low), 1.0 (medium) and 2.0 (high) (also used for home-court). The training sizes for the first four are 300, and test sizes 3000. |
wide |
logical variable which determines if p>n (default, 1000) or not (100). This function calls worker functions simulate_gaussian(), simulate_two-class(), and simulate_counterexample(), which are currently not documented. |
a list with components "x", "y", "xtest", "ytest", "mutest", and "sigma", where "mutest" is the true test mean, and "ytest <- mutest + rnorm(nrow(xtest))*sigma."
dat = simulate_uniLasso("high-SNR") fit = cv.uniLasso(dat$x, dat$y) mse = mean( (predict(fit, dat$xtest)- dat$mutest)^2)dat = simulate_uniLasso("high-SNR") fit = cv.uniLasso(dat$x, dat$y) mse = mean( (predict(fit, dat$xtest)- dat$mutest)^2)
For a cv.uniLasso object, compare the CV-selected nonzero coefficients to their univariate counterparts. Also works for a cv.glmnet object.
uniCoef(cv.object, info = NULL, s = c("lambda.min", "lambda.1se"), ...)uniCoef(cv.object, info = NULL, s = c("lambda.min", "lambda.1se"), ...)
cv.object |
a |
info |
the result of a call to |
s |
the value of lambda to be used, with default |
... |
other arguments to |
a three-columns data frame with the second column being the non-zero coefficients from the cv.object, the first the corresponding univariate coefficients, and the third an indication if there was a sign change.
@examples
sigma <- 3 set.seed(1) n <- 100 p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma
cvfit <- cv.uniLasso(x, y) uniCoef(cvfit) cvfit2 <- cv.glmnet(x,y) uniCoef(cvfit2, info=cvfit$info)
Fit p separate univariate fits, and if requested computes the loo fit matrix F.
It is called internally by uniLasso, or can be called externally on separate data and passed as input to uniLasso.
Currently this function can accommodate "gaussian", "binomial", and "Cox" families.
uniInfo( X, y, family = c("gaussian", "binomial", "cox"), weights = NULL, nit = 2, loo = FALSE, ridge = 0, eps = 1e-06 )uniInfo( X, y, family = c("gaussian", "binomial", "cox"), weights = NULL, nit = 2, loo = FALSE, ridge = 0, eps = 1e-06 )
X |
An n x p feature matrix |
y |
A response object, depending on the family. For "gaussian" it is just a response vector. For "binomial" either a binary vector, a two level factor, or a two column non-negative matrix with rows summing to 1. For "cox" it is a Surv object (currently for right censored data). |
family |
one of "gaussian","binomial" or "cox". Currently only these families are implemented. In the future others will be added. |
weights |
Vector of non-negative weights. Default is NULL, which results in all weights equal to 1. |
nit |
Number of iterations if Newton steps are required (in "binomial" and "cox"). Default is 2. In principal more is better, but in some cases can run into convergence issues. |
loo |
A logical, default=FALSE. If TRUE it computes the matrix of loo fits F. |
ridge |
A positive number that penalizes the square of the slope parameters. This is useful if some of the variables are nearly constant, or have very small variances. Default is 0.0. |
eps |
A small number to regularize the hessian for "cox"; default is 1e-6. |
a list with components $beta and $beta0, and if loo=TRUE, a n x p matrix F with the loo fits.
# Gaussian model set.seed(1) sigma=3 n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma info = uniInfo(x,y) names(info) yb = as.numeric(y>0) info = uniInfo(x,yb, family = "binomial", loo = TRUE) names(info)# Gaussian model set.seed(1) sigma=3 n <- 100; p <- 20 x <- matrix(rnorm(n * p), n, p) beta <- matrix(c(rep(2, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n)*sigma info = uniInfo(x,y) names(info) yb = as.numeric(y>0) info = uniInfo(x,yb, family = "binomial", loo = TRUE) names(info)