Title: | Fit Regularization Path for Generalized Additive Models |
---|---|
Description: | Using overlap grouped-lasso penalties, 'gamsel' selects whether a term in a 'gam' is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families. See <doi:10.48550/arXiv.1506.03850> for more details. |
Authors: | Alexandra Chouldechova [aut], Trevor Hastie [aut, cre], Balasubramanian Narasimhan [ctb], Vitalie Spinu [ctb], Matt Wand [ctb] |
Maintainer: | Trevor Hastie <[email protected]> |
License: | GPL-2 |
Version: | 1.8-5 |
Built: | 2024-10-25 03:30:53 UTC |
Source: | https://github.com/cran/gamsel |
Using overlap grouped lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families. Key functions are gamsel, predict.gamsel, plot.gamsel, print.gamsel, summary.gamsel, cv.gamsel, plot.cv.gamsel
Alexandra Chouldechova, Trevor Hastie Maintainer: Trevor Hastie [email protected]
Useful links:
Generate basis
basis.gen(x, df = 6, thresh = 0.01, degree = 8, parms = NULL, ...)
basis.gen(x, df = 6, thresh = 0.01, degree = 8, parms = NULL, ...)
x |
A vector of values for |
df |
The degrees of freedom of the smoothing spline. |
thresh |
If the next eigenvector improves the approximation by less
than threshold, a truncated bases is returned. For |
degree |
The nominal number of basis elements. The basis returned has
no more than |
parms |
A parameter set. If included in the call, these are used to define the basis. This is used for prediction. |
... |
other arguments |
the basis
A routine for performing K-fold cross-validation for gamsel.
cv.gamsel( x, y, lambda = NULL, family = c("gaussian", "binomial"), degrees = rep(10, p), dfs = rep(5, p), bases = pseudo.bases(x, degrees, dfs, parallel = parallel, ...), type.measure = c("mse", "mae", "deviance", "class"), nfolds = 10, foldid, keep = FALSE, parallel = FALSE, ... )
cv.gamsel( x, y, lambda = NULL, family = c("gaussian", "binomial"), degrees = rep(10, p), dfs = rep(5, p), bases = pseudo.bases(x, degrees, dfs, parallel = parallel, ...), type.measure = c("mse", "mae", "deviance", "class"), nfolds = 10, foldid, keep = FALSE, parallel = FALSE, ... )
x |
|
y |
response |
lambda |
Optional use-supplied lambda sequence. If |
family |
|
degrees |
|
dfs |
|
bases |
|
type.measure |
Loss function for cross-validated error calculation.
Currently there are four options: |
nfolds |
Numer of folds (default is 10). Maximum value is |
foldid |
Optional vector of length |
keep |
If |
parallel |
If |
... |
Other arguments that can be passed to |
This function has the effect of running gamsel
nfolds
+1 times.
The initial run uses all the data and gets the lambda
sequence. The
remaining runs fit the data with each of the folds omitted in turn. The
error is accumulated, and the average error and standard deviation over the
folds is computed. Note that cv.gamsel
does NOT search for values
for gamma
. A specific value should be supplied, else gamma=.4
is assumed by default. If users would like to cross-validate gamma
as
well, they should call cv.gamsel
with a pre-computed vector
foldid
, and then use this same fold vector in separate calls to
cv.gamsel
with different values of gamma
. Note also that the
results of cv.gamsel
are random, since the folds are selected at
random. Users can reduce this randomness by running cv.gamsel
many
times, and averaging the error curves.
an object of class "cv.gamsel"
is returned, which is a list
with the ingredients of the cross-validation fit.
lambda |
the values
of |
cvm |
The mean cross-validated
error - a vector of length |
cvsd |
estimate of
standard error of |
cvup |
upper curve = |
cvlo |
lower curve = |
nzero |
number of non-zero
coefficients at each |
name |
a text string indicating type of measure (for plotting purposes). |
gamsel.fit |
a fitted gamsel object for the full data. |
lambda.min |
value of |
lambda.1se |
largest value of |
fit.preval |
if |
foldid |
if
|
index.min |
the sequence number of the minimum lambda. |
index.1se |
the sequence number of the 1se lambda value. |
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie [email protected]
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection
gamsel
, plot
function for cv.gamsel
object.
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) par(mfrow=c(1,2),mar=c(5,4,3,1)) summary(gamsel.out) gamsel.cv=cv.gamsel(X,y,bases=bases) par(mfrow=c(1,1)) plot(gamsel.cv) par(mfrow=c(3,4)) plot(gamsel.out,newx=X,index=20)
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) par(mfrow=c(1,2),mar=c(5,4,3,1)) summary(gamsel.out) gamsel.cv=cv.gamsel(X,y,bases=bases) par(mfrow=c(1,1)) plot(gamsel.cv) par(mfrow=c(3,4)) plot(gamsel.out,newx=X,index=20)
Using overlap grouped lasso penalties, gamsel selects whether a term in a gam is nonzero, linear, or a non-linear spline (up to a specified max df per variable). It fits the entire regularization path on a grid of values for the overall penalty lambda, both for gaussian and binomial families.
gamsel( x, y, num_lambda = 50, lambda = NULL, family = c("gaussian", "binomial"), degrees = rep(10, p), gamma = 0.4, dfs = rep(5, p), bases = pseudo.bases(x, degrees, dfs, parallel = parallel, ...), tol = 1e-04, max_iter = 2000, traceit = FALSE, parallel = FALSE, ... )
gamsel( x, y, num_lambda = 50, lambda = NULL, family = c("gaussian", "binomial"), degrees = rep(10, p), gamma = 0.4, dfs = rep(5, p), bases = pseudo.bases(x, degrees, dfs, parallel = parallel, ...), tol = 1e-04, max_iter = 2000, traceit = FALSE, parallel = FALSE, ... )
x |
Input (predictor) matrix of dimension |
y |
Response variable. Quantitative for |
num_lambda |
Number of |
lambda |
User-supplied |
family |
Response type. |
degrees |
An integer vector of length |
gamma |
Penalty mixing parameter |
dfs |
Numeric vector of length |
bases |
A list of orthonormal bases for the non-linear terms for each
variable. The function |
tol |
Convergence threshold for coordinate descent. The coordinate
descent loop continues until the total change in objective after a pass over
all variables is less than |
max_iter |
Maximum number of coordinate descent iterations over all the
variables for each |
traceit |
If |
parallel |
passed on to the |
... |
additional arguments passed on to |
The sequence of models along the lambda
path is fit by (block)
cordinate descent. In the case of logistic regression the fitting routine
may terminate before all num_lambda
values of lambda
have been
used. This occurs when the fraction of null deviance explained by the model
gets too close to 1, at which point the fit becomes numerically unstable.
Each of the smooth terms is computed using an approximation to the
Demmler-Reinsch smoothing spline basis for that variable, and the
accompanying diagonal pernalty matrix.
An object with S3 class gamsel
. %% If it is a LIST, use
intercept |
Intercept sequence of length |
alphas |
|
betas |
|
lambdas |
The sequence of lambda values used |
degrees |
Number of basis functions used for each variable |
parms |
A set of parameters that capture the bases used. This
allows for efficient generation of the bases elements for
|
, the predict
method for this class.
family |
|
nulldev |
Null deviance (deviance of the intercept model) |
dev.ratio |
Vector of
length |
call |
The call that produced this object |
%% ...
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie [email protected]
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection, https://arxiv.org/abs/1506.03850
predict.gamsel
, cv.gamsel
,
plot.gamsel
, summary.gamsel
,
basis.gen
,
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) par(mfrow=c(1,2),mar=c(5,4,3,1)) summary(gamsel.out) gamsel.cv=cv.gamsel(X,y,bases=bases) par(mfrow=c(1,1)) plot(gamsel.cv) par(mfrow=c(3,4)) plot(gamsel.out,newx=X,index=20) # Binomial model gamsel.out=gamsel(X,yb,family="binomial") par(mfrow=c(1,2),mar=c(5,4,3,1)) summary(gamsel.out) par(mfrow=c(3,4)) plot(gamsel.out,newx=X,index=30)
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) par(mfrow=c(1,2),mar=c(5,4,3,1)) summary(gamsel.out) gamsel.cv=cv.gamsel(X,y,bases=bases) par(mfrow=c(1,1)) plot(gamsel.cv) par(mfrow=c(3,4)) plot(gamsel.out,newx=X,index=20) # Binomial model gamsel.out=gamsel(X,yb,family="binomial") par(mfrow=c(1,2),mar=c(5,4,3,1)) summary(gamsel.out) par(mfrow=c(3,4)) plot(gamsel.out,newx=X,index=30)
Extract active variables of different kinds from a gamsel object
getActive( object, index = NULL, type = c("nonzero", "linear", "nonlinear"), EPS = 0 )
getActive( object, index = NULL, type = c("nonzero", "linear", "nonlinear"), EPS = 0 )
object |
gamsel object |
index |
index or vector of indices at which to obtain active
information. |
type |
type of active variables to report. One of |
EPS |
threshold for what is nonzero; default is 0 |
Returns a vector of variables indices of variables having the desired properties.
vector of indices
Produces a cross-validation curve with standard errors for a fitted gamsel objecty.
## S3 method for class 'cv.gamsel' plot(x, sign.lambda = 1, ...)
## S3 method for class 'cv.gamsel' plot(x, sign.lambda = 1, ...)
x |
|
sign.lambda |
Either plot against |
... |
Optional graphical parameters to plot. |
A plot showing cross-validation error is produced. Nothing is returned.
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie [email protected]
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) gamsel.cv=cv.gamsel(X,y,bases=bases) par(mfrow=c(1,1)) plot(gamsel.cv)
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) gamsel.cv=cv.gamsel(X,y,bases=bases) par(mfrow=c(1,1)) plot(gamsel.cv)
gamsel
ObjectProduces plots of the estimated functions for specified variables at a given
value of lambda
.
## S3 method for class 'gamsel' plot(x, newx, index, which = 1:p, rugplot = TRUE, ylims, ...)
## S3 method for class 'gamsel' plot(x, newx, index, which = 1:p, rugplot = TRUE, ylims, ...)
x |
Fitted |
newx |
|
index |
Index of lambda value (i.e., model) for which plotting is desired. |
which |
Which values to plot. Default is all variables, i.e.
|
rugplot |
If |
ylims |
|
... |
Optional graphical parameters to plot. |
A plot of the specified fitted functions is produced. Nothing is returned.
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie [email protected]
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection
gamsel
, and print.gamsel
, summary.gamsel
##set.seed(1211) ##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) par(mfrow=c(3,4)) plot(gamsel.out,newx=X,index=20)
##set.seed(1211) ##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) par(mfrow=c(3,4)) plot(gamsel.out,newx=X,index=20)
Make predictions from a gamsel
object.
## S3 method for class 'gamsel' predict( object, newdata, index = NULL, type = c("link", "response", "terms", "nonzero"), ... )
## S3 method for class 'gamsel' predict( object, newdata, index = NULL, type = c("link", "response", "terms", "nonzero"), ... )
object |
Fitted |
newdata |
|
index |
Index of model in the sequence for which plotting is desired. Note, this is NOT a lambda value. |
type |
Type of prediction desired. Type |
... |
Not used |
Either a vector aor a matrix is returned, depending on type
.
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie [email protected]
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection
gamsel
, cv.gamsel
,
summary.gamsel
, basis.gen
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) preds=predict(gamsel.out,X,index=20,type="terms")
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) preds=predict(gamsel.out,X,index=20,type="terms")
Print a summary of the gamsel path at each step along the path
## S3 method for class 'gamsel' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'gamsel' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
fitted gamsel object |
digits |
significant digits in printout |
... |
additional print arguments |
The call that produced the object x
is printed, followed by a
five-column matrix with columns NonZero
, Lin
, NonLin
, %Dev
and Lambda
. The first three columns say how many nonzero, linear
and nonlinear terms there are. %Dev
is the percent deviance
explained (relative to the null deviance).
The matrix above is silently returned
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie [email protected]
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection
predict.gamsel, cv.gamsel, plot.gamsel, summary.gamsel, basis.gen
Generate an approximation to the Demmler-Reinsch orthonormal bases for
smoothing splines, using orthogonal polynomials. basis.gen
generates
a basis for a single x
, and pseudo.bases
generates a list of
bases for each column of the matrix x
.
pseudo.bases(x, degree = 8, df = 6, parallel = FALSE, ...)
pseudo.bases(x, degree = 8, df = 6, parallel = FALSE, ...)
x |
A vector of values for |
degree |
The nominal number of basis elements. The basis returned has
no more than |
df |
The degrees of freedom of the smoothing spline. |
parallel |
if TRUE, parallelize |
... |
other arguments for |
basis.gen
starts with a basis of orthogonal polynomials of total
degree
. These are each smoothed using a smoothing spline, which
allows for a one-step approximation to the Demmler-Reinsch basis for a
smoothing spline of rank equal to the degree. See the reference for details.
The function also approximates the appropriate diagonal penalty matrix for
this basis, so that the a approximate smoothing spline (generalized ridge
regression) has the target df.
An orthonormal basis is returned (a list for pseudo.bases
).
This has an attribute parms
, which has elements
coefs
Coefficients needed to generate the orthogonal polynomials
rotate
Transformation matrix for transforming the polynomial basis
d
penalty values for the diagonal penalty df
df used
degree
number of columns
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie [email protected]
T. Hastie Pseudosplines. (1996) JRSSB 58(2), 379-396.
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model
Selection
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) ## Not run: require(doMC) registerDoMC(cores=4) bases=pseudo.bases(X,degree=10,df=6,parallel=TRUE) ## End(Not run)
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) ## Not run: require(doMC) registerDoMC(cores=4) bases=pseudo.bases(X,degree=10,df=6,parallel=TRUE) ## End(Not run)
This makes a two-panel plot of the gamsel object.
## S3 method for class 'gamsel' summary(object, label = FALSE, ...)
## S3 method for class 'gamsel' summary(object, label = FALSE, ...)
object |
|
label |
if |
... |
additional arguments to summary |
A two panel plot is produced, that summarizes the linear components and the nonlinear components, as a function of lambda. For the linear components, it is the coefficient for each variable. For the nonlinear, we see the norm of the nonlinear coefficients.
Nothing is returned.
Alexandra Chouldechova and Trevor Hastie
Maintainer: Trevor
Hastie [email protected]
Chouldechova, A. and Hastie, T. (2015) Generalized Additive Model Selection
gamsel
, and methods plot
, print
and
predict
for cv.gamsel
object.
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) par(mfrow=c(1,2),mar=c(5,4,3,1)) summary(gamsel.out)
##data=gamsel:::gendata(n=500,p=12,k.lin=3,k.nonlin=3,deg=8,sigma=0.5) data = readRDS(system.file("extdata/gamsel_example.RDS", package = "gamsel")) attach(data) bases=pseudo.bases(X,degree=10,df=6) # Gaussian gam gamsel.out=gamsel(X,y,bases=bases) par(mfrow=c(1,2),mar=c(5,4,3,1)) summary(gamsel.out)