glmnet is an R package for fitting the lasso and
elastic-net regularization paths of generalized linear models and
related regression problems. It has been on CRAN since 2008. Over the
intervening years it has grown from a single coordinate-descent Fortran
kernel for Gaussian, binomial, multinomial, Poisson, and Cox responses
into a layered system in which any GLM family can be fitted, in
which large-scale genomic problems are tractable, and in which every
numerical kernel is now written in modern C++. The package reached its
current shape in identifiable steps, each due to contributions from
several people.
This vignette is a history of those steps and an acknowledgement of
the contributions. While some historical information is in
NEWS.md, it misses the details. We hope that this document
goes beyond a listing of enhancements and bug fixes to explain why the
package has its current structure, and attributes contributions
accurately with references to published work where appropriate.
The statistical backbone of glmnet is older than the
package. The lasso was introduced in Tibshirani
(1996) and the elastic-net penalty,
which combines \(\ell_1\) and \(\ell_2\) regularization and smooths the
variable selection behavior of the lasso in the presence of correlated
features, was introduced in Zou and Hastie (2005). The algorithmic
insight, worked out in Friedman et al. (2007), that cyclical coordinate descent
is extraordinarily effective when combined with warm starts along a
decreasing sequence of regularization parameters \(\lambda_1 >
\lambda_2 > \cdots > \lambda_K\) was implemented in its
CRAN debut (version 1.1-1). The path is computed once; each \(\lambda_k\) solution starts from the
previous, and both the set of active variables and the number of inner
passes stay small.
Coordinate descent has a long history, and our team was not the first
to use it in conjunction with the lasso. Tibshirani’s Ph.D student
Wenjiang Fu at U. Toronto invented a version of coordinate descent
called the shooting algorithm in 1997. Ingrid Daubechies used
coordinate descent for \(\ell_1\) image
deblurring in a talk at Stanford in 2002. In her Ph.D. thesis with
Jacquie Meulman at U. Leiden in 2006, Anita van der Kooij used
coordinate descent in fitting elastic net problems. Others have used it
is well. The pathwise use of coordinate descent in
glmnet leads to great efficiency, and allows for a fairly
seamless transition to different loss functions.
The canonical reference for the package itself is Friedman et al. (2010), published in the Journal of
Statistical Software the year after the initial release. That paper
covers Gaussian, two-class logistic, multinomial, and Poisson
regression. The Cox proportional-hazards model for right-censored
survival data was added in version 1.2 (2010) by Noah Simon, then a PhD
student at Stanford, and documented in Simon et
al. (2011). At that point the available
families were exactly those exposed through the family
argument today: "gaussian", "binomial",
"multinomial", "poisson",
"mgaussian", and "cox". Each one had its own
Fortran subroutine, and the dispatch in glmnet.R was a
chain of if/else branches keyed on the family string:
glmnet before
v4.0. User-facing R in blue, internal R in orange, Fortran subroutines
in grey.A word on the implementation language is warranted, because it
explains most of what came later. The Fortran sources were not written
by hand in Fortran but generated from Mortran, a macro language
layered over Fortran 77. In the spirit of open source, the full Mortran
sources live in inst/mortran/ and were maintained by Jerome
Friedman for over a decade. The combination was extremely fast but it
was difficult to extend and hard for newcomers to read. Every new
feature that touched the inner loop had to be expressed in Mortran or
directly in Fortran, regenerated to Fortran, and then debugged under the
limitations of that toolchain. Furthermore, Fortran was dropping in
popularity and C/C++ was being used more and more for writing fast code
to interface with R. So too with glmnet.
NEWS.md and CRAN archive.
glmnet release milestones. Top row runs left-to-right; the path folds down at the right and the bottom row runs right-to-left, ending at v5.0. Evenly spaced for legibility rather than to reflect calendar time.
More responses. The mgaussian family
for multivariate Gaussian regression with a grouped-lasso penalty, and a
grouped option for the multinomial family, arrived in v1.8
(2012). Sparse x via Matrix::dgCMatrix landed
even earlier, in v1.6 (2011). This was a major step making it possible
to use glmnet with large, sparse design matrices.
Strong rules. Version 1.6 (April 2011) also
introduced strong rules screening (Tibshirani et al. 2012): at each
\(\lambda\) along the path a cheap
inner-product check predicts which variables are likely to stay
inactive, and the coordinate-descent sweep skips them, with a
KKT-condition re-check afterwards to catch any violations. For sparse
problems this screens out the vast majority of predictors and was a
step-change in speed for large \(p\).
Like several other glmnet milestones, the code landed in
CRAN a year before the paper appeared.
Coefficient constraints. Upper and lower limits on
individual coefficients, together with glmnet.control() for
tuning numerical parameters, appeared in v1.9-1 (2013). The ability to
fix a coefficient in an interval — or pin it to zero from outside via
exclude — is now routinely relied upon by downstream
packages.
Cross-validation maturity. The v2.0-1 (2015) release
rewrote cv.glmnet: each fold fits on its own \(\lambda\) sequence and is then evaluated at
the original path. The alignment subtlety that motivated this — paths
from different folds do not in general hit the same \(\lambda\) values — was subsequently
addressed again in v2.0-18 (2019) via the alignment
argument.
Cox refinements. A bug affecting ties between the
death set and the risk set was fixed in v2.0-19, and
coxgrad and an improved coxnet.deviance were
added in v2.0-20. These changes prepared the ground for the larger Cox
work that followed in v4.1.
Fortran maintenance. Substantial effort was spent in
making the Mortran generated Fortran build cleanly on every CRAN
platform: portable on every Fortran compiler (v2.0-15),
double-precision-consistent and -Wall-clean (v2.0-16,
2018), native Fortran registration (v2.0-8, 2017). These changes were
behind the scenes, less visible to users, but essential for passing CRAN
checks for another decade.
Version 3.0, released in November 2019, was the first release in
which the feature set grew faster than the implementation underneath it.
The headline additions, contributed by Trevor Hastie and Balasubramanian
Narasimhan, were: the relax argument (Hastie et al.
2020) to both glmnet and cv.glmnet,
which refits each active set in the path without regularization and
blends the two fits through a gamma parameter (a dedicated
relax vignette documents the usage);
assess.glmnet, roc.glmnet, and
confusion.glmnet for evaluating model performance;
makeX for building input matrices from raw data frames with
one-hot encoding of factors and NA handling; and bigGlm for
unpenalized GLM fits using the same algorithmic machinery. A
trace.it progress bar made long paths less opaque, and
print methods for cv.glmnet and the new
relaxed class were tidied up.
Importantly, the family dispatch at this point was still entirely
character-based. A call glmnet(x, y, family = "binomial")
passed through an if branch into lognet.R,
which called a Fortran subroutine dedicated to logistic regression.
Every family had its own subroutine. That was about to change.
Before Version 4.0, each family reference such as
"binomial" was routed to a dedicated routine. Version 4.0
(May 2020) generalized the objective
\[ \sum_{i=1}^n w_i\,\text{NLL}_i(\beta_0,\beta) + \lambda\,P_\alpha(\beta), \]
where \(\text{NLL}_i\) is the
negative log-likelihood contribution of observation \(i\) under any GLM family, and the
elastic-net penalty is \(P_\alpha(\beta) =
\alpha\|\beta\|_1 +
\tfrac{1-\alpha}{2}\|\beta\|_2^2\). Similar to classical GLMs, we
use an iteratively reweighted lasso or elastic net algorithm at each
value of lambda (along with cyclical coordinate descent) to
fit this model, which is implemented pathwise down the sequence of
lambda values. That is, at each value of
lambda, we implement the proximal Newton via
penalized iterative weighted lease squares
\[ \frac{1}{2n}\sum_{i=1}^n w_i \bigl(y_i - \beta_0 - x_i^\top\beta\bigr)^2 + \lambda\,P_\alpha(\beta). \]
This coordinate-descent kernel that serves this inner loop is the
weighted least squares routine wls, which becomes the
engine for every family the package does not handle with a dedicated
subroutine.
The programming was done by Kenneth Tay, then a PhD student at
Stanford, with supervision from Hastie and Narasimhan. The user-visible
payoff is that the family argument can now be an R
family() object — any object of class "family"
in the sense of stats::glm, including
MASS::negative.binomial, statmod::tweedie, or
a user-defined family. The published reference is Tay et al. (2023).
Note that the paper appeared three years after the release; the software
led the publication.
glmnet v4.0. The
new right-hand branch (glmnet.path,
glmnet.fit, wls) handles any
family() object; the original per-family subroutines remain
for the built-in string-named families, which stay on the fast
path.Two design decisions deserve explicit mention. First, the pre-v4
character-named families were retained and continue to go through their
dedicated subroutines, because a hand-written kernel will always beat a
generic IRLS loop. So the flexibility was added alongside the old code,
not on top of it. Second, the v4.0 warm-start convention was tightened:
the earliest form required passing a full glmnetfit object,
which was awkward; in v4.0.1 this was relaxed so a warm start can be a
simple list containing the intercept and coefficient vector. Consistency
with the pre-v4 numerical results — verified through KKT-condition
checks against the Fortran output — was the correctness criterion
throughout.
Version 4.1 (January 2021) brought the Cox model to something close
to feature parity with survival::coxph, which has served as
the reference implementation throughout glmnet’s Cox
history. Four things were added in a single release:
survival has long accepted.stratifySurv(), with the
interpretation of survival::coxph: a separate baseline
hazard per stratum and a single shared coefficient vector.dgCMatrix.survfit method for coxnet objects so
that fitted survival curves could be produced without leaving
glmnet.Much of this work was also implemented by Tay, building on the v4.0
IRLS machinery but targeting the Cox deviance specifically. In v4.1-2,
following a suggestion by Rob Tibshirani, the exclude
argument was extended from a fixed set of indices to a function
that is handed x, y, and weights
and returns the indices to drop. This is the user-level hook by which
strong-rules screening can be plugged into the path; see Tibshirani et al. (2012) and, for its application at
GWAS scale, Qian et al. (2020).
During the summer of 2021, first year statistics Ph.D student James
Yang was looking for a fun activity after completing the grueling
qualiying exams. He decided to rewrite the fortran backbone in
glmnet in C++! This caught the attention of Hastie (who
became his advisor) and Narasimhan, because Yang is a very experienced
programmer. Between v4.1-3 (November 2021) and v4.1-6 (November 2022),
Yang replaced essentially all of the Mortran/Fortran in the package with
a header-only C++ implementation built on Eigen. This
implementation, glmnetpp, is a single template kernel that
absorbs dense and sparse matrices and all the built-in families through
parametrization rather than code duplication.
The staging was deliberate. In v4.1-3 the weighted least squares
kernels (wls and its sparse counterpart) and the
elnet family of routines were ported first — these were the
engines that the new glmnet.fit path exercised the most,
and porting them delivered roughly 4–8\(\times\) speedups on the generic GLM path.
In v4.1-4 the rest of the Fortran was rewritten, leaving only the Cox
subroutine. In v4.1-6 the legacy Fortran that could be removed was
removed.
coxnet() remained in Fortran.Several thousand lines of template C++ replaced several thousand
lines of Fortran, with comparable or better performance because
arguments were no longer copied—R version 3.1+ changed this behavior and
dup = FALSE had no effect. This provided a much lower
barrier to future extension, and removed the dependence on a
generated-Fortran toolchain. The Cox subroutine was left alone because
its numerical structure — stratified risk sets, ties, left truncation —
is harder to express cleanly as a template, and because James Yang’s
priority was to get the rest of the package onto the new footing.
Resolving Cox would take another three years.
glmnet is also a piece of infrastructure on which other
statistical software is built, and some of those applications have
exerted pressure on the package’s interface. We mention three
briefly.
SNPnet (Qian et
al. 2020) applies the lasso at the scale of modern
genome-wide association studies — hundreds of thousands of samples by
hundreds of thousands of variants — by aggressively screening predictors
using the strong rules of Tibshirani et al. (2012). The work motivated the v4.1-2
generalization of exclude to accept a function, so that
strong-rules screening can be plugged into the standard
glmnet call rather than requiring a fork.
The data-shared lasso (Gross and Tibshirani 2016) uses
glmnet as its optimization backend to share information
across related datasets while retaining per-dataset coefficients, an
idiom close to multi-task learning.
Cooperative learning (Ding et al. 2022) is a framework for
multi-view regression in which multiple data modalities on the same
subjects (genomics, proteomics, imaging, and so on) are jointly
regressed against an outcome under a regularization term that penalizes
disagreement between the views. The companion CRAN package
multiview dispatches its fits to glmnet.
On the eve of v5.0 the package was in an unusual state: every kernel except the Cox subroutine had been rewritten in C++. The Cox kernel had survived because of the combinatorial complexity its users expect of it — tie handling, stratified risk sets, left truncation — and because producing a numerically stable deviance, gradient, and Hessian in all of these cases in a single clean implementation is harder than it looks.
The v5.0 release resolves this. Jonathan Taylor joined the team, and
contributed coxdev, a standalone C++ library that computes
the Cox partial log-likelihood deviance together with its gradient and
Hessian, handling Breslow and Efron tie methods, strata, and \((start, stop]\) data through a small number
of well-tested classes. coxdev is separately developed (a
nested git submodule of glmnetpp, which is itself a
submodule of glmnet), and the computational details are the
subject of a companion vignette. At the same time, Narasimhan replaced
the Fortran code for the Cox regularization path with C++, making use of
the coxdev library for the gradient and Hessian
computations, which automatically takes care of all the special cases.
From the point of view of glmnet, the essential change is
that Cox is now a first-class GLM type inside glmnetpp —
glm_type::cox in the type enumeration — and the same IRLS
loop that drives the Gaussian, binomial, and Poisson paths drives the
Cox path, with a thin CoxDevAdapter handing the deviance
and gradient queries off to coxdev. The last of the Fortran
leaves the package in this release.
glmnet v5.0
with the glmnetpp and coxdev submodules, the
Rcpp boundary, and the CoxDevAdapter bridge from the
glmnetpp IRLS loop to the coxdev deviance and
gradient routines.Two user-visible consequences of this unification are worth stating
plainly. First, the Cox model now takes a cox.ties argument
with choices "breslow" or "efron". The v5.0
default is cox.ties = "breslow", preserving the pre-v5.0
numerical behavior for upgraders; v5.1 will switch the default to
"efron" to match survival::coxph. Calls to
glmnet() with family = "cox" that do not set
cox.ties explicitly emit a transition warning until the
switch occurs. The performance picture is favourable. On the benchmark
set in tests/benchmark_cox.R, dense Cox paths are roughly 3
times faster than the legacy Fortran, sparse Cox paths roughly 90 times
faster, and stratified Cox paths — which had been the most penalized
case in the old implementation — roughly 300 times faster. Full
benchmark numbers appear in the NEWS entry for v5.0.
Development continues along several lines. A Python
glmnet is being prepared by Jonathan Taylor that shares a
common C++ codebase with the R package, so that the same kernels serve
both languages. The shared codebase opens the possibility of compiling
glmnet to WebAssembly for in-browser use. Upstream work in
coxdev is aimed at reducing per-call allocation in the IRLS
loop, which is the last remaining place where the Cox path pays a
measurable overhead for its extensibility.
glmnet is the work of many hands. Jerome Friedman wrote
the original coordinate-descent kernel and the Mortran infrastructure,
and is a co-author of Friedman et al. (2010), Simon et al.
(2011), and Friedman et al. (2007). Trevor Hastie has been co-author
throughout, as well as the package maintainer. He also wrote the bulk of
the R interface code, as well as methods for plotting, printing,
predicting and cross-validation, and the corresponding code for the
relaxed models (with help from Narasimhan).
Robert Tibshirani introduced the lasso (Tibshirani 1996) and co-authored the
strong-rules work on which the package depends. Noah Simon led the
original Cox path work and is the first author of Simon et al. (2011).
Kenneth Tay is responsible for the v4.0 GLM extension and much of the
v4.1 Cox expansion, and is first author of Tay et
al. (2023). Balasubramanian
Narasimhan has maintained the package through the v4.x and v5.0
architectural transitions and is a co-author of Tay et al. (2023).
James Yang carried out the v4.1-3 through v4.1-6 port from Fortran to
C++ and is the author of glmnetpp. Junyang Qian is first
author of Qian et al. (2020), whose screening machinery
influenced the v4.1-2 exclude-as-function interface.
Jonathan Taylor contributed coxdev and with Narasimhan
drove the v5.0 unification of the Cox path onto the C++ engine. Many bug
reports and fixes contributed by users over the years are recorded in
NEWS.md; the work of Tomas Kalibera, David Keplinger, and
others is gratefully acknowledged.