rstanarm

rstanarm R package for Bayesian applied regression modeling

Github星跟蹤圖

rstanarm

CRAN\_Status\_BadgeDownloads

Bayesian applied regression modeling (arm) via Stan

This is an R package that emulates other R model-fitting functions but uses
Stan (via the rstan package) for the back-end
estimation. The primary target audience is people who would be open to Bayesian
inference if using Bayesian software were easier but would use frequentist
software otherwise.

Fitting models with rstanarm is also useful for experienced Bayesian
software users who want to take advantage the pre-compiled Stan programs that
are written by Stan developers and carefully implemented to prioritize numerical
stability and the avoidance of sampling problems.

Click the arrows for more details:

The rstanarm package is an appendage to the rstan package, the R
interface to Stan. rstanarm enables many of the most
common applied regression models to be estimated using Markov Chain Monte Carlo,
variational approximations to the posterior distribution, or optimization. The
package allows these models to be specified using the customary R modeling
syntax (e.g., like that of glm with a formula and data.frame).
Additional arguments are provided for specifying prior distributions.

The set of models supported by rstanarm is large (and will continue to
grow), but also limited enough so that it is possible to integrate them
tightly with the pp_check function for graphical posterior predictive checks using bayesplot and the
posterior_predict
function to easily estimate the effect of specific manipulations of predictor
variables or to predict the outcome in a training set.

The fitted model objects returned by the rstanarm modeling functions are
called stanreg objects. In addition to all of the traditional
methods
defined for fitted model objects, stanreg objects can also be used with the
loo package for
leave-one-out cross-validation, model comparison, and model weighting/averaging
and the shinystan
package for exploring the posterior distribution and model diagnostics
with a graphical user interface.

Check out the rstanarm vignettes
for examples and more details about the entire process.

The model estimating functions are described in greater detail in their
individual help pages and vignettes. Here we provide a very brief overview:

  • stan_lm, stan_aov,stan_biglm

    Similar to lm and aov but with novel regularizing priors on the model
    parameters that are driven by prior beliefs about R-squared, the proportion of
    variance in the outcome attributable to the predictors in a linear model.

  • stan_glm, stan_glm.nb

    Similar to glm but with various possible prior distributions for the
    coefficients and, if applicable, a prior distribution for any auxiliary
    parameter in a Generalized Linear Model (GLM) that is characterized by a
    family object (e.g. the shape parameter in Gamma models). It is also possible
    to estimate a negative binomial model similar to the glm.nb function
    in the MASS package.

  • stan_glmer, stan_glmer.nb, stan_lmer

    Similar to the glmer, glmer.nb, and lmer functions (lme4 package) in
    that GLMs are augmented to have group-specific terms that deviate from the
    common coefficients according to a mean-zero multivariate normal distribution
    with a highly-structured but unknown covariance matrix (for which rstanarm
    introduces an innovative prior distribution). MCMC provides more appropriate
    estimates of uncertainty for models that consist of a mix of common and
    group-specific parameters.

  • stan_nlmer

    Similar to nlmer (lme4 package) package for nonlinear "mixed-effects"
    models, but flexible priors can be specified for all parameters in the model,
    including the unknown covariance matrices for the varying
    (group-specific) coefficients.

  • stan_gamm4

    Similar to gamm4 (gamm4 package), which augments a GLM (possibly with
    group-specific terms) with nonlinear smooth functions of the predictors to
    form a Generalized Additive Mixed Model (GAMM). Rather than calling
    lme4::glmer like gamm4 does, stan_gamm4 essentially calls stan_glmer,
    which avoids the optimization issues that often crop up with GAMMs and
    provides better estimates for the uncertainty of the parameter estimates.

  • stan_polr

    Similar to polr (MASS package) in that it models an ordinal response,
    but the Bayesian model also implies a prior distribution on the unknown
    cutpoints. Can also be used to model binary outcomes, possibly while
    estimating an unknown exponent governing the probability of success.

  • stan_betareg

    Similar to betareg (betareg package) in that it models an outcome that
    is a rate (proportion) but, rather than performing maximum likelihood
    estimation, full Bayesian estimation is performed by default, with
    customizable prior distributions for all parameters.

  • stan_clogit

    Similar to clogit (survival package) in that it models an binary outcome
    where the number of successes and failures is fixed within each stratum by
    the research design. There are some minor syntactical differences relative
    to survival::clogit that allow stan_clogit to accept
    group-specific terms as in stan_glmer.

  • stan_mvmer

    A multivariate form of stan_glmer, whereby the user can specify
    one or more submodels each consisting of a GLM with group-specific terms. If
    more than one submodel is specified (i.e. there is more than one outcome
    variable) then a dependence is induced by assuming that the group-specific
    terms for each grouping factor are correlated across submodels.

  • stan_jm

    Estimates shared parameter joint models for longitudinal and time-to-event
    (i.e. survival) data. The joint model can be univariate (i.e. one longitudinal
    outcome) or multivariate (i.e. more than one longitudinal outcome). A variety
    of parameterisations are available for linking the longitudinal and event
    processes (i.e. a variety of association structures).

The modeling functions in the rstanarm package take an algorithm
argument that can be one of the following:

  • Sampling (algorithm="sampling"):

Uses Markov Chain Monte Carlo (MCMC) --- in particular, Stan's implementation
of Hamiltonian Monte Carlo (HMC) with a tuned but diagonal mass matrix ---
to draw from the posterior distribution of the parameters. This is the slowest
but most reliable of the available estimation algorithms and it is the
default and recommended algorithm for statistical inference
.

  • Mean-field (algorithm="meanfield"):

Uses mean-field variational inference to draw from an approximation to the
posterior distribution. In particular, this algorithm finds the set of
independent normal distributions in the unconstrained space that --- when
transformed into the constrained space --- most closely approximate the
posterior distribution. Then it draws repeatedly from these independent
normal distributions and transforms them into the constrained space. The
entire process is much faster than HMC and yields independent draws but
is not recommended for final statistical inference. It can be useful to
narrow the set of candidate models in large problems, particularly when
specifying QR=TRUE in stan_glm, stan_glmer, and stan_gamm4, but is
only an approximation to the posterior distribution.

  • Full-rank (algorithm="fullrank"):

Uses full-rank variational inference to draw from an approximation to the
posterior distribution by finding the multivariate normal distribution in
the unconstrained space that --- when transformed into the constrained space
--- most closely approximates the posterior distribution. Then it draws
repeatedly from this multivariate normal distribution and transforms the
draws into the constrained space. This process is slower than meanfield
variational inference but is faster than HMC. Although still an
approximation to the posterior distribution and thus not recommended
for final statistical inference
, the approximation is more realistic than
that of mean-field variational inference because the parameters are not
assumed to be independent in the unconstrained space. Nevertheless, fullrank
variational inference is a more difficult optimization problem and the
algorithm is more prone to non-convergence or convergence to a local
optimum.

  • Optimizing (algorithm="optimizing"):

Finds the posterior mode using a C++ implementation of the LBGFS algorithm. If
there is no prior information, then this is equivalent to maximum likelihood,
in which case there is no great reason to use the functions in the rstanarm
package over the emulated functions in other packages. However, if priors are
specified, then the estimates are penalized maximum likelihood estimates, which
may have some redeeming value. Currently, optimization is only supported for
stan_glm.


Resources

Installation

Latest Release

The most recent rstanarm release can be installed from CRAN via

install.packages("rstanarm")

Development Version

To install from GitHub, first make sure that you can install the rstan
package and C++ toolchain by following these
instructions.
Once rstan is successfully installed, you can install rstanarm from
GitHub using the devtools package by executing the following in R:

if (!require(devtools)) {
  install.packages("devtools")
  library(devtools)
}
install_github("stan-dev/rstanarm", build_vignettes = FALSE)

You can switch build_vignettes to TRUE but it takes a lot longer to install and the
vignettes are already separately available from the
Stan website
and
CRAN.
If installation fails, please let us know by filing an issue.

Contributing

If you are interested in contributing to the development of rstanarm please
see the developer notes page.

主要指標

概覽
名稱與所有者stan-dev/rstanarm
主編程語言R
編程語言R (語言數: 8)
平台
許可證GNU General Public License v3.0
所有者活动
創建於2013-07-16 17:11:11
推送於2025-05-27 19:09:37
最后一次提交2025-05-27 13:09:37
發布數19
最新版本名稱v2.32.1 (發布於 )
第一版名稱v2.9.0-1 (發布於 )
用户参与
星數394
關注者數42
派生數137
提交數3.3k
已啟用問題?
問題數446
打開的問題數163
拉請求數159
打開的拉請求數8
關閉的拉請求數27
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?