Title: | Tools for Analyzing MCMC Simulations from Bayesian Inference |
---|---|
Description: | Tools for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables, and functions to work with hierarchical/multilevel batches of parameters (Fernández-i-Marín, 2016 <doi:10.18637/jss.v070.i09>). |
Authors: | Xavier Fernández i Marín [aut, cre] |
Maintainer: | Xavier Fernández i Marín <[email protected]> |
License: | GPL-2 |
Version: | 1.5.1.1 |
Built: | 2024-12-24 04:14:21 UTC |
Source: | https://github.com/xfim/ggmcmc |
Calculate the autocorrelation of a single chain, for a specified amount of lags.
ac(x, nLags)
ac(x, nLags)
x |
Vector with a chain of simulated values. |
nLags |
Numerical value with the maximum number of lags to take into account. |
A matrix with the autocorrelations of every chain.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
Internal function used by ggs_autocorrelation
.
# Calculate the autocorrelation of a simple vector ac(cumsum(rnorm(10))/10, nLags=4)
# Calculate the autocorrelation of a simple vector ac(cumsum(rnorm(10))/10, nLags=4)
Simulate a dataset with one explanatory variable and one binary outcome variable using (y ~ dbern(mu); logit(mu) = theta[1] + theta[2] * X). The data loads two objects: the observed y values and the coda object containing simulated values from the posterior distribution of the intercept and slope of a logistic regression. The purpose of the dataset is only to show the possibilities of the ggmcmc package.
data(binary)
data(binary)
Two objects, namely:
A coda object containing posterior distributions of the intercept (theta[1]) and slope (theta[2]) of a logistic regression with simulated data.
A numeric vector containing the observed values of the outcome in the binary regression with simulated data.
Simulated data for ggmcmc
data(binary) str(s.binary) str(y.binary) table(y.binary)
data(binary) str(s.binary) str(y.binary) table(y.binary)
Compute the minimal elements to recreate a histogram manually by defining the total number of bins.
calc_bin(x, bins = bins)
calc_bin(x, bins = bins)
x |
any vector or variable |
bins |
the number of requested bins |
Internal function to compute the minimal elements to recreate a histogram manually by defining the total number of bins, used by ggs_histogram
ggs_ppmean
and ggs_ppsd
.
A data frame with the x location, the width of the bars and the number of observations at each x location.
Generate a data frame with the limits of two credible intervals. Function used by ggs_caterpillar
. "low" and "high" refer to the wide interval, whereas "Low" and "High" refer to the narrow interval. "median" is self-explanatory and is used to draw a dot in caterpillar plots. The data frame generated is of wide format, suitable for ggplot2::geom_segment().
ci(D, thick_ci = c(0.05, 0.95), thin_ci = c(0.025, 0.975))
ci(D, thick_ci = c(0.05, 0.95), thin_ci = c(0.025, 0.975))
D |
Data frame whith the simulations. |
thick_ci |
Vector of length 2 with the quantiles of the thick band for the credible interval |
thin_ci |
Vector of length 2 with the quantiles of the thin band for the credible interval |
A data frame tibble with the Parameter names and 5 variables with the limits of the credibal intervals (thin and thick), ready to be used to produce caterpillar plots.
data(linear) ci(ggs(s))
data(linear) ci(ggs(s))
Auxiliary function that sorts Parameter names taking into account numeric values
custom.sort(x)
custom.sort(x)
x |
a character vector to which we want to sort elements |
X a character vector sorted with family parametrs first and then numeric values
Internal function used by the graphical functions to get only some of the parameters that follow a given regular expression.
get_family(D, family = NA)
get_family(D, family = NA)
D |
Data frame with the data arranged and ready to be used by the rest of the ggmcmc functions. The dataframe has four columns, namely: Iteration, Parameter, value and Chain, and six attributes: nChains, nParameters, nIterations, nBurnin, nThin and description. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
D Data frame that is a subset of the given D dataset.
ggmcmc()
is simply a wrapper function that generates a pdf file with all the potential plots that the package can produce.
ggmcmc is a tool for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables.
ggmcmc( D, file = "ggmcmc-output.pdf", family = NA, plot = NULL, param_page = 5, width = 7, height = 10, simplify_traceplot = NULL, dev_type_html = "png", ... )
ggmcmc( D, file = "ggmcmc-output.pdf", family = NA, plot = NULL, param_page = 5, width = 7, height = 10, simplify_traceplot = NULL, dev_type_html = "png", ... )
D |
Data frame whith the simulations, previously arranged using |
file |
Character vector with the name of the file to create. Defaults to "ggmcmc-output.pdf". When NULL, no pdf device is opened or closed. This allows the user to work with an opened pdf (or other) device. When the file has an html file extension the output is an Rmarkdown report with the figures embedded in the html file. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
plot |
character vector containing the names of the desired plots. By default (NULL), |
param_page |
Numerical, number of parameters to plot for each page. Defaults to 5. |
width |
Width of the pdf display, in inches. Defaults to 7. |
height |
Height of the pdf display, in inches. Defaults to 10. |
simplify_traceplot |
Numerical. A percentage of iterations to keep in the time series. It is an option intended only for the purpose of saving time and resources when doing traceplots. It is not a thin operation, because it is not regular. It must be used with care. |
dev_type_html |
Character. Character vector indicating the type of graphical device for the html output. By default, png. See RMarkdown. |
... |
Other options passed to the pdf device. |
Notice that caterpillar plots are only created when there are multiple parameters within the same family. A family of parameters is considered to be all parameters that have the same name (usually the same greek letter) but different number within square brackets (such as alpha[1], alpha[2], ...).
http://xavier-fim.net/packages/ggmcmc/.
## Not run: data(linear) ggmcmc(ggs(s)) # Directly from a coda object ## End(Not run)
## Not run: data(linear) ggmcmc(ggs(s)) # Directly from a coda object ## End(Not run)
This function manages MCMC samples from different sources (JAGS, MCMCpack, STAN -both via rstan and via csv files-) and converts them into a data frame tibble. The resulting data frame has four columns (Iteration, Chain, Parameter, value) and six attributes (nChains, nParameters, nIterations, nBurnin, nThin and description). The ggs object returned is then used as the input of the ggs_* functions to actually plot the different convergence diagnostics.
ggs( S, family = NA, description = NA, burnin = TRUE, par_labels = NA, sort = TRUE, keep_original_order = FALSE, splitting = FALSE, inc_warmup = FALSE, stan_include_auxiliar = FALSE )
ggs( S, family = NA, description = NA, burnin = TRUE, par_labels = NA, sort = TRUE, keep_original_order = FALSE, splitting = FALSE, inc_warmup = FALSE, stan_include_auxiliar = FALSE )
S |
Either a |
family |
Name of the family of parameters to process, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
description |
Character vector giving a short descriptive text that identifies the model. |
burnin |
Logical or numerical value. When logical and TRUE (the default), the number of samples in the burnin period will be taken into account, if it can be guessed by the extracting process. Otherwise, iterations will start counting from 1. If a numerical vector is given, the user then supplies the length of the burnin period. |
par_labels |
data frame with two colums. One named "Parameter" with the same names of the parameters of the model. Another named "Label" with the label of the parameter. When missing, the names passed to the model are used for representation. When there is no correspondence between a Parameter and a Label, the original name of the parameter is used. The order of the levels of the original Parameter does not change. |
sort |
Logical. When TRUE (the default), parameters are sorted first by family name and then by numerical value. |
keep_original_order |
Logical. When TRUE, parameters are sorted using the original order provided by the source software. Defaults to FALSE. |
splitting |
Logical. When TRUE, use the approach suggested by Gelman, Carlin, Stern, Dunson, Vehtari and Rubin (2014) Bayesian Data Analysis. 3rd edition. This implies splitting the sequences (original chains) in half, and treat each half as a different Chain, therefore effectively doubling the number of chains. In this case, the first half of Chain 1 is still Chain 1 , but the second half is turned into Chain 2, and the first half of Chain 2 into Chain 3, and so on. Defaults to FALSE. |
inc_warmup |
Logical. When dealing with stanfit objects from rstan, logical value whether the warmup samples are included. Defaults to FALSE. |
stan_include_auxiliar |
Logical value to include "lp__" parameter in rstan, and "lp__", "treedepth__" and "stepsize__" in stan running without rstan. Defaults to FALSE. |
D A data frame tibble with the data arranged and ready to be used by the rest of the ggmcmc
functions. The data frame has four columns, namely: Iteration, Chain, Parameter and value, and six attributes: nChains, nParameters, nIterations, nBurnin, nThin and description. A data frame tibble is a wrapper to a local data frame, behaves like a data frame and its advantage is related to printing, which is compact. For more details, see as_tibble()
in package dplyr
.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
Gelman, Carlin, Stern, Dunson, Vehtari and Rubin (2014) Bayesian Data Analysis. 3rd edition. Chapman & Hall/CRC, Boca Raton.
# Assign 'S' to be a data frame suitable for \code{ggmcmc} functions from # a coda object called s data(linear) S <- ggs(s) # s is a coda object # Get samples from 'beta' parameters only S <- ggs(s, family = "beta")
# Assign 'S' to be a data frame suitable for \code{ggmcmc} functions from # a coda object called s data(linear) S <- ggs(s) # s is a coda object # Get samples from 'beta' parameters only S <- ggs(s, family = "beta")
Plot an autocorrelation matrix.
ggs_autocorrelation(D, family = NA, nLags = 50, greek = FALSE)
ggs_autocorrelation(D, family = NA, nLags = 50, greek = FALSE)
D |
Data frame whith the simulations. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
nLags |
Integer indicating the number of lags of the autocorrelation plot. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
A ggplot
object.
data(linear) ggs_autocorrelation(ggs(s))
data(linear) ggs_autocorrelation(ggs(s))
Caterpillar plots are plotted combining all chains for each parameter.
ggs_caterpillar( D, family = NA, X = NA, thick_ci = c(0.05, 0.95), thin_ci = c(0.025, 0.975), line = NA, horizontal = TRUE, model_labels = NULL, label = NULL, comparison = NULL, comparison_separation = 0.2, greek = FALSE, sort = TRUE )
ggs_caterpillar( D, family = NA, X = NA, thick_ci = c(0.05, 0.95), thin_ci = c(0.025, 0.975), line = NA, horizontal = TRUE, model_labels = NULL, label = NULL, comparison = NULL, comparison_separation = 0.2, greek = FALSE, sort = TRUE )
D |
Data frame whith the simulations or list of data frame with simulations. If a list of data frames with simulations is passed, the names of the models are the names of the objects in the list. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
X |
data frame with two columns, Parameter and the value for the x location. Parameter must be a character vector with the same names that the parameters in the D object. |
thick_ci |
Vector of length 2 with the quantiles of the thick band for the credible interval |
thin_ci |
Vector of length 2 with the quantiles of the thin band for the credible interval |
line |
Numerical value indicating a concrete position, usually used to mark where zero is. By default do not plot any line. |
horizontal |
Logical. When TRUE (the default), the plot has horizontal lines. When FALSE, the plot is reversed to show vertical lines. Horizontal lines are more appropriate for categorical caterpillar plots, because the x-axis is the only dimension that matters. But for caterpillar plots against another variable, the vertical position is more appropriate. |
model_labels |
Vector of strings that matches the number of models in the list. It is only used in case of multiple models and when the list of ggs objects given at |
label |
Character value with the name of the variable that contains the labels displayed in the plot. Defaults to NULL, which corresponds to using the Parameter name or the Label in case par_labels is used in the ggs() object. |
comparison |
Character value with the name of the variable that contains the focus of the comparison. Defaults to NULL, which corresponds to no comparison. It is not expected to be used together with X. |
comparison_separation |
Numerical value with the separation between the dodged parameters. Defaults to 0.2. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
sort |
Logical value indicating whether, in a horizontal display, y-axis labels must be sorted (the default) or not. |
A ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_caterpillar(ggs(s)) ggs_caterpillar(list(A=ggs(s), B=ggs(s))) # silly example duplicating the same model
data(linear) ggs_caterpillar(ggs(s)) ggs_caterpillar(list(A=ggs(s), B=ggs(s))) # silly example duplicating the same model
Auxiliary function that extracts information from a single chain.
ggs_chain(s)
ggs_chain(s)
s |
a single chain to convert into a data frame |
D data frame with the chain arranged
Density plots comparing the distribution of the whole chain with only its last part.
ggs_compare_partial(D, family = NA, partial = 0.1, rug = FALSE, greek = FALSE)
ggs_compare_partial(D, family = NA, partial = 0.1, rug = FALSE, greek = FALSE)
D |
Data frame whith the simulations |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
partial |
Percentage of the chain to compare to. Defaults to the last 10 percent. |
rug |
Logical indicating whether a rug must be added to the plot. It is FALSE by default, since in large chains it may use lot of resources and it is not central to the plot. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
A ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_compare_partial(ggs(s))
data(linear) ggs_compare_partial(ggs(s))
Plot the Cross-correlation between-chains.
ggs_crosscorrelation(D, family = NA, absolute_scale = TRUE, greek = FALSE)
ggs_crosscorrelation(D, family = NA, absolute_scale = TRUE, greek = FALSE)
D |
Data frame whith the simulations. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
absolute_scale |
Logical. When TRUE (the default), the scale of the colour diverges between perfect inverse correlation (-1) to perfect correlation (1), whereas when FALSE, the scale is relative to the minimum and maximum cross-correlations observed. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
a ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_crosscorrelation(ggs(s))
data(linear) ggs_crosscorrelation(ggs(s))
Density plots with the parameter distribution. For multiple chains, use colours to differentiate the distributions.
ggs_density(D, family = NA, rug = FALSE, hpd = FALSE, greek = FALSE)
ggs_density(D, family = NA, rug = FALSE, hpd = FALSE, greek = FALSE)
D |
Data frame whith the simulations. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
rug |
Logical indicating whether a rug must be added to the plot. It is FALSE by default, since in large chains it may use lot of resources and it is not central to the plot. |
hpd |
Logical indicating whether HPD intervals (using the defaults from ci()) must be added to the plot. It is FALSE by default. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
A ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_density(ggs(s))
data(linear) ggs_density(ggs(s))
Get in a single tidy dataframe the results of the formal (non-visual) convergence analysis. Namely, the Geweke diagnostic (z, from ggs_geweke()), the Potential Scale Reduction Factor Rhat (Rhat, from ggs_Rhat()) and the number of effective independent draws (Effective, from ggs_effective()).
ggs_diagnostics( D, family = NA, version_rhat = "BDA2", version_effective = "spectral", proportion = TRUE )
ggs_diagnostics( D, family = NA, version_rhat = "BDA2", version_effective = "spectral", proportion = TRUE )
D |
Data frame whith the simulations |
family |
Name of the family of parameters to return, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
version_rhat |
Character variable with the name of the version of the potential scale reduction factor to use. Defaults to "BDA2", which refers to the second version of _Bayesian Data Analysis_ (Gelman, Carlin, Stern and Rubin). The other available version is "BG98", which refers to Brooks & Gelman (1998) and is the one used in the "coda" package. |
version_effective |
Character variable with the name of the version of the calculation to use. Defaults to "spectral", which refers to the simple version estimating the spectral density at frequency zero used in the "coda" package. An alternative version "BDA3" is provided, which refers to the third edition of Bayesian Data Analysis (Gelman, Carlin, Stern, Dunson, Vehtari and Rubin). |
proportion |
Logical value whether to return the proportion of effective independent draws over the total (the default) or the number. |
Notice that at least two chains are required. Otherwise, only the Geweke diagnostic makes sense, and can be returned with its own function.
A tidy
dataframe.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
Geweke, J. Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In _Bayesian Statistics 4_ (ed JM Bernardo, JO Berger, AP Dawid and AFM Smith). Clarendon Press, Oxford, UK.
Gelman, Carlin, Stern and Rubin (2003) Bayesian Data Analysis. 2nd edition. Chapman & Hall/CRC, Boca Raton.
Gelman, A and Rubin, DB (1992) Inference from iterative simulation using multiple sequences, _Statistical Science_, *7*, 457-511.
Brooks, S. P., and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. _Journal of computational and graphical statistics_, 7(4), 434-455.
Gelman, Carlin, Stern, Dunson, Vehtari and Rubin (2014) Bayesian Data Analysis. 3rd edition. Chapman & Hall/CRC, Boca Raton.
ggs_geweke
, ggs_Rhat
and ggs_effective
for their respective options.
data(linear) ggs_diagnostics(ggs(s))
data(linear) ggs_diagnostics(ggs(s))
Dotplot of the effective number of independent draws. The default version is the sample size adjusted for autocorrelation. An alternative from the third edition of Bayesian Data Analysis (Gelman, Carlin, Stern, Dunson, Vehtari and Rubin) is provided.
ggs_effective( D, family = NA, greek = FALSE, version_effective = "spectral", proportion = TRUE, plot = TRUE )
ggs_effective( D, family = NA, greek = FALSE, version_effective = "spectral", proportion = TRUE, plot = TRUE )
D |
Data frame whith the simulations |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
version_effective |
Character variable with the name of the version of the calculation to use. Defaults to "spectral", which refers to the simple version estimating the spectral density at frequency zero used in the "coda" package. An alternative version "BDA3" is provided, which refers to the third edition of Bayesian Data Analysis (Gelman, Carlin, Stern, Dunson, Vehtari and Rubin). |
proportion |
Logical value whether to return the proportion of effective independent draws over the total (the default) or the number. |
plot |
Logical value indicating whether the plot must be returned (the default) or a tidy dataframe with the effective number of samples per Parameter. |
Notice that at least two chains are required.
A ggplot
object, or a tidy
data frame.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
Gelman, Carlin, Stern, Dunson, Vehtari and Rubin (2014) Bayesian Data Analysis. 3rd edition. Chapman & Hall/CRC, Boca Raton.
data(linear) ggs_effective(ggs(s))
data(linear) ggs_effective(ggs(s))
Dotplot of Geweke diagnostic.
ggs_geweke( D, family = NA, frac1 = 0.1, frac2 = 0.5, shadow_limit = TRUE, greek = FALSE, plot = TRUE )
ggs_geweke( D, family = NA, frac1 = 0.1, frac2 = 0.5, shadow_limit = TRUE, greek = FALSE, plot = TRUE )
D |
data frame whith the simulations. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
frac1 |
Numeric, proportion of the first part of the chains selected. Defaults to 0.1. |
frac2 |
Numeric, proportion of the last part of the chains selected. Defaults to 0.5. |
shadow_limit |
logical. When TRUE (the default), a shadowed area between -2 and +2 is drawn. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
plot |
Logical value indicating whether the plot must be returned (the default) or a tidy dataframe with the results of the Geweke diagnostics per Parameter and Chain. |
A ggplot
object, or a tidy
data frame.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
Geweke, J. Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In _Bayesian Statistics 4_ (ed JM Bernardo, JO Berger, AP Dawid and AFM Smith). Clarendon Press, Oxford, UK.
data(linear) ggs_geweke(ggs(s))
data(linear) ggs_geweke(ggs(s))
Generate a Figure with the Rhat shrinkage evolution over bins of simulations, known as the Gelman-Rubin-Brooks plot, or the Gelman plot. For the Potential Scale Reduction Factor (Rhat), proposed by Gelman and Rubin (1992), the version from the second edition of Bayesian Data Analysis (Gelman, Carlin, Stern and Rubin) is used, but the version used in the package "coda" can also be used (Brooks & Gelman 1998).
ggs_grb( D, family = NA, scaling = 1.5, greek = FALSE, version_rhat = "BDA2", bins = 50, plot = TRUE )
ggs_grb( D, family = NA, scaling = 1.5, greek = FALSE, version_rhat = "BDA2", bins = 50, plot = TRUE )
D |
Data frame whith the simulations |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
scaling |
Value of the upper limit for the x-axis. By default, it is 1.5, to help contextualization of the convergence. When 0 or NA, the axis are not scaled. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
version_rhat |
Character variable with the name of the version of the potential scale reduction factor to use. Defaults to "BDA2", which refers to the second version of _Bayesian Data Analysis_ (Gelman, Carlin, Stern and Rubin). The other available version is "BG98", which refers to Brooks & Gelman (1998) and is the one used in the "coda" package. |
bins |
Numerical value with the number of bins requested. Defaults to 50. |
plot |
Logical value indicating whether the plot must be returned (the default) or a tidy dataframe with the results of the Rhat diagnostics per Parameter. |
Notice that at least two chains are required.
A ggplot
object, or a tidy
data frame.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
Gelman, Carlin, Stern and Rubin (2003) Bayesian Data Analysis. 2nd edition. Chapman & Hall/CRC, Boca Raton.
Gelman, A and Rubin, DB (1992) Inference from iterative simulation using multiple sequences, _Statistical Science_, *7*, 457-511.
Brooks, S. P., and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. _Journal of computational and graphical statistics_, 7(4), 434-455.
data(linear) ggs_grb(ggs(s))
data(linear) ggs_grb(ggs(s))
Plot a histogram of each of the parameters. Histograms are plotted combining all chains for each parameter.
ggs_histogram(D, family = NA, bins = 30, greek = FALSE)
ggs_histogram(D, family = NA, bins = 30, greek = FALSE)
D |
Data frame whith the simulations. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
bins |
integer indicating the total number of bins in which to divide the histogram. Defaults to 30, which is the same as geom_histogram() |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
A ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_histogram(ggs(s))
data(linear) ggs_histogram(ggs(s))
Pairs style plots to evaluate posterior correlations among parameters.
ggs_pairs(D, family = NA, greek = FALSE, ...)
ggs_pairs(D, family = NA, greek = FALSE, ...)
D |
Data frame with the simulations. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
... |
Arguments to be passed to |
A ggpairs
object that creates a plot matrix consisting of univariate density plots on the diagonal, correlation estimates in upper triangular elements, and scatterplots in lower triangular elements.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
## Not run: library(GGally) data(linear) # default ggpairs plot ggs_pairs(ggs(s)) # change alpha transparency of points ggs_pairs(ggs(s), lower=list(continuous = wrap("points", alpha = 0.2))) # with too many points, try contours instead ggs_pairs(ggs(s), lower=list(continuous="density")) # histograms instead of univariate densities on diagonal ggs_pairs(ggs(s), diag=list(continuous="barDiag")) # coloring results according to chains ggs_pairs(ggs(s), mapping = aes(color = Chain)) # custom points on lower panels, black contours on upper panels ggs_pairs(ggs(s), upper=list(continuous = wrap("density", color = "black")), lower=list(continuous = wrap("points", alpha = 0.2, shape = 1))) ## End(Not run)
## Not run: library(GGally) data(linear) # default ggpairs plot ggs_pairs(ggs(s)) # change alpha transparency of points ggs_pairs(ggs(s), lower=list(continuous = wrap("points", alpha = 0.2))) # with too many points, try contours instead ggs_pairs(ggs(s), lower=list(continuous="density")) # histograms instead of univariate densities on diagonal ggs_pairs(ggs(s), diag=list(continuous="barDiag")) # coloring results according to chains ggs_pairs(ggs(s), mapping = aes(color = Chain)) # custom points on lower panels, black contours on upper panels ggs_pairs(ggs(s), upper=list(continuous = wrap("density", color = "black")), lower=list(continuous = wrap("points", alpha = 0.2, shape = 1))) ## End(Not run)
Plot a histogram with the distribution of correctly predicted cases in a model against a binary response variable.
ggs_pcp(D, outcome, threshold = "observed", bins = 30)
ggs_pcp(D, outcome, threshold = "observed", bins = 30)
D |
Data frame whith the simulations. Notice that only the fitted / expected posterior outcomes are needed, and so either the previous call to ggs() should have limited the family of parameters to only pass the fitted / expected values. See the example below. |
outcome |
vector (or matrix or array) containing the observed outcome variable. Currently only a vector is supported. |
threshold |
numerical bounded between 0 and 1 or "observed", the default. If "observed", the threshold of expected values to be considered a realization of the event (1, succes) is computed using the observed value in the data. Otherwise, a numerical value showing which threshold to use (typically, 0.5) can be given. |
bins |
integer indicating the total number of bins in which to divide the histogram. Defaults to 30, which is the same as geom_histogram() |
A ggplot
object
data(binary) ggs_pcp(ggs(s.binary, family="mu"), outcome=y.binary)
data(binary) ggs_pcp(ggs(s.binary, family="mu"), outcome=y.binary)
Histogram with the distribution of the predicted posterior means, compared with the mean of the observed outcome.
ggs_ppmean(D, outcome, family = NA, bins = 30)
ggs_ppmean(D, outcome, family = NA, bins = 30)
D |
Data frame whith the simulations. Notice that only the posterior outcomes are needed, and so either the ggs() call limits the parameters to the outcomes or the user provides a family of parameters to limit it. |
outcome |
vector (or matrix or array) containing the observed outcome variable. Currently only a vector is supported. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
bins |
integer indicating the total number of bins in which to divide the histogram. Defaults to 30, which is the same as geom_histogram() |
A ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_ppmean(ggs(s.y.rep), outcome=y)
data(linear) ggs_ppmean(ggs(s.y.rep), outcome=y)
Histogram with the distribution of the predicted posterior standard deviations, compared with the standard deviations of the observed outcome.
ggs_ppsd(D, outcome, family = NA, bins = 30)
ggs_ppsd(D, outcome, family = NA, bins = 30)
D |
Data frame whith the simulations. Notice that only the posterior outcomes are needed, and so either the ggs() call limits the parameters to the outcomes or the user provides a family of parameters to limit it. |
outcome |
vector (or matrix or array) containing the observed outcome variable. Currently only a vector is supported. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
bins |
integer indicating the total number of bins in which to divide the histogram. Defaults to 30, which is the same as geom_histogram() |
A ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_ppsd(ggs(s.y.rep), outcome=y)
data(linear) ggs_ppsd(ggs(s.y.rep), outcome=y)
Plot a dotplot of Potential Scale Reduction Factor (Rhat), proposed by Gelman and Rubin (1992). The version from the second edition of Bayesian Data Analysis (Gelman, Carlin, Stern and Rubin) is used, but the version used in the package "coda" can also be used (Brooks & Gelman 1998).
ggs_Rhat( D, family = NA, scaling = 1.5, greek = FALSE, version_rhat = "BDA2", plot = TRUE )
ggs_Rhat( D, family = NA, scaling = 1.5, greek = FALSE, version_rhat = "BDA2", plot = TRUE )
D |
Data frame whith the simulations |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
scaling |
Value of the upper limit for the x-axis. By default, it is 1.5, to help contextualization of the convergence. When 0 or NA, the axis are not scaled. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
version_rhat |
Character variable with the name of the version of the potential scale reduction factor to use. Defaults to "BDA2", which refers to the second version of _Bayesian Data Analysis_ (Gelman, Carlin, Stern and Rubin). The other available version is "BG98", which refers to Brooks & Gelman (1998) and is the one used in the "coda" package. |
plot |
Logical value indicating whether the plot must be returned (the default) or a tidy dataframe with the results of the Rhat diagnostics per Parameter. |
Notice that at least two chains are required.
A ggplot
object, or a tidy
data frame.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
Gelman, Carlin, Stern and Rubin (2003) Bayesian Data Analysis. 2nd edition. Chapman & Hall/CRC, Boca Raton.
Gelman, A and Rubin, DB (1992) Inference from iterative simulation using multiple sequences, _Statistical Science_, *7*, 457-511.
Brooks, S. P., and Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. _Journal of computational and graphical statistics_, 7(4), 434-455.
data(linear) ggs_Rhat(ggs(s))
data(linear) ggs_Rhat(ggs(s))
Receiver-Operator Characteristic (ROC) plot for models with binary outcomes
ggs_rocplot(D, outcome, fully_bayesian = FALSE)
ggs_rocplot(D, outcome, fully_bayesian = FALSE)
D |
Data frame whith the simulations. Notice that only the posterior outcomes are needed, and so either the previous call to ggs() should have limited the family of parameters to pass to the predicted outcomes. |
outcome |
vector (or matrix or array) containing the observed outcome variable. Currently only a vector is supported. |
fully_bayesian |
logical, false by default. When not fully Bayesian, it uses the median of the predictions for each observation by iteration. When TRUE the function plots as many ROC curves as iterations. It uses a a lot of CPU and needs more memory. Use it with caution. |
A ggplot
object
data(binary) ggs_rocplot(ggs(s.binary, family="mu"), outcome=y.binary)
data(binary) ggs_rocplot(ggs(s.binary, family="mu"), outcome=y.binary)
Running means of the chains.
ggs_running( D, family = NA, original_burnin = TRUE, original_thin = TRUE, greek = FALSE )
ggs_running( D, family = NA, original_burnin = TRUE, original_thin = TRUE, greek = FALSE )
D |
Data frame whith the simulations. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
original_burnin |
Logical. When TRUE (the default), start the iteration counter in the x-axis at the end of the burnin period. |
original_thin |
Logical. When TRUE (the default), take into account the thinning interval in the x-axis. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
A ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_running(ggs(s))
data(linear) ggs_running(ggs(s))
Plot a separation plot with the results of the model against a binary response variable.
ggs_separation( D, outcome, minimalist = FALSE, show_labels = FALSE, uncertainty_band = TRUE )
ggs_separation( D, outcome, minimalist = FALSE, show_labels = FALSE, uncertainty_band = TRUE )
D |
Data frame whith the simulations. Notice that only the fitted / expected posterior outcomes are needed, and so either the previous call to ggs() should have limited the family of parameters to only pass the fitted / expected values. See the example below. |
outcome |
vector (or matrix or array) containing the observed outcome variable. Currently only a vector is supported. |
minimalist |
logical, FALSE by default. It returns a minimalistic version of the figure with the bare minimum elements, suitable for being used inline as suggested by Greenhill, Ward and Sacks citing Tufte. |
show_labels |
logical, FALSE by default. If TRUE it adds the Parameter as the label of the case in the x-axis. |
uncertainty_band |
logical, TRUE by default. If FALSE it removes the uncertainty band on the predicted values. |
A ggplot
object
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
Greenhill B, Ward MD and Sacks A (2011). The separation plot: A New Visual Method for Evaluating the Fit of Binary Models. _American Journal of Political Science_, 55(4), 991-1002, doi:10.1111/j.1540-5907.2011.00525.x.
Greenhill, Ward and Sacks (2011): The separation plot: a new visual method for evaluating the fit of binary models. American Journal of Political Science, vol 55, number 4, pg 991-1002.
data(binary) ggs_separation(ggs(s.binary, family="mu"), outcome=y.binary)
data(binary) ggs_separation(ggs(s.binary, family="mu"), outcome=y.binary)
Traceplot with the time series of the chains.
ggs_traceplot( D, family = NA, original_burnin = TRUE, original_thin = TRUE, simplify = NULL, hpd = FALSE, greek = FALSE )
ggs_traceplot( D, family = NA, original_burnin = TRUE, original_thin = TRUE, simplify = NULL, hpd = FALSE, greek = FALSE )
D |
Data frame with the simulations. |
family |
Name of the family of parameters to plot, as given by a character vector or a regular expression. A family of parameters is considered to be any group of parameters with the same name but different numerical value between square brackets (as beta[1], beta[2], etc). |
original_burnin |
Logical. When TRUE (the default) start the Iteration counter in the x-axis at the end of the burnin period. |
original_thin |
Logical. When TRUE (the default) take into account the thinning interval in the x-axis. |
simplify |
Numerical. A percentage of iterations to keep in the time series. It is an option intended only for the purpose of saving time and resources when doing traceplots. It is not a thin operation, because it is not regular. It must be used with care. |
hpd |
Logical indicating whether HPD intervals (using the defaults from ci()) must be added to the plot. It is FALSE by default. |
greek |
Logical value indicating whether parameter labels have to be parsed to get Greek letters. Defaults to false. |
A ggplot
object.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
data(linear) ggs_traceplot(ggs(s))
data(linear) ggs_traceplot(ggs(s))
Generate a factor with levels of unequal length.
gl_unq(n, k, labels = 1:n)
gl_unq(n, k, labels = 1:n)
n |
number of levels |
k |
number of repetitions |
labels |
optional vector of labels |
Internal function to generate a factor with levels of unequal length, used by ggs_histogram
.
A factor
Simulate a dataset with one explanatory variable and one continuous outcome variable using (y ~ dnorm(mu, sigma); mu = beta[1] + beta[2] * X). The data loads three objects: the observed y values, a coda object containing simulated values from the posterior distribution of the intercept and slope of a linear regression, and a coda object containing simulated values from the posterior predictive distribution. The purpose of the dataset is only to show the possibilities of the ggmcmc package.
data(linear)
data(linear)
Three objects, namely:
A coda object containing posterior distributions of the intercept (beta[1]) and slope (beta[2]) of a linear regression with simulated data.
A coda object containing simulated values from the posterior predictive distribution of the outcome of a linear regression with simulated data (y ~ N(mu, sigma); mu = beta[1] + beta[2] * X; y.rep ~ N(mu, sigma); where y.rep is a replicated outcome, originally missing data).
A numeric vector containing the observed values of the outcome in the linear regression with simulated data.
Simulated data for ggmcmc
data(linear) str(s) str(s.y.rep) str(y)
data(linear) str(s) str(s.y.rep) str(y)
Generate a data frame with at least columns for Parameter and Labels. This function is intended to work as a shortcut for the matching data frame necessary to pass the argument "par_labels" to ggs() calls for transforming the parameter names.
plab(parameter.name, match, subscripts = NULL)
plab(parameter.name, match, subscripts = NULL)
parameter.name |
A character vector of length one with the name of the variable (family) without subscripts. Usually, it refers to a Greek letter. |
match |
A named list with the variable labels and the values of the factor corresponding to the dimension they map to. The order of the list matters, as ggmcmc assumes that the first dimension corresponds to the first element in the list, and so on. |
subscripts |
An optional character with the letters that correspond to each of the dimensions of the family of parameters. By default it uses not very informative names "dim.1", "dim.2", etc... It usually corresponds to the "i", "j", ... subscripts in classical textbooks, but is recommended to be closer to the subscripts given in the sampling software. |
A data frame tibble with the Parameter names and its match with meaningful variable Labels. Also the intermediate variables are passed to make it easier to work with the samples using meaningful variable names.
data(radon) L.radon <- plab("alpha", match = list(County = radon$counties$County)) # Generates a data frame suitable for matching with the generated samples # through the "par_labels" function: ggs_caterpillar(ggs(radon$s.radon, par_labels = L.radon, family = "^alpha"))
data(radon) L.radon <- plab("alpha", match = list(County = radon$counties$County)) # Generates a data frame suitable for matching with the generated samples # through the "par_labels" function: ggs_caterpillar(ggs(radon$s.radon, par_labels = L.radon, family = "^alpha"))
Using the radon example in Gelman & Hill (2007), the list contains several elements to show the possibilities of ggmcmc for applied Bayesian Hierarchical/multilevel analysis.
data(radon)
data(radon)
A list containing several elements (data and outputs of the analysis):
A data frame with the country label, ids and radon level.
A vector identifying counties in the data.
The outcome variable.
A coda object with simulated values from the posterior distribution of all parameters, with few iterations for each one.
A coda object containing simulated values from the posterior predictive distribution.
A coda object with simulated values from the posterior distribution of few parameters, with reasonable chain length.
http://www.stat.columbia.edu/~gelman/arm/examples/radon/
data(radon) names(radon) # Generate a data frame suitable for matching with the generated samples # through the "par_labels" function: L.radon <- plab("alpha", match = list(County = radon$counties$County))
data(radon) names(radon) # Generate a data frame suitable for matching with the generated samples # through the "par_labels" function: L.radon <- plab("alpha", match = list(County = radon$counties$County))
Internal function used by ggs_autocorrelation
.
roc_calc(R)
roc_calc(R)
R |
data frame with the 'value' (predicted probability) and the observed 'Outcome'. |
A data frame with the Sensitivity and the Specificity.
Fernández-i-Marín, Xavier (2016) ggmcmc: Analysis of MCMC Samples and Bayesian Inference. Journal of Statistical Software, 70(9), 1-20. doi:10.18637/jss.v070.i09
A coda object containing simulated values from the posterior distribution of the intercept, slope and residual of a linear regression with fake data (y = beta[1] + beta[2] * X + sigma). The purpose of the dataset is only to show the possibilities of the ggmcmc package.
data(s)
data(s)
A coda object containing posterior distributions of the intercept, slope and residual of a linear regression with fake data.
A coda object containing simulated values from the posterior distribution of the intercept and slope of a logistic regression with fake data (y ~ dbern(mu); logit(mu) = theta[1] + theta[2] * X), and the fitted / expected values (mu). The purpose of the dataset is only to show the possibilities of the ggmcmc package.
data(s.binary)
data(s.binary)
A coda object containing posterior distributions of the intercept (theta[1]) and slope (theta[2]) of a logistic regression with fake data, and of the fitted / expected values (mu).
A coda object containing simulated values from the posterior predictive distribution of the outcome of a linear regression with fake data (y ~ N(mu, sigma); mu = beta[1] + beta[2] * X; y.rep ~ N(mu, sigma); where y.rep is a replicated outcome, originally missing data). The purpose of the dataset is only to show the possibilities of the ggmcmc package.
data(s.y.rep)
data(s.y.rep)
A coda object containing posterior distributions of the posterior predictive distribution of a linear regression with fake data.
Compute the Spectral Density Estimate at Zero Frequency for a given chain.
sde0f(x)
sde0f(x)
x |
A time series |
Internal function to compute the Spectral Density Estimate at Zero Frequency for a given chain used by ggs_geweke
.
A vector with the spectral density estimate at zero frequency
A numeric vector containing the observed values of the outcome of a linear regression with fake data (y = beta[1] + beta[2] + X + sigma). The purpose of the dataset is only to show the possibilities of the ggmcmc package.
data(y)
data(y)
A numeric vector containing the observed values of the outcome in the linear regression with fake data.
A numeric vector containing the observed values (y) of the outcome of a logistic regression with fake data (y ~ dbern(mu); logit(mu) = theta[1] + theta[2] * X). The purpose of the dataset is only to show the possibilities of the ggmcmc package.
data(y.binary)
data(y.binary)
A numeric vector containing the observed values of the outcome in the linear regression with fake data.