MacMCMC
| File:MacMCMC.png | |
| Developer(s) | Michael P. McLaughlin |
|---|---|
| Initial release | 15 February 2019 |
| Stable release | 1.4
/ 8 January 2020 |
| Written in | C, C++, Objective-C, Objective-C++, Flex, Bison |
| Engine | |
| Operating system | Mac OS |
| Available in | English |
| Type | Statistical package |
| License | Freeware |
| Website | www |
Search MacMCMC on Amazon.
MacMCMC is a free, full-featured, standalone software package for Bayesian data analysis. As the name implies, it was developed for Mac OS platforms. From a user perspective, MacMCMC is a typical document application incorporating an editor for the analysis model plus menus, dialogs and other controls for selecting options and producing graphs, etc. Although originally intended as a pedagogical tool, it is powerful enough to be used for most Bayesian analyses.
Brief overview
Bayesian data analysis is a methodology that employs a descriptive model for both the observed (new) data, characterized by a likelihood function, as well as a prior probability for any information already known before looking at the new data.[1][2] Mathematical implementation of this model begins with Bayes' theorem but leads, in nearly all cases, to intractable integrals in a parameter space of many dimensions. To perform the analysis (extract information from the data), these integrals must be evaluated somehow. Currently, the state-of-the-art approach is to utilize a Markov Chain Monte Carlo (MCMC) algorithm.[3][4]
All MCMC algorithms are highly computer-intensive and, therefore, some software package is required. The output from such a package includes, initially, a two-dimensional array describing the joint posterior probability for the unknowns. This posterior may be post-processed to generate quantitative results of many kinds such as
- Plots for the marginal distribution of a parameter, p, or of some derived quantity, f(·)
- A credible interval for any unknown
- A goodness-of-fit plot comparing the data to the posterior result
- The marginal likelihood of the model quantifying its overall quality independent of its parameters
- A summary report
MacMCMC does all of the above, plus a lot more, from a user interface (GUI) designed for convenience. A complete list of features is given at its website (see below).
MCMC algorithm
An MCMC process is a kind of computer simulation in which a "walker" meanders, somewhat at random, through the parameter space in such a way as to guarantee that, after equilibrium has been established, the set of states (multi-dimensional points) that it visits has a histogram matching the desired posterior distribution implied by the model. Thousands and, perhaps, millions of states are visited in this fashion. Usually, there are two or more independent walkers so that equilibrium and convergence may be assessed by comparing various features of their posteriors.
MacMCMC utilizes an advanced algorithm known as ensemble MCMC.[5] In the latter, each walker is replaced with a set (ensemble) of walkers which are used collectively to select a proposed destination for the next state for each member of that ensemble. Other MCMC variants make this selection in other ways. Replacing a walker with an ensemble will, of course, increase the runtime substantially. MacMCMC mitigates this increase to a large extent by running ensembles in parallel so that small problems take only a few seconds.
Trivial example
The following example illustrates a few of the features of MacMCMC as well as something of the flavor of data analysis via MCMC. However, it does not even begin to show the power and flexibility of Bayesian inference which this software implements. The references and links on this webpage are a good starting point for further exploration.
The problem consists of taking observations of atmospheric carbon dioxide concentration over the past two millennia and determining the year when an essentially flat trend in concentration vs. time changed to an exponential rise. MacMCMC solves this problem using two input (text) files: a model (editable in MacMCMC) and a datafile (created separately). Output consists of text files and graphic files (default = PDF).
This example is discussed in more detail in the ebook referenced in the second External link below.
The data
This dataset contains values for all years from 1 to 2016 CE.[6] Concentration is in units of parts per million by volume (ppmv) which is proportional to the concentration of CO2 molecules. The top of the datafile, created with a spreadsheet, is as follows:
year CO2conc 1 276.7 2 276.8 3 276.8 4 276.9 5 276.9 6 277 7 277.1 8 277.1 9 277.2 10 277.2
The model
MCMC models, usually input with some customized pseudocode, specify the names and dimensions of variables, the relationships between variables and, in MacMCMC, which variables are monitored (will appear in the posterior file). Relationships that are deterministic must be distinguished from those that are stochastic. Here, this is done using = and ~ signs, respectively.
As always, a priori uncertainty about the unknowns is encoded in the priors. The relationship between time and concentration is here described by a piecewise (linear → exponential) function. The likelihood of the observations is described as Normal(mu, sigma) which makes this model analogous to an unweighted least squares regression. The unknown of primary interest is the time, tc, when this relationship went from linear to exponential. Prior information associates this change with the industrial revolution which started at some point in the eighteenth century.[7] Therefore, an appropriately vague prior is assigned to parameter tc.
The pseudocode shown below illustrates MacMCMC syntax. Other MCMC packages will be similar.
Constants:
N = 2016; // # of points
Data:
year[N], CO2conc[N];
Variables:
A, B, tc, baseline, mu, sigma, i;
Priors:
A ~ Jeffreys(0.001, 0.1);
B ~ Jeffreys(10, 100);
tc ~ Normal(1750, 15);
baseline ~ Uniform(250, 300);
sigma ~ Jeffreys(1, 10);
Likelihood:
for (i, 1:N) {
mu = baseline + (year[i] >= tc)*A*B*(exp((year[i] - tc)/B) - 1);
CO2conc[i] ~ Normal(mu, sigma);
}
Extras:
Monitored:
A, B, tc, baseline, sigma;
Some results
A Bayesian posterior contains a great deal of information—in principle, whatever is knowable about the unknowns given the data and the model. In this case, the report, output by MacMCMC automatically, gives a mean point-estimate for tc of 1734 with a 95% credible interval of 1716 to 1751 (all rounded off to the nearest year), consistent with the idea that the industrial revolution is responsible for the concentration increase. The report also includes, by default, a value for log(marginal likelihood), here = -4664.79. The latter can be used to compare this model to competing models in order to quantify, with no further assumptions or approximations, which model is best (has highest posterior probability). The marginal likelihood is not computed from the posterior. MacMCMC does a separate, optional numerical integration to obtain this value.
What results are output by MacMCMC is determined in a Setup dialog and, once the run is finished, by additional menu choices. The Setup dialog also provides both high-level and low-level control over the details of the MCMC process.
See also
References
- ↑ Gregory, P. (2010). Bayesian Logical Data Analysis for the Physical Sciences. Cambridge University Press. Search this book on
- ↑ Gelman, A.; et al. (2013). Bayesian Data Analysis. Chapman & Hall/CRC. Search this book on
- ↑ Gilks, W. R.; et al. (1995). Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC. Search this book on
- ↑ Brooks, S.; et al., eds. (2001). Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC. Search this book on
- ↑ Goodman, J.; Weare, J. (2010). "Ensemble samplers with affine invariance". Communications in Applied Mathematics and Computational Science. 5 (1): 65–80.
- ↑ U. S. Dept. of Energy. "Earth Science Data". ESS-DIVE.
- ↑ Lindsey, Rebecca. "Climate Change: Atmospheric Carbon Dioxide". Climate.gov. U. S. National Oceanic and Atmospheric Administration (NOAA). Archived from the original on 2013-06-24. Retrieved 2020-05-16.
External links
- MacMCMC website with download link
- Data, Uncertainty and Inference an informal introduction to Bayesian data analysis (free ebook) with many MacMCMC examples
This article "MacMCMC" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:MacMCMC. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.
