On a multi-processor machine you can start to get good results in a few of hours. On single processors you'll need to set aside rather longer. You can also run on a cluster.
By default CosmoMC uses a simple Metropolis algorithm or an optimized fast-slow sampling method (which works for likelihood with many fast nuisance parameters like Planck). The program takes as inputs estimates of central values and posterior uncertainties of the various parameters. The proposal density can use information about parameter correlations from a supplied covariance matrix: use one if possible as it will significantly improve performance. Covariance matrices are supplied for common sets of default base parameters. If you compile and run with MPI (to run across nodes in a cluster), there is an option to dynamically learn the proposal matrix from the covariance of the post-burn-in samples so far. The MPI option also allows you to terminate the computation automatically when a particular convergence criterion is matched. MPI is strongly recommended.
There are two fortran programs supplied cosmomc and getdist. The first does the actual Monte-Carlo and produces sets of .txt chain files and (optionally) .data output files (the binary .data files include the theoretical CMB power spectra etc.). The "getdist" program analyses the .txt files calculating statistics and outputs files for the requested 1D, 2D and 3D plots (and could be used independently of the main cosmomc program). The "cosmomc" program also does post processing on .data files, for example doing importance sampling with new data.
Please e-mail details of any bugs or enhancements to Antony Lewis. If you have any questions please ask in the CosmoCoffee computers and software forum. You can also read answers to other people's questions there.
Downloading and Compiling
You will need a Fortran 2003 (or higher) compiler - Intel Fortran 13 or higher works (earlier versions probably will not). Others may, but gfortran does not support Fortran 2003 well enough at the moment.
Using MPI simplifies running several chains and proposal optimization. MPI can be used with OpenMP: generally you want to use OpenMP to use all the shared-memory processors on each node of a cluster, and MPI to run multiple chains on different nodes (the program can also just be run on a single CPU).
If not using Intel also need to link to LAPACK (for doing matrix diagonalization, etc) - you may need to edit the Makefile to specify where this on your system (Intel compilers use MPL).
Using Visual Fortran there's no need to use the Makefile, just open the project file in the source folder, and set params.ini as the program argument. This is not set up to compile with MPI, so mainly useful for development.
See BibTex file for relevant citations.See the supplied params.ini file for a fairly self-explanatory list of input parameters and options. The file_root entry gives the root name for all files produced. Running using MPI on a cluster is recommended if possible as you can automatically handle convergence testing and stopping.
mpirun -np 4 ./cosmomc params.iniThere is also a supplied perl script runMPI.pl that you may be able to adapt for submitting jobs to a PBS queue, e.g.
perl runMPI.pl params 4to run 4 chains over four nodes using the params.ini parameters file (the script is set up by default for the CITA cluster - edit ppn=2 to the number of CPUs per node you have). A couple of runMPI.pl variations are also supplied (specifically for a couple of Cambridge computers, but may be generally adaptable).
If things go wrong check the .log and any error files in your cosmomc/scripts directory.
param[paramname] = center min max start_width propose_widthThe start_width entry determines the randomly chosen dispersion of the starting position about the given centre, and should be as large or larger than the posterior (for the convergence diagnostic to be reliable, chains should start at widely separated points). Chains are restricted to stay within the bounds determined by min and max.
The sampler proposes changes in parameters using a proposal density function with a width determined by propose_width (multiplied by the value of the global propose_scale parameter). The propose_width should be of the order of the conditional posterior width (i.e. the expected error in that parameter when other parameters are fixed). If you specify a propose_matrix (approximate covariance matrix for the parameters), the parameter distribution widths are determined from its eigenvalues instead, and the proposal density changes the parameter eigenvectors. The covariance matrix can be computed using "getdist" once you have done one run - the file_root.covmat file. The planck_covmats/ directory is supplied with many you can use. The covariance matrix does not have to include all the parameters that are used - zero entries will be updated from the input propose widths of new parameters (the propose width should be of the size of the conditional distribution of the parameter - typically quite a bit smaller than the marginalized posterior width; generally too small is better than too large). The scale of the proposal density relative to the covariance is given by the propose_scale parameter. If your propose_matrix is significantly broader than the expected posterior, this number can be decreased.
If you don't have a propose_matrix, you can use estimate_propose_matrix = T to automatically estimate it by numerical fitting about the best fit point (with action=2 to stop, or action=0 to continue straight into an MCMC run).
Other sampling methods that were implemented in previous version have not currently been updated for the latest version.
weight like param1 param2 param3 ...
The weight gives the number of samples (or importance weight) with these parameters. like gives -log(likelihood). The getdist program could be used completely independently of the cosmomc program.
Run getdist distparams.ini to process the chains specified in the parameter input file distparams.ini. This should be fairly self-explanatory, in the same format as the cosmomc parameter input file.
GetDist Parameters
Output Text Files
Plotting
If GetDist produces scripts files to make simple 1D, 2D and 3D plots. These can be either python or matlab, set plot_ext=py or plot_ext=m for which you prefer. The script files produced are called
Parameter labels are set in distparams.ini - if any are blank the parameter is ignored. You can also specify which parameters to plot, or if parameters are not specified for
the 2D plots or the colour of the 3D plots getdist automatically works out
the most correlated variables and uses them.
The data files used by python and Matlab are output to the plot_data directory.
Performance of the MCMC can be improved by using parameters which have a close to Gaussian posterior distribution. The default parameters (which get implicit flat priors) are
Parameters like H0 and omegal (ΩΛ) are derived from the above. Using theta rather than H0 is more efficient as it is much less correlated with other parameters. There is an implicit prior 40 < H0 < 100 (which can be changed).
The .txt chain files list derived parameters after the base parameters.
The list of parameter names and labels used in the default parameterization is listed in the supplied params_CMB.paramnames file.
Since the program uses a covariance matrix for the parameters, it knows about (or will learn about) linear combination degeneracies. In particular ln[10^10 A_s] - 2*tau is well constrained, since exp(-2tau)A_s determines the overall amplitude of the observed CMB anisotropy (thus the above parameterization explores the tau-A degeneracy efficiently). The supplied covariance matrix will do this even if you add new parameters.
Changing parameters does in principle change the results as each base parameter has a flat prior. However for well constrained parameters this effect is very small. In particular using theta rather than H_0 has a small effect on marginalized results.
The above parameterization does make use of some knowledge about the physics, in particular the (approximate) formula for the sound horizon.
To change parameterization make a new .paramnames file, then change sources/params_CMB.f90 to change the mapping of physical parameters to MCMC array indices, and also to read in your new .paramnames file.
Likelihoods that you use may also have their own nuisance parameters.
You are encouraged to examine what the code is doing and consider carefully
changes you may wish to make. For example, the results can depend on the
parameterization. You may also want to use different CAMB modules, e.g.
slow-roll parameters for inflation, or use a fast approximator. The main
source code files you may want to modify are
The .ini file comments should explain the other options.
Example: Since many people get this wrong, here is an illustration of what happens when generating plots from a tensor run set of chains (with prior r>0):
If you are not using parameter names you can set numbered limit parameters, e.g. limits12=0 N for parameter 12.
Incorrect result when limits[r02] is not set.
Correct result when setting limits[r02]=0 N.
For 2D plots smooth_scale_2D is the smoothing scale relative to the bin size
To compare two different sets of chains set compare_num=1 in the .ini file, and compare1 to the root name of some chains you have previously run GetDist on.
If plot_ext=py GetDist produces python '.py' files.
GetDist also produces a set of files in the plot_data directory that can be used by custom python scripts for plotting, independently of the plot_ext scripts described above.
See the CosmoMC python readme for details of how to use the plotting library.
If plot_ext=m GetDist produces Matlab '.m' files. Matlab support is not likely to be developed further, but maintained from
the previous version for convenience. Type file_root into a Matlab
window set to the directory containing the .m files to produce 1D marginalized plots.
You can use the blue Matlab script (in the mscripts) directory to change to a B&W-friendly colourmap (see also other colormaps in that directory).
Custom Matlab plots
Some Matlab scripts are also supplied for making custom Matlab plots using the files produced by GetDist (see also CosmoloGui). The scripts are in the mscripts directory - you will probably want to add this to your Matlab path using e.g. addpath('mscripts'). confid2D makes marginalized contour plots like this.
But now the python scripts do this much better and more easily.
Convergence diagnostics
The getdist program will output convergence diagnostics, both short summary information when getdist is run, and also more detailed information in the file_root.converge file. When running with MPI the first two of the parameters below can also be calculated when running the chains for comparison with a stopping criterion (see the .ini input file).
Differences between GetDist and MPI run-time statistics
GetDist will cut out ignore_rows from the beginning of each chain, then compute the R statistic using all of the remaining samples. The MPI run-time statistic uses the last half of all of the samples. In addition, GetDist will use all the parameters, including derived parameters. If a derived parameter has poor convergence this may show up when running GetDist but not when running the chain (however the eigenvalues of covariance of means is computed using only base parameters). The run-time values also use thinned samples (by default every one in ten), whereas GetDist will use all of them. GetDist will allow you to use only subsets of the chains.
Parameterizations
This defines what the input variables mean. Change this to use different
variables. You can change which parameterization file to use in the Makefile.
You can also change the num_cls number of (temperature plus polarization) Cls to compute and store, power spectrum parameter, etc.
This defines the maximum number of parameters and their types.
This reads in the CMB .dataset information and computes likelihoods.
You may wish to edit this, for example to use likelihood distributions
for the band powers, or to compute the likelihood from actual polarized data. This version assumes polarized data points are an arbitrary combination of the raw TT, TE, EE, and BB Cls, as specified in the window files in data/windows. WMAP data is handled as a special case.
This is the proposal density and related constants and subroutines. The efficiency
of MCMC is quite dependent on the proposal. Fast+slow and fast parameter subspaces are proposed separately. See the paper for a discussion of the proposal density and use of fast and slow parameters.
Routines for generating Cls, matter power spectra and sigma8 from CAMB.
Replace this file to use other generators, e.g. a fast approximator like
CMBfit, DASH, PICO, etc.
This is where you can add in new likelihood functions
Reads in .data files and re-calculates likelihoods or theory predictions. Unused in MCMC runs.
Calls the data likelihood functions, etc.
Main program that reads in parameters and calls MCMC or post-processing.
The "getdist" program for analysing chains. Write your own importance
weighting function or parameter mapping.Add-ons and extra datasets
Many old add-ons are unlikely to work with this version of CosmoMC. I will add ones here if and when they are updated.
Version History
Fixed problem initializing nuisance parameters. Updated CAMB to June 2008 version (fix for very closed models).
supernovae.f90 now replaced by default with UNION Supernovae Ia dataset (previous code now supernovae_ReissSNLS.f90; thanks to Anze Slosar).
Additions to Planck_like module; support for sampling and hence marginalizing over data nuisance parameters, point sources, beam uncertainty modes.
New GetDist option single_column_chain_files to support WMAP 5-year format chains (thanks to Mike Nolta): 1col_distparams.ini is a supplied sample input file. New GetDist option do_minimal_1d_intervals to calculate equal-likelihood 1-D limits (see 0705.0440, thanks to Jan Hamann). New GetDist option num_contours to produce more than two sets of limits.
Includes latest CAMB version with new reionization parameterization - default now assumes first ionization of helium happened at the same time as hydrogen, and z_re is defined as the point where xe is half its maximum (the optical depth and z_re are related in a way independent of the speed of the transition in the new parameterization). This changes the z_re numbers at the ~6% level. Fixed bug reading in mpk parameters.
Uses WMAP 5-year likelihood code. Added cmb_dataset_SZx and cmb_dataset_SZ_scalex parameters to specify (parameter independent) SZ template for each CMB dataset (WMAP_SZ_VBand.dat included from LAMBDA). Parameter 13 is now ASZ - the scaling of all the SZ templates, as used in WMAP3/WMAP5 papers. Updated supplied covariance params_CMB.covmat for WMAP5. Minor compatibility changes.
Added generic_mcmc in settings.f90 to easily use CosmoMC as generic sampling program without calling CAMB etc (write GenericLikelihoodFunction in calclike.f90 and use Makefile_nowmap). Added latest ACBAR dataset. CAMB update (including RECFAST 1.4). New Planck_like.f90 module for Cl likelihoods using approximation of arXiv:0801.0554 (also basic low-l likelihood). Added markerx GetDist parameters for adding vertical lines to parameter x in 1D Matlab plots. Various minor changes/compatibility fixes.
Updated CBI data. Compiler compatibility tweaks. Fixed error msg in mpk.f90. Minor CAMB update. Better error reporting in conjgrad_wrapper (thanks to Sam Leach).
(20th October 2006)Fixed k-scaling for SDSS LRG likelihood in mpk.f90. Changes for new version of WMAP likelihood code. Added out_dir and plot_data_dir options for GetDist. Minor compatibility fixes.
Added support for SDSS LRG data (astro-ph/0608632; thanks to Licia Verde, Hiranya Peiris and Max Tegmark). CAMB fixes and other minor changes.
Improved speed of GetDist 2D plot generation, added limitsxxx support when smoothing = F. Added sampling_method = 5,6, preliminary implementations of multicanonical and Wang-Landau sampling for nasty (e.g. multi-modal) distributions (currently does not compute Evidence, just generates weighted samples). Changed matter_power_minkh (cmbtypes) to work around potential rounding errors on some computers. Updated CAMB following August 2006 version. Added warning about missing limitsxx parameters to getdist. Added MPI_Max_R_ProposeUpdateNew parameter (when varying parameters that are fixed in covmat). Updated CBI data files.
Supernovae.f90 updated to use SNLS by default, edit to use Riess Gold. 2dF updated (twodf.f90 file deleted, use 2df_2005.dataset); covariance matrix support in mpk.f90.
Fixed bug using LSS with non-flat models. Improved error checking and Matlab 7 enhancements in getdist. Getdist auto column counting with columnnum=0, various hidden extra options now shown in sample distparams.ini. Extra fix for confid2D. Fixed MPI thinning bug in utils.F90. Makefile fixes. Fixed mpk.f90 analytical marginalization (since March 2006). SDSS likelihood now computed from k/h=1e-4.
Fixed bug in lya.f90 (SDSS lyman-alpha now the default; lya.f90 now includes Croft by default). Fixes to Confid2D Matlab script. Added .covmat files for WMAP with running and tensors, and basic Planck simulation. Fixed version confusion in GetDist (one-tail limits set to prior limit value).
Updated for 3-year WMAP. Added use_lya to include lyman-alpha data (standard LCDM only). Default in lya.f90 is LUQAS (can also compile with SDSSLy-a-v3.f90 for SDSS).
New Matlab scripts for producing solid contour and 4D plots. New checkpoint option to generate checkpoint files and continue terminated runs.
Added action=1 parameters redo_add (adds new likelihoods rather than recalculating) and redo_from_text (if you don't have .data files). Added pivot_k and inflation_consistency for use with default power spectrum parameterization.
Added get_sigma8 to force calculation of σ8. Updated .newdat CMB dataset format (also added B03 data files). New use_fast_slow parameter to turn on/off fast-slow optimizations. Fixed bug which resulted in occasional wrong tau values when importance sampling .data files. GetDist now outputs one/two-tail limit info in .margestats file. Updated CAMB version (support for non-linear matter power spectrum).
Updated CAMB for new accurate lensed Cl calculation of astro-ph/0502425. Minor changes to getdist (new Matlab_version input parameter, all_limits to set same limits for all parameters). cmbdata.f90 includes new format used by BOOMERANG/CBI for polarization.
Fixed bug in mpk.f90 when using 2df. Changes to GetDist for compatibility with Matlab 7. Fixed Makefile_intel (though now obsolete if you have Intel fortran v8).
Added mpk.f90 for reading in general (multiple) matter power spectrum data files in a similar way to CMB dataset files - corresponding changes to input parameter file. Included SDSS data files (note CosmoMC only models linear spectrum).
Various minor bug fixes and improved MPI error handling. Included (though not compiled by default) supernovae_riess_gold.f90 file to include more recent supernova data. Some mscripts fixes for compatibility with Matlab 7.
Improved proposal density for efficient handling of fast and slow parameters, plus more robust distance proposal (should see significant speed improvement). New sampling_method parameter: new options for slice sampling (robust) and directional gridding (robust and efficient use of fast parameters). Also option to use slice sampling for burn in (more robust than Metropolis in many cases), then switch to Metropolis (faster with good covariance matrix). See the notes for details. Improved MPI handling and minor bug fixes. Fixed effect of reionization on CAMB's lensed Cl results.
Uses June 2004 CAMB version: bessel_cache.camb file no longer produced or needed (prevents MPI problems). Increased sig figs of chain output and GetDist analysis. New parameter propose_scale, the ratio of proposal width to st. dev., default 2.4 (following Roberts, Gelman, Gilks, astro-ph/0405462) - often significantly speeds convergence (parameters in .ini file are now estimates of the st. dev., not desired proposal widths). Added MPI_R_StopProposeUpdate to stop updating proposal covariance matrix after a given convergence level has been reached. Added accuracy_level parameter to run at higher CAMB accuracy level (may be useful for forecasting).
Added new VSA and CBI datasets. Added first_band= option to .dataset files to cut out low l that aren't wanted. CAMB pivot scale for tensors changed to 0.05/MPc (same as scalar). Fixed various compiler compatibility issues. Corrected CMB_lensing parameter in sample .ini file. Fixed minor typo in params_CMB.f90. Fixed reading in of MPI_Limit_Converge parameter in driver.F90.
Fixed bounds checking in MatterPowerAt (harmless with 2df). Added an exact likelihood calculation/data format to cmbdata.f90 for polarized full sky CMB Cl.
Added MPI support, with stopping on convergence and optional proposal density updating. Added calculation of matter power spectrum at different redshifts using CAMB (settings in cmbtypes.f90). Fixed bug when restarting chains using "continue_from" parameter [March 2006: now obsolete], and a few compiler compatibility issues. Updated CAMB for more accurate non-flat model results. Added output of parameter auto-correlations to GetDist, along with support for ignore_rows<1 to cut out a fraction of the chain and percentile split-test error estimators. Changed proposal density to proposal a random number of parameter changes on each step. Added GetDist samples_are_chains option - if false, rows can be any samples of anything (starting in column one, without an importance weight or likelihood value as produced by CosmoMC) - useful for analysing samples that don't come from CosmoMC. Added GetDist auto_label parameter to label parameters automatically by their parameter number.
Fixed bug in MCMC.f90 affecting all raw chains - weights and likelihoods were displaced by one row. Post-processed results were correct, and effect on parameters is very small. Minor bug fixes in GetDist. Can now make file_root.read file to be read by all chains file_root_1, file_root_2, etc (this file is not auto-deleted after being read).
Added support for 'triangle' plots to GetDist (example. Set triangle_plot=T in the .ini file). If truncation is required, the covariance matrix for CMB data sets is now truncated (rather than truncating the inverse covariance). Fixed CAMB bug with non-flat models, and problem setting CAMB parameters when run separately from CosmoMC.
Fixed bug in GetDist - the .margestats file produced contained incorrect limits (the mean and stddev were OK)
Support for WMAP data (customized code fixes TE and amplitude bugs). CMB computation now uses Cl transfer functions - complete split possible between transfer functions and the initial power spectrum, so improved efficiency handling fast parameters. Bug fixes and tidying of proposal function. Initial power spectrum no longer assumed smooth for P_k. GetDist limitsxxx variables can be N to auto-size one end (margestats are still one tail). Support of IBM XL fortran (workarounds for bug on Seaborg). GetDist will automatically compute some chain statistics to help diagnose convergence and accuracy. CAMB updated, including more accurate and faster handling of tight coupling. Option to generate chains including CMB lensing effect. Various other changes.
Added support for polarization, and improved compatibility with different compilers and systems.
Reference links
See the BibTex file of CosmoMC references you should cite (including data likelihoods), along with some references of potential interest. These are the two main CosmoMC papers and some general sampling references:
call SetTheoryParameterNumbers(num_slow_params,num_semi_slow_params)which sets the number of hard and semi-hard parameters (the latter being things like initial power spectrum parameters) and then
call this%Init(Ini,Names, 'params_CMB.paramnames')to set the parameter names file.
Feel free to ask questions (and read answers to other people's) on the CosmoCoffee software forum. There is also a FAQ in the CosmoloGUI readme.