Theme III - Statistical Foundations

Editor: Mark Naylor

Authors: Mark Naylor, Katerina Orfanogiannaki, and David Harte

Abstract: This article will take you through an exploratory analysis of data contained in earthquake catalogues. The aim is to provide the reader with ideas about how to start investigating the properties of a new dataset in a straightforward and rigorous way. We start to introduce more advanced concepts, such as how to determine catalogue completeness, but reserve detailed descriptions of such advanced methodologies to other articles.

The target audience is undergraduate and graduate students who would like to use SSLib (Harte and Brownrigg 2010) and the R Language (Team 2010) to explore earthquake data. We have chosen to use R because it is freely available on all platforms and hope this makes the tutorial as accessible as possible. This article focusses on data exploration rather than being comprehensive guide to the application.

You will learn about basic plotting tools that can be used to explore the properties of earthquake data and visually identify diffculties in choosing a subset of the a total catalogue for subsequent analysis. This section provides an introductory overview but does not provide technical solutions to those problems.

Download full article

Recommended statistical references

Regression

Available from the R Project website is Practical Regression and Anova using R. The emphasis of this text is on the practice of regression and analysis of variance. The objective is to learn what methods are available and more importantly, when they should be applied. Many examples are presented to clarify the use of the techniques and to demonstrate what conclusions can be made.

Gaussian Process

The Gaussian Processes Web Site site aims to provide an overview of resources concerned with probabilistic modeling, inference and learning based on Gaussian processes.

Bootstrapping

Bootstrapping is a general approach to statistical inference based on building a sampling distribution for a statistic by resampling from the data at hand. Bootstrapping Regression Models gives an introduction focussing on the nonparametric bootstrap.

MCMCglmm

MCMCglmm is an open source package for R. The course notes provide an accessible introduction to generalised linear mixture modelling (glmm) and Markov Chain Monte Carlo (MCMC) techniques with good examples. There is also informative discussion of other issues such as over dispersion and how to deal with it.

Model selection

Model Selection and Multimodel Inference is an excellent introductory book on the use of information criteria to choose between models.