Short stay


Statistics - Irland

Contact details

Research topics


Cluster Analysis in the Human & Social Sciences

The goal of cluster analysis is to find meaningful groups or clusters in data. Members of a cluster have something in common that they do not share with members of other groups. The grouping of objects according to what they have in common goes back at least to the beginning of language. Aristotle was amongst the first to do clustering based on empirical data in the History of Animals. Empirical clustering became increasingly common over time, including Linnaeus' taxonomies of plants and animals, through to the advent of cluster analysis.

Model-based clustering involves developing cluster analysis methods from a statistical modelling viewpoint. This approach allows for the development of methods that accurately account for the structure of the problem being studied.

This project will develop model-based clustering methods for the complex data types that arise in applications in the humanities and social sciences. The project will be considering applications of clustering in social surveys, historical censuses, transportation, sustainability and voting.

Activities / Resume


Brendan is Full Professor of Statistics in the School of Mathematics and Statistics at University College Dublin.

His research interests include clustering, classification and latent variable modelling, particularly he is interested in applications from social sciences, food science, medicine and biology. He is the editor for Social Sciences and Government for the Annals of Applied Statistics and he has recently co-authored a research monograph on Model-Based Clustering and Classification.



Charles Bouveyron, Gilles Celeux, T. Brendan Murphy, Adrian E. Raftery (2019) Model-Based Clustering and Classification for Data Science, Cambridge University Press.


Ng, T.L.J., Murphy, T.B., McCormick, T., Fosdick, B. and Westling, T. (2021) Modeling the social media relationships of Irish politicians using a generalized latent space stochastic blockmodel. Annals of Applied Statistics, To appear.

Murphy, K., Murphy, T.B., Piccaretta, R. and Gormley, I.C. (2021) Clustering longitudinal life-course sequences using mixtures of exponential-distance models. Journal of the Royal Statistical Society, Series A., To appear.

Cappozzo, A., Greselin, F. and Murphy, T.B. (2020) Anomaly and novelty detection for robust semi-supervised learning. Statistics & Computing, 30, 1545—1571.

Murphy, K. and Murphy, T.B. (2020) Gaussian parsimonious clustering models with covariates and a noise component. Advances in Data Analysis & Classification, 14(2), 293—325.

Fop, M., Murphy, T.B. and Scrucca, L. (2019) Model-based clustering with sparse covariance matrices. Statistics & Computing, 29(4), 791—819.

Cappozzo, A., Greselin, F. and Murphy, T.B. (2020) A robust approach to model-based classification based on trimming and constraints. Advances in Data Analysis & Classification, 14(2), 327—354.

Hu, S., O'Hagan, A. and Murphy, T.B. (2018) Motor insurance claim modeling with factor collapsing and Bayesian model averaging. Stat, 7(1), e180.

Fop, M. and Murphy, T.B. (2018) Variable selection methods for model-based clustering. Statistics Surveys, 12, 18—65.

Fop, M., Smart, K. and Murphy, T.B. (2017) Variable selection for latent class analysis with application to low back pain diagnosis. Annals of Applied Statistics, 11(4), 2085—2115.

Scrucca, L., Fop, M., Murphy, T.B. and Raftery, A.E. (2016) mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models. The R Journal, 8(1), 205—233.