Display options
Share it on

IEEE Trans Pattern Anal Mach Intell. 2016 Nov;38(11):2156-2169. doi: 10.1109/TPAMI.2016.2515599. Epub 2016 Jan 07.

Bayesian Non-Parametric Clustering of Ranking Data.

IEEE transactions on pattern analysis and machine intelligence

Marina Meila, Harr Chen

PMID: 26761192 DOI: 10.1109/TPAMI.2016.2515599

Abstract

This paper studies the estimation of Dirichlet process mixtures over discrete incomplete rankings. The generative model for each mixture component is the generalized Mallows (GM) model, an exponential family model for permutations which extends seamlessly to top- t  rankings. While the GM  is remarkably tractable in comparison with other permutation models, its conjugate prior is not. Our main contribution is to derive the theory and algorithms for sampling from the desired posterior distributions under this DPM. We introduce a family of partially collapsed Gibbs samplers, containing as one extreme point an exact algorithm based on slice-sampling, and at the other a fast approximate sampler with superior mixing that is still very accurate in all but the lowest ranks. We empirically demonstrate the effectiveness of the approximation in reducing mixing time, the benefits of the Dirichlet process approach over alternative clustering techniques, and the applicability of the approach to exploring large real-world ranking datasets.

Publication Types