Efficient clustering of populations using a minimal SNP panel.
Belcaid M, Baek K, Haymer D, Poisson G.
Belcaid M, Baek K, Haymer D, Poisson G. (2011) Efficient clustering of populations using a minimal SNP panel. Proceedings of the 2011 ACM Symposium on Applied Computing 83-88
The recent explosion in available SNP data requires exploration of satisfactory methods for efficiently clustering a set of individuals into their respective populations. As a practical matter, we describe a modified euclidean distance-based approach for successfully clustering 525 HapMap individuals into their 4 original populations; European, African, Japanese and Chinese. Our approach relies on the computation of the Fst estimator using 4 distinct methods, and shows that that the k-means clustering of the 10 highest Fst scoring SNPs is sufficient for producing an error-free description of the underlying population structure. A generalization of our approach represents a more “faithful” way of selecting the SNPs having the highest discriminating power and thus generating the most accurate population specific assignments of individuals, using the smallest possible SNP panel.