PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells

Abstract

Motivation: New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous singlecell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. Results: We introduce a highly scalable graph-based clustering algorithm PARC—Phenotyping by Accelerated Refined Community-partitioning—for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graphclustering alg

Publication
Bioinformatics
Hayden Kwok-Hay So
Hayden Kwok-Hay So
Associate Professor