ClusterChirp: Scalable Interactive Exploration of Omics Data with Natural Language-Guided Analysis
Abstract
High-dimensional omics datasets are routinely visualized as heatmaps, where color intensities reveal co-expression patterns and correlations. However, modern omics technologies increasingly generate matrices so large that existing visual exploration tools require down-sampling or filtering, causing loss of biologically important patterns. Additional barriers arise from tools that require command-line expertise, or fragmented workflows for downstream biological interpretation. We present ClusterChirp, a web-based platform for real-time exploration of large-scale data matrices. The platform combines GPU-accelerated rendering and parallelized hierarchical clustering using multiple CPU cores. Built on deck.gl and multi-threaded clustering algorithms, ClusterChirp supports on-the-fly clustering, multi-metric sorting, feature search and interactive visualization controls within a single interface. Uniquely, a natural language interface powered by a Large Language Model allows users to perform complex operations and build reproducible workflows through conversational commands. ClusterChirp further enables within-cluster correlation network analysis in 2D or 3D, and integrates functional enrichment through biological knowledge bases. Developed with iterative user feedback and adhering to FAIR4S principles, ClusterChirp enables users to extract insights from high-dimensional omics data with unprecedented ease and speed. It is freely available at clusterchirp.mssm.edu without login and is also distributed as a Dockerized application at ghcr.io/gumuslab/clusterchirp.