Zarr-Based Chunk-Level Cumulative Sums in Reduced Dimensions

Abstract

Data analysis on massive multi-dimensional data, such as high-resolution large-region time averaging or area averaging for geospatial data, often involves calculations over a significant number of data points. While performing calculations in scalable and flexible distributed or cloud environments is a viable option, a full scan of large data volumes still serves as a computationally intensive bottleneck, leading to significant cost. This paper introduces a generic and comprehensive method to address these computational challenges. This method generates a small, size-tunable supplementary dataset that stores the cumulative sums along specific subset dimensions on top of the raw data. This minor addition unlocks rapid and cheap high-resolution large-region data analysis, making calculations over large numbers of data points feasible with small instances or even microservices in the cloud. This method is general-purpose, but is particularly well-suited for data stored in chunked, cloud-optimized formats and for services running in distributed or cloud environments. We present a Zarr extension proposal to integrate the specifications of this method and facilitate its straightforward implementation in general-purpose software applications. Benchmark tests demonstrate that this method, implemented in Amazon Web services (AWS), significantly outperforms the brute-force approach used in on-premises services. With just 5% supplemental storage, this method achieves a performance that is 3-4 orders of magnitude (~10,000 times) faster than the brute-force approach, while incurring significantly reduced computational costs.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…