Clouds play an important role in the Earth’s energy budget and their behavior is one of the largest uncertainties in future climate projections. Over a decade of multi-spectral satellite observations should help in understanding cloud responses, but the complexity and size of this dataset have left it under-utilized. This study employs deep learning to reduce the dimensionality of satellite cloud observations by grouping them via a novel automated cloud classification technique using a convolutional neural network to provide ‘unsupervised classification’. By combining a rotational-invariant autoencoder and hierarchical agglomerative clustering, we can generate cloud clusters that detect meaningful distinctions between cloud textures, using only raw multispectral imagery as an input. That is, cloud classes are defined without reliance on location, time/season, derived physical properties, or pre-designated class definitions. We use this approach to generate a unique new cloud dataset, the AI-driven cloud classification atlas (AICCA), which clusters 22 years of ocean cloud images from the Moderate Resolution Imaging Spectroradiometer (MODIS) on NASA’s Aqua and Terra instruments – 800 TB of data or 160 million of roughly 100 km x 100 km patches (128 x 128 pixels) – into 42 AI-generated cloud class labels. We show that the AICCA classes create meaningful distinctions that utilize information on spatial structure: while AICAA cloud classes show consistent physical properties, they cannot be reproduced solely from the mean cloud properties in each patch. AICCA classes also show strong geographic and temporal distributions, capturing e.g. stratocumulus decks along the West coast of N. and S. America, or high-latitude clouds that appear only in local summer. The data-driven unsupervised learning approach allows the discovery of unknown but relevant cloud patterns, and resulting atlases like AICCA can improve our understanding of cloud responses and cloud feedback in a warming climate. The AICCA dataset also helps democratizing a vast amount of climate dataset by facilitating access to the core data.

Registration is required.