Back to Course

Cubes & Clouds

0% Complete
0/0 Steps
Lesson 12, Topic 1
In Progress

Cloud native data formats

What are cloud native data formats

Cloud native formats or cloud-optimized formats, are file formats specifically designed to optimize the storage, access, and processing of geospatial data in cloud computing environments. These formats are tailored to leverage the scalability, flexibility, and parallel processing capabilities of cloud infrastructure, enabling efficient handling of large-scale datasets.

YouTube

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

Video content in cooperation with Aimee Barciauskas (DevelopmentSeed) and Ryan Avery (DevelopmentSeed).

“Cloud-optimised means organizing so subsets of data can be read. Ideally, the data is also compressed. Both of these factors minimize the amount of data that has to be transferred across a network.”

Characteristics of cloud native data formats

Cloud-optimized means mainly optimized “read” access with partial reads and also parallel reads. Main characteristics common for cloud-optimized formats:

  • Data Chunking: Cloud native formats employ a chunk-based organization, where the data is divided into smaller chunks or blocks. This enables parallel processing and efficient retrieval of specific portions of the data, reducing the need to access the entire dataset.
  • Internal Indexing: These formats incorporate internal indexing structures that facilitate fast spatial and attribute queries. This enables efficient data access and retrieval operations without the need for extensive scanning or processing of the entire dataset.
  • Metadata Optimization: Cloud native formats optimize metadata storage and indexing, allowing for efficient access and retrieval of metadata associated with the data at once. This supports faster discovery and interpretation of data properties and characteristics.
  • Compression and Tiling: Cloud native formats often employ advanced compression techniques to reduce storage requirements while maintaining data quality. Additionally, they utilize tiling strategies to divide the data into smaller, manageable tiles that can be independently accessed and processed.

HTTP Range Request allows clients to request only a specific portion or range of data instead of a complete dataset.

Examples of cloud native data formats

  • COG – Cloud-Optimized GeoTIFF (COG) is an optimized version of the GeoTIFF format. It organizes raster data into chunks, utilizes internal tiling and compression, and uses HTTP range requests for efficient data access in the cloud.
  • ZARR Zarr is a format specifically designed for storing and accessing multidimensional arrays. It supports chunking, compression, and parallel processing, making it suitable for large-scale geospatial datasets, for example, weather data. Metadata is stored externally in data files itself.
  • FlatGeoBuf Cloud optimized vector data format. It is a binary encoding format for geodata and holds a collection of Simple Features.

Available Material