Properties and metadata used for filtering Copy
In the previous lesson, we learned where to find data. Now it is time to look at how we can select only the data we want. Properties and metadata are a great help when we are not interested in the whole collection, but only in certain regions, times or even bands of selected satellite products.
In (many of) the data catalogs, we can filter by specific values for each satellite. Let’s talk more about some of them shortly
- Dataset name/identifier: Filtering directly by name of a product as a unique name or identifier is assigned to the dataset, allowing it to be easily identified and referenced.
- Time range: The temporal coverage or specific dates associated with the data acquisition. By this, we can easily select products from the same location with different dates of acquisition. This is particularly relevant for time-series or multi-temporal datasets.
- Bounding box or other area of interest: Spatial Extent or the geographic coverage of the raster data, typically defined by the bounding coordinates (longitude and latitude) that encompass the dataset. Usually, you can select your own area or interest as a rectangle/geometry or use a map window to take the current extent
- Mission: You can select by satellite used for data acquisition. In the case of Sentinel missions, you can select only Sentinel-1 as an example
- Processing levels: Typically you can also select by processing level you are interested in for your missions. Typically Level-0 means unprocessed, raw data, and with a higher number represents more corrections were applied.
- Sensors or instrument: Selection by the output of a specific sensor or instrument.
- Cloud coverage/polarization: Based on a mission, a selection filter can be made by specific parameters. For multi-spectral data such as those captured by Sentinel-2, we can typically filter by cloud coverage in percentage. This is used as we are interested in lower cloud coverage percent for analysis. Similarly, specific polarizations can be selected for SAR data such as captured by Sentinel-1.
- Orbit number: Typically integer number selects a specific orbit number the user is interested in.
- Orbit direction: For the majority of data types the selection is either ascending or descending.
- Availability status/Timeliness: Some missions have different time availability for their data allowing them to select Near Real-Time acquisitions (approximately 3 hours after acquisition), raw data with a certain delay, or processed data later. It is also common that infrequently accessed data or older data than a certain threshold (years) can only be accessed ‘on demand’. Their ‘Availability status’ is usually set as ‘Archived’ or similar and they must be ‘tasked to be brought online from archive storage’ by the user. This operation usually takes minutes to hours to complete. The user of the platform can later access the catalog again and hopefully, the product will be brought online in the meantime.
In the examples above it also became more clear why a standardized way of expressing the metadata is important. For example, how do you know whether the cloud cover is a percentage expressed in a range from 0 to 1 or from 0 to 100? How would you filter on this property, if both scales were mixed?
Standardisation
Metadata and properties which are not typically used for filtering
Many metadata are contained within the product but are not used for filtering on the platforms or accessing hubs directly. Among these is for example Author of the dataset or License. You should be able to get all information about those in data metadata when you are accessing the data itself or on the general page with information.
Usually, it is unfortunately not possible to filter by or search by direct data properties, e.g. by values in data.
Dimensions
Dimensions are important descriptions of data and their properties. More information about dimensions of data and datacubes was covered in the lecture about Datacubes
- x, y and sometimes z – Spatial dimension of data
- temporal/time dimension – capturing the time aspect for time series analysis
Value Types (data types)
Once we have selected data products we are interested in, we can look directly into values to select what we are interested in. Common data types representing measured values are these:
- bitmask 0/1
- 8bit 0-255
- UInt16 – 0-65k
- Int16 – -32k – 32k
- Float32