Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Sentinel-2

Sentinel-2 SAFE products (L1C and L2A)

UNEP
IMEO
MARS

Module: georeader/readers/S2_SAFE_reader.py (1845 LOC — the largest file in the package) Role: read Sentinel-2 imagery in the official SAFE product format. Both Level-1C (top-of-atmosphere reflectance) and Level-2A (atmospherically-corrected surface reflectance) are supported, from local folders or Google Cloud’s free public bucket.


1. Why this module is so large

A Sentinel-2 SAFE product is not one file. It’s a folder hierarchy with:

The module’s job is to hide all of that behind a single class that behaves like a GeoData (Chapter 2) — s2.shape, s2.transform, s2.read_from_bounds(...), s2.load() all work as if S2 were one file.

The 1845 LOC accommodates: SAFE-folder discovery, XML parsing, granule resolution, per-band JP2 stacking via RasterioReader, DN→radiance conversion, SRF extraction, multi-resolution band alignment, and Google Cloud Storage path translation. All of that is invisible from the public API surface, which is essentially three classes (S2Image, S2ImageL1C, S2ImageL2A) and one factory (s2loader).


2. L1C vs L2A — the two product levels

┌─────────────────────────────────────────────────────────────────────────┐
│                 SENTINEL-2 PROCESSING LEVELS                             │
│                                                                          │
│   Level-1C (L1C)                      Level-2A (L2A)                     │
│   ─────────────────                   ─────────────────                  │
│                                                                          │
│   ☀️ Sun                               ☀️ Sun                             │
│    │                                   │                                 │
│    ▼                                   ▼                                 │
│   ┌─────────┐                        ┌─────────┐                        │
│   │Atmosphere│ ◄─ NOT corrected      │Atmosphere│ ◄─ CORRECTED          │
│   └────┬────┘                        └────┬────┘                        │
│        │                                  │                              │
│        ▼                                  ▼                              │
│   ┌─────────┐                        ┌─────────┐                        │
│   │ Surface │                        │ Surface │                        │
│   └─────────┘                        └─────────┘                        │
│        │                                  │                              │
│        ▼ 🛰️                              ▼ 🛰️                           │
│                                                                          │
│   TOA Reflectance                     BOA Reflectance                   │
│   - Includes atmospheric effects      - Surface reflectance             │
│   - Globally available                - Atmospheric correction applied  │
│   - Can convert to radiance           - Scene Classification (SCL)     │
│   - 13 bands (incl. B10 cirrus)       - 12 bands (no B10)              │
│                                                                          │
│   Use for:                            Use for:                          │
│   - Radiance-based analysis           - Land cover mapping              │
│   - Custom atmospheric correction     - Vegetation indices (NDVI)       │
│   - Cloud studies (B10)               - Change detection                │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

The key technical differences:

For ML pipelines you almost always want L2A — the atmospheric correction is consistent across scenes and removes much of the haze/illumination variability that confuses CNN training. For radiative-transfer studies that need to handle their own correction, you want L1C.


3. Spectral bands — the canonical S2 table

Band │ Central λ │ Bandwidth │ Resolution │ L1C │ L2A │ Description
─────┼───────────┼───────────┼────────────┼─────┼─────┼─────────────────────
B01  │   443 nm  │   20 nm   │    60m     │  ✓  │  ✓  │ Coastal/Aerosol
B02  │   490 nm  │   65 nm   │    10m     │  ✓  │  ✓  │ Blue
B03  │   560 nm  │   35 nm   │    10m     │  ✓  │  ✓  │ Green
B04  │   665 nm  │   30 nm   │    10m     │  ✓  │  ✓  │ Red
B05  │   705 nm  │   15 nm   │    20m     │  ✓  │  ✓  │ Red Edge 1
B06  │   740 nm  │   15 nm   │    20m     │  ✓  │  ✓  │ Red Edge 2
B07  │   783 nm  │   20 nm   │    20m     │  ✓  │  ✓  │ Red Edge 3
B08  │   842 nm  │  115 nm   │    10m     │  ✓  │  ✓  │ NIR
B8A  │   865 nm  │   20 nm   │    20m     │  ✓  │  ✓  │ NIR Narrow
B09  │   945 nm  │   20 nm   │    60m     │  ✓  │  ✓  │ Water Vapour
B10  │  1375 nm  │   30 nm   │    60m     │  ✓  │  ✗  │ Cirrus (L1C only)
B11  │  1610 nm  │   90 nm   │    20m     │  ✓  │  ✓  │ SWIR 1
B12  │  2190 nm  │  180 nm   │    20m     │  ✓  │  ✓  │ SWIR 2

This table is the reference you’ll come back to. Three things worth highlighting:

The module exports the canonical lists as BANDS_S2 (full 13 for L1C), BANDS_S2_L1C (alias), and BANDS_S2_L2A (12 bands; no B10).


4. The class hierarchy

S2Image                    # base class — do not instantiate directly
├── S2ImageL1C             # L1C-specific: DN→radiance, no SCL
└── S2ImageL2A             # L2A-specific: SCL band, no B10

S2Image lives at S2_SAFE_reader.py:295. It implements the GeoData protocol (Chapter 2 §3) plus S2-specific machinery:

S2ImageL1C (S2_SAFE_reader.py:1000) adds:

S2ImageL2A (S2_SAFE_reader.py:918) adds:


5. Constructor signature (common to both subclasses)

S2ImageL2A(
    s2folder,                    # path to .SAFE folder (local or gs://)
    polygon=None,                # Shapely polygon (EPSG:4326) for AOI
    granules=None,               # dict[band → JP2 path]; auto-discovered if None
    out_res=10,                  # 10, 20, or 60 — output resolution in meters
    window_focus=None,           # rasterio Window for sub-region (in out_res grid)
    bands=None,                  # list[str] — band subset; defaults to all 12/13
    metadata_msi=None,           # explicit path to MTD_MSI*.xml; auto-located if None
)

A few non-obvious points:


6. The factory: s2loader

def s2loader(
    s2folder,
    polygon=None,
    out_res=10,
    window_focus=None,
    bands=None,
    metadata_msi=None,
)

Located at S2_SAFE_reader.py:1603. The factory you should use 95% of the time:

Two related convenience functions for common catalog systems:

Both wrap s2loader after extracting the SAFE path from the feature’s assets.


7. The Google Cloud public bucket

gs://gcp-public-data-sentinel-2/ (constant: FULL_PATH_PUBLIC_BUCKET_SENTINEL_2) is a free, no-auth-required mirror of the entire Sentinel-2 archive. The s2_public_bucket_path(...) function (S2_SAFE_reader.py:1739) constructs paths from (tile_number_field, datetime, processing_baseline) — useful when you have a SAFE name from a catalog query and need to turn it into a gs:// URL.

The standard “load any S2 scene from anywhere” recipe:

from georeader.readers.S2_SAFE_reader import s2loader

path = "gs://gcp-public-data-sentinel-2/tiles/29/S/ND/.../S2A_MSIL2A_20240615T...SAFE"
s2 = s2loader(path, out_res=10)

# s2 is an S2ImageL2A — behaves like a GeoData
gt = read.read_from_polygon(s2, my_aoi, crs_polygon="EPSG:4326")

The lazy access pattern matches what you’d get with a single-file RasterioReader — which is the design goal.


8. SRF reading

read_srf(s2obj=None, mission=None, ...) (S2_SAFE_reader.py:1411) returns a pd.DataFrame with the published S2 SRFs — wavelength index, one column per band — exactly as ESA distributes them. Use this for the spectral-binning recipe in Chapter 11 §8:

srf_df = read_srf(mission="S2A")            # or pass s2obj
e_per_band = integrated_irradiance(srf_df)  # band-integrated solar irradiance
toa_refl = radiance_to_reflectance(s2_radiance, e_per_band, ...)

The DataFrame uses the canonical band naming (B01, B02, ..., B12) so it lines up with the band axis of an S2 GeoTensor directly.


9. Function reference

Classes

Loaders

Helpers

Calibration

File / cloud

Constants


10. Sharp edges


11. Connection to geotoolz

The whole presets.s2 block in geotoolz.md §1.2 sits on top of this module:

Sentinel-2 is the most-used sensor in the package’s ecosystem, and this module’s design (single class subclassing GeoData, lazy granule access, multi-resolution alignment) is the template the other big sensor readers (emit, prisma, enmap) follow.

Next chapter: Hyperspectral — the curvilinear-sensor trio (EMIT, PRISMA, EnMAP) and their shared design pattern.