Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Save

Writing GeoTensors to COGs

UNEP
IMEO
MARS

Module: georeader/save.py (586 LOC; the empty save_cog.py is a deprecated stub) Role: the export side. Take a GeoTensor (or anything GeoData-shaped), write it to disk as a tiled GeoTIFF or a Cloud-Optimized GeoTIFF (COG). Handles cloud-storage destinations (gs://, s3://, az://) transparently.


1. Why COGs

A traditional GeoTIFF stores data as a single big image, with the header at the end of the file. To read any part of it you need to first download enough bytes to find the header, then seek into the data. Over HTTP this is awful — you pay for the round-trip on every read.

A Cloud Optimized GeoTIFF (COG) reorganises the same byte stream so:

┌─────────────────────────────────────────────────────────────────────────┐
│              CLOUD OPTIMIZED GEOTIFF STRUCTURE                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Traditional GeoTIFF            Cloud Optimized GeoTIFF (COG)           │
│  ────────────────────           ─────────────────────────────           │
│                                                                          │
│  ┌─────────────────┐            ┌─────────────────┐                     │
│  │ Header (end)    │            │ Header (start)  │ ← HTTP range 0-N   │
│  │                 │            │ ─────────────── │                     │
│  │                 │            │ Overview 8x     │ ← Pyramid layers   │
│  │    Full        │            │ Overview 4x     │   for fast zoom    │
│  │    Resolution  │            │ Overview 2x     │                     │
│  │    Data        │            │ ─────────────── │                     │
│  │                 │            │ ┌───┬───┬───┐  │ ← Tiled structure  │
│  │                 │            │ │256│256│256│  │   (default 256x256)│
│  │                 │            │ ├───┼───┼───┤  │                     │
│  │                 │            │ │256│256│256│  │                     │
│  └─────────────────┘            │ └───┴───┴───┘  │                     │
│                                 └─────────────────┘                     │
│                                                                          │
│  Benefits:                                                               │
│  • Read any region without downloading whole file                       │
│  • Fast preview via overviews (pyramid layers)                          │
│  • Efficient streaming for web mapping applications                     │
│  • Compatible with all GeoTIFF readers                                  │
└─────────────────────────────────────────────────────────────────────────┘

The cost: COG files are slightly larger than minimal GeoTIFFs (overviews add ~33% on top of full resolution). The benefit: for any file you’ll touch over HTTP, COG is transformative — RasterioReader("https://...cog.tif") becomes interactive.

The practical rule: save everything as COG by default unless you have a reason not to.


2. Compression options

┌─────────────────────────────────────────────────────────────────────────┐
│                    COMPRESSION RECOMMENDATIONS                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Compression    Best For                    Speed    Size                │
│  ───────────    ────────                    ─────    ────                │
│  lzw           General purpose (DEFAULT)    Fast     Medium             │
│  deflate       Better compression ratio     Medium   Small              │
│  zstd          Modern, fast + small         Fast     Small              │
│  lzma          Maximum compression          Slow     Smallest           │
│  jpeg          RGB imagery (lossy)          Fast     Tiny               │
│  webp          RGB imagery (lossy/lossless) Fast     Tiny               │
│  none          Already compressed data      N/A      Original           │
│                                                                          │
│  For scientific data: Use lzw or zstd (lossless)                        │
│  For visualization: Consider jpeg or webp (lossy ok)                    │
│  For integer masks: Use deflate with predictor=2                        │
└─────────────────────────────────────────────────────────────────────────┘

The package default is lzw — a sensible “fast and lossless” choice that’s universally supported by GeoTIFF readers (going back decades). Three other choices show up often:

Override via profile_arg={"compress": "zstd"}.


3. Data type mapping

┌─────────────────────────────────────────────────────────────────────────┐
│  NumPy dtype     GeoTIFF dtype    Notes                                 │
│  ────────────    ─────────────    ─────                                 │
│  uint8           Byte             8-bit unsigned (0-255)                │
│  uint16          UInt16           16-bit unsigned (0-65535)             │
│  int16           Int16            16-bit signed                         │
│  uint32          UInt32           32-bit unsigned                       │
│  int32           Int32            32-bit signed                         │
│  float32         Float32          32-bit floating point                 │
│  float64         Float64          64-bit floating point                 │
│  complex64       CFloat32         Complex 32-bit                        │
│  bool            Byte             Converted to 0/1                      │
└─────────────────────────────────────────────────────────────────────────┘

The bool → Byte conversion is the only sneaky one — booleans are written as uint8 {0, 1} because GeoTIFF has no boolean type. Reading them back gives a uint8 array, and you’d usually > 0 to convert it back to bool. The validmask/invalidmask GeoTensor properties (Chapter 1) are aware of this.

profile_from_dtype(dtype) gives you the rasterio profile dict for any of these — used internally by save_cog.


4. The two save functions

save_cog(data_save, path_tiff_save, descriptions=None, tags=None, profile_arg=None, ...)

The standard export. Steps:

  1. Materialise data_save to an in-memory ndarray (calls .load() if it’s a reader).

  2. Build a profile from the dtype + the COG defaults (LZW, 256×256 blocks, BIGTIFF if needed).

  3. Apply user-supplied profile_arg overrides.

  4. Write a tiled GeoTIFF.

  5. Build overviews via rasterio.rio.overview (powers of 2 down to ≤256 pixels).

  6. Reorganise the file via rasterio_shutil.copy(..., COG_DRIVER) so the header and overviews land at the start.

The two-pass approach (write, then reorganise) is what GDAL’s COG driver does internally — this module just orchestrates it via rasterio.

save_tiled_geotiff(data_save, path_tiff_save, ...)

The “no overviews” variant. Same profile, same compression, same tiling, but skips the overview-building and reorganisation steps. Cheaper to write; you can’t preview at low res over HTTP. Use when:

Source: save.py:163 (save_tiled_geotiff), save.py:327 (save_cog).


5. Cloud destinations

save_cog("gs://bucket/key.tif", ...) works. The implementation:

The constant REMOTE_FILE_EXTENSIONS at the top of the module is the prefix list. Extending support to a new cloud (oss:// etc.) is mostly a one-line addition there + ensuring the right fsspec backend is installed.

For HTTP destinations, the upload phase requires a destination that accepts PUTs (presigned S3 URLs work; static HTTP URLs do not). Most “I want to write to https://...” workflows mean “actually I want to write to s3://, then serve via https://” — pass the s3:// URL.

The fs parameter is the escape hatch for custom auth: pass fs=gcsfs.GCSFileSystem(token=...) to use a non-default credential.


6. The signature

save_cog(
    data_save,                          # GeoData (GeoTensor or RasterioReader)
    path_tiff_save,                     # str — local path or gs://, s3://, etc.
    descriptions=None,                  # list[str] — per-band names
    tags=None,                          # dict[str, str] — file-level metadata
    profile_arg=None,                   # dict — overrides the default profile
    fs=None,                            # fsspec filesystem for cloud writes
    nodata=None,                        # override fill_value_default
)

A few non-obvious points:


7. Idiomatic usage

Save an in-memory GeoTensor as a COG with band names and provenance:

save.save_cog(
    ndvi_gt,
    "output/ndvi_2024_06.tif",
    descriptions=["NDVI"],
    tags={"source": "S2_L2A", "tile": "T29SND", "date": "2024-06-15"},
    profile_arg={"compress": "zstd"},
)

Save a lazy reader to cloud storage:

save.save_cog(
    rasterio_reader_for_huge_scene,
    "gs://bucket/output.tif",
)

save_cog calls data_save.load() internally (step 1 of section 4), so the full scene is materialised in memory before the file is written and uploaded. This is fine for scenes that fit in RAM; for genuinely huge scenes that don’t, this function is the wrong tool — use a manual window-loop pattern instead, opening the destination dataset with rasterio.open(path, "w", **profile) and calling dataset.write(arr, window=...) per chunk so that no buffer ever holds the full output.


8. Internal helpers worth knowing


9. Sharp edges


10. Connection to geotoolz

Two geotoolz operators wrap this module:

The empty save_cog.py file alongside save.py is the back-compat stub from when save_cog lived in its own module. Don’t import from it; use from georeader.save import save_cog (or the package-level re-export from __init__.py).

Next chapter: Utilities — the smaller utility modules (io.py, dataarray.py, plot.py).