AI & Agents

How to Extract Metadata from Geospatial Files (GeoTIFF, Shapefile)

Geospatial metadata describes the spatial properties of geographic data files: coordinate reference systems, bounding boxes, pixel resolution, band counts, and attribute schemas. This guide covers practical methods for extracting that metadata from GeoTIFF, Shapefile, GeoJSON, and GeoPackage formats using command-line tools, Python libraries, and desktop GIS software.

Fast.io Editorial Team 9 min read
Extracting structured metadata from geospatial file formats

What Geospatial Metadata Contains

Geospatial metadata describes the spatial properties of geographic data files, including coordinate reference systems, bounding boxes, resolution, band information, and attribute schemas. Without this metadata, a GeoTIFF is just a grid of numbers and a Shapefile is just a collection of coordinates with no real-world reference.

The specific metadata varies by format, but most geospatial files carry these core properties:

  • Coordinate Reference System (CRS): Defines how coordinates map to locations on Earth. Common systems include WGS 84 (EPSG:4326) for latitude/longitude and UTM zones for projected coordinates.
  • Bounding box (extent): The geographic rectangle that encloses all features or pixels in the file, expressed as min/max coordinates.
  • Resolution: For raster files like GeoTIFF, this is the ground distance each pixel represents. A 30-meter resolution means each pixel covers a 30x30 meter area.
  • Band information: Raster files often contain multiple bands (red, green, blue, near-infrared). Metadata describes band count, data type, and no-data values.
  • Attribute schema: Vector files like Shapefiles carry tabular attributes for each feature. The schema defines column names, data types, and field widths.
  • Provenance: Processing history, creation dates, source imagery details, and accuracy statements.

The Open Geospatial Consortium (OGC), with over 450 member organizations, maintains standards like ISO 19115 and ISO 19139 that define how geospatial metadata should be structured and shared. In practice, most day-to-day metadata extraction focuses on CRS and extent rather than full ISO-compliant metadata records.

Helpful references: Fast.io Workspaces, Fast.io Collaboration, and Fast.io AI.

Reading Metadata from GeoTIFF Files

GeoTIFF is the most common raster format in GIS. It embeds geospatial metadata directly in TIFF tags, so any GDAL-compatible tool can read it without sidecar files.

Using gdalinfo on the command line

The fast way to inspect a GeoTIFF is gdalinfo, which ships with every GDAL installation:

gdalinfo elevation.tif

This prints the driver, file size, CRS in WKT format, the geotransform (origin and pixel size), corner coordinates in both projected and geographic units, band count, data type, and any embedded metadata tags. For a quick CRS check, pipe the output through grep:

gdalinfo elevation.tif | grep "EPSG"

To get machine-readable output, use the JSON flag:

gdalinfo -json elevation.tif

This returns a structured JSON object with coordinateSystem, cornerCoordinates, size, and bands fields that you can parse with jq or any scripting language.

Using rasterio in Python

Rasterio wraps GDAL in a clean Python API. Opening a file gives you immediate access to all spatial metadata:

import rasterio

with rasterio.open("elevation.tif") as src:
    print("CRS:", src.crs)
    print("Bounds:", src.bounds)
    print("Resolution:", src.res)
    print("Dimensions:", src.width, "x", src.height)
    print("Band count:", src.count)
    print("Data types:", src.dtypes)
    print("No-data value:", src.nodata)
    print("Transform:", src.transform)

The src.crs property returns a CRS object that you can convert to EPSG codes, WKT, or PROJ strings. The src.bounds property returns a BoundingBox named tuple with left, bottom, right, top attributes. The src.transform is an affine matrix that maps pixel coordinates to geographic coordinates.

For batch processing, wrap this in a function that returns a dictionary:

import rasterio
from pathlib import Path

def extract_raster_metadata(filepath):
    with rasterio.open(filepath) as src:
        return {
            "filename": Path(filepath).name,
            "crs": str(src.crs),
            "epsg": src.crs.to_epsg(),
            "bounds": dict(zip(
                ["west", "south", "east", "north"],
                src.bounds
            )),
            "resolution": src.res,
            "dimensions": (src.width, src.height),
            "bands": src.count,
            "dtype": src.dtypes[0],
            "nodata": src.nodata,
        }

Using QGIS

Right-click any raster layer in QGIS and select Properties > Information. This panel shows the CRS, extent, pixel size, band count, and statistics for each band. You can also access this through the Processing Toolbox under Raster information, which generates a text report you can export.

Structured metadata extraction and analysis interface

Reading Metadata from Shapefiles

A Shapefile is actually a bundle of files that each store different metadata. The .shp file holds geometry, .dbf holds attribute data, .prj holds the CRS definition, and .shx provides a spatial index. Optional files like .cpg (character encoding) and .shp.xml (ISO metadata) carry additional information.

Using ogrinfo on the command line

ogrinfo is the vector equivalent of gdalinfo:

ogrinfo -so parcels.shp parcels

The -so flag gives a summary instead of dumping every feature. The output includes geometry type, feature count, spatial extent, CRS, and the full attribute schema with field names, types, and widths. For just the extent and CRS:

ogrinfo -so parcels.shp parcels | grep -E "Extent|Layer SRS"

Using Fiona in Python Fiona provides a Pythonic interface to vector file metadata:

import fiona

with fiona.open("parcels.shp") as src:
    print("Driver:", src.driver)
    print("CRS:", src.crs)
    print("Schema:", src.schema)
    print("Feature count:", len(src))
    print("Bounds:", src.bounds)

The src.schema property returns a dictionary with geometry (the geometry type) and properties (an ordered dictionary of field names to field types like str, int, float, date). This is useful for validating that a Shapefile contains the expected columns before processing.

Using GeoPandas

GeoPandas reads Shapefiles into a GeoDataFrame and exposes metadata through familiar pandas patterns:

import geopandas as gpd

gdf = gpd.read_file("parcels.shp")
print("CRS:", gdf.crs)
print("Bounds:", gdf.total_bounds)
print("Geometry types:", gdf.geom_type.unique())
print("Columns:", list(gdf.columns))
print("Feature count:", len(gdf))

The gdf.total_bounds returns a numpy array of [minx, miny, maxx, maxy]. You can also call gdf.crs.to_epsg() to get the EPSG code, or gdf.crs.to_wkt() for the full WKT string.

Fastio features

Catalog Your Geospatial Deliverables Without Scripts

Fast.io Metadata Views extract CRS, extent, and custom fields from uploaded files into a searchable spreadsheet. Upload your GeoTIFFs and Shapefiles, define the metadata columns you need, and share the results with your team. 50 GB free, no credit card required.

Working with GeoJSON and GeoPackage

GeoJSON and GeoPackage have become common alternatives to Shapefile, each with different metadata characteristics.

GeoJSON GeoJSON files are plain JSON, so you can inspect metadata with any text editor or JSON parser. The CRS is almost always WGS 84 (EPSG:4326) per the RFC 7946 specification. Feature properties are embedded directly in the JSON structure. You can extract the bounding box and property schema with a few lines of Python:

import json

with open("buildings.geojson") as f:
    data = json.load(f)

features = data["features"]
print("Feature count:", len(features))
print("Properties:", list(features[0]["properties"].keys()))

For the bounding box, Fiona and GeoPandas work with GeoJSON the same way they work with Shapefiles. Just pass the .geojson file path to fiona.open() or gpd.read_file().

GeoPackage

GeoPackage (.gpkg) is a SQLite-based format that can store multiple vector layers, raster tiles, and metadata tables in a single file. Its built-in gpkg_contents and gpkg_spatial_ref_sys tables hold layer-level metadata. You can query these directly with SQLite:

sqlite3 survey.gpkg "SELECT table_name, srs_id, min_x, min_y, max_x, max_y FROM gpkg_contents;"

With ogrinfo, list all layers first, then inspect a specific one:

ogrinfo survey.gpkg
ogrinfo -so survey.gpkg roads

GeoPackage supports raster and vector data in the same file, making it useful for projects that need both elevation models and feature layers stored together. The metadata tables also store descriptions, last-change timestamps, and data type indicators for each layer.

Batch Processing and Automation

Real-world GIS projects rarely involve a single file. A satellite imagery archive might contain thousands of GeoTIFFs across different CRS zones. A municipal dataset might include hundreds of Shapefiles from different departments. Extracting metadata at scale requires automation.

Python batch script

Here is a script that scans a directory for common geospatial formats and builds a metadata catalog as a CSV:

import rasterio
import fiona
import csv
from pathlib import Path

def catalog_directory(directory, output_csv):
    records = []
    raster_exts = {".tif", ".tiff", ".geotiff"}
    vector_exts = {".shp", ".geojson", ".gpkg"}

for path in Path(directory).rglob("*"):
        if path.suffix.lower() in raster_exts:
            with rasterio.open(path) as src:
                records.append({
                    "file": str(path),
                    "type": "raster",
                    "crs": str(src.crs),
                    "bounds": str(src.bounds),
                    "resolution": str(src.res),
                })
        elif path.suffix.lower() in vector_exts:
            with fiona.open(path) as src:
                records.append({
                    "file": str(path),
                    "type": "vector",
                    "crs": str(src.crs),
                    "bounds": str(src.bounds),
                    "features": len(src),
                })

with open(output_csv, "w", newline="") as f:
        writer = csv.DictWriter(
            f, fieldnames=records[0].keys()
        )
        writer.writeheader()
        writer.writerows(records)

This gives you a spreadsheet of every file's CRS, extent, and basic properties. You can sort by CRS to find files that need reprojection, or filter by bounds to find files covering a specific area.

Cloud-based extraction with Fast.io

For teams managing large collections of geospatial deliverables, manually running scripts on each batch gets tedious. Fast.io's Metadata Views let you describe the fields you want extracted in plain language, and AI builds a typed schema, matches files in your workspace, and populates a sortable, filterable spreadsheet. You can define columns like "coordinate system," "bounding box," "resolution," and "band count," then let the extraction run across your entire workspace. New files added later get processed automatically without rerunning scripts.

This works alongside traditional GIS tools rather than replacing them. Use GDAL and Python for detailed technical analysis, and Metadata Views for cataloging and sharing extracted metadata with team members who do not have GDAL installed. Fast.io workspaces also support file versioning and audit trails, so you can track when geospatial deliverables were updated and by whom.

Troubleshooting Common Issues

Missing or unknown CRS

Some GeoTIFF files lack embedded CRS information, returning None from rasterio or showing "Coordinate System is: (unknown)" in gdalinfo. This usually happens when the file was exported from software that did not write the CRS tags. You can assign a CRS without modifying the pixel data:

gdal_edit.py -a_srs EPSG:4326 untagged.tif

For Shapefiles, a missing .prj file causes the same problem. Create a .prj file containing the WKT definition for the correct CRS, or use ogr2ogr to assign one during conversion.

CRS mismatch between files

When combining files from different sources, CRS mismatches are the most common cause of features appearing in the wrong location. Always check CRS before any spatial operation. Rasterio and GeoPandas both provide reprojection methods:

gdf_reprojected = gdf.to_crs(epsg=4326)

For rasters, use gdalwarp to reproject to a target CRS:

gdalwarp -t_srs EPSG:4326 input.tif output.tif

Truncated attribute names in Shapefiles

The DBF format limits field names to 10 characters. Longer names get silently truncated during export, which can break downstream scripts that expect specific column names. If you need longer field names, consider migrating to GeoPackage or GeoJSON, which have no such limitation.

Large file performance

Reading metadata from large GeoTIFFs (multi-gigabyte satellite scenes) is fast because tools like rasterio only read the file header, not the pixel data. However, if you accidentally trigger a full read (for example, calling .read() instead of just accessing .meta), memory usage will spike. Stick to metadata-only operations when cataloging files.

Frequently Asked Questions

How do I view metadata in a GeoTIFF file?

Run `gdalinfo yourfile.tif` on the command line for a full metadata report including CRS, extent, resolution, and band information. In Python, open the file with rasterio and access properties like `src.crs`, `src.bounds`, `src.res`, and `src.count`. In QGIS, right-click the layer and select Properties > Information.

What metadata is stored in a Shapefile?

A Shapefile stores geometry in the .shp file, attribute data in the .dbf file, a spatial index in the .shx file, and CRS information in the .prj file. Optional files include .cpg for character encoding and .shp.xml for ISO-standard metadata. The .dbf file contains column names, data types, and field widths for all feature attributes.

How do I extract the CRS from a geospatial file?

For rasters, use `gdalinfo file.tif | grep EPSG` or open the file with rasterio and call `src.crs.to_epsg()`. For vector files, use `ogrinfo -so file.shp layername` and look for the Layer SRS line, or open with Fiona or GeoPandas and access the `.crs` property.

What tools read GeoTIFF metadata?

GDAL (gdalinfo command), rasterio and rioxarray (Python), QGIS (desktop), and ArcGIS Pro all read GeoTIFF metadata. GDAL is the underlying library for most of these tools. For programmatic access, rasterio provides the cleanest Python API.

What is the difference between GeoTIFF and regular TIFF?

A GeoTIFF embeds geospatial metadata in standard TIFF tags, including the coordinate reference system, geotransform (origin and pixel size), and map projection parameters. A regular TIFF contains only image data with no spatial reference. Any image viewer can open a GeoTIFF, but only GIS software interprets the spatial tags.

Can I extract metadata from GeoPackage files?

Yes. GeoPackage is SQLite-based, so you can query its metadata tables directly with SQL. The gpkg_contents table stores layer names, types, spatial extent, and SRS identifiers. You can also use ogrinfo, Fiona, or GeoPandas to read GeoPackage metadata the same way you would read Shapefile metadata.

Related Resources

Fastio features

Catalog Your Geospatial Deliverables Without Scripts

Fast.io Metadata Views extract CRS, extent, and custom fields from uploaded files into a searchable spreadsheet. Upload your GeoTIFFs and Shapefiles, define the metadata columns you need, and share the results with your team. 50 GB free, no credit card required.