data Module

Data provider modules for GADM and OSM data.

class geo_sampling.data.GADMProvider(data_dir: str = 'data')[source]

Bases: object

Provider for GADM administrative boundary data.

GADM_BASE_URL = 'https://geodata.ucdavis.edu/gadm/gadm4.1/shp/'
GADM_URL_FORMAT = 'gadm41_{0}_shp.zip'
download_country_data(country_code: str) str[source]

Download GADM shapefile data for a country.

Parameters:

country_code – Three-letter country code (e.g., ‘IND’)

Returns:

Path to the downloaded and extracted directory

get_country_list() List[str][source]

Get list of available countries from GADM.

Returns:

List of country names

load_boundaries(country_code: str, admin_level: int, region_name: str | None = None) Tuple[List[str], Polygon, BoundingBox][source]

Load administrative boundaries for a country/region.

Parameters:
  • country_code – Three-letter country code

  • admin_level – Administrative level (1-4)

  • region_name – Specific region name to filter by

Returns:

Tuple of (region_names, combined_polygon, bounding_box)

class geo_sampling.data.OSMProvider(data_dir: str = 'data')[source]

Bases: object

Provider for OpenStreetMap data via BBBike.org extract service.

BBBIKE_BASE_URL = 'http://extract.bbbike.org'
MAX_EXTRACT_POINTS = 300
MAX_WAIT_TIME = 50
check_extract_status(extract_id: str) str | None[source]

Check status of BBBike extract job.

Parameters:

extract_id – Job ID from submit_extract_request

Returns:

Download URL if ready, None if still processing

download_osm_data(region_polygon: Polygon, region_name: str, bbox: BoundingBox) str[source]

Download OSM data for a region and return path to road shapefile.

Parameters:
  • region_polygon – Shapely polygon defining region

  • region_name – Name for the extract

  • bbox – Bounding box coordinates

Returns:

Path to the roads shapefile (without .shp extension)

extract_road_segments(shapefile_path: str, road_types: List[str] | None = None, segment_length: int = 500) List[RoadSegment][source]

Extract road segments from OSM shapefile.

Parameters:
  • shapefile_path – Path to roads shapefile (without .shp)

  • road_types – List of road types to include (None = all)

  • segment_length – Target segment length in meters

Returns:

List of RoadSegment objects

generate_extract_url(region_polygon: Polygon, region_name: str, bbox: BoundingBox) str[source]

Generate BBBike extract URL for a polygon region.

Parameters:
  • region_polygon – Shapely polygon defining the region

  • region_name – Name for the extract

  • bbox – Bounding box coordinates

Returns:

BBBike extract URL

submit_extract_request(extract_url: str) str[source]

Submit extract request to BBBike and get job ID.

Parameters:

extract_url – BBBike extract URL

Returns:

Extract job ID for checking status

GADM (Global Administrative Areas) data provider.

class geo_sampling.data.gadm.GADMProvider(data_dir: str = 'data')[source]

Bases: object

Provider for GADM administrative boundary data.

GADM_BASE_URL = 'https://geodata.ucdavis.edu/gadm/gadm4.1/shp/'
GADM_URL_FORMAT = 'gadm41_{0}_shp.zip'
download_country_data(country_code: str) str[source]

Download GADM shapefile data for a country.

Parameters:

country_code – Three-letter country code (e.g., ‘IND’)

Returns:

Path to the downloaded and extracted directory

get_country_list() List[str][source]

Get list of available countries from GADM.

Returns:

List of country names

load_boundaries(country_code: str, admin_level: int, region_name: str | None = None) Tuple[List[str], Polygon, BoundingBox][source]

Load administrative boundaries for a country/region.

Parameters:
  • country_code – Three-letter country code

  • admin_level – Administrative level (1-4)

  • region_name – Specific region name to filter by

Returns:

Tuple of (region_names, combined_polygon, bounding_box)

OpenStreetMap data provider via BBBike.org.

class geo_sampling.data.osm.OSMProvider(data_dir: str = 'data')[source]

Bases: object

Provider for OpenStreetMap data via BBBike.org extract service.

BBBIKE_BASE_URL = 'http://extract.bbbike.org'
MAX_EXTRACT_POINTS = 300
MAX_WAIT_TIME = 50
check_extract_status(extract_id: str) str | None[source]

Check status of BBBike extract job.

Parameters:

extract_id – Job ID from submit_extract_request

Returns:

Download URL if ready, None if still processing

download_osm_data(region_polygon: Polygon, region_name: str, bbox: BoundingBox) str[source]

Download OSM data for a region and return path to road shapefile.

Parameters:
  • region_polygon – Shapely polygon defining region

  • region_name – Name for the extract

  • bbox – Bounding box coordinates

Returns:

Path to the roads shapefile (without .shp extension)

extract_road_segments(shapefile_path: str, road_types: List[str] | None = None, segment_length: int = 500) List[RoadSegment][source]

Extract road segments from OSM shapefile.

Parameters:
  • shapefile_path – Path to roads shapefile (without .shp)

  • road_types – List of road types to include (None = all)

  • segment_length – Target segment length in meters

Returns:

List of RoadSegment objects

generate_extract_url(region_polygon: Polygon, region_name: str, bbox: BoundingBox) str[source]

Generate BBBike extract URL for a polygon region.

Parameters:
  • region_polygon – Shapely polygon defining the region

  • region_name – Name for the extract

  • bbox – Bounding box coordinates

Returns:

BBBike extract URL

submit_extract_request(extract_url: str) str[source]

Submit extract request to BBBike and get job ID.

Parameters:

extract_url – BBBike extract URL

Returns:

Extract job ID for checking status