Quick Start

Get started with geo-sampling in 5 minutes! This guide shows you how to extract and sample road segments using both the command-line interface and Python API.

Command-Line Interface (CLI)

Complete Workflow in One Command

Extract roads and create a sample for Singapore in a single command:

geo-sampling workflow "Singapore" "Central" \
    --sample-size 100 \
    --output singapore_sample.csv \
    --plot

Step-by-Step Approach

For more control, use the step-by-step approach:

# 1. Extract all roads
geo-sampling extract "India" "NCT of Delhi" \
    --output delhi_roads.csv

# 2. Create a random sample
geo-sampling sample delhi_roads.csv \
    --sample-size 1000 \
    --strategy random \
    --output delhi_sample.csv \
    --plot

# 3. Get information about a region
geo-sampling info "Thailand" "Bangkok"

Python API

One-Liner Convenience Function

import geo_sampling as gs

# Quick sampling for research
sample = gs.sample_roads_for_region(
    "Singapore", "Central",
    n=100,
    strategy="random",
    seed=42
)

# Plot the results
gs.quick_plot(sample, title="Singapore Road Sample")

Step-by-Step with Full Control

import geo_sampling as gs

# Extract roads from a region
extractor = gs.RoadExtractor("India", "NCT of Delhi")
roads = extractor.get_roads(road_types=["primary", "secondary"])

# Create sampler and generate sample
sampler = gs.RoadSampler(roads)
sample = sampler.random_sample(1000, seed=42)

# Save and visualize
sampler.save_csv(sample, "delhi_sample.csv")
gs.plot_road_segments(sample, title="Delhi Road Sample")

What’s Next?

Understanding the Output

CSV File Structure

The output CSV contains these columns:

Column

Description

segment_id

Unique identifier for each road segment

osm_id

OpenStreetMap way ID

osm_name

Road name from OpenStreetMap

osm_type

Road type (primary, secondary, residential, etc.)

start_lat, start_long

Starting coordinates of segment

end_lat, end_long

Ending coordinates of segment

Sample Data

Here’s what a few rows look like:

segment_id,osm_id,osm_name,osm_type,start_lat,start_long,end_lat,end_long
1,way_123,Orchard Road,primary,1.3048,103.8318,1.3052,103.8322
2,way_124,Marina Bay Drive,trunk,1.2966,103.8558,1.2970,103.8562
3,way_125,Residential Street,residential,1.3100,103.8400,1.3104,103.8404

Working with Sample Data

Load and analyze your samples:

import geo_sampling as gs
import pandas as pd

# Load the CSV back into Python
segments = gs.load_segments_from_csv("singapore_sample.csv")
print(f"Loaded {len(segments)} segments")

# Convert to pandas DataFrame for analysis (optional)
sampler = gs.RoadSampler(segments)
df = sampler.to_dataframe()

# Analyze road type distribution
road_type_counts = df['osm_type'].value_counts()
print("Road type distribution:")
print(road_type_counts)

Common Workflows

Research Study Design

import geo_sampling as gs

# 1. Get road summary to plan sample size
summary = gs.get_road_summary("Thailand", "Bangkok")
print(f"Total roads available: {summary['total_segments']:,}")
print("Road types:", list(summary['road_types']))

# 2. Extract and sample with stratification
sample = gs.sample_roads_for_region(
    "Thailand", "Bangkok",
    n=500,  # Sample size
    strategy="stratified",  # Maintain road type proportions
    road_types=["primary", "secondary", "tertiary"],  # Focus on major roads
    seed=42  # Reproducible results
)

# 3. Export for field work
sampler = gs.RoadSampler(sample)
sampler.save_csv(sample, "bangkok_fieldwork_sample.csv")

# 4. Create field maps
gs.plot_road_segments(sample, title="Bangkok Field Study Sites")

Batch Processing Multiple Regions

#!/bin/bash
# Process multiple regions

REGIONS=("Bangkok" "Chiang Mai" "Phuket")

for region in "${REGIONS[@]}"; do
    echo "Processing $region..."

    geo-sampling workflow "Thailand" "$region" \
        --sample-size 200 \
        --strategy stratified \
        --output "${region,,}_sample.csv" \
        --plot
done

Tips for Success

1. Start Small

Begin with small administrative areas to test your workflow before scaling up.

2. Check Data Quality

Always inspect a few segments manually:

# Look at first few segments
for i, seg in enumerate(sample[:3]):
    print(f"Segment {i+1}: {seg.osm_name} ({seg.osm_type})")
    print(f"  From: {seg.start_lat:.4f}, {seg.start_long:.4f}")
    print(f"  To: {seg.end_lat:.4f}, {seg.end_long:.4f}")

3. Validate Geographic Bounds

Plot samples to ensure they cover your intended study area:

gs.quick_plot(sample, title="Sample Coverage Check")

4. Document Your Methodology

Save your sampling parameters for reproducibility:

import json

metadata = {
    "country": "Singapore",
    "region": "Central",
    "sample_size": len(sample),
    "strategy": "stratified",
    "road_types": ["primary", "secondary"],
    "seed": 42,
    "date_created": "2024-01-15"
}

with open("sample_metadata.json", "w") as f:
    json.dump(metadata, f, indent=2)

Troubleshooting

Empty results: Check that your region name matches exactly what’s in GADM. Use geo-sampling info to verify.

Too many segments: Use road type filtering or smaller administrative areas to reduce the sampling frame.

Plotting issues: Install matplotlib if you get visualization errors: pip install matplotlib

Next Steps

Ready for more advanced usage? Check out: