Migration Guide: v0.x to v1.0¶

This guide helps you migrate from allocator v0.x to the completely redesigned v1.0.

🚨 Breaking Changes Overview¶

Allocator v1.0 is a complete rewrite with no backward compatibility. The changes provide:

Modern Python API design
Better performance and reliability
Cleaner, more maintainable codebase
Rich structured results with metadata

📊 Data Format Changes¶

Column Names (BREAKING)¶

v0.x (Old):

# Accepted various column names
data = pd.DataFrame({
    'start_long': [101.0, 101.1],   # or 'lon', 'lng'
    'start_lat': [13.0, 13.1],      # or 'lat'
    'end_long': [101.2, 101.3],
    'end_lat': [13.2, 13.3]
})

v1.0 (New):

# Only accepts standard column names
data = pd.DataFrame({
    'longitude': [101.0, 101.1],    # REQUIRED
    'latitude': [13.0, 13.1],       # REQUIRED
    'location_id': ['A', 'B']       # Optional, other columns preserved
})

Migration:

# Rename columns in existing data
data = data.rename(columns={
    'start_long': 'longitude',
    'start_lat': 'latitude',
    'lon': 'longitude',
    'lat': 'latitude'
})

🔄 API Changes¶

Clustering¶

v0.x (Old):

from allocator.cluster_kmeans import main
result = main(data, n_clusters=3, distance_method='euclidean')

v1.0 (New):

import allocator
result = allocator.cluster(data, n_clusters=3, method='kmeans', distance='euclidean')
# or
result = allocator.kmeans(data, n_clusters=3, distance='euclidean')

Routing/TSP¶

v0.x (Old):

from allocator.shortest_path_ortools import main
result = main(data, distance_method='euclidean')

v1.0 (New):

import allocator
result = allocator.shortest_path(data, method='ortools', distance='euclidean')
# or  
result = allocator.tsp_ortools(data, distance='euclidean')

Distance Assignment¶

v0.x (Old):

from allocator.sort_by_distance import main
result = main(points, workers, by_worker=False)

v1.0 (New):

import allocator
result = allocator.assign_to_closest(points, workers, distance='euclidean')
# or for sorting
result = allocator.sort_by_distance(points, workers, distance='euclidean')

🖥️ CLI Changes¶

Command Structure¶

v0.x (Old):

# Separate scripts for each function
python -m allocator.cluster_kmeans data.csv -n 3 --plot
python -m allocator.shortest_path_ortools data.csv -d euclidean
python -m allocator.sort_by_distance points.csv workers.csv

v1.0 (New):

# Unified CLI with subcommands
allocator cluster data.csv --clusters 3 --method kmeans
allocator route data.csv --method ortools --distance euclidean  
allocator assign points.csv workers.csv --distance euclidean

CLI Options Mapping¶

v0.x	v1.0	Description
`-n`, `--n_clusters`	`--clusters`	Number of clusters
`-d`, `--distance_method`	`--distance`	Distance metric
`--plot`	(removed)	Use Python API for plotting
`-o`, `--output`	`--output`	Output file path
(none)	`--format`	Output format (csv, json)
(none)	`--verbose`	Verbose output

CLI Examples¶

Clustering:

# Old
python -m allocator.cluster_kmeans locations.csv -n 5 -d haversine --plot

# New  
allocator cluster locations.csv --clusters 5 --distance haversine --output clusters.csv

Routing:

# Old
python -m allocator.shortest_path_ortools points.csv -d euclidean -o route.csv

# New
allocator route points.csv --method ortools --distance euclidean --output route.csv

Assignment:

# Old
python -m allocator.sort_by_distance points.csv workers.csv -o assignments.csv

# New  
allocator assign points.csv workers.csv --output assignments.csv

📦 Result Objects¶

v0.x Return Values¶

Old results were inconsistent:

# Different return types for different functions
kmeans_result = dict  # Dictionary with various keys
tsp_result = tuple    # (distance, route)  
sort_result = DataFrame  # Raw pandas DataFrame

v1.0 Structured Results¶

New consistent result objects:

# ClusterResult
cluster_result = allocator.cluster(data, n_clusters=3)
print(cluster_result.labels)        # np.ndarray
print(cluster_result.centroids)     # np.ndarray  
print(cluster_result.converged)     # bool
print(cluster_result.inertia)       # float
print(cluster_result.data)          # pd.DataFrame with cluster column
print(cluster_result.metadata)      # dict with algorithm info

# RouteResult  
route_result = allocator.shortest_path(data)
print(route_result.route)           # list[int] - visiting order
print(route_result.total_distance)  # float
print(route_result.data)            # pd.DataFrame with route_order column
print(route_result.metadata)        # dict with algorithm info

# SortResult
sort_result = allocator.assign_to_closest(points, workers)
print(sort_result.data)             # pd.DataFrame with assignments
print(sort_result.distance_matrix)  # np.ndarray (if available)
print(sort_result.metadata)         # dict with algorithm info

🔧 Installation Changes¶

Dependencies¶

v0.x:

pip install allocator==0.2.x
# Dependencies automatically included

v1.0:

pip install allocator
# Core functionality included

# Optional algorithms:
pip install ortools          # For OR-Tools TSP
pip install googlemaps       # For Google Maps API  
pip install kahipwrapper     # For KaHIP clustering

Python Version¶

v0.x: Python 2.7+ / 3.6+
v1.0: Python 3.11+ (modern Python required)

📈 Performance Improvements¶

v1.0 provides significant performance improvements:

3x faster clustering with optimized algorithms
Reduced memory usage for large datasets
Vectorized operations with NumPy/pandas
Better error handling and progress reporting

🛠️ Step-by-Step Migration¶

1. Update Installation¶

# Uninstall old version
pip uninstall allocator

# Install new version  
pip install allocator

# Install optional dependencies as needed
pip install ortools  # For TSP solving

2. Update Data Preparation¶

# Create a migration function
def migrate_data(old_data):
    """Convert v0.x data format to v1.0"""
    new_data = old_data.copy()
    
    # Rename columns
    column_mapping = {
        'start_long': 'longitude',
        'start_lat': 'latitude', 
        'end_long': 'longitude',
        'end_lat': 'latitude',
        'lon': 'longitude',
        'lng': 'longitude',
        'lat': 'latitude'
    }
    
    for old_col, new_col in column_mapping.items():
        if old_col in new_data.columns:
            new_data = new_data.rename(columns={old_col: new_col})
    
    # Validate required columns
    required = ['longitude', 'latitude']
    missing = [col for col in required if col not in new_data.columns]
    if missing:
        raise ValueError(f"Missing required columns after migration: {missing}")
    
    return new_data

# Use the migration function
migrated_data = migrate_data(your_old_data)

3. Update API Calls¶

# Replace old imports and calls
# OLD:
# from allocator.cluster_kmeans import main as cluster_kmeans
# result = cluster_kmeans(data, n_clusters=3)

# NEW:
import allocator
result = allocator.cluster(data, n_clusters=3, method='kmeans')

4. Update Result Handling¶

# OLD: 
# result was a dictionary, tuple, or DataFrame

# NEW: Use structured result objects
result = allocator.cluster(data, n_clusters=3)

# Access structured data
labels = result.labels
centroids = result.centroids  
clustered_data = result.data
algorithm_info = result.metadata

5. Update CLI Scripts¶

# Replace old CLI calls in scripts/automation
# OLD: python -m allocator.cluster_kmeans data.csv -n 3
# NEW: allocator cluster data.csv --clusters 3

🔍 Common Migration Issues¶

Issue 1: Column Name Errors¶

Error:

ValueError: Missing required columns: ['longitude', 'latitude']

Solution:

# Check current column names
print(data.columns.tolist())

# Rename as needed
data = data.rename(columns={'start_long': 'longitude', 'start_lat': 'latitude'})

Issue 2: Import Errors¶

Error:

ModuleNotFoundError: No module named 'allocator.cluster_kmeans'

Solution:

# OLD import
# from allocator.cluster_kmeans import main

# NEW import  
import allocator
result = allocator.cluster(data, n_clusters=3)

Issue 3: Result Access¶

Error:

KeyError: 'labels'  # or TypeError: 'ClusterResult' object is not subscriptable

Solution:

# OLD: result was dict
# labels = result['labels']

# NEW: result is structured object
labels = result.labels
data_with_clusters = result.data

📚 Additional Resources¶

API Examples: docs/API_EXAMPLES.md
Full Documentation: https://geosensing.github.io/allocator/
GitHub Issues: Report migration problems at https://github.com/geosensing/allocator/issues

💡 Migration Tips¶

Start with data format - Fix column names first
Test incrementally - Migrate one function at a time
Use the Python API - More flexible than CLI for complex workflows
Leverage new features - Rich metadata and structured results
Check performance - v1.0 should be faster for most use cases

Need help? Open an issue on GitHub with your specific migration challenge!