Clustering API

Geographic point clustering for task allocation and zone creation.

Main Functions

allocator.cluster(data: str | DataFrame | ndarray | list, n_clusters: int = 3, method: str = 'kmeans', distance: str = 'euclidean', random_state: int | None = None, **kwargs) ClusterResult[source]

Cluster geographic data points.

Parameters:
  • data – Input data (file path, DataFrame, numpy array, or list)

  • n_clusters – Number of clusters to create

  • method – Clustering method (‘kmeans’)

  • distance – Distance metric (‘euclidean’, ‘haversine’, ‘osrm’, ‘google’)

  • random_state – Random seed for reproducibility

  • **kwargs – Additional arguments for specific methods

Returns:

ClusterResult with labels, centroids, and metadata

Example

>>> result = cluster('data.csv', n_clusters=5, method='kmeans')
>>> print(result.labels)  # Cluster assignments
>>> print(result.centroids)  # Cluster centers

Core Clustering Module

Modern clustering API for allocator package.

allocator.api.cluster.cluster(data: str | DataFrame | ndarray | list, n_clusters: int = 3, method: str = 'kmeans', distance: str = 'euclidean', random_state: int | None = None, **kwargs) ClusterResult[source]

Cluster geographic data points.

Parameters:
  • data – Input data (file path, DataFrame, numpy array, or list)

  • n_clusters – Number of clusters to create

  • method – Clustering method (‘kmeans’)

  • distance – Distance metric (‘euclidean’, ‘haversine’, ‘osrm’, ‘google’)

  • random_state – Random seed for reproducibility

  • **kwargs – Additional arguments for specific methods

Returns:

ClusterResult with labels, centroids, and metadata

Example

>>> result = cluster('data.csv', n_clusters=5, method='kmeans')
>>> print(result.labels)  # Cluster assignments
>>> print(result.centroids)  # Cluster centers
allocator.api.cluster.kmeans(data: DataFrame | ndarray | list, n_clusters: int = 3, distance: str = 'euclidean', max_iter: int = 300, random_state: int | None = None, **kwargs) ClusterResult[source]

K-means clustering of geographic data.

Parameters:
  • data – Input data as DataFrame or numpy array

  • n_clusters – Number of clusters

  • distance – Distance metric (‘euclidean’, ‘haversine’, ‘osrm’, ‘google’)

  • max_iter – Maximum iterations

  • random_state – Random seed for reproducibility

  • **kwargs – Additional distance-specific arguments

Returns:

ClusterResult with clustering information

Usage Examples

Basic K-means Clustering

import pandas as pd
import allocator

# Geographic data
data = pd.DataFrame({
    'longitude': [100.5, 100.6, 100.7, 100.8],
    'latitude': [13.7, 13.8, 13.9, 14.0],
})

# Create 2 clusters using haversine distance
result = allocator.cluster(
    data=data,
    n_clusters=2, 
    distance='haversine',
    algorithm='kmeans'
)

print(f"Created {result['n_clusters']} clusters")
print(result['data'][['longitude', 'latitude', 'cluster']])

Distance Methods

  • euclidean: Fast straight-line distance

  • haversine: Accurate geographic distance

  • custom: User-defined distance functions

Algorithm Options

  • kmeans: K-means clustering (default, balanced clusters)

  • custom: Custom sklearn-compatible algorithms

Return Format

The cluster function returns a dictionary with:

  • data: DataFrame with original data plus cluster column

  • n_clusters: Number of clusters created

  • algorithm: Algorithm used

  • distance: Distance method used

  • metadata: Additional clustering information