Clustering API¶
Geographic point clustering for task allocation and zone creation.
Main Functions¶
- allocator.cluster(data: str | DataFrame | ndarray | list, n_clusters: int = 3, method: str = 'kmeans', distance: str = 'euclidean', random_state: int | None = None, **kwargs) ClusterResult[source]¶
Cluster geographic data points.
- Parameters:
data – Input data (file path, DataFrame, numpy array, or list)
n_clusters – Number of clusters to create
method – Clustering method (‘kmeans’)
distance – Distance metric (‘euclidean’, ‘haversine’, ‘osrm’, ‘google’)
random_state – Random seed for reproducibility
**kwargs – Additional arguments for specific methods
- Returns:
ClusterResult with labels, centroids, and metadata
Example
>>> result = cluster('data.csv', n_clusters=5, method='kmeans') >>> print(result.labels) # Cluster assignments >>> print(result.centroids) # Cluster centers
Core Clustering Module¶
Modern clustering API for allocator package.
- allocator.api.cluster.cluster(data: str | DataFrame | ndarray | list, n_clusters: int = 3, method: str = 'kmeans', distance: str = 'euclidean', random_state: int | None = None, **kwargs) ClusterResult[source]¶
Cluster geographic data points.
- Parameters:
data – Input data (file path, DataFrame, numpy array, or list)
n_clusters – Number of clusters to create
method – Clustering method (‘kmeans’)
distance – Distance metric (‘euclidean’, ‘haversine’, ‘osrm’, ‘google’)
random_state – Random seed for reproducibility
**kwargs – Additional arguments for specific methods
- Returns:
ClusterResult with labels, centroids, and metadata
Example
>>> result = cluster('data.csv', n_clusters=5, method='kmeans') >>> print(result.labels) # Cluster assignments >>> print(result.centroids) # Cluster centers
- allocator.api.cluster.kmeans(data: DataFrame | ndarray | list, n_clusters: int = 3, distance: str = 'euclidean', max_iter: int = 300, random_state: int | None = None, **kwargs) ClusterResult[source]¶
K-means clustering of geographic data.
- Parameters:
data – Input data as DataFrame or numpy array
n_clusters – Number of clusters
distance – Distance metric (‘euclidean’, ‘haversine’, ‘osrm’, ‘google’)
max_iter – Maximum iterations
random_state – Random seed for reproducibility
**kwargs – Additional distance-specific arguments
- Returns:
ClusterResult with clustering information
Usage Examples¶
Basic K-means Clustering¶
import pandas as pd
import allocator
# Geographic data
data = pd.DataFrame({
'longitude': [100.5, 100.6, 100.7, 100.8],
'latitude': [13.7, 13.8, 13.9, 14.0],
})
# Create 2 clusters using haversine distance
result = allocator.cluster(
data=data,
n_clusters=2,
distance='haversine',
algorithm='kmeans'
)
print(f"Created {result['n_clusters']} clusters")
print(result['data'][['longitude', 'latitude', 'cluster']])
Distance Methods¶
euclidean: Fast straight-line distance
haversine: Accurate geographic distance
custom: User-defined distance functions
Algorithm Options¶
kmeans: K-means clustering (default, balanced clusters)
custom: Custom sklearn-compatible algorithms
Return Format¶
The cluster function returns a dictionary with:
data: DataFrame with original data plusclustercolumnn_clusters: Number of clusters createdalgorithm: Algorithm useddistance: Distance method usedmetadata: Additional clustering information