Cluster Locations Using K-Means =============================== The function takes a CSV of data collection locations and clusters locations into ``n`` groups, where ``n`` is specified by the user. ``n`` can be ``n_workers``, ``n_workers*n_days`` etc. The function uses k-means to cluster the locations, and defaults to the euclidean distance matrix. **Input:** - ``n`` or number of clusters - A CSV file containing the lat/longs of the remaining points. For instance, output of `geo_sampling `__ with a few adjustments can work. The `geo_sampling` script produces a list of segments with a start and an end lat/long of each segment. Pick the start or end lat/long and rename columns so that they are 3 columns with names: id, lat, long - Distance function: ``-d euclidean``, ``-d haversine`` or ``-d osrm``. *Default* is ``-d euclidean``. - Name of the output file. Default is ``cluster-kmeans-output.csv`` **Output:** - Appends a new column ``assigned_points`` that gives cluster assignment for each row of the lat/long file. The column takes integer values: ``1, 2, 3, ....n`` **Usage:** :: usage: cluster_kmeans.py [-h] -n N_WORKERS [-m MAX_ITER] [-d {euclidean,haversine,osrm}] [-c CENTROIDS] [-o OUTPUT] [-r RANDOM_STATE] [--plot] [--osrm-base-url OSRM_BASE_URL] [--osrm-max-table-size OSRM_MAX_TABLE_SIZE] input Random allocator based on K-Means clustering positional arguments: input Road segments input file optional arguments: -h, --help show this help message and exit -n N_WORKERS, --n_workers N_WORKERS Number of workers -m MAX_ITER, --max_iter MAX_ITER Maximum number of iterations -d {euclidean,haversine,osrm}, --distance-func {euclidean,haversine,osrm} Distance function for distance matrix -c CENTROIDS, --centroids CENTROIDS Output file name of K-Means centroids -o OUTPUT, --output OUTPUT Output file name -r RANDOM_STATE, --random-state RANDOM_STATE Random state --plot Plot the output --osrm-base-url OSRM_BASE_URL Custom OSRM service URL --osrm-max-table-size OSRM_MAX_TABLE_SIZE Maximum OSRM table size **Examples:** :: python -m allocator.cluster_kmeans -n 10 allocator/examples/chonburi-roads-1k.csv --plot Output file will be saved as :download:`cluster-kmeans-output.csv <../../allocator/examples/kmeans/cluster-kmeans-output.csv>` if a different name is not specified by ``-o/--output``. K-means centroids will be saved as :download:`cluster-kmeans-centroids-output.csv <../../allocator/examples/kmeans/cluster-kmeans-centroids-output.csv>` if another name is not specified by ``-c/--centroids`` To see the plot, please specify ``--plot`` .. image:: ../../allocator/examples/kmeans/cluster-kmeans-output.png