Contributing to Allocator

Thank you for your interest in contributing to allocator! This guide will help you get started.

Development Setup

Prerequisites

  • Python 3.11 or higher

  • Git

  • UV (recommended) or pip

Installation

  1. Clone the repository

git clone https://github.com/geosensing/allocator.git
cd allocator
  1. Install dependencies

# With UV (recommended)
uv sync --all-extras

# Or with pip
pip install -e ".[complete]"
  1. Set up pre-commit hooks

pre-commit install

Code Standards

Linting and Formatting

We use modern Python tooling:

# Linting and formatting with ruff
uv run ruff check .
uv run ruff format .

# Type checking with mypy  
uv run mypy allocator/

# Traditional tools (if preferred)
uv run black .
uv run isort .

Testing

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=allocator --cov-report=html

# Run specific test files
uv run pytest tests/api/test_cluster_api.py

Documentation

# Build documentation locally
cd docs/
make html

# View documentation
open build/html/index.html

Contribution Workflow

  1. Fork the repository on GitHub

  2. Create a feature branch from main

git checkout -b feature/your-feature-name
  1. Make your changes following our code standards

  2. Add tests for new functionality

  3. Update documentation as needed

  4. Run the full test suite

uv run pytest
uv run ruff check .
uv run mypy allocator/
  1. Commit with clear messages

git commit -m "Add clustering algorithm validation

- Add parameter validation for n_clusters
- Add tests for edge cases
- Update documentation examples"
  1. Push and create a Pull Request

Code Style Guidelines

Python Code

  • Follow PEP 8 style guidelines

  • Use type hints for all function signatures

  • Write docstrings in Google/NumPy style

  • Maximum line length: 100 characters

  • Use meaningful variable and function names

Example Function

def cluster_locations(
    data: pd.DataFrame,
    n_clusters: int,
    distance: str = "haversine",
    random_state: Optional[int] = None,
) -> ClusterResult:
    """Cluster geographic locations into balanced groups.
    
    Args:
        data: DataFrame with longitude/latitude columns
        n_clusters: Number of clusters to create
        distance: Distance calculation method
        random_state: Random seed for reproducible results
        
    Returns:
        ClusterResult object with cluster assignments and metadata
        
    Raises:
        ValueError: If n_clusters is invalid
        KeyError: If required columns are missing
    """

Documentation

  • Write clear, concise documentation

  • Include practical examples for all public functions

  • Update relevant guides when adding features

  • Use markdown for all new documentation

Testing Guidelines

Test Structure

tests/
├── api/              # API function tests
├── core/             # Core algorithm tests  
├── distances/        # Distance calculation tests
├── integration/      # End-to-end tests
└── benchmarks/       # Performance tests

Writing Tests

import pytest
import pandas as pd
from allocator import cluster

def test_cluster_basic():
    """Test basic clustering functionality."""
    data = pd.DataFrame({
        'longitude': [100.5, 100.6, 100.7],
        'latitude': [13.7, 13.8, 13.9],
    })
    
    result = cluster(data, n_clusters=2)
    
    assert result['n_clusters'] == 2
    assert 'cluster' in result['data'].columns
    assert len(result['data']) == 3
    assert set(result['data']['cluster']) == {0, 1}

def test_cluster_invalid_input():
    """Test error handling for invalid input."""
    data = pd.DataFrame({'invalid': [1, 2, 3]})
    
    with pytest.raises(KeyError, match="longitude"):
        cluster(data, n_clusters=2)

Documentation Guidelines

API Documentation

  • All public functions must have complete docstrings

  • Include parameter types, return values, and examples

  • Document error conditions and edge cases

Examples

  • Provide realistic, practical examples

  • Use real-world data when possible

  • Include expected output

  • Show both basic and advanced usage

Guides

  • Write step-by-step tutorials for complex workflows

  • Include troubleshooting sections

  • Link related concepts and functions

Release Process

Version Numbering

We follow semantic versioning (SemVer):

  • Major (1.0.0): Breaking changes

  • Minor (1.1.0): New features, backward compatible

  • Patch (1.0.1): Bug fixes, backward compatible

Preparing a Release

  1. Update version in pyproject.toml

  2. Update CHANGELOG.md with release notes

  3. Ensure all tests pass and documentation is current

  4. Create release PR for review

  5. Tag release after merge: git tag v1.1.0

  6. Release to PyPI is automated via GitHub Actions

Getting Help

Areas for Contribution

We welcome contributions in these areas:

Core Features

  • New clustering algorithms

  • Additional routing methods

  • Performance optimizations

  • Better error handling

Documentation

  • More real-world examples

  • Tutorial improvements

  • API documentation enhancements

  • Translation support

Testing

  • Edge case coverage

  • Performance benchmarks

  • Integration tests

  • Property-based testing

Tooling

  • CLI improvements

  • Visualization enhancements

  • IDE integrations

  • CI/CD optimizations

Thank you for contributing to make allocator better for everyone!