Working with Geographic Data

This tutorial shows how to work with Canadian Census geographic boundaries and create maps using pycancensus.

Introduction

Census data becomes much more powerful when combined with geographic boundaries. pycancensus makes it easy to:

  • Retrieve census data with geographic boundaries

  • Work with different geographic levels (CMA, CSD, CT, DA)

  • Create maps and perform spatial analysis

  • Export data for use in GIS applications

import pycancensus as pc
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from IPython.display import display

# Set up plotting for notebook display
%matplotlib inline
plt.style.use('default')
plt.rcParams['figure.figsize'] = (12, 8)

print("Libraries imported successfully!")
Libraries imported successfully!

Geographic Levels in Canadian Census

The Canadian Census organizes data at several geographic levels:

# Geographic hierarchy (from largest to smallest)
geo_levels = {
    'PR': 'Province/Territory',
    'CMA': 'Census Metropolitan Area', 
    'CA': 'Census Agglomeration',
    'CSD': 'Census Subdivision (Municipality)',
    'CT': 'Census Tract',
    'DA': 'Dissemination Area'
}

print("Canadian Census Geographic Hierarchy:")
print("="*50)
for code, name in geo_levels.items():
    print(f"{code:3} - {name}")
Canadian Census Geographic Hierarchy:
==================================================
PR  - Province/Territory
CMA - Census Metropolitan Area
CA  - Census Agglomeration
CSD - Census Subdivision (Municipality)
CT  - Census Tract
DA  - Dissemination Area

Getting Geographic Data

Let’s retrieve census data with geographic boundaries:

try:
    # Get census data with geometries for Vancouver CMA
    vancouver_data = pc.get_census(
        dataset="CA21",
        regions={"CMA": "59933"},  # Vancouver CMA
        vectors=["v_CA21_1", "v_CA21_434"],  # Population and median income
        level="CSD",  # Municipality level
        geo_format="geopandas",
        labels="short"
    )
    
    print(f"Retrieved data for {len(vancouver_data)} municipalities")
    print(f"CRS: {vancouver_data.crs}")

    # Show sample data
    display(vancouver_data[['name', 'v_CA21_1', 'v_CA21_434']].head())
    
except Exception as e:
    print(f"Error retrieving data: {e}")
    raise  # Fail if API call doesn't work - no fallbacks
📋 Request Preview:
   Dataset: CA21
   Level: CSD
   Regions: 1 region(s)
   Variables: 2 vector(s)
   Geography: geopandas
🔍 Estimated Size: medium (100 rows)
⏱️  Expected Time: 5-15 seconds
Downloading 100 regions with geography... 
Error retrieving data: Failed to process API response: Invalid or missing API key
💡 Suggestion: Get a free API key at https://censusmapper.ca/users/sign_up
---------------------------------------------------------------------------
AuthenticationError                       Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/pycancensus/checkouts/latest/pycancensus/core.py:179, in get_census(dataset, regions, vectors, level, geo_format, resolution, labels, use_cache, quiet, api_key)
    176 if geo_format == "geopandas" and vectors:
    177     # The geo.geojson endpoint doesn't properly return vector data
    178     # Use dedicated function to fetch and merge geo + vector data
--> 179     result, data_version, geo_version = _fetch_census_with_geometry_and_vectors(
    180         base_url, request_data, resolution, vectors, labels
    181     )
    182 else:
    183     # Standard single-endpoint approach

File ~/checkouts/readthedocs.org/user_builds/pycancensus/checkouts/latest/pycancensus/core.py:307, in _fetch_census_with_geometry_and_vectors(base_url, request_data, resolution, vectors, labels)
    306 geo_multipart_data = {key: (None, value) for key, value in geo_request_data.items()}
--> 307 geo_response = get_session().post(
    308     f"{base_url}geo.geojson", files=geo_multipart_data
    309 )
    310 geo_version = geo_response.headers.get("data-version")

File ~/checkouts/readthedocs.org/user_builds/pycancensus/checkouts/latest/pycancensus/resilience.py:319, in ResilientSession.post(self, url, **kwargs)
    318 """Make a POST request."""
--> 319 return self.request("POST", url, **kwargs)

File ~/checkouts/readthedocs.org/user_builds/pycancensus/checkouts/latest/pycancensus/resilience.py:311, in ResilientSession.request(self, method, url, **kwargs)
    310 # Permanent error, or transient error with retries exhausted
--> 311 raise self._create_appropriate_exception(response)

AuthenticationError: Invalid or missing API key
💡 Suggestion: Get a free API key at https://censusmapper.ca/users/sign_up

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
Cell In[3], line 3
      1 try:
      2     # Get census data with geometries for Vancouver CMA
----> 3     vancouver_data = pc.get_census(
      4         dataset="CA21",
      5         regions={"CMA": "59933"},  # Vancouver CMA
      6         vectors=["v_CA21_1", "v_CA21_434"],  # Population and median income
      7         level="CSD",  # Municipality level
      8         geo_format="geopandas",
      9         labels="short"
     10     )
     12     print(f"Retrieved data for {len(vancouver_data)} municipalities")
     13     print(f"CRS: {vancouver_data.crs}")

File ~/checkouts/readthedocs.org/user_builds/pycancensus/checkouts/latest/pycancensus/core.py:254, in get_census(dataset, regions, vectors, level, geo_format, resolution, labels, use_cache, quiet, api_key)
    252     raise RuntimeError(f"API request failed: {e}")
    253 except Exception as e:
--> 254     raise RuntimeError(f"Failed to process API response: {e}")

RuntimeError: Failed to process API response: Invalid or missing API key
💡 Suggestion: Get a free API key at https://censusmapper.ca/users/sign_up

Creating Basic Maps

Let’s create our first map:

# Create a simple population map
fig, ax = plt.subplots(1, 1, figsize=(12, 10))

vancouver_data.plot(
    column='v_CA21_1',
    cmap='YlOrRd',
    legend=True,
    ax=ax,
    edgecolor='white',
    linewidth=0.5,
    legend_kwds={'label': 'Population', 'shrink': 0.8}
)

ax.set_title('Population by Municipality\nVancouver CMA, 2021', fontsize=16)
ax.axis('off')  # Remove axes for cleaner look

plt.tight_layout()
display(fig)

Multi-Variable Mapping

Compare two variables side by side:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))

# Population map
vancouver_data.plot(
    column='v_CA21_1',
    cmap='YlOrRd',
    legend=True,
    ax=ax1,
    edgecolor='white',
    linewidth=0.5,
    legend_kwds={'label': 'Population', 'shrink': 0.8}
)
ax1.set_title('Population')
ax1.axis('off')

# Income map
vancouver_data.plot(
    column='v_CA21_434',
    cmap='YlGnBu',
    legend=True,
    ax=ax2,
    edgecolor='white', 
    linewidth=0.5,
    legend_kwds={'label': 'Median Income ($)', 'shrink': 0.8}
)
ax2.set_title('Median Income')
ax2.axis('off')

plt.suptitle('Vancouver CMA: Population vs Income', fontsize=16, y=1.02)
plt.tight_layout()
display(fig)

Working with Different Geographic Levels

Different levels provide different levels of detail:

try:
    # Compare data at different geographic levels
    levels = ['CMA', 'CSD', 'CT']
    level_names = ['Metro Area', 'Municipality', 'Census Tract']
    
    print("Data availability by geographic level:")
    print("="*45)
    
    for level, name in zip(levels, level_names):
        try:
            # Get count of regions at each level for Vancouver
            data = pc.get_census(
                dataset="CA21",
                regions={"CMA": "59933"},
                vectors=["v_CA21_1"],
                level=level,
                labels="short"
            )
            print(f"{name:15} ({level}): {len(data):4,} regions")
        except Exception as e:
            print(f"{name:15} ({level}): Error - {e}")
            
except Exception as e:
    print(f"Error comparing levels: {e}")
    
    # Show conceptual differences
    print("Geographic Level Detail (conceptual):")
    print("CMA  (1 region):    Entire metro area")
    print("CSD  (20+ regions): Individual cities/towns")  
    print("CT   (300+ regions): Neighborhoods within cities")
    print("DA   (1000+ regions): Small areas (400-700 people)")

Getting Geometry Only

Sometimes you just need the boundaries without data:

try:
    # Get just the geographic boundaries
    boundaries = pc.get_census_geometry(
        dataset="CA21",
        regions={"CMA": "59933"},
        level="CSD"
    )
    
    print(f"Retrieved {len(boundaries)} boundary polygons")
    print(f"Columns: {list(boundaries.columns)}")
    
    # Plot the boundaries
    fig, ax = plt.subplots(1, 1, figsize=(10, 8))
    boundaries.plot(ax=ax, edgecolor='blue', facecolor='lightblue', alpha=0.7)
    ax.set_title('Vancouver CMA Municipal Boundaries')
    ax.axis('off')
    display(fig)
    
except Exception as e:
    print(f"Error getting boundaries: {e}")
    raise  # Fail if API call doesn't work - no fallbacks

Spatial Analysis

Perform basic spatial analysis with geopandas:

# Reproject to a projected CRS for accurate area calculations
# EPSG:3347 is Statistics Canada Lambert Conformal Conic - designed for Canada
vancouver_projected = vancouver_data.to_crs('EPSG:3347')

# Calculate area and population density
vancouver_projected['area_km2'] = vancouver_projected.geometry.area / 1e6  # Convert m² to km²
vancouver_projected['pop_density'] = vancouver_projected['v_CA21_1'] / vancouver_projected['area_km2']

print("Population Density Analysis:")
print("="*30)
print(f"Highest density: {vancouver_projected['pop_density'].max():.0f} people/km²")
print(f"Lowest density:  {vancouver_projected['pop_density'].min():.0f} people/km²")
print(f"Average density: {vancouver_projected['pop_density'].mean():.0f} people/km²")

# Show top 3 densest areas
densest = vancouver_projected.nlargest(3, 'pop_density')[['name', 'pop_density']]
print(f"\nTop 3 densest municipalities:")
for idx, row in densest.iterrows():
    print(f"  {row['name']}: {row['pop_density']:.0f} people/km²")

Coordinate Reference Systems

Understanding and working with projections:

print("Working with Coordinate Reference Systems:")
print("="*45)
print(f"Original CRS: {vancouver_data.crs}")
print(f"Projected CRS: {vancouver_projected.crs}")

print(f"\nCommon CRS options for Canadian data:")
print("• EPSG:4326 - WGS84 (geographic, degrees)")
print("• EPSG:3347 - Statistics Canada Lambert (Canada-wide)")
print("• EPSG:3157 - NAD83 UTM Zone 10N (BC, Western Canada)")

# Example: Convert back to geographic for web mapping
vancouver_geo = vancouver_projected.to_crs('EPSG:4326')
print(f"\nConverted back to geographic: {vancouver_geo.crs}")

Exporting Geographic Data

Save your data for use in other applications:

# Export to various formats
try:
    # Prepare clean data for export (use original geographic CRS)
    export_data = vancouver_data[['geometry', 'name', 'v_CA21_1', 'v_CA21_434']].copy()

    # GeoJSON (web-friendly, supports long column names)
    export_data.to_file("vancouver_census.geojson", driver="GeoJSON")
    print("✓ Exported to GeoJSON")

    # Shapefile (GIS standard, has column name limitations)
    # Rename columns to be shapefile-friendly (10 char max)
    shp_data = export_data.rename(columns={
        'v_CA21_1': 'pop2021',
        'v_CA21_434': 'income'
    })
    shp_data.to_file("vancouver_census.shp")
    print("✓ Exported to Shapefile")

    # CSV with geometry as WKT (for Excel/analysis)
    csv_export = export_data.copy()
    csv_export['geometry_wkt'] = csv_export.geometry.to_wkt()
    csv_export.drop('geometry', axis=1).to_csv("vancouver_census.csv", index=False)
    print("✓ Exported to CSV")

except Exception as e:
    print(f"Export example: {e}")
    print("Note: Supported formats include GeoJSON, Shapefile, KML, GeoPackage")

Interactive Maps with Folium

Create interactive web maps:

try:
    import folium
    
    # Create interactive map
    center_lat = vancouver_data.geometry.centroid.y.mean()
    center_lon = vancouver_data.geometry.centroid.x.mean()
    
    m = folium.Map(
        location=[center_lat, center_lon],
        zoom_start=10,
        tiles='OpenStreetMap'
    )
    
    # Add choropleth layer
    folium.Choropleth(
        geo_data=vancouver_data,
        data=vancouver_data,
        columns=['GeoUID', 'v_CA21_1'],
        key_on='feature.properties.GeoUID',
        fill_color='YlOrRd',
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name='Population'
    ).add_to(m)
    
    print("Interactive map created successfully!")
    print("(In a Jupyter notebook, the map would display here)")
    
except ImportError:
    print("Folium not installed. Install with: pip install folium")
except Exception as e:
    print(f"Error creating interactive map: {e}")

Best Practices

Performance Tips

print("Performance Best Practices:")
print("="*30)
print("1. Use appropriate geographic levels:")
print("   • CMA/CA for regional analysis")
print("   • CSD for municipal comparisons") 
print("   • CT for neighborhood analysis")
print("   • DA for detailed local analysis")
print("\n2. Cache your data:")
print("   • pycancensus caches API responses automatically")
print("   • Save processed results to avoid re-computation")
print("\n3. Choose the right CRS:")
print("   • EPSG:4326 (WGS84) for web mapping")
print("   • EPSG:3347 (Stats Can Lambert) for Canada analysis")
print("   • Local UTM zones for precise measurements")

Data Quality

print("Data Quality Checks:")
print("="*20)

# Check for missing geometries
null_geom = vancouver_data.geometry.isnull().sum()
print(f"Missing geometries: {null_geom}")

# Check for invalid geometries
invalid_geom = (~vancouver_data.geometry.is_valid).sum()
print(f"Invalid geometries: {invalid_geom}")

# Check data completeness
null_pop = vancouver_data['v_CA21_1'].isnull().sum()
print(f"Missing population data: {null_pop}")

# Basic statistics
print(f"\nData summary:")
print(f"Total regions: {len(vancouver_data)}")
print(f"Total population: {vancouver_data['v_CA21_1'].sum():,}")
print(f"Average income: ${vancouver_data['v_CA21_434'].mean():,.0f}")

Summary

This tutorial covered the essential aspects of working with geographic census data:

Key Skills Learned:

  • Retrieving census data with geographic boundaries

  • Understanding Canadian census geographic levels

  • Creating choropleth maps with matplotlib

  • Performing basic spatial analysis

  • Working with coordinate reference systems

  • Exporting data to various formats

  • Creating interactive maps

Next Steps:

  • Try different census datasets (CA16, CA11, etc.)

  • Explore temporal analysis by comparing multiple census years

  • Combine with other geospatial data sources

  • Use advanced spatial analysis tools from PySAL or similar libraries

  • Create web applications with your maps using Streamlit or Dash

Additional Resources:

  • Geopandas Documentation: Comprehensive spatial data analysis

  • Folium Documentation: Interactive web mapping

  • Matplotlib Cartopy: Advanced cartographic projections

  • CensusMapper: Web interface for exploring census data

  • Statistics Canada: Official census documentation