Getting Started with pycancensus
This tutorial demonstrates the enhanced pycancensus functionality with clear hierarchy examples and real data access.
Key Features Demonstrated:
list_census_vectors() - Browse all available data variables
Vector Hierarchies - Navigate parent-child relationships
find_census_vectors() - Smart search functionality
Real Data Retrieval - Get actual census data
Note
You’ll need a free API key from CensusMapper to run these examples with real data.
Setup and Installation
First, let’s import pycancensus and set up our environment:
import pycancensus
from pycancensus import (
list_census_datasets,
list_census_vectors,
get_census,
parent_census_vectors,
child_census_vectors,
find_census_vectors
)
import pandas as pd
# Set your API key (replace with your actual key)
# pycancensus.set_api_key("your_api_key_here")
print("pycancensus imported successfully!")
pycancensus imported successfully!
1. Exploring Census Vectors
The list_census_vectors() function shows all available data variables:
# List all vectors for 2021 Census
try:
vectors_ca21 = list_census_vectors('CA21')
print(f"CA21 Census has {len(vectors_ca21):,} vectors available")
print(f"Columns: {list(vectors_ca21.columns)}")
# Show how many vectors have parent relationships
with_parents = vectors_ca21[vectors_ca21['parent_vector'].notna()]
print(f"Vectors with parent relationships: {len(with_parents):,} out of {len(vectors_ca21):,}")
print("\nSample hierarchy examples:")
display(with_parents[['vector', 'parent_vector', 'label']].head())
except Exception as e:
print(f"Error: {e}")
print("Make sure you have set your API key!")
Reading vectors from cache...
CA21 Census has 7,709 vectors available
Columns: ['vector', 'label', 'type', 'units', 'aggregation', 'parent_vector', 'details']
Vectors with parent relationships: 7,448 out of 7,709
Sample hierarchy examples:
| vector | parent_vector | label | |
|---|---|---|---|
| 4 | v_CA21_5 | v_CA21_4 | Private dwellings occupied by usual residents |
| 10 | v_CA21_11 | v_CA21_8 | 0 to 14 years |
| 11 | v_CA21_12 | v_CA21_9 | 0 to 14 years |
| 12 | v_CA21_13 | v_CA21_10 | 0 to 14 years |
| 13 | v_CA21_14 | v_CA21_11 | 0 to 4 years |
3. Enhanced Vector Search
The find_census_vectors() function provides smart search with relevance scoring:
try:
# Search for income-related vectors
income_vectors = find_census_vectors('CA21', 'income')
print(f"Found {len(income_vectors)} income-related vectors")
print(f"\nTop income vectors (sorted by relevance):")
display(income_vectors[['vector', 'label', 'relevance_score']].head(3))
except Exception as e:
print(f"Error searching vectors: {e}")
Reading vectors from cache...
Found 649 income-related vectors
Top income vectors (sorted by relevance):
| vector | label | relevance_score | |
|---|---|---|---|
| 4380 | v_CA21_4315 | % of tenant households spending 30% or more of... | 5.0 |
| 511 | v_CA21_554 | Income statistics in 2020 for the population a... | 5.0 |
| 512 | v_CA21_555 | Income statistics in 2020 for the population a... | 5.0 |
4. Real Data Retrieval
Finally, let’s get actual census data using our hierarchy vectors:
try:
# Get real data for Toronto CMA using our income hierarchy vectors
toronto_data = get_census(
dataset='CA21',
regions={'cma': '535'}, # Toronto CMA
vectors=['v_CA21_923', 'v_CA21_939', 'v_CA21_942', 'v_CA21_943'], # Income categories
level='cma',
use_cache=False
)
print(f"Toronto CMA Income Demographics:")
print(f"\nHousehold Income Distribution:")
total_households = toronto_data['v_CA21_923'].iloc[0]
high_income = toronto_data['v_CA21_939'].iloc[0] # $100,000+
very_high_1 = toronto_data['v_CA21_942'].iloc[0] # $150,000-$199,999
very_high_2 = toronto_data['v_CA21_943'].iloc[0] # $200,000+
print(f"• Total households: {total_households:,}")
print(f"• $100,000+ income: {high_income:,} ({high_income/total_households*100:.1f}%)")
print(f" - $150,000-$199,999: {very_high_1:,} ({very_high_1/total_households*100:.1f}%)")
print(f" - $200,000+: {very_high_2:,} ({very_high_2/total_households*100:.1f}%)")
except Exception as e:
print(f"Error retrieving data: {e}")
print("This requires a valid API key and internet connection")
Error retrieving data: Invalid level: cma. Valid levels are: C, Regions, PR, CMA, CD, CSD, CT, DA, EA, DB
This requires a valid API key and internet connection
Summary
This tutorial demonstrates the enhanced pycancensus capabilities:
list_census_vectors() - Browse 7,709+ available variables with explicit parent-child relationships
Hierarchy Navigation - Navigate through income hierarchies from main categories to detailed brackets
parent_census_vectors() & child_census_vectors() - Navigate up and down the hierarchy
find_census_vectors() - Smart search with relevance scoring
Real Data - Actual census data retrieved and analyzed
🎯 Key Improvement: Unlike previous versions, these hierarchy functions now work with clear, well-defined parent-child relationships in the census data structure.
Next Steps:
Explore other hierarchies (income, education, housing)
Try different geographic levels (province, census division, etc.)
Use
geo_format='geopandas'for spatial analysisCheck out the gallery examples for more advanced use cases
Getting Help
Documentation: Explore the API reference and other tutorials
Examples: Browse the example gallery for specific use cases
Issues: Report problems on GitHub
API Key: Get your free key at CensusMapper