Getting Started with pycancensus

This tutorial demonstrates the enhanced pycancensus functionality with clear hierarchy examples and real data access.

Key Features Demonstrated:

  • list_census_vectors() - Browse all available data variables

  • Vector Hierarchies - Navigate parent-child relationships

  • find_census_vectors() - Smart search functionality

  • Real Data Retrieval - Get actual census data

Note

You’ll need a free API key from CensusMapper to run these examples with real data.

Setup and Installation

First, let’s import pycancensus and set up our environment:

import pycancensus
from pycancensus import (
    list_census_datasets, 
    list_census_vectors, 
    get_census,
    parent_census_vectors,
    child_census_vectors,
    find_census_vectors
)
import pandas as pd

# Set your API key (replace with your actual key)
# pycancensus.set_api_key("your_api_key_here")
print("pycancensus imported successfully!")
pycancensus imported successfully!

1. Exploring Census Vectors

The list_census_vectors() function shows all available data variables:

# List all vectors for 2021 Census
try:
    vectors_ca21 = list_census_vectors('CA21')
    print(f"CA21 Census has {len(vectors_ca21):,} vectors available")
    print(f"Columns: {list(vectors_ca21.columns)}")

    # Show how many vectors have parent relationships
    with_parents = vectors_ca21[vectors_ca21['parent_vector'].notna()]
    print(f"Vectors with parent relationships: {len(with_parents):,} out of {len(vectors_ca21):,}")
    print("\nSample hierarchy examples:")
    display(with_parents[['vector', 'parent_vector', 'label']].head())
    
except Exception as e:
    print(f"Error: {e}")
    print("Make sure you have set your API key!")
Reading vectors from cache...
CA21 Census has 7,709 vectors available
Columns: ['vector', 'label', 'type', 'units', 'aggregation', 'parent_vector', 'details']
Vectors with parent relationships: 7,448 out of 7,709

Sample hierarchy examples:
vector parent_vector label
4 v_CA21_5 v_CA21_4 Private dwellings occupied by usual residents
10 v_CA21_11 v_CA21_8 0 to 14 years
11 v_CA21_12 v_CA21_9 0 to 14 years
12 v_CA21_13 v_CA21_10 0 to 14 years
13 v_CA21_14 v_CA21_11 0 to 4 years

2. Vector Hierarchy Navigation

Unlike previous versions with limited hierarchy examples, pycancensus now provides clear parent-child relationships:

try:
    # Find household income vector (this is our ROOT with real hierarchy)
    income_root = "v_CA21_923"  # Household total income groups in 2020
    
    # Get the vector details for context
    income_info = vectors_ca21[vectors_ca21['vector'] == income_root]
    if not income_info.empty:
        print(f"Household Income Hierarchy\n")
        print(f"ROOT: {income_root} - {income_info['label'].iloc[0][:50]}...")
        print(f"\nLEVEL 1 - Income Brackets:")
    
    # Get its direct children (income brackets)
    income_children = child_census_vectors(income_root, 'CA21')
    display(income_children[['vector', 'label', 'parent_vector']].head(8))  # Show first 8 brackets
    
except Exception as e:
    print(f"Error exploring hierarchy: {e}")
Household Income Hierarchy

ROOT: v_CA21_923 - Household total income groups in 2020 for private ...

LEVEL 1 - Income Brackets:
vector label parent_vector
0 v_CA21_924 Under $5,000 v_CA21_923
1 v_CA21_925 $5,000 to $9,999 v_CA21_923
2 v_CA21_926 $10,000 to $14,999 v_CA21_923
3 v_CA21_927 $15,000 to $19,999 v_CA21_923
4 v_CA21_928 $20,000 to $24,999 v_CA21_923
5 v_CA21_929 $25,000 to $29,999 v_CA21_923
6 v_CA21_930 $30,000 to $34,999 v_CA21_923
7 v_CA21_931 $35,000 to $39,999 v_CA21_923

Drilling Down Further

try:
    # Drill down into the high-income bracket (shows grandparent -> parent -> child)
    high_income_bracket = "v_CA21_939"  # $100,000 and over
    print(f"LEVEL 2 - High-income sub-categories for '{high_income_bracket}':")

    # Get the children of the $100,000+ bracket
    high_income_subcats = child_census_vectors(high_income_bracket, 'CA21')
    display(high_income_subcats[['vector', 'label', 'parent_vector']])

    # Show the parent relationship for context
    parent_info = parent_census_vectors(high_income_bracket, 'CA21')
    if not parent_info.empty:
        print(f"\nParent of this bracket: {parent_info['vector'].iloc[0]} - {parent_info['label'].iloc[0][:50]}...")
    
except Exception as e:
    print(f"Error exploring detailed hierarchy: {e}")
LEVEL 2 - High-income sub-categories for 'v_CA21_939':
vector label parent_vector
0 v_CA21_940 $100,000 to $124,999 v_CA21_939
1 v_CA21_941 $125,000 to $149,999 v_CA21_939
2 v_CA21_942 $150,000 to $199,999 v_CA21_939
3 v_CA21_943 $200,000 and over v_CA21_939
Parent of this bracket: v_CA21_923 - Household total income groups in 2020 for private ...

Finding Parent Vectors

You can also navigate upward in the hierarchy:

try:
    # Find parent of a specific income bracket
    income_bracket = "v_CA21_942"  # $150,000 to $199,999
    parent = parent_census_vectors(income_bracket, 'CA21')
    print(f"Finding parent of income bracket '{income_bracket}':")
    display(parent[['vector', 'label', 'parent_vector']])
    
except Exception as e:
    print(f"Error finding parent: {e}")
Reading vectors from cache...
Finding parent of income bracket 'v_CA21_942':
vector label parent_vector
0 v_CA21_939 $100,000 and over v_CA21_923

4. Real Data Retrieval

Finally, let’s get actual census data using our hierarchy vectors:

try:
    # Get real data for Toronto CMA using our income hierarchy vectors
    toronto_data = get_census(
        dataset='CA21',
        regions={'cma': '535'},  # Toronto CMA
        vectors=['v_CA21_923', 'v_CA21_939', 'v_CA21_942', 'v_CA21_943'],  # Income categories
        level='cma',
        use_cache=False
    )
    
    print(f"Toronto CMA Income Demographics:")
    print(f"\nHousehold Income Distribution:")
    total_households = toronto_data['v_CA21_923'].iloc[0]
    high_income = toronto_data['v_CA21_939'].iloc[0]  # $100,000+
    very_high_1 = toronto_data['v_CA21_942'].iloc[0]  # $150,000-$199,999
    very_high_2 = toronto_data['v_CA21_943'].iloc[0]  # $200,000+
    
    print(f"• Total households: {total_households:,}")
    print(f"• $100,000+ income: {high_income:,} ({high_income/total_households*100:.1f}%)")
    print(f"  - $150,000-$199,999: {very_high_1:,} ({very_high_1/total_households*100:.1f}%)")
    print(f"  - $200,000+: {very_high_2:,} ({very_high_2/total_households*100:.1f}%)")
    
except Exception as e:
    print(f"Error retrieving data: {e}")
    print("This requires a valid API key and internet connection")
Error retrieving data: Invalid level: cma. Valid levels are: C, Regions, PR, CMA, CD, CSD, CT, DA, EA, DB
This requires a valid API key and internet connection

Summary

This tutorial demonstrates the enhanced pycancensus capabilities:

  1. list_census_vectors() - Browse 7,709+ available variables with explicit parent-child relationships

  2. Hierarchy Navigation - Navigate through income hierarchies from main categories to detailed brackets

  3. parent_census_vectors() & child_census_vectors() - Navigate up and down the hierarchy

  4. find_census_vectors() - Smart search with relevance scoring

  5. Real Data - Actual census data retrieved and analyzed

🎯 Key Improvement: Unlike previous versions, these hierarchy functions now work with clear, well-defined parent-child relationships in the census data structure.

Next Steps:

  • Explore other hierarchies (income, education, housing)

  • Try different geographic levels (province, census division, etc.)

  • Use geo_format='geopandas' for spatial analysis

  • Check out the gallery examples for more advanced use cases

Getting Help

  • Documentation: Explore the API reference and other tutorials

  • Examples: Browse the example gallery for specific use cases

  • Issues: Report problems on GitHub

  • API Key: Get your free key at CensusMapper