# This cell is added by sphinx-gallery
# It can be customized to whatever you like
%matplotlib inline

Basic Census Data Access

This example demonstrates how to access Canadian Census data using pycancensus, covering the essential functions for getting started with census data analysis.

Setting up pycancensus

First, we need to import pycancensus and set up our API key. You can get a free API key at: https://censusmapper.ca/users/sign_up

import pycancensus as pc
import pandas as pd

# Set your API key (you'll need to replace this with your actual key)
import os
api_key = os.environ.get('CANCENSUS_API_KEY')
if api_key:
    pc.set_api_key(api_key)
    print("API key configured")
else:
    print("No API key - examples will show code structure")
    print("Get your API key at: https://censusmapper.ca/users/sign_up")
API key set for current session.
API key configured

Exploring Available Datasets

Let’s start by exploring what Census datasets are available.

print("Available Census Datasets:")
try:
    datasets = pc.list_census_datasets()
    print(datasets)
except Exception as e:
    print(f"Error accessing datasets: {e}")
    print("Make sure you have set your API key!")
Available Census Datasets:
Querying CensusMapper API for available datasets...
Retrieved 29 datasets
    dataset                                        description geo_dataset  \
0    CA1996                                 1996 Canada Census      CA1996   
1      CA01                                 2001 Canada Census        CA01   
2      CA06                                 2006 Canada Census        CA06   
3      CA11                         2011 Canada Census and NHS        CA11   
4      CA16                                 2016 Canada Census        CA16   
5      CA21                                 2021 Canada Census        CA21   
6   CA01xSD  2001 Canada Census xtab - Structural type by D...        CA01   
7   CA06xSD  2006 Canada Census xtab - Structural type by D...        CA06   
8   CA11xSD  2011 Canada Census xtab - Structural type by D...        CA11   
9   CA16xSD  2016 Canada Census xtab - Structural type by D...        CA16   
10   TX2000                            2000 T1FF taxfiler data      CA1996   
11   TX2001                            2001 T1FF taxfiler data        CA01   
12   TX2002                            2002 T1FF taxfiler data        CA01   
13   TX2003                            2003 T1FF taxfiler data        CA01   
14   TX2004                            2004 T1FF taxfiler data        CA01   
15   TX2005                            2005 T1FF taxfiler data        CA01   
16   TX2006                            2006 T1FF taxfiler data        CA06   
17   TX2007                            2007 T1FF taxfiler data        CA06   
18   TX2008                            2008 T1FF taxfiler data        CA06   
19   TX2009                            2009 T1FF taxfiler data        CA06   
20   TX2010                            2010 T1FF taxfiler data        CA06   
21   TX2011                            2011 T1FF taxfiler data        CA06   
22   TX2012                            2012 T1FF taxfiler data        CA11   
23   TX2013                            2013 T1FF taxfiler data        CA11   
24   TX2014                            2014 T1FF taxfiler data        CA11   
25   TX2015                            2015 T1FF taxfiler data        CA11   
26   TX2016                            2016 T1FF taxfiler data        CA16   
27   TX2017                            2017 T1FF taxfiler data        CA16   
28   TX2018                            2018 T1FF taxfiler data        CA16   

                           attribution           reference  \
0                  StatCan 1996 Census            92-351-U   
1                  StatCan 2001 Census            92-378-X   
2                  StatCan 2006 Census            92-566-X   
3          StatCan 2011 Census and NHS  98-301-X, 99-000-X   
4                  StatCan 2016 Census            98-301-X   
5                  StatCan 2021 Census            98-301-X   
6   StatCan 2001 Census xtab, via CMHC            92-378-X   
7   StatCan 2006 Census xtab, via CMHC            92-566-X   
8   StatCan 2011 Census xtab, via CMHC            98-301-X   
9   StatCan 2016 Census xtab, via CMHC            98-301-X   
10         StatCan 2000 T1FF, via CMHC            72-212-X   
11         StatCan 2001 T1FF, via CMHC            72-212-X   
12         StatCan 2002 T1FF, via CMHC            72-212-X   
13         StatCan 2003 T1FF, via CMHC            72-212-X   
14         StatCan 2004 T1FF, via CMHC            72-212-X   
15         StatCan 2005 T1FF, via CMHC            72-212-X   
16         StatCan 2006 T1FF, via CMHC            72-212-X   
17         StatCan 2007 T1FF, via CMHC            72-212-X   
18         StatCan 2008 T1FF, via CMHC            72-212-X   
19         StatCan 2009 T1FF, via CMHC            72-212-X   
20         StatCan 2010 T1FF, via CMHC            72-212-X   
21         StatCan 2011 T1FF, via CMHC            72-212-X   
22         StatCan 2012 T1FF, via CMHC            72-212-X   
23         StatCan 2013 T1FF, via CMHC            72-212-X   
24         StatCan 2014 T1FF, via CMHC            72-212-X   
25         StatCan 2015 T1FF, via CMHC            72-212-X   
26         StatCan 2016 T1FF, via CMHC            72-212-X   
27         StatCan 2017 T1FF, via CMHC            72-212-X   
28         StatCan 2018 T1FF, via CMHC            72-212-X   

                                        reference_url  
0   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
1   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
2   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
3   https://www12.statcan.gc.ca/census-recensement...  
4   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
5   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
6   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
7   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
8   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
9   https://www150.statcan.gc.ca/n1/en/catalogue/9...  
10  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
11  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
12  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
13  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
14  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
15  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
16  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
17  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
18  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
19  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
20  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
21  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
22  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
23  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
24  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
25  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
26  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
27  https://www150.statcan.gc.ca/n1/en/catalogue/7...  
28  https://www150.statcan.gc.ca/n1/en/catalogue/7...  

Finding Census Regions

Next, let’s explore the geographic regions available in the Census.

print("\nExploring Census Regions:")
try:
    # Get regions for the 2021 Census
    regions = pc.list_census_regions("CA21")
    print(f"Found {len(regions)} regions in CA21 dataset")
    print("\nSample regions:")
    print(regions.head())
    
    # Search for specific regions (Vancouver)
    print("\nSearching for Vancouver regions:")
    vancouver_regions = pc.search_census_regions("Vancouver", "CA21")
    print(vancouver_regions[["region", "name", "level", "pop"]].head())
    
except Exception as e:
    print(f"Error accessing regions: {e}")
Exploring Census Regions:
Querying CensusMapper API for CA21 regions...
Retrieved 5518 regions
Found 5518 regions in CA21 dataset

Sample regions:
               name  region level         pop municipal_status  CMA_UID  \
0            Canada       1     C  36991981.0              NaN      NaN   
1           Ontario      35    PR  14223942.0             Ont.      NaN   
2            Quebec      24    PR   8501833.0             Que.      NaN   
3  British Columbia      59    PR   5000879.0             B.C.      NaN   
4           Alberta      48    PR   4262635.0            Alta.      NaN   

   CD_UID  PR_UID  
0     NaN     NaN  
1     NaN     NaN  
2     NaN     NaN  
3     NaN     NaN  
4     NaN     NaN  

Searching for Vancouver regions:
Reading regions from cache...
Found 7 regions matching 'Vancouver'
      region               name level        pop
16     59933          Vancouver   CMA  2642825.0
65      5915  Greater Vancouver    CD  2642825.0
364  5915022          Vancouver   CSD   662248.0
425  5915046    North Vancouver   CSD    88168.0
452  5915051    North Vancouver   CSD    58120.0

Discovering Census Variables

Census data is organized into vectors (variables). Let’s explore what’s available.

print("\nExploring Census Variables:")
try:
    # List available vectors
    vectors = pc.list_census_vectors("CA21")
    print(f"Found {len(vectors)} vectors in CA21 dataset")
    print("\nSample vectors:")
    print(vectors[["vector", "label", "type"]].head())
    
    # Search for population-related vectors
    print("\nSearching for population vectors:")
    pop_vectors = pc.search_census_vectors("population", "CA21")
    print(pop_vectors[["vector", "label", "type"]].head())
    
except Exception as e:
    print(f"Error accessing vectors: {e}")
Exploring Census Variables:
🔍 Querying CensusMapper API for CA21 vectors...
✅ Retrieved 7709 vectors for CA21
📊 Large dataset: 7709 variables available
Found 7709 vectors in CA21 dataset

Sample vectors:
     vector                                          label   type
0  v_CA21_1                               Population, 2021  Total
1  v_CA21_2                               Population, 2016  Total
2  v_CA21_3     Population percentage change, 2016 to 2021  Total
3  v_CA21_4                        Total private dwellings  Total
4  v_CA21_5  Private dwellings occupied by usual residents  Total

Searching for population vectors:
Reading vectors from cache...
Found 6711 vectors matching 'population'
     vector                                          label   type
0  v_CA21_1                               Population, 2021  Total
1  v_CA21_2                               Population, 2016  Total
2  v_CA21_3     Population percentage change, 2016 to 2021  Total
3  v_CA21_4                        Total private dwellings  Total
4  v_CA21_5  Private dwellings occupied by usual residents  Total

Getting Census Data

Now let’s retrieve actual census data for analysis.

print("\nRetrieving Census Data:")
try:
    # Get population data for Vancouver CMA
    data = pc.get_census(
        dataset="CA21",
        regions={"CMA": "59933"},  # Vancouver CMA
        vectors=["v_CA21_1", "v_CA21_2"],  # Population vectors
        level="CSD"  # Census Subdivision level
    )
    
    print(f"Retrieved data shape: {data.shape}")
    print("\nSample data:")
    print(data.head())
    
    # Basic analysis
    if not data.empty and 'v_CA21_1' in data.columns:
        total_pop = data['v_CA21_1'].sum()
        print(f"\nTotal population in Vancouver CMA: {total_pop:,}")
        
except Exception as e:
    print(f"Error retrieving census data: {e}")
Retrieving Census Data:
📋 Request Preview:
   Dataset: CA21
   Level: CSD
   Regions: 1 region(s)
   Variables: 2 vector(s)
🔍 Estimated Size: small (100 rows)
⏱️  Expected Time: < 5 seconds
🔄 Querying CensusMapper API for 1 region(s)...
📊 Retrieving 2 variable(s) at CSD level...
✅ Successfully retrieved data for 38 regions
📈 Data includes 2 vector columns
Retrieved data shape: (38, 13)

Sample data:
    GeoUID Type      Region Name  Area (sq km)  Population  Dwellings  \
0  5915001  CSD     Langley (DM)      307.2193      132603      49011   
1  5915002  CSD     Langley (CY)       10.1796       28963      13271   
2  5915004  CSD      Surrey (CY)      316.1071      568322     195098   
3  5915007  CSD  White Rock (CY)        5.1736       21939      11541   
4  5915011  CSD       Delta (CY)      179.6628      108455      39736   

   Households  rpid rgid   ruid rguid  v_CA21_1: Population, 2021  \
0       46928  5915   59  59933   NaN                    132603.0   
1       12598  5915   59  59933   NaN                     28963.0   
2      185671  5915   59  59933   NaN                    568322.0   
3       10735  5915   59  59933   NaN                     21939.0   
4       38058  5915   59  59933   NaN                    108455.0   

   v_CA21_2: Population, 2016  
0                      117285  
1                       25888  
2                      517887  
3                       19952  
4                      102238  

Working with Geographic Data

pycancensus can also retrieve geographic boundaries along with the data.

print("\nRetrieving Geographic Data:")
try:
    # Get census data with geographic boundaries
    geo_data = pc.get_census(
        dataset="CA21",
        regions={"CMA": "59933"},  # Vancouver CMA
        vectors=["v_CA21_1"],  # Population
        level="CSD",
        geo_format="geopandas"
    )
    
    print(f"GeoDataFrame shape: {geo_data.shape}")
    print(f"Columns: {list(geo_data.columns)}")
    if hasattr(geo_data, 'crs'):
        print(f"Coordinate Reference System: {geo_data.crs}")
    
    # Just the geometries
    geometries = pc.get_census_geometry(
        dataset="CA21",
        regions={"CMA": "59933"},
        level="CSD"
    )
    print(f"\nGeometries-only shape: {geometries.shape}")
    
except Exception as e:
    print(f"Error retrieving geographic data: {e}")
Retrieving Geographic Data:
📋 Request Preview:
   Dataset: CA21
   Level: CSD
   Regions: 1 region(s)
   Variables: 1 vector(s)
   Geography: geopandas
🔍 Estimated Size: medium (100 rows)
⏱️  Expected Time: 5-15 seconds
Downloading 100 regions with geography... 
✅ Retrieved 38 regions with 1 variables (0.5s)
GeoDataFrame shape: (38, 16)
Columns: ['geometry', 'a', 'q', 't', 'dw', 'hh', 'id', 'pop', 'dw16', 'hh16', 'name', 'rgid', 'rpid', 'ruid', 'pop16', 'v_CA21_1: Population, 2021']
Coordinate Reference System: None
📋 Request Preview:
   Dataset: CA21
   Level: CSD
   Regions: 1 region(s)
   Geography: geopandas
🔍 Estimated Size: medium (100 rows)
⏱️  Expected Time: 5-15 seconds
Downloading 100 regions with geography... 
✅ Retrieved 38 regions (0.3s)

Geometries-only shape: (38, 15)

Vector Hierarchy Navigation

pycancensus provides tools to navigate the hierarchical structure of census variables.

print("\nVector Hierarchy Navigation:")
try:
    # Find vectors using enhanced search
    income_vectors = pc.find_census_vectors("CA21", "income")
    print(f"Found {len(income_vectors)} income-related vectors")
    
    # Navigate vector hierarchies using household income as example
    # This demonstrates a real hierarchy: main category -> income brackets -> sub-brackets
    income_parent = "v_CA21_923"  # Household total income groups in 2020
    high_income_bracket = "v_CA21_939"  # $100,000 and over bracket
    
    # Find children of main income vector (all income brackets)
    income_brackets = pc.child_census_vectors(income_parent, dataset="CA21")
    print(f"Income brackets under '{income_parent}': {len(income_brackets)} categories")
    
    # Find grandchildren (sub-categories of high income bracket)  
    high_income_subcats = pc.child_census_vectors(high_income_bracket, dataset="CA21")
    print(f"High-income sub-categories: {len(high_income_subcats)} levels")
    
    # Find parent relationship (child -> parent navigation)
    parent_of_bracket = pc.parent_census_vectors(high_income_bracket, dataset="CA21")
    if not parent_of_bracket.empty:
        print(f"Parent of '{high_income_bracket}': {parent_of_bracket['vector'].iloc[0]}")
    
except Exception as e:
    print(f"Error with vector operations: {e}")
Vector Hierarchy Navigation:
Reading vectors from cache...
Found 649 income-related vectors
Reading vectors from cache...
Income brackets under 'v_CA21_923': 16 categories
Reading vectors from cache...
High-income sub-categories: 4 levels
Reading vectors from cache...
Parent of 'v_CA21_939': v_CA21_923

Extracting Vector Metadata

The label_vectors() function extracts metadata for census vectors from DataFrames returned by get_census().

print("\nExtracting Vector Metadata:")
try:
    # Get census data with vectors
    census_with_vectors = pc.get_census(
        dataset="CA21",
        regions={"PR": "59"},  # British Columbia
        vectors=["v_CA21_1", "v_CA21_2", "v_CA21_3"],
        level="PR",
        labels="detailed"
    )

    # Extract vector labels and metadata
    vector_labels = pc.label_vectors(census_with_vectors)

    print("Vector metadata extracted from census data:")
    for vector_id, label in vector_labels.items():
        print(f"  {vector_id}: {label[:60]}...")

except Exception as e:
    print(f"Error extracting vector metadata: {e}")
Extracting Vector Metadata:
📋 Request Preview:
   Dataset: CA21
   Level: PR
   Regions: 1 region(s)
   Variables: 3 vector(s)
🔍 Estimated Size: small (1 rows)
⏱️  Expected Time: < 5 seconds
🔄 Querying CensusMapper API for 1 region(s)...
📊 Retrieving 3 variable(s) at PR level...
✅ Successfully retrieved data for 1 regions
📈 Data includes 3 vector columns
Vector metadata extracted from census data:
  Vector: 0    v_CA21_1
1    v_CA21_2
2    v_CA21_3
Name: Vector, dtype: object...
  Detail: 0                              Population, 2021
1                              Population, 2016
2    Population percentage change, 2016 to 2021
Name: Detail, dtype: object...

Dataset Attribution

Get proper attribution text for census datasets to comply with Statistics Canada Open Data License requirements.

print("\nDataset Attribution:")
try:
    # Get attribution for a single dataset
    single_attribution = pc.dataset_attribution(["CA21"])
    print(f"\nCA21 Attribution:\n{single_attribution}")

    # Get combined attribution for multiple datasets
    multi_attribution = pc.dataset_attribution(["CA16", "CA21"])
    print(f"\nCombined Attribution (CA16 + CA21):\n{multi_attribution}")

except Exception as e:
    print(f"Error getting dataset attribution: {e}")
Dataset Attribution:

CA21 Attribution:
['StatCan 2021 Census']

Combined Attribution (CA16 + CA21):
['StatCan 2016, 2021 Census']

Summary

This example covered the basic workflow for accessing Canadian Census data:

  1. Setup: Import pycancensus and set your API key

  2. Explore: Discover available datasets, regions, and variables

  3. Retrieve: Get census data for your areas and variables of interest

  4. Analyze: Work with the data using pandas/geopandas workflows

For more advanced examples, see the other gallery examples and tutorials.

print("\n" + "="*50)
print("Basic Census Data Access Example Complete")
print("="*50)
print("\nNext steps:")
print("1. Get your free API key at: https://censusmapper.ca/users/sign_up")
print("2. Set your API key: pc.set_api_key('your_key_here')")  
print("3. Try running this example with real data!")
print("4. Explore the other examples in the gallery")
==================================================
Basic Census Data Access Example Complete
==================================================

Next steps:
1. Get your free API key at: https://censusmapper.ca/users/sign_up
2. Set your API key: pc.set_api_key('your_key_here')
3. Try running this example with real data!
4. Explore the other examples in the gallery