# This cell is added by sphinx-gallery
# It can be customized to whatever you like
%matplotlib inline
Basic Census Data Access
This example demonstrates how to access Canadian Census data using pycancensus, covering the essential functions for getting started with census data analysis.
Setting up pycancensus
First, we need to import pycancensus and set up our API key. You can get a free API key at: https://censusmapper.ca/users/sign_up
import pycancensus as pc
import pandas as pd
# Set your API key (you'll need to replace this with your actual key)
import os
api_key = os.environ.get('CANCENSUS_API_KEY')
if api_key:
pc.set_api_key(api_key)
print("API key configured")
else:
print("No API key - examples will show code structure")
print("Get your API key at: https://censusmapper.ca/users/sign_up")
API key set for current session.
API key configured
Exploring Available Datasets
Let’s start by exploring what Census datasets are available.
print("Available Census Datasets:")
try:
datasets = pc.list_census_datasets()
print(datasets)
except Exception as e:
print(f"Error accessing datasets: {e}")
print("Make sure you have set your API key!")
Available Census Datasets:
Querying CensusMapper API for available datasets...
Retrieved 29 datasets
dataset description geo_dataset \
0 CA1996 1996 Canada Census CA1996
1 CA01 2001 Canada Census CA01
2 CA06 2006 Canada Census CA06
3 CA11 2011 Canada Census and NHS CA11
4 CA16 2016 Canada Census CA16
5 CA21 2021 Canada Census CA21
6 CA01xSD 2001 Canada Census xtab - Structural type by D... CA01
7 CA06xSD 2006 Canada Census xtab - Structural type by D... CA06
8 CA11xSD 2011 Canada Census xtab - Structural type by D... CA11
9 CA16xSD 2016 Canada Census xtab - Structural type by D... CA16
10 TX2000 2000 T1FF taxfiler data CA1996
11 TX2001 2001 T1FF taxfiler data CA01
12 TX2002 2002 T1FF taxfiler data CA01
13 TX2003 2003 T1FF taxfiler data CA01
14 TX2004 2004 T1FF taxfiler data CA01
15 TX2005 2005 T1FF taxfiler data CA01
16 TX2006 2006 T1FF taxfiler data CA06
17 TX2007 2007 T1FF taxfiler data CA06
18 TX2008 2008 T1FF taxfiler data CA06
19 TX2009 2009 T1FF taxfiler data CA06
20 TX2010 2010 T1FF taxfiler data CA06
21 TX2011 2011 T1FF taxfiler data CA06
22 TX2012 2012 T1FF taxfiler data CA11
23 TX2013 2013 T1FF taxfiler data CA11
24 TX2014 2014 T1FF taxfiler data CA11
25 TX2015 2015 T1FF taxfiler data CA11
26 TX2016 2016 T1FF taxfiler data CA16
27 TX2017 2017 T1FF taxfiler data CA16
28 TX2018 2018 T1FF taxfiler data CA16
attribution reference \
0 StatCan 1996 Census 92-351-U
1 StatCan 2001 Census 92-378-X
2 StatCan 2006 Census 92-566-X
3 StatCan 2011 Census and NHS 98-301-X, 99-000-X
4 StatCan 2016 Census 98-301-X
5 StatCan 2021 Census 98-301-X
6 StatCan 2001 Census xtab, via CMHC 92-378-X
7 StatCan 2006 Census xtab, via CMHC 92-566-X
8 StatCan 2011 Census xtab, via CMHC 98-301-X
9 StatCan 2016 Census xtab, via CMHC 98-301-X
10 StatCan 2000 T1FF, via CMHC 72-212-X
11 StatCan 2001 T1FF, via CMHC 72-212-X
12 StatCan 2002 T1FF, via CMHC 72-212-X
13 StatCan 2003 T1FF, via CMHC 72-212-X
14 StatCan 2004 T1FF, via CMHC 72-212-X
15 StatCan 2005 T1FF, via CMHC 72-212-X
16 StatCan 2006 T1FF, via CMHC 72-212-X
17 StatCan 2007 T1FF, via CMHC 72-212-X
18 StatCan 2008 T1FF, via CMHC 72-212-X
19 StatCan 2009 T1FF, via CMHC 72-212-X
20 StatCan 2010 T1FF, via CMHC 72-212-X
21 StatCan 2011 T1FF, via CMHC 72-212-X
22 StatCan 2012 T1FF, via CMHC 72-212-X
23 StatCan 2013 T1FF, via CMHC 72-212-X
24 StatCan 2014 T1FF, via CMHC 72-212-X
25 StatCan 2015 T1FF, via CMHC 72-212-X
26 StatCan 2016 T1FF, via CMHC 72-212-X
27 StatCan 2017 T1FF, via CMHC 72-212-X
28 StatCan 2018 T1FF, via CMHC 72-212-X
reference_url
0 https://www150.statcan.gc.ca/n1/en/catalogue/9...
1 https://www150.statcan.gc.ca/n1/en/catalogue/9...
2 https://www150.statcan.gc.ca/n1/en/catalogue/9...
3 https://www12.statcan.gc.ca/census-recensement...
4 https://www150.statcan.gc.ca/n1/en/catalogue/9...
5 https://www150.statcan.gc.ca/n1/en/catalogue/9...
6 https://www150.statcan.gc.ca/n1/en/catalogue/9...
7 https://www150.statcan.gc.ca/n1/en/catalogue/9...
8 https://www150.statcan.gc.ca/n1/en/catalogue/9...
9 https://www150.statcan.gc.ca/n1/en/catalogue/9...
10 https://www150.statcan.gc.ca/n1/en/catalogue/7...
11 https://www150.statcan.gc.ca/n1/en/catalogue/7...
12 https://www150.statcan.gc.ca/n1/en/catalogue/7...
13 https://www150.statcan.gc.ca/n1/en/catalogue/7...
14 https://www150.statcan.gc.ca/n1/en/catalogue/7...
15 https://www150.statcan.gc.ca/n1/en/catalogue/7...
16 https://www150.statcan.gc.ca/n1/en/catalogue/7...
17 https://www150.statcan.gc.ca/n1/en/catalogue/7...
18 https://www150.statcan.gc.ca/n1/en/catalogue/7...
19 https://www150.statcan.gc.ca/n1/en/catalogue/7...
20 https://www150.statcan.gc.ca/n1/en/catalogue/7...
21 https://www150.statcan.gc.ca/n1/en/catalogue/7...
22 https://www150.statcan.gc.ca/n1/en/catalogue/7...
23 https://www150.statcan.gc.ca/n1/en/catalogue/7...
24 https://www150.statcan.gc.ca/n1/en/catalogue/7...
25 https://www150.statcan.gc.ca/n1/en/catalogue/7...
26 https://www150.statcan.gc.ca/n1/en/catalogue/7...
27 https://www150.statcan.gc.ca/n1/en/catalogue/7...
28 https://www150.statcan.gc.ca/n1/en/catalogue/7...
Finding Census Regions
Next, let’s explore the geographic regions available in the Census.
print("\nExploring Census Regions:")
try:
# Get regions for the 2021 Census
regions = pc.list_census_regions("CA21")
print(f"Found {len(regions)} regions in CA21 dataset")
print("\nSample regions:")
print(regions.head())
# Search for specific regions (Vancouver)
print("\nSearching for Vancouver regions:")
vancouver_regions = pc.search_census_regions("Vancouver", "CA21")
print(vancouver_regions[["region", "name", "level", "pop"]].head())
except Exception as e:
print(f"Error accessing regions: {e}")
Exploring Census Regions:
Querying CensusMapper API for CA21 regions...
Retrieved 5518 regions
Found 5518 regions in CA21 dataset
Sample regions:
name region level pop municipal_status CMA_UID \
0 Canada 1 C 36991981.0 NaN NaN
1 Ontario 35 PR 14223942.0 Ont. NaN
2 Quebec 24 PR 8501833.0 Que. NaN
3 British Columbia 59 PR 5000879.0 B.C. NaN
4 Alberta 48 PR 4262635.0 Alta. NaN
CD_UID PR_UID
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
Searching for Vancouver regions:
Reading regions from cache...
Found 7 regions matching 'Vancouver'
region name level pop
16 59933 Vancouver CMA 2642825.0
65 5915 Greater Vancouver CD 2642825.0
364 5915022 Vancouver CSD 662248.0
425 5915046 North Vancouver CSD 88168.0
452 5915051 North Vancouver CSD 58120.0
Discovering Census Variables
Census data is organized into vectors (variables). Let’s explore what’s available.
print("\nExploring Census Variables:")
try:
# List available vectors
vectors = pc.list_census_vectors("CA21")
print(f"Found {len(vectors)} vectors in CA21 dataset")
print("\nSample vectors:")
print(vectors[["vector", "label", "type"]].head())
# Search for population-related vectors
print("\nSearching for population vectors:")
pop_vectors = pc.search_census_vectors("population", "CA21")
print(pop_vectors[["vector", "label", "type"]].head())
except Exception as e:
print(f"Error accessing vectors: {e}")
Exploring Census Variables:
🔍 Querying CensusMapper API for CA21 vectors...
✅ Retrieved 7709 vectors for CA21
📊 Large dataset: 7709 variables available
Found 7709 vectors in CA21 dataset
Sample vectors:
vector label type
0 v_CA21_1 Population, 2021 Total
1 v_CA21_2 Population, 2016 Total
2 v_CA21_3 Population percentage change, 2016 to 2021 Total
3 v_CA21_4 Total private dwellings Total
4 v_CA21_5 Private dwellings occupied by usual residents Total
Searching for population vectors:
Reading vectors from cache...
Found 6711 vectors matching 'population'
vector label type
0 v_CA21_1 Population, 2021 Total
1 v_CA21_2 Population, 2016 Total
2 v_CA21_3 Population percentage change, 2016 to 2021 Total
3 v_CA21_4 Total private dwellings Total
4 v_CA21_5 Private dwellings occupied by usual residents Total
Getting Census Data
Now let’s retrieve actual census data for analysis.
print("\nRetrieving Census Data:")
try:
# Get population data for Vancouver CMA
data = pc.get_census(
dataset="CA21",
regions={"CMA": "59933"}, # Vancouver CMA
vectors=["v_CA21_1", "v_CA21_2"], # Population vectors
level="CSD" # Census Subdivision level
)
print(f"Retrieved data shape: {data.shape}")
print("\nSample data:")
print(data.head())
# Basic analysis
if not data.empty and 'v_CA21_1' in data.columns:
total_pop = data['v_CA21_1'].sum()
print(f"\nTotal population in Vancouver CMA: {total_pop:,}")
except Exception as e:
print(f"Error retrieving census data: {e}")
Retrieving Census Data:
📋 Request Preview:
Dataset: CA21
Level: CSD
Regions: 1 region(s)
Variables: 2 vector(s)
🔍 Estimated Size: small (100 rows)
⏱️ Expected Time: < 5 seconds
🔄 Querying CensusMapper API for 1 region(s)...
📊 Retrieving 2 variable(s) at CSD level...
✅ Successfully retrieved data for 38 regions
📈 Data includes 2 vector columns
Retrieved data shape: (38, 13)
Sample data:
GeoUID Type Region Name Area (sq km) Population Dwellings \
0 5915001 CSD Langley (DM) 307.2193 132603 49011
1 5915002 CSD Langley (CY) 10.1796 28963 13271
2 5915004 CSD Surrey (CY) 316.1071 568322 195098
3 5915007 CSD White Rock (CY) 5.1736 21939 11541
4 5915011 CSD Delta (CY) 179.6628 108455 39736
Households rpid rgid ruid rguid v_CA21_1: Population, 2021 \
0 46928 5915 59 59933 NaN 132603.0
1 12598 5915 59 59933 NaN 28963.0
2 185671 5915 59 59933 NaN 568322.0
3 10735 5915 59 59933 NaN 21939.0
4 38058 5915 59 59933 NaN 108455.0
v_CA21_2: Population, 2016
0 117285
1 25888
2 517887
3 19952
4 102238
Working with Geographic Data
pycancensus can also retrieve geographic boundaries along with the data.
print("\nRetrieving Geographic Data:")
try:
# Get census data with geographic boundaries
geo_data = pc.get_census(
dataset="CA21",
regions={"CMA": "59933"}, # Vancouver CMA
vectors=["v_CA21_1"], # Population
level="CSD",
geo_format="geopandas"
)
print(f"GeoDataFrame shape: {geo_data.shape}")
print(f"Columns: {list(geo_data.columns)}")
if hasattr(geo_data, 'crs'):
print(f"Coordinate Reference System: {geo_data.crs}")
# Just the geometries
geometries = pc.get_census_geometry(
dataset="CA21",
regions={"CMA": "59933"},
level="CSD"
)
print(f"\nGeometries-only shape: {geometries.shape}")
except Exception as e:
print(f"Error retrieving geographic data: {e}")
Retrieving Geographic Data:
📋 Request Preview:
Dataset: CA21
Level: CSD
Regions: 1 region(s)
Variables: 1 vector(s)
Geography: geopandas
🔍 Estimated Size: medium (100 rows)
⏱️ Expected Time: 5-15 seconds
Downloading 100 regions with geography...
✅ Retrieved 38 regions with 1 variables (0.5s)
GeoDataFrame shape: (38, 16)
Columns: ['geometry', 'a', 'q', 't', 'dw', 'hh', 'id', 'pop', 'dw16', 'hh16', 'name', 'rgid', 'rpid', 'ruid', 'pop16', 'v_CA21_1: Population, 2021']
Coordinate Reference System: EPSG:4326
📋 Request Preview:
Dataset: CA21
Level: CSD
Regions: 1 region(s)
Geography: geopandas
🔍 Estimated Size: medium (100 rows)
⏱️ Expected Time: 5-15 seconds
Downloading 100 regions with geography...
✅ Retrieved 38 regions (0.3s)
Geometries-only shape: (38, 15)
Extracting Vector Metadata
The label_vectors() function extracts metadata for census vectors from DataFrames returned by get_census().
print("\nExtracting Vector Metadata:")
try:
# Get census data with vectors
census_with_vectors = pc.get_census(
dataset="CA21",
regions={"PR": "59"}, # British Columbia
vectors=["v_CA21_1", "v_CA21_2", "v_CA21_3"],
level="PR",
labels="detailed"
)
# Extract vector labels and metadata
vector_labels = pc.label_vectors(census_with_vectors)
print("Vector metadata extracted from census data:")
for vector_id, label in vector_labels.items():
print(f" {vector_id}: {label[:60]}...")
except Exception as e:
print(f"Error extracting vector metadata: {e}")
Extracting Vector Metadata:
📋 Request Preview:
Dataset: CA21
Level: PR
Regions: 1 region(s)
Variables: 3 vector(s)
🔍 Estimated Size: small (1 rows)
⏱️ Expected Time: < 5 seconds
🔄 Querying CensusMapper API for 1 region(s)...
📊 Retrieving 3 variable(s) at PR level...
✅ Successfully retrieved data for 1 regions
📈 Data includes 3 vector columns
Vector metadata extracted from census data:
Vector: 0 v_CA21_1
1 v_CA21_2
2 v_CA21_3
Name: Vector, dtype: object...
Detail: 0 Population, 2021
1 Population, 2016
2 Population percentage change, 2016 to 2021
Name: Detail, dtype: object...
Dataset Attribution
Get proper attribution text for census datasets to comply with Statistics Canada Open Data License requirements.
print("\nDataset Attribution:")
try:
# Get attribution for a single dataset
single_attribution = pc.dataset_attribution(["CA21"])
print(f"\nCA21 Attribution:\n{single_attribution}")
# Get combined attribution for multiple datasets
multi_attribution = pc.dataset_attribution(["CA16", "CA21"])
print(f"\nCombined Attribution (CA16 + CA21):\n{multi_attribution}")
except Exception as e:
print(f"Error getting dataset attribution: {e}")
Dataset Attribution:
CA21 Attribution:
['StatCan 2021 Census']
Combined Attribution (CA16 + CA21):
['StatCan 2016, 2021 Census']
Summary
This example covered the basic workflow for accessing Canadian Census data:
Setup: Import pycancensus and set your API key
Explore: Discover available datasets, regions, and variables
Retrieve: Get census data for your areas and variables of interest
Analyze: Work with the data using pandas/geopandas workflows
For more advanced examples, see the other gallery examples and tutorials.
print("\n" + "="*50)
print("Basic Census Data Access Example Complete")
print("="*50)
print("\nNext steps:")
print("1. Get your free API key at: https://censusmapper.ca/users/sign_up")
print("2. Set your API key: pc.set_api_key('your_key_here')")
print("3. Try running this example with real data!")
print("4. Explore the other examples in the gallery")
==================================================
Basic Census Data Access Example Complete
==================================================
Next steps:
1. Get your free API key at: https://censusmapper.ca/users/sign_up
2. Set your API key: pc.set_api_key('your_key_here')
3. Try running this example with real data!
4. Explore the other examples in the gallery