Migration Guide: R to Python

Complete guide for R cancensus users migrating to Python pycancensus.

Quick Start

Installation

R:

install.packages("cancensus")
library(cancensus)

Python:

pip install pycancensus
import pycancensus as pc

API Key Setup

R:

set_cancensus_api_key("YOUR_API_KEY", install = TRUE)

Python:

pc.set_api_key("YOUR_API_KEY", install=True)
# Or: export CANCENSUS_API_KEY="YOUR_API_KEY"

Function Equivalence

All Core Functions Available

R Function

Python Function

Equivalence

get_census()

get_census()

✅ 100%

list_census_datasets()

list_census_datasets()

✅ 100%

list_census_vectors()

list_census_vectors()

✅ 100%

search_census_vectors()

search_census_vectors()

✅ 100%

find_census_vectors()

find_census_vectors()

✅ 100%

parent_census_vectors()

parent_census_vectors()

✅ 100%

child_census_vectors()

child_census_vectors()

✅ 100%

dataset_attribution()

dataset_attribution()

✅ 100%

label_vectors()

label_vectors()

✅ 100%

list_cancensus_cache()

list_cache()

✅ 100%

Syntax Conversion

Core Syntax Differences

R Syntax

Python Syntax

list(CMA = "59933")

{'CMA': '59933'}

c("v1", "v2", "v3")

['v1', 'v2', 'v3']

TRUE / FALSE

True / False

NULL

None

Side-by-Side Examples

Example 1: Basic Data Retrieval

R:

library(cancensus)

census_data <- get_census(
  dataset = 'CA21',
  regions = list(CMA = "59933"),
  vectors = c("v_CA21_906"),
  level = 'CSD'
)

Python:

import pycancensus as pc

census_data = pc.get_census(
    dataset='CA21',
    regions={'CMA': '59933'},
    vectors=['v_CA21_906'],
    level='CSD'
)

Example 2: With Geography

R:

census_data <- get_census(
  dataset = 'CA21',
  regions = list(CMA = "35535"),
  vectors = c("v_CA21_906"),
  level = 'CSD',
  geo_format = 'sf'
)

Python:

census_data = pc.get_census(
    dataset='CA21',
    regions={'CMA': '35535'},
    vectors=['v_CA21_906'],
    level='CSD',
    geo_format='sf'
)

Example 3: Search Vectors

R:

income_vectors <- search_census_vectors("income", "CA21")

Python:

income_vectors = pc.search_census_vectors("income", "CA21")

Example 4: List Datasets

R:

datasets <- list_census_datasets()

Python:

datasets = pc.list_census_datasets()

Return Type Conversions

Data Structures

R Type

Python Type

Notes

data.frame / tibble

pandas.DataFrame

Direct equivalent

sf object

geopandas.GeoDataFrame

Same spatial data

list

list

Direct equivalent

character

str

Direct equivalent

Working with Results

R:

# Filter data
filtered <- census_data %>%
  filter(Population > 50000)

# Select columns
selected <- census_data %>%
  select(GeoUID, Population)

Python:

# Filter data
filtered = census_data[census_data['Population'] > 50000]

# Select columns
selected = census_data[['GeoUID', 'Population']]

Visualization Migration

Mapping

R (using ggplot2 + sf):

library(ggplot2)
library(sf)

ggplot(census_data) +
  geom_sf(aes(fill = v_CA21_906)) +
  scale_fill_viridis_c() +
  theme_minimal()

Python (using matplotlib + geopandas):

import matplotlib.pyplot as plt

census_data.plot(
    column='v_CA21_906',
    cmap='viridis',
    legend=True
)
plt.show()

Python (using plotly for interactive):

import plotly.express as px

fig = px.choropleth_mapbox(
    census_data,
    geojson=census_data.geometry,
    locations=census_data.index,
    color='v_CA21_906',
    mapbox_style='carto-positron'
)
fig.show()

Charts

R (ggplot2):

ggplot(census_data, aes(x = `Region Name`, y = Population)) +
  geom_bar(stat = "identity") +
  theme(axis.text.x = element_text(angle = 45))

Python (matplotlib):

census_data.plot.bar(x='Region Name', y='Population')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Python (plotly):

import plotly.express as px

fig = px.bar(census_data, x='Region Name', y='Population')
fig.show()

Common Migration Patterns

Pattern 1: Data Pipeline

R:

library(dplyr)
library(cancensus)

result <- get_census(
  dataset = 'CA21',
  regions = list(CMA = "35535"),
  vectors = c("v_CA21_906"),
  level = 'CSD'
) %>%
  filter(Population > 50000) %>%
  arrange(desc(v_CA21_906))

Python:

import pycancensus as pc

result = (pc.get_census(
    dataset='CA21',
    regions={'CMA': '35535'},
    vectors=['v_CA21_906'],
    level='CSD'
)
.query('Population > 50000')
.sort_values('v_CA21_906', ascending=False)
)

Pattern 2: Multiple Regions

R:

regions <- list(
  CMA = c("59933", "35535", "24462")
)

data <- get_census(
  dataset = 'CA21',
  regions = regions,
  vectors = c("v_CA21_906"),
  level = 'CSD'
)

Python:

regions = {
    'CMA': ['59933', '35535', '24462']
}

data = pc.get_census(
    dataset='CA21',
    regions=regions,
    vectors=['v_CA21_906'],
    level='CSD'
)

Pattern 3: Caching Control

R:

# Disable cache for this query
data <- get_census(
  dataset = 'CA21',
  regions = list(CMA = "59933"),
  vectors = c("v_CA21_906"),
  level = 'CSD',
  use_cache = FALSE
)

# Clear all cache
remove_from_cancensus_cache()

Python:

# Disable cache for this query
data = pc.get_census(
    dataset='CA21',
    regions={'CMA': '59933'},
    vectors=['v_CA21_906'],
    level='CSD',
    use_cache=False
)

# Clear all cache
pc.clear_cache()

Key Differences to Remember

  1. Dictionary vs Named List

    R uses named lists: list(CMA = "59933")

    Python uses dictionaries: {'CMA': '59933'}

  2. Vector vs List

    R uses c(): c("v1", "v2")

    Python uses []: ['v1', 'v2']

  3. Boolean Capitalization

    R: TRUE, FALSE

    Python: True, False

  4. NULL vs None

    R: NULL

    Python: None

  5. Function Parameter Order

    find_census_vectors() has different parameter order:

    • R: find_census_vectors(query, dataset, ...)

    • Python: find_census_vectors(dataset, query, ...)

Performance Comparison

Based on validation testing, Python pycancensus is typically 2.7x faster than R cancensus for equivalent operations, primarily due to:

  • More efficient HTTP connection pooling

  • Optimized pandas data operations

  • Better caching implementation

Troubleshooting

Common Issues

Issue 1: Empty vector list causes API error

# ❌ This fails
data = pc.get_census(dataset='CA21', regions={'CSD': '123'}, vectors=[])

# ✅ Use None instead
data = pc.get_census(dataset='CA21', regions={'CSD': '123'}, vectors=None)

Issue 2: Function not found

Make sure you’ve imported pycancensus:

import pycancensus as pc
# Then use: pc.get_census(...)

Issue 3: API key not set

# Check if key is set
pc.show_api_key()

# Set key
pc.set_api_key("YOUR_KEY")

Getting Help

Complete Example

Here’s a complete analysis migrated from R to Python:

R Version:

library(cancensus)
library(dplyr)
library(ggplot2)
library(sf)

# Get data
toronto <- get_census(
  dataset = 'CA21',
  regions = list(CMA = "35535"),
  vectors = c("v_CA21_906"),
  level = 'CSD',
  geo_format = 'sf'
)

# Analyze
top_income <- toronto %>%
  filter(!is.na(v_CA21_906)) %>%
  top_n(10, v_CA21_906)

# Visualize
ggplot(top_income) +
  geom_sf(aes(fill = v_CA21_906)) +
  scale_fill_viridis_c() +
  labs(title = "Top 10 Highest Income Areas - Toronto CMA") +
  theme_minimal()

Python Version:

import pycancensus as pc
import matplotlib.pyplot as plt

# Get data
toronto = pc.get_census(
    dataset='CA21',
    regions={'CMA': '35535'},
    vectors=['v_CA21_906'],
    level='CSD',
    geo_format='sf'
)

# Analyze
top_income = (toronto
    .dropna(subset=['v_CA21_906'])
    .nlargest(10, 'v_CA21_906')
)

# Visualize
top_income.plot(
    column='v_CA21_906',
    cmap='viridis',
    legend=True
)
plt.title("Top 10 Highest Income Areas - Toronto CMA")
plt.axis('off')
plt.show()

Both versions produce identical results!

Further Reading