Migration Guide: R to Python
Complete guide for R cancensus users migrating to Python pycancensus.
Quick Start
Installation
R:
install.packages("cancensus")
library(cancensus)
Python:
pip install pycancensus
import pycancensus as pc
API Key Setup
R:
set_cancensus_api_key("YOUR_API_KEY", install = TRUE)
Python:
pc.set_api_key("YOUR_API_KEY", install=True)
# Or: export CANCENSUS_API_KEY="YOUR_API_KEY"
Function Equivalence
All Core Functions Available
R Function |
Python Function |
Equivalence |
|---|---|---|
|
|
✅ 100% |
|
|
✅ 100% |
|
|
✅ 100% |
|
|
✅ 100% |
|
|
✅ 100% |
|
|
✅ 100% |
|
|
✅ 100% |
|
|
✅ 100% |
|
|
✅ 100% |
|
|
✅ 100% |
Syntax Conversion
Core Syntax Differences
R Syntax |
Python Syntax |
|---|---|
|
|
|
|
|
|
|
|
Side-by-Side Examples
Example 1: Basic Data Retrieval
R:
library(cancensus)
census_data <- get_census(
dataset = 'CA21',
regions = list(CMA = "59933"),
vectors = c("v_CA21_906"),
level = 'CSD'
)
Python:
import pycancensus as pc
census_data = pc.get_census(
dataset='CA21',
regions={'CMA': '59933'},
vectors=['v_CA21_906'],
level='CSD'
)
Example 2: With Geography
R:
census_data <- get_census(
dataset = 'CA21',
regions = list(CMA = "35535"),
vectors = c("v_CA21_906"),
level = 'CSD',
geo_format = 'sf'
)
Python:
census_data = pc.get_census(
dataset='CA21',
regions={'CMA': '35535'},
vectors=['v_CA21_906'],
level='CSD',
geo_format='sf'
)
Example 3: Search Vectors
R:
income_vectors <- search_census_vectors("income", "CA21")
Python:
income_vectors = pc.search_census_vectors("income", "CA21")
Example 4: List Datasets
R:
datasets <- list_census_datasets()
Python:
datasets = pc.list_census_datasets()
Return Type Conversions
Data Structures
R Type |
Python Type |
Notes |
|---|---|---|
|
|
Direct equivalent |
|
|
Same spatial data |
|
|
Direct equivalent |
|
|
Direct equivalent |
Working with Results
R:
# Filter data
filtered <- census_data %>%
filter(Population > 50000)
# Select columns
selected <- census_data %>%
select(GeoUID, Population)
Python:
# Filter data
filtered = census_data[census_data['Population'] > 50000]
# Select columns
selected = census_data[['GeoUID', 'Population']]
Visualization Migration
Mapping
R (using ggplot2 + sf):
library(ggplot2)
library(sf)
ggplot(census_data) +
geom_sf(aes(fill = v_CA21_906)) +
scale_fill_viridis_c() +
theme_minimal()
Python (using matplotlib + geopandas):
import matplotlib.pyplot as plt
census_data.plot(
column='v_CA21_906',
cmap='viridis',
legend=True
)
plt.show()
Python (using plotly for interactive):
import plotly.express as px
fig = px.choropleth_mapbox(
census_data,
geojson=census_data.geometry,
locations=census_data.index,
color='v_CA21_906',
mapbox_style='carto-positron'
)
fig.show()
Charts
R (ggplot2):
ggplot(census_data, aes(x = `Region Name`, y = Population)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45))
Python (matplotlib):
census_data.plot.bar(x='Region Name', y='Population')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Python (plotly):
import plotly.express as px
fig = px.bar(census_data, x='Region Name', y='Population')
fig.show()
Common Migration Patterns
Pattern 1: Data Pipeline
R:
library(dplyr)
library(cancensus)
result <- get_census(
dataset = 'CA21',
regions = list(CMA = "35535"),
vectors = c("v_CA21_906"),
level = 'CSD'
) %>%
filter(Population > 50000) %>%
arrange(desc(v_CA21_906))
Python:
import pycancensus as pc
result = (pc.get_census(
dataset='CA21',
regions={'CMA': '35535'},
vectors=['v_CA21_906'],
level='CSD'
)
.query('Population > 50000')
.sort_values('v_CA21_906', ascending=False)
)
Pattern 2: Multiple Regions
R:
regions <- list(
CMA = c("59933", "35535", "24462")
)
data <- get_census(
dataset = 'CA21',
regions = regions,
vectors = c("v_CA21_906"),
level = 'CSD'
)
Python:
regions = {
'CMA': ['59933', '35535', '24462']
}
data = pc.get_census(
dataset='CA21',
regions=regions,
vectors=['v_CA21_906'],
level='CSD'
)
Pattern 3: Caching Control
R:
# Disable cache for this query
data <- get_census(
dataset = 'CA21',
regions = list(CMA = "59933"),
vectors = c("v_CA21_906"),
level = 'CSD',
use_cache = FALSE
)
# Clear all cache
remove_from_cancensus_cache()
Python:
# Disable cache for this query
data = pc.get_census(
dataset='CA21',
regions={'CMA': '59933'},
vectors=['v_CA21_906'],
level='CSD',
use_cache=False
)
# Clear all cache
pc.clear_cache()
Key Differences to Remember
Dictionary vs Named List
R uses named lists:
list(CMA = "59933")Python uses dictionaries:
{'CMA': '59933'}Vector vs List
R uses
c():c("v1", "v2")Python uses
[]:['v1', 'v2']Boolean Capitalization
R:
TRUE,FALSEPython:
True,FalseNULL vs None
R:
NULLPython:
NoneFunction Parameter Order
find_census_vectors()has different parameter order:R:
find_census_vectors(query, dataset, ...)Python:
find_census_vectors(dataset, query, ...)
Performance Comparison
Based on validation testing, Python pycancensus is typically 2.7x faster than R cancensus for equivalent operations, primarily due to:
More efficient HTTP connection pooling
Optimized pandas data operations
Better caching implementation
Troubleshooting
Common Issues
Issue 1: Empty vector list causes API error
# ❌ This fails
data = pc.get_census(dataset='CA21', regions={'CSD': '123'}, vectors=[])
# ✅ Use None instead
data = pc.get_census(dataset='CA21', regions={'CSD': '123'}, vectors=None)
Issue 2: Function not found
Make sure you’ve imported pycancensus:
import pycancensus as pc
# Then use: pc.get_census(...)
Issue 3: API key not set
# Check if key is set
pc.show_api_key()
# Set key
pc.set_api_key("YOUR_KEY")
Getting Help
Documentation: https://pycancensus.readthedocs.io/
Validation Results: See R Equivalence Validation
GitHub Issues: https://github.com/dshkol/pycancensus/issues
R cancensus docs: https://mountainmath.github.io/cancensus/
Complete Example
Here’s a complete analysis migrated from R to Python:
R Version:
library(cancensus)
library(dplyr)
library(ggplot2)
library(sf)
# Get data
toronto <- get_census(
dataset = 'CA21',
regions = list(CMA = "35535"),
vectors = c("v_CA21_906"),
level = 'CSD',
geo_format = 'sf'
)
# Analyze
top_income <- toronto %>%
filter(!is.na(v_CA21_906)) %>%
top_n(10, v_CA21_906)
# Visualize
ggplot(top_income) +
geom_sf(aes(fill = v_CA21_906)) +
scale_fill_viridis_c() +
labs(title = "Top 10 Highest Income Areas - Toronto CMA") +
theme_minimal()
Python Version:
import pycancensus as pc
import matplotlib.pyplot as plt
# Get data
toronto = pc.get_census(
dataset='CA21',
regions={'CMA': '35535'},
vectors=['v_CA21_906'],
level='CSD',
geo_format='sf'
)
# Analyze
top_income = (toronto
.dropna(subset=['v_CA21_906'])
.nlargest(10, 'v_CA21_906')
)
# Visualize
top_income.plot(
column='v_CA21_906',
cmap='viridis',
legend=True
)
plt.title("Top 10 Highest Income Areas - Toronto CMA")
plt.axis('off')
plt.show()
Both versions produce identical results!
Further Reading
R Equivalence Validation - See 96% validation pass rate with 24 examples
pycancensus Documentation - Package overview
Tutorials - Step-by-step tutorials
Example Gallery - Gallery of examples