{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# This cell is added by sphinx-gallery\n# It can be customized to whatever you like\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Basic Census Data Access\n\nThis example demonstrates how to access Canadian Census data using pycancensus,\ncovering the essential functions for getting started with census data analysis.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up pycancensus\n\nFirst, we need to import pycancensus and set up our API key.\nYou can get a free API key at: https://censusmapper.ca/users/sign_up\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pycancensus as pc\nimport pandas as pd\n\n# Set your API key (you'll need to replace this with your actual key)\nimport os\napi_key = os.environ.get('CANCENSUS_API_KEY')\nif api_key:\n pc.set_api_key(api_key)\n print(\"API key configured\")\nelse:\n print(\"No API key - examples will show code structure\")\n print(\"Get your API key at: https://censusmapper.ca/users/sign_up\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring Available Datasets\n\nLet's start by exploring what Census datasets are available.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"Available Census Datasets:\")\ntry:\n datasets = pc.list_census_datasets()\n print(datasets)\nexcept Exception as e:\n print(f\"Error accessing datasets: {e}\")\n print(\"Make sure you have set your API key!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding Census Regions\n\nNext, let's explore the geographic regions available in the Census.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\nExploring Census Regions:\")\ntry:\n # Get regions for the 2021 Census\n regions = pc.list_census_regions(\"CA21\")\n print(f\"Found {len(regions)} regions in CA21 dataset\")\n print(\"\\nSample regions:\")\n print(regions.head())\n \n # Search for specific regions (Vancouver)\n print(\"\\nSearching for Vancouver regions:\")\n vancouver_regions = pc.search_census_regions(\"Vancouver\", \"CA21\")\n print(vancouver_regions[[\"region\", \"name\", \"level\", \"pop\"]].head())\n \nexcept Exception as e:\n print(f\"Error accessing regions: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Discovering Census Variables\n\nCensus data is organized into vectors (variables). Let's explore what's available.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\nExploring Census Variables:\")\ntry:\n # List available vectors\n vectors = pc.list_census_vectors(\"CA21\")\n print(f\"Found {len(vectors)} vectors in CA21 dataset\")\n print(\"\\nSample vectors:\")\n print(vectors[[\"vector\", \"label\", \"type\"]].head())\n \n # Search for population-related vectors\n print(\"\\nSearching for population vectors:\")\n pop_vectors = pc.search_census_vectors(\"population\", \"CA21\")\n print(pop_vectors[[\"vector\", \"label\", \"type\"]].head())\n \nexcept Exception as e:\n print(f\"Error accessing vectors: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Census Data\n\nNow let's retrieve actual census data for analysis.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\nRetrieving Census Data:\")\ntry:\n # Get population data for Vancouver CMA\n data = pc.get_census(\n dataset=\"CA21\",\n regions={\"CMA\": \"59933\"}, # Vancouver CMA\n vectors=[\"v_CA21_1\", \"v_CA21_2\"], # Population vectors\n level=\"CSD\" # Census Subdivision level\n )\n \n print(f\"Retrieved data shape: {data.shape}\")\n print(\"\\nSample data:\")\n print(data.head())\n \n # Basic analysis\n if not data.empty and 'v_CA21_1' in data.columns:\n total_pop = data['v_CA21_1'].sum()\n print(f\"\\nTotal population in Vancouver CMA: {total_pop:,}\")\n \nexcept Exception as e:\n print(f\"Error retrieving census data: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with Geographic Data\n\npycancensus can also retrieve geographic boundaries along with the data.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\nRetrieving Geographic Data:\")\ntry:\n # Get census data with geographic boundaries\n geo_data = pc.get_census(\n dataset=\"CA21\",\n regions={\"CMA\": \"59933\"}, # Vancouver CMA\n vectors=[\"v_CA21_1\"], # Population\n level=\"CSD\",\n geo_format=\"geopandas\"\n )\n \n print(f\"GeoDataFrame shape: {geo_data.shape}\")\n print(f\"Columns: {list(geo_data.columns)}\")\n if hasattr(geo_data, 'crs'):\n print(f\"Coordinate Reference System: {geo_data.crs}\")\n \n # Just the geometries\n geometries = pc.get_census_geometry(\n dataset=\"CA21\",\n regions={\"CMA\": \"59933\"},\n level=\"CSD\"\n )\n print(f\"\\nGeometries-only shape: {geometries.shape}\")\n \nexcept Exception as e:\n print(f\"Error retrieving geographic data: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vector Hierarchy Navigation\n\npycancensus provides tools to navigate the hierarchical structure of census variables.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\nVector Hierarchy Navigation:\")\ntry:\n # Find vectors using enhanced search\n income_vectors = pc.find_census_vectors(\"CA21\", \"income\")\n print(f\"Found {len(income_vectors)} income-related vectors\")\n \n # Navigate vector hierarchies using household income as example\n # This demonstrates a real hierarchy: main category -> income brackets -> sub-brackets\n income_parent = \"v_CA21_923\" # Household total income groups in 2020\n high_income_bracket = \"v_CA21_939\" # $100,000 and over bracket\n \n # Find children of main income vector (all income brackets)\n income_brackets = pc.child_census_vectors(income_parent, dataset=\"CA21\")\n print(f\"Income brackets under '{income_parent}': {len(income_brackets)} categories\")\n \n # Find grandchildren (sub-categories of high income bracket) \n high_income_subcats = pc.child_census_vectors(high_income_bracket, dataset=\"CA21\")\n print(f\"High-income sub-categories: {len(high_income_subcats)} levels\")\n \n # Find parent relationship (child -> parent navigation)\n parent_of_bracket = pc.parent_census_vectors(high_income_bracket, dataset=\"CA21\")\n if not parent_of_bracket.empty:\n print(f\"Parent of '{high_income_bracket}': {parent_of_bracket['vector'].iloc[0]}\")\n \nexcept Exception as e:\n print(f\"Error with vector operations: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extracting Vector Metadata\n\nThe label_vectors() function extracts metadata for census vectors\nfrom DataFrames returned by get_census().\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\nExtracting Vector Metadata:\")\ntry:\n # Get census data with vectors\n census_with_vectors = pc.get_census(\n dataset=\"CA21\",\n regions={\"PR\": \"59\"}, # British Columbia\n vectors=[\"v_CA21_1\", \"v_CA21_2\", \"v_CA21_3\"],\n level=\"PR\",\n labels=\"detailed\"\n )\n\n # Extract vector labels and metadata\n vector_labels = pc.label_vectors(census_with_vectors)\n\n print(\"Vector metadata extracted from census data:\")\n for vector_id, label in vector_labels.items():\n print(f\" {vector_id}: {label[:60]}...\")\n\nexcept Exception as e:\n print(f\"Error extracting vector metadata: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset Attribution\n\nGet proper attribution text for census datasets to comply with\nStatistics Canada Open Data License requirements.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\nDataset Attribution:\")\ntry:\n # Get attribution for a single dataset\n single_attribution = pc.dataset_attribution([\"CA21\"])\n print(f\"\\nCA21 Attribution:\\n{single_attribution}\")\n\n # Get combined attribution for multiple datasets\n multi_attribution = pc.dataset_attribution([\"CA16\", \"CA21\"])\n print(f\"\\nCombined Attribution (CA16 + CA21):\\n{multi_attribution}\")\n\nexcept Exception as e:\n print(f\"Error getting dataset attribution: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n\nThis example covered the basic workflow for accessing Canadian Census data:\n\n1. **Setup**: Import pycancensus and set your API key\n2. **Explore**: Discover available datasets, regions, and variables\n3. **Retrieve**: Get census data for your areas and variables of interest\n4. **Analyze**: Work with the data using pandas/geopandas workflows\n\nFor more advanced examples, see the other gallery examples and tutorials.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\n\" + \"=\"*50)\nprint(\"Basic Census Data Access Example Complete\")\nprint(\"=\"*50)\nprint(\"\\nNext steps:\")\nprint(\"1. Get your free API key at: https://censusmapper.ca/users/sign_up\")\nprint(\"2. Set your API key: pc.set_api_key('your_key_here')\") \nprint(\"3. Try running this example with real data!\")\nprint(\"4. Explore the other examples in the gallery\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.22" } }, "nbformat": 4, "nbformat_minor": 0 }