{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# This cell is added by sphinx-gallery\n# It can be customized to whatever you like\n%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Basic Census Data Access\n\nThis example demonstrates how to access Canadian Census data using pycancensus,\ncovering the essential functions for getting started with census data analysis.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setting up pycancensus\n\nFirst, we need to import pycancensus and set up our API key.\nYou can get a free API key at: https://censusmapper.ca/users/sign_up\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import pycancensus as pc\nimport pandas as pd\n\n# Set your API key (you'll need to replace this with your actual key)\nimport os\napi_key = os.environ.get('CANCENSUS_API_KEY')\nif api_key:\n    pc.set_api_key(api_key)\n    print(\"API key configured\")\nelse:\n    print(\"No API key - examples will show code structure\")\n    print(\"Get your API key at: https://censusmapper.ca/users/sign_up\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Exploring Available Datasets\n\nLet's start by exploring what Census datasets are available.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"Available Census Datasets:\")\ntry:\n    datasets = pc.list_census_datasets()\n    print(datasets)\nexcept Exception as e:\n    print(f\"Error accessing datasets: {e}\")\n    print(\"Make sure you have set your API key!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Finding Census Regions\n\nNext, let's explore the geographic regions available in the Census.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\\nExploring Census Regions:\")\ntry:\n    # Get regions for the 2021 Census\n    regions = pc.list_census_regions(\"CA21\")\n    print(f\"Found {len(regions)} regions in CA21 dataset\")\n    print(\"\\nSample regions:\")\n    print(regions.head())\n    \n    # Search for specific regions (Vancouver)\n    print(\"\\nSearching for Vancouver regions:\")\n    vancouver_regions = pc.search_census_regions(\"Vancouver\", \"CA21\")\n    print(vancouver_regions[[\"region\", \"name\", \"level\", \"pop\"]].head())\n    \nexcept Exception as e:\n    print(f\"Error accessing regions: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Discovering Census Variables\n\nCensus data is organized into vectors (variables). Let's explore what's available.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\\nExploring Census Variables:\")\ntry:\n    # List available vectors\n    vectors = pc.list_census_vectors(\"CA21\")\n    print(f\"Found {len(vectors)} vectors in CA21 dataset\")\n    print(\"\\nSample vectors:\")\n    print(vectors[[\"vector\", \"label\", \"type\"]].head())\n    \n    # Search for population-related vectors\n    print(\"\\nSearching for population vectors:\")\n    pop_vectors = pc.search_census_vectors(\"population\", \"CA21\")\n    print(pop_vectors[[\"vector\", \"label\", \"type\"]].head())\n    \nexcept Exception as e:\n    print(f\"Error accessing vectors: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Getting Census Data\n\nNow let's retrieve actual census data for analysis.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\\nRetrieving Census Data:\")\ntry:\n    # Get population data for Vancouver CMA\n    data = pc.get_census(\n        dataset=\"CA21\",\n        regions={\"CMA\": \"59933\"},  # Vancouver CMA\n        vectors=[\"v_CA21_1\", \"v_CA21_2\"],  # Population vectors\n        level=\"CSD\"  # Census Subdivision level\n    )\n    \n    print(f\"Retrieved data shape: {data.shape}\")\n    print(\"\\nSample data:\")\n    print(data.head())\n    \n    # Basic analysis\n    if not data.empty and 'v_CA21_1' in data.columns:\n        total_pop = data['v_CA21_1'].sum()\n        print(f\"\\nTotal population in Vancouver CMA: {total_pop:,}\")\n        \nexcept Exception as e:\n    print(f\"Error retrieving census data: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Working with Geographic Data\n\npycancensus can also retrieve geographic boundaries along with the data.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\\nRetrieving Geographic Data:\")\ntry:\n    # Get census data with geographic boundaries\n    geo_data = pc.get_census(\n        dataset=\"CA21\",\n        regions={\"CMA\": \"59933\"},  # Vancouver CMA\n        vectors=[\"v_CA21_1\"],  # Population\n        level=\"CSD\",\n        geo_format=\"geopandas\"\n    )\n    \n    print(f\"GeoDataFrame shape: {geo_data.shape}\")\n    print(f\"Columns: {list(geo_data.columns)}\")\n    if hasattr(geo_data, 'crs'):\n        print(f\"Coordinate Reference System: {geo_data.crs}\")\n    \n    # Just the geometries\n    geometries = pc.get_census_geometry(\n        dataset=\"CA21\",\n        regions={\"CMA\": \"59933\"},\n        level=\"CSD\"\n    )\n    print(f\"\\nGeometries-only shape: {geometries.shape}\")\n    \nexcept Exception as e:\n    print(f\"Error retrieving geographic data: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Vector Hierarchy Navigation\n\npycancensus provides tools to navigate the hierarchical structure of census variables.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\\nVector Hierarchy Navigation:\")\ntry:\n    # Find vectors using enhanced search\n    income_vectors = pc.find_census_vectors(\"CA21\", \"income\")\n    print(f\"Found {len(income_vectors)} income-related vectors\")\n    \n    # Navigate vector hierarchies using household income as example\n    # This demonstrates a real hierarchy: main category -> income brackets -> sub-brackets\n    income_parent = \"v_CA21_923\"  # Household total income groups in 2020\n    high_income_bracket = \"v_CA21_939\"  # $100,000 and over bracket\n    \n    # Find children of main income vector (all income brackets)\n    income_brackets = pc.child_census_vectors(income_parent, dataset=\"CA21\")\n    print(f\"Income brackets under '{income_parent}': {len(income_brackets)} categories\")\n    \n    # Find grandchildren (sub-categories of high income bracket)  \n    high_income_subcats = pc.child_census_vectors(high_income_bracket, dataset=\"CA21\")\n    print(f\"High-income sub-categories: {len(high_income_subcats)} levels\")\n    \n    # Find parent relationship (child -> parent navigation)\n    parent_of_bracket = pc.parent_census_vectors(high_income_bracket, dataset=\"CA21\")\n    if not parent_of_bracket.empty:\n        print(f\"Parent of '{high_income_bracket}': {parent_of_bracket['vector'].iloc[0]}\")\n    \nexcept Exception as e:\n    print(f\"Error with vector operations: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Extracting Vector Metadata\n\nThe label_vectors() function extracts metadata for census vectors\nfrom DataFrames returned by get_census().\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\\nExtracting Vector Metadata:\")\ntry:\n    # Get census data with vectors\n    census_with_vectors = pc.get_census(\n        dataset=\"CA21\",\n        regions={\"PR\": \"59\"},  # British Columbia\n        vectors=[\"v_CA21_1\", \"v_CA21_2\", \"v_CA21_3\"],\n        level=\"PR\",\n        labels=\"detailed\"\n    )\n\n    # Extract vector labels and metadata\n    vector_labels = pc.label_vectors(census_with_vectors)\n\n    print(\"Vector metadata extracted from census data:\")\n    for vector_id, label in vector_labels.items():\n        print(f\"  {vector_id}: {label[:60]}...\")\n\nexcept Exception as e:\n    print(f\"Error extracting vector metadata: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Dataset Attribution\n\nGet proper attribution text for census datasets to comply with\nStatistics Canada Open Data License requirements.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\\nDataset Attribution:\")\ntry:\n    # Get attribution for a single dataset\n    single_attribution = pc.dataset_attribution([\"CA21\"])\n    print(f\"\\nCA21 Attribution:\\n{single_attribution}\")\n\n    # Get combined attribution for multiple datasets\n    multi_attribution = pc.dataset_attribution([\"CA16\", \"CA21\"])\n    print(f\"\\nCombined Attribution (CA16 + CA21):\\n{multi_attribution}\")\n\nexcept Exception as e:\n    print(f\"Error getting dataset attribution: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n\nThis example covered the basic workflow for accessing Canadian Census data:\n\n1. **Setup**: Import pycancensus and set your API key\n2. **Explore**: Discover available datasets, regions, and variables\n3. **Retrieve**: Get census data for your areas and variables of interest\n4. **Analyze**: Work with the data using pandas/geopandas workflows\n\nFor more advanced examples, see the other gallery examples and tutorials.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(\"\\n\" + \"=\"*50)\nprint(\"Basic Census Data Access Example Complete\")\nprint(\"=\"*50)\nprint(\"\\nNext steps:\")\nprint(\"1. Get your free API key at: https://censusmapper.ca/users/sign_up\")\nprint(\"2. Set your API key: pc.set_api_key('your_key_here')\")  \nprint(\"3. Try running this example with real data!\")\nprint(\"4. Explore the other examples in the gallery\")"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.22"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}