{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# This cell is added by sphinx-gallery\n# It can be customized to whatever you like\n%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Geographic Analysis with Census Data\n\nThis example demonstrates how to work with geographic census data,\nincluding creating maps and performing spatial analysis.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import required libraries\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pycancensus as pc\nimport matplotlib.pyplot as plt\nimport pandas as pd\n\n# Set up the plotting style\nplt.style.use('default')\nplt.rcParams['figure.figsize'] = (12, 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Geographic Census Data\n\nLet's retrieve census data with geographic boundaries for mapping.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"Retrieving geographic census data...\")\n\ntry:\n # Get population data with geometries for Vancouver CMA\n geo_data = pc.get_census(\n dataset=\"CA21\",\n regions={\"CMA\": \"59933\"}, # Vancouver CMA\n vectors=[\"v_CA21_1\", \"v_CA21_434\"], # Population and median income\n level=\"CT\", # Census Tract level for detailed analysis\n geo_format=\"geopandas\"\n )\n \n print(f\"Retrieved {len(geo_data)} census tracts\")\n print(f\"Columns: {list(geo_data.columns)}\")\n \nexcept Exception as e:\n print(f\"Error retrieving data: {e}\")\n print(\"Creating sample data for demonstration...\")\n \n # Create sample data for demonstration when API is not available\n import numpy as np\n from shapely.geometry import Point\n import geopandas as gpd\n \n # Sample coordinates around Vancouver area\n n_points = 50\n np.random.seed(42)\n lons = np.random.uniform(-123.3, -122.9, n_points)\n lats = np.random.uniform(49.15, 49.35, n_points)\n \n geo_data = gpd.GeoDataFrame({\n 'GeoUID': [f'59933{i:03d}' for i in range(n_points)],\n 'name': [f'Census Tract {i}' for i in range(n_points)],\n 'v_CA21_1': np.random.randint(1000, 8000, n_points), # Population\n 'v_CA21_434': np.random.randint(30000, 120000, n_points), # Median income\n 'geometry': [Point(lon, lat) for lon, lat in zip(lons, lats)]\n }, crs='EPSG:4326')\n \n print(\"Using sample data for demonstration\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating a Basic Map\n\nLet's create a simple map showing population distribution.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig, ax = plt.subplots(1, 1, figsize=(12, 10))\n\n# Create choropleth map of population\nif 'v_CA21_1' in geo_data.columns:\n geo_data.plot(\n column='v_CA21_1',\n cmap='YlOrRd',\n legend=True,\n ax=ax,\n legend_kwds={'label': 'Population', 'shrink': 0.8}\n )\n\nax.set_title('Population Distribution by Census Tract\\nVancouver CMA, 2021', \n fontsize=16, pad=20)\nax.set_xlabel('Longitude')\nax.set_ylabel('Latitude')\n\n# Remove axes ticks for cleaner look \nax.tick_params(labelsize=10)\nplt.tight_layout()\nplt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multi-variable Analysis\n\nLet's compare two variables: population and median income.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if 'v_CA21_1' in geo_data.columns and 'v_CA21_434' in geo_data.columns:\n fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 7))\n \n # Population map\n geo_data.plot(\n column='v_CA21_1',\n cmap='YlOrRd',\n legend=True,\n ax=ax1,\n legend_kwds={'label': 'Population', 'shrink': 0.8}\n )\n ax1.set_title('Population by Census Tract')\n ax1.axis('off')\n \n # Median income map\n geo_data.plot(\n column='v_CA21_434',\n cmap='YlGnBu', \n legend=True,\n ax=ax2,\n legend_kwds={'label': 'Median Income ($)', 'shrink': 0.8}\n )\n ax2.set_title('Median Income by Census Tract')\n ax2.axis('off')\n \n plt.suptitle('Vancouver CMA: Population vs Income Distribution', \n fontsize=16, y=1.02)\n plt.tight_layout()\n plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistical Analysis\n\nLet's perform some basic statistical analysis on our geographic data.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if 'v_CA21_1' in geo_data.columns and 'v_CA21_434' in geo_data.columns:\n print(\"\\nStatistical Summary:\")\n print(\"=\"*50)\n \n # Population statistics\n pop_stats = geo_data['v_CA21_1'].describe()\n print(\"Population Statistics:\")\n print(f\" Mean: {pop_stats['mean']:.0f}\")\n print(f\" Median: {pop_stats['50%']:.0f}\")\n print(f\" Std Dev: {pop_stats['std']:.0f}\")\n print(f\" Range: {pop_stats['min']:.0f} - {pop_stats['max']:.0f}\")\n \n # Income statistics\n income_stats = geo_data['v_CA21_434'].describe()\n print(f\"\\nMedian Income Statistics:\")\n print(f\" Mean: ${income_stats['mean']:,.0f}\")\n print(f\" Median: ${income_stats['50%']:,.0f}\")\n print(f\" Std Dev: ${income_stats['std']:,.0f}\")\n print(f\" Range: ${income_stats['min']:,.0f} - ${income_stats['max']:,.0f}\")\n \n # Correlation analysis\n correlation = geo_data['v_CA21_1'].corr(geo_data['v_CA21_434'])\n print(f\"\\nCorrelation between Population and Income: {correlation:.3f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scatter Plot Analysis\n\nLet's visualize the relationship between population and income.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if 'v_CA21_1' in geo_data.columns and 'v_CA21_434' in geo_data.columns:\n fig, ax = plt.subplots(1, 1, figsize=(10, 8))\n \n scatter = ax.scatter(\n geo_data['v_CA21_1'], \n geo_data['v_CA21_434'],\n c=geo_data['v_CA21_1'],\n cmap='viridis',\n alpha=0.7,\n s=60\n )\n \n ax.set_xlabel('Population')\n ax.set_ylabel('Median Income ($)')\n ax.set_title('Population vs Median Income\\nVancouver CMA Census Tracts')\n \n # Add colorbar\n cbar = plt.colorbar(scatter, ax=ax)\n cbar.set_label('Population')\n \n # Add trend line\n z = np.polyfit(geo_data['v_CA21_1'], geo_data['v_CA21_434'], 1)\n p = np.poly1d(z)\n ax.plot(geo_data['v_CA21_1'], p(geo_data['v_CA21_1']), \n \"r--\", alpha=0.8, linewidth=2)\n \n plt.tight_layout()\n plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with Different Geography Levels\n\nCensus data is available at different geographic levels. Let's compare a few.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\nComparing Geographic Levels:\")\nprint(\"=\"*40)\n\nlevels_to_try = [\"CMA\", \"CSD\", \"CT\"]\nlevel_names = [\"Metro Area\", \"Municipality\", \"Census Tract\"]\n\nfor level, name in zip(levels_to_try, level_names):\n try:\n # Get count of regions at each level\n regions = pc.list_census_regions(\"CA21\", level=level)\n print(f\"{name} ({level}): {len(regions)} regions\")\n except:\n print(f\"{name} ({level}): Unable to retrieve\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Advanced Mapping with Folium\n\nFor interactive maps, we can use folium (when available).\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "try:\n import folium\n from folium import plugins\n \n print(\"\\nCreating interactive map with Folium...\")\n \n # Create base map centered on Vancouver\n center_lat = geo_data.geometry.centroid.y.mean()\n center_lon = geo_data.geometry.centroid.x.mean()\n \n m = folium.Map(\n location=[center_lat, center_lon],\n zoom_start=10,\n tiles='OpenStreetMap'\n )\n \n # Add choropleth layer\n if 'v_CA21_1' in geo_data.columns:\n folium.Choropleth(\n geo_data=geo_data,\n data=geo_data,\n columns=['GeoUID', 'v_CA21_1'],\n key_on='feature.properties.GeoUID',\n fill_color='YlOrRd',\n fill_opacity=0.7,\n line_opacity=0.2,\n legend_name='Population'\n ).add_to(m)\n \n print(\"Interactive map created successfully!\")\n print(\"(Map object created but not displayed in static documentation)\")\n \nexcept ImportError:\n print(\"Folium not available - install with: pip install folium\")\nexcept Exception as e:\n print(f\"Error creating interactive map: {e}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary and Next Steps\n\nThis example demonstrated key geographic analysis capabilities:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(\"\\n\" + \"=\"*60)\nprint(\"Geographic Analysis Example Complete\")\nprint(\"=\"*60)\n\nprint(\"\\nWhat we covered:\")\nprint(\"\u2022 Retrieving census data with geographic boundaries\")\nprint(\"\u2022 Creating choropleth maps with matplotlib\")\nprint(\"\u2022 Multi-variable geographic visualization\")\nprint(\"\u2022 Statistical analysis of spatial data\")\nprint(\"\u2022 Scatter plot analysis of geographic variables\")\nprint(\"\u2022 Working with different geographic levels\")\nprint(\"\u2022 Interactive mapping with Folium\")\n\nprint(\"\\nNext steps for your analysis:\")\nprint(\"1. Experiment with different census variables\")\nprint(\"2. Try different geographic levels (CMA, CSD, CT, DA)\")\nprint(\"3. Combine multiple datasets for temporal analysis\")\nprint(\"4. Use spatial analysis tools from geopandas\")\nprint(\"5. Create interactive dashboards with your maps\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional Resources\n\nFor more advanced geographic analysis, consider exploring:\n\n- **geopandas**: Advanced spatial operations and analysis\n- **folium**: Interactive web maps\n- **plotly**: Interactive plotting and dashboards \n- **contextily**: Adding basemaps to static plots\n- **rasterio**: Working with raster data\n- **pysal**: Spatial analysis library\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.22" } }, "nbformat": 4, "nbformat_minor": 0 }