spellbook

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs

commit 6056b2f4cf36a6c0e756942def2d1df47b1bbd24
parent 5c8240f4ebb0b8e91b64f4c697582b155f21e9fa
Author: Steve Gattuso <steve@stevegattuso.me>
Date:   Sun, 26 Feb 2023 21:19:59 +0100

add citibike methodology

Diffstat:
Acitibike-methodology.md | 42++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+), 0 deletions(-)

diff --git a/citibike-methodology.md b/citibike-methodology.md @@ -0,0 +1,42 @@ +# Citibike topography methodology + +This document outlines the methodology and data sources used in _[Visualizing the topography of Citibike](https://www.stevegattuso.me/2021/11/28/citibike-topography.html)_ + +## Data Sources +* Used [this](https://gist.github.com/stevenleeg/c9815da685ea0736f77557032b222d48) Python script to download all citibike stations +* Used [NYC Neighborhood Tabluation Area](https://www1.nyc.gov/site/planning/data-maps/open-data/census-download-metadata.page?tab=2) geography files. +* Used [MapPLUTO](https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page) data for household calculations. +* Fetched 2015-2019 ACS data and census tract geographies from the National Historical Geographic Information System [data portal](https://www.nhgis.org/). + +## Steps +1. Areas within 0.5km of a Citibike station + * Imported Citibike station CSV from Python script + * Reproject to New York/Long Island CRS + * Create a buffer of 0.5km around each station + * Dissolve all buffers into a single polygon + * Clip the polygon using the NTA polygons +2. Households served (within the 0.5km range) + * Imported MapPLUTO data + * Clip using the 0.5km station buffers + * Ran the `Basic statistics` operation on... + * Unclipped MapPLUTO data to get the total number of households + * Clipped MapPLUTO data to get the total number of households within 0.5km of a station + * Calculated percentages based on these values +3. Neighborhood station capacity + * Imported Citibike station CSV from Python script + * Ran `Join attributes by location (summary)` operation + * Summed up `capacity` column of each station per neighborhood + * Created a new column: `capacity_count / ($area * 100)` to generate `capacity_per_100sqkm` + * Visualized the column onto the NTA map +4. Neighborhood station capacity in NTAs below the poverty line + * Fetched and imported census tract geographies sourced from NHGIS + * Fetched and joined NHGIS 2015-2019 ACS median income per-household data + * Used NTA geography files + * Generate centroids of each polygon + * Run `Join attributes by location (summary)` operation to merge ACS data into NTA polygons + * Used the median of the median income field + * Filtered out NTAs below the poverty line of $35k + * Ran `Join attributes by location (summary)` to merge station data with NTA polygons + * Summed up `capacity` column + * Created a new column: `capacity_count / $area * 100` to generate `capacity_per_100sqkm` + * Visualized the column onto map along with station locations