citibike-methodology.md (2519B)
1 # Citibike topography methodology 2 3 This document outlines the methodology and data sources used in _[Visualizing the topography of Citibike](https://www.stevegattuso.me/2021/11/28/citibike-topography.html)_ 4 5 ## Data Sources 6 * Used [this](https://gist.github.com/stevenleeg/c9815da685ea0736f77557032b222d48) Python script to download all citibike stations 7 * Used [NYC Neighborhood Tabluation Area](https://www1.nyc.gov/site/planning/data-maps/open-data/census-download-metadata.page?tab=2) geography files. 8 * Used [MapPLUTO](https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page) data for household calculations. 9 * Fetched 2015-2019 ACS data and census tract geographies from the National Historical Geographic Information System [data portal](https://www.nhgis.org/). 10 11 ## Steps 12 1. Areas within 0.5km of a Citibike station 13 * Imported Citibike station CSV from Python script 14 * Reproject to New York/Long Island CRS 15 * Create a buffer of 0.5km around each station 16 * Dissolve all buffers into a single polygon 17 * Clip the polygon using the NTA polygons 18 2. Households served (within the 0.5km range) 19 * Imported MapPLUTO data 20 * Clip using the 0.5km station buffers 21 * Ran the `Basic statistics` operation on... 22 * Unclipped MapPLUTO data to get the total number of households 23 * Clipped MapPLUTO data to get the total number of households within 0.5km of a station 24 * Calculated percentages based on these values 25 3. Neighborhood station capacity 26 * Imported Citibike station CSV from Python script 27 * Ran `Join attributes by location (summary)` operation 28 * Summed up `capacity` column of each station per neighborhood 29 * Created a new column: `capacity_count / ($area * 100)` to generate `capacity_per_100sqkm` 30 * Visualized the column onto the NTA map 31 4. Neighborhood station capacity in NTAs below the poverty line 32 * Fetched and imported census tract geographies sourced from NHGIS 33 * Fetched and joined NHGIS 2015-2019 ACS median income per-household data 34 * Used NTA geography files 35 * Generate centroids of each polygon 36 * Run `Join attributes by location (summary)` operation to merge ACS data into NTA polygons 37 * Used the median of the median income field 38 * Filtered out NTAs below the poverty line of $35k 39 * Ran `Join attributes by location (summary)` to merge station data with NTA polygons 40 * Summed up `capacity` column 41 * Created a new column: `capacity_count / $area * 100` to generate `capacity_per_100sqkm` 42 * Visualized the column onto map along with station locations