forerad

Utilities for collecting and analyzing with Citibike data in Python
Log | Files | Refs | README

README.md (1625B)


      1 # Forerad
      2 This repository is a collection of utilities for working with Citibike data. It allows you to easily download all of Citibike's ride history archives, transform them as you see fit, and throw them into a SQLite database for easy querying.
      3 
      4 This repository is what I use to build the SQLite database used in [Citibike Explorer](https://citibike.stevegattuso.me). It is also potentially useful if you don't feel like re-writing your own scraper to download, unzip, and load trip history archives into a `pd.DataFrame`.
      5 
      6 ## Installation and usage
      7 Clone the repository, cd into the directory, and run:
      8 
      9 ```bash
     10 $ python -m virtualenv .venv
     11 $ source .venv/bin/activate
     12 $ pip install -r ./requirements.txt
     13 ```
     14 
     15 Once requirements are installed, you can use `./bin/scraper` to download the trip archives individually or all in one swoop. See `./bin/scraper --help` for details.
     16 
     17 There is also `./bin/hourly-volume-rollup` which will parse through all available archives and roll up the trip data into an hourly timeseries. Note that this requires provisioning a sqlite database, which can be done by running `yoyo apply`.
     18 
     19 If you're just looking to load an archive into pandas, here's the code snippet you're looking for:
     20 
     21 ```python
     22 import forerad.scrapers.historical as historical
     23 
     24 archives = historical.HistoricalTripArchive.list_cached()
     25 df = archives[0].fetch_df()
     26 
     27 print(df)
     28 ```
     29 
     30 ## FAQ
     31 ### What's with the stupid name?
     32 I originally wanted to build a forecast of daily trip volume but ended up scaling back my ambitions (maybe just for now). `Fore` is for forecast, `rad` is for das Fahrrad, the German word for bike.