Mini Project #01: Fiscal Characteristics of Major US Public Transit Systems

Author

Lavanya

Published

September 20, 2024

Introduction In this mini-project, I analyze the fiscal characteristics of major US public transit systems, focusing on farebox recovery, unlinked passenger trips (UPT), and vehicle revenue miles (VRM) for 2022. The data is sourced from the National Transit Database and includes various transit agencies and modes of transportation.

Methodology The analysis is based on data from:

  1. 2022 Fare Revenue report
  2. Operating Expenses report
  3. Monthly Ridership data for 2022

Data Cleaning and Merging

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 3744 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Agency, City, State, NTD ID, Organization Type, Reporter Type, UZA...
dbl  (2): Report Year, UACE Code
num (10): Primary UZA Population, Agency VOMS, Mode VOMS, Vehicle Operations...
lgl  (7): Vehicle Operations Questionable, Vehicle Maintenance Questionable,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning in inner_join(TRIPS, MILES, by = c("NTD ID", "Agency", "UZA Name", : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 97 of `x` matches multiple rows in `y`.
ℹ Row 97 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.

Table Summarization

# A tibble: 6 × 6
  `NTD ID` Agency      metro_area          Mode  Total_UPT Total_VRM
  <chr>    <chr>       <chr>               <chr>     <dbl>     <dbl>
1 00001    King County Seattle--Tacoma, WA DR      1326018  12860448
2 00001    King County Seattle--Tacoma, WA FB       400407     51236
3 00001    King County Seattle--Tacoma, WA MB    107967282  61632644
4 00001    King County Seattle--Tacoma, WA SR      1117605    180369
5 00001    King County Seattle--Tacoma, WA TB      9575043   2635705
6 00001    King County Seattle--Tacoma, WA VP       702709   4413910

Analysis

  1. Which Transit System Had the Most UPT in 2022? The transit system with the most Unlinked Passenger Trips (UPT) is:
# A tibble: 1 × 4
  `NTD ID` Agency                    Mode   Total_UPT
  <chr>    <chr>                     <chr>      <dbl>
1 20008    MTA New York City Transit HR    1793073801
  1. Which Transit System Had the Highest Farebox Recovery?
# A tibble: 1 × 13
  `NTD ID` Agency  `UZA Name` Mode  `3 Mode.x` month        UPT `3 Mode.y`   VRM
  <chr>    <chr>   <chr>      <chr> <chr>      <date>     <dbl> <chr>      <dbl>
1 50157    Butler… Cincinnat… CB    Bus        2012-01-01   947 Bus         9182
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
#   Farebox_Recovery <dbl>
  1. Which Transit System Has the Lowest Expenses per UPT?
# A tibble: 1 × 13
  `NTD ID` Agency  `UZA Name` Mode  `3 Mode.x` month        UPT `3 Mode.y`   VRM
  <chr>    <chr>   <chr>      <chr> <chr>      <date>     <dbl> <chr>      <dbl>
1 60017    Centra… Oklahoma … VP    Bus        2016-11-01    89 Bus          460
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
#   Expenses_per_UPT <dbl>
  1. Which Transit System Has the Highest Fares per UPT?
# A tibble: 1 × 13
  `NTD ID` Agency  `UZA Name` Mode  `3 Mode.x` month        UPT `3 Mode.y`   VRM
  <chr>    <chr>   <chr>      <chr> <chr>      <date>     <dbl> <chr>      <dbl>
1 10001    Rhode … Providenc… DR    Bus        2020-05-01     0 Bus        90781
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
#   Fares_per_UPT <dbl>
  1. Which Transit System Has the Lowest Expenses per VRM?
# A tibble: 1 × 13
  `NTD ID` Agency  `UZA Name` Mode  `3 Mode.x` month        UPT `3 Mode.y`   VRM
  <chr>    <chr>   <chr>      <chr> <chr>      <date>     <dbl> <chr>      <dbl>
1 60017    Centra… Oklahoma … VP    Bus        2016-11-01    89 Bus          460
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
#   Expenses_per_VRM <dbl>
  1. Which Transit System Has the Highest Total Fares per VRM?
# A tibble: 1 × 13
  `NTD ID` Agency  `UZA Name` Mode  `3 Mode.x` month        UPT `3 Mode.y`   VRM
  <chr>    <chr>   <chr>      <chr> <chr>      <date>     <dbl> <chr>      <dbl>
1 10001    Rhode … Providenc… DR    Bus        2020-05-01  7350 Bus            0
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
#   Fares_per_VRM <dbl>

Conclusion Based on the analysis, I believe, the most efficient transit system varies depending on which factor of efficiency is prioritized:

Farebox recovery: Butler County Regional Transit Authority (Commuter Bus) demonstrates the highest farebox recovery, meaning it operates with a high degree of financial self-sustainability. Operational Scale: MTA New York City Transit (Heavy Rail) handles the most UPT and VRM, making it highly efficient in serving a large population.