Introduction In this mini-project, I analyze the fiscal characteristics of major US public transit systems, focusing on farebox recovery, unlinked passenger trips (UPT), and vehicle revenue miles (VRM) for 2022. The data is sourced from the National Transit Database and includes various transit agencies and modes of transportation.
Methodology The analysis is based on data from:
- 2022 Fare Revenue report
- Operating Expenses report
- Monthly Ridership data for 2022
Data Cleaning and Merging
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 3744 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Agency, City, State, NTD ID, Organization Type, Reporter Type, UZA...
dbl (2): Report Year, UACE Code
num (10): Primary UZA Population, Agency VOMS, Mode VOMS, Vehicle Operations...
lgl (7): Vehicle Operations Questionable, Vehicle Maintenance Questionable,...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning in inner_join(TRIPS, MILES, by = c("NTD ID", "Agency", "UZA Name", : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 97 of `x` matches multiple rows in `y`.
ℹ Row 97 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
Table Summarization
# A tibble: 6 × 6
`NTD ID` Agency metro_area Mode Total_UPT Total_VRM
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 00001 King County Seattle--Tacoma, WA DR 1326018 12860448
2 00001 King County Seattle--Tacoma, WA FB 400407 51236
3 00001 King County Seattle--Tacoma, WA MB 107967282 61632644
4 00001 King County Seattle--Tacoma, WA SR 1117605 180369
5 00001 King County Seattle--Tacoma, WA TB 9575043 2635705
6 00001 King County Seattle--Tacoma, WA VP 702709 4413910
Analysis
- Which Transit System Had the Most UPT in 2022? The transit system with the most Unlinked Passenger Trips (UPT) is:
# A tibble: 1 × 4
`NTD ID` Agency Mode Total_UPT
<chr> <chr> <chr> <dbl>
1 20008 MTA New York City Transit HR 1793073801
- Which Transit System Had the Highest Farebox Recovery?
# A tibble: 1 × 13
`NTD ID` Agency `UZA Name` Mode `3 Mode.x` month UPT `3 Mode.y` VRM
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <chr> <dbl>
1 50157 Butler… Cincinnat… CB Bus 2012-01-01 947 Bus 9182
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
# Farebox_Recovery <dbl>
- Which Transit System Has the Lowest Expenses per UPT?
# A tibble: 1 × 13
`NTD ID` Agency `UZA Name` Mode `3 Mode.x` month UPT `3 Mode.y` VRM
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <chr> <dbl>
1 60017 Centra… Oklahoma … VP Bus 2016-11-01 89 Bus 460
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
# Expenses_per_UPT <dbl>
- Which Transit System Has the Highest Fares per UPT?
# A tibble: 1 × 13
`NTD ID` Agency `UZA Name` Mode `3 Mode.x` month UPT `3 Mode.y` VRM
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <chr> <dbl>
1 10001 Rhode … Providenc… DR Bus 2020-05-01 0 Bus 90781
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
# Fares_per_UPT <dbl>
- Which Transit System Has the Lowest Expenses per VRM?
# A tibble: 1 × 13
`NTD ID` Agency `UZA Name` Mode `3 Mode.x` month UPT `3 Mode.y` VRM
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <chr> <dbl>
1 60017 Centra… Oklahoma … VP Bus 2016-11-01 89 Bus 460
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
# Expenses_per_VRM <dbl>
- Which Transit System Has the Highest Total Fares per VRM?
# A tibble: 1 × 13
`NTD ID` Agency `UZA Name` Mode `3 Mode.x` month UPT `3 Mode.y` VRM
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <chr> <dbl>
1 10001 Rhode … Providenc… DR Bus 2020-05-01 7350 Bus 0
# ℹ 4 more variables: `Agency Name` <chr>, `Total Fares` <dbl>, Expenses <dbl>,
# Fares_per_VRM <dbl>
Conclusion Based on the analysis, I believe, the most efficient transit system varies depending on which factor of efficiency is prioritized:
Farebox recovery: Butler County Regional Transit Authority (Commuter Bus) demonstrates the highest farebox recovery, meaning it operates with a high degree of financial self-sustainability. Operational Scale: MTA New York City Transit (Heavy Rail) handles the most UPT and VRM, making it highly efficient in serving a large population.