Exploring the Use of Machine Learning to Estimate Transit Ridership Based on Socio-Demographic Variables and Transportation System Characteristics

Sponsored by North Central Council of Governments (NCTCOG)

The increased availability of powerful computing resources, paired up with unprecedented access to data, has promoted significant advances in machine learning (ML) techniques in the last decade. ML techniques offer a promising approach to leverage abundant data in the characterization and solution of transportation problems. Tested applications include system and service planning, asset management, system operations, communication and information, business administration, and public safety and enforcement.

The focus of this project is to apply ML techniques to estimate transit ridership based on sociodemographic variables and the characteristics of the transportation system. The outcomes of selected ML techniques will be compared to the results of simpler statistical methods, such as regression, and to the estimates provided by traditional planning models.

The performance evaluation of selected models will involve comparing model ridership estimates at different levels (e.g., system-wide, route, stop, market segment, mode of access) to expanded data from an on-board survey conducted in 2014. If time permits, analyses will be repeated using present year automated-passenger-count (APC) data for validation, provided that adequate sources for socio-demographic data are available.

Read the NCTCOG report.