Connected vehicles (CVs) can exchange messages containing location and other safety-related information with other vehicles and with devices affixed to roadside infrastructure. While the main purpose of vehicle connectivity is to enhance safety, the data generated by CVs has an enormous potential to support transportation planning and operations. However, handling the vast volume of data produced by CVs presents considerable challenges for researchers in the transportation domain.
This research examines a case study of using HIVE to facilitate CV data analysis based on the largest CV data set publicly released to date by characterizing the data analysis tasks that are expected to enable transportation planning research, and investigating several approaches to increase the corresponding query efficiency and throughput. This study compares the use of HIVE in conjunction with the MapReduce and Spark programming frameworks, analyzes its performance using different data storage formats, and documents potential use cases.