EVALUATING AND CHOOSING A MACHINE LEARNING ALGORITHM FOR CLASSIFYING ROAD SURFACE QUALITY DATA

ABSTRACT Considering the importance of roads to a community, stakeholders (Governments, Motorists etc) need up-to-date information about the state of roads for decision making . This problem inspired Vorgbe’s (2014) work in implementing a machine learning classifier that could accurately classify roads as “good”, “fair” or “bad”. This information can then be visualised on Google Maps. However, with his algorithm failing to accurately classify some roads, this project seeks to evaluate five classification algorithms to determine which one is best for classifying road surface quality data. To do this, we collected x, y, z acceleration and location data, extracted the desired features from it, performed a 10-fold cross-validation training on the data to choose the best model and then tested on a new set of examples to determine the model that accurately classifies the data. From the data available, the decision tree model produced the best performance with true positives of 97% accuracy for bad roads, 81% accuracy for fair roads and 93% accuracy for good roads. The overall accuracy on the test set is 92% with a precision of 92% and recall of 90%. This means that, this model is more likely to accurately predict a new data point as belonging to its true class. The other algorithms (Logistic Regression, Random Forests, Support Vector Machines and Nearest Neighbour) performed well when classifying the “good” and “bad” road data but instead classified the “fair” road data as “good” road.