3D Scan Campaign Classification with Representative Training Scan Selection

Abstract Point cloud classification has been shown to effectively classify points in 3D scans, and can accelerate manual tasks like the removal of unwanted points from cultural heritage scans. However, a classifier’s performance depends on which classifier and feature set is used, and choosing these is difficult since previous approaches may not generalise to new domains. Furthermore, when choosing training scans for campaign-based classification, it is important to identify a descriptive set of scans that represent the rest of the campaign. However, this task is increasingly onerous for large and diverse campaigns, and randomly selecting scans does not guarantee a descriptive training set. To address these challenges, a framework including three classifiers (Random Forest (RF), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP)) and various point features and feature selection methods was developed. The framework also includes a proposed automatic representative scan selection method, which uses segmentation and clustering to identify balanced, similar or distinct training scans. The framework was evaluated on four labelled datasets, including two cultural heritage campaigns, to compare the speed and accuracy of the implemented classifiers and feature sets, and to determine if the proposed selection method identifies scans that yield a more accurate classifier than random selection. It was found that the RF, paired with a complete multi-scale feature set including covariance, geometric and height-based features, consistently achieved the highest overall accuracy on the four datasets. However, the other classifiers and reduced sets of selected features achieved similar accuracy and, in some cases, greatly reduced training and prediction times. It was also found that the proposed training scan selection method can, on particularly diverse campaigns, yield a more accurate classifier than random selection. However, for homogeneous campaigns where variations to the training set have limited impact, the method is less applicable. Furthermore, it is dependent on segmentation and clustering output, which require campaign-specific parameter tuning and may be imprecise.