# Prediction of aircraft surface roughness after coating removal based on optical image and deep learning

### Experimental environment and configuration

Table 2 shows the experimental environment of this article. Batch size = 32, max epoch = 100 and the weight of the last epoch is taken as the training result of the model. When using the Adam optimization algorithm, learning rate = 0.001, momentum = 0.9.

### Comparative experiences of different models

In this paper, the prediction performance of three SSEResNet regression models on three datasets of different sizes using a simple gradient descent (GD) optimization algorithm is first compared. The experimental conditions are the original dataset with no data enhancement, a fixed learning rate of 0.0025, 100 epochs, and other identical parameter configurations. In this article, the mean squared error (MSE) loss is used to replace the previous cross-entropy loss used for classification tasks, as an index for evaluating experience. MSE is suitable for regression tasks and is calculated by Eq. (5). Where thereI represents the true value and pI represents the predicted value. Table 3 shows the experimental results of different SSEResNet regression models on different datasets.

$$MSE = frac{1}{n}sumlimits_{i = 1}^{n} {left( {p_{i} – y_{i} } right)^{2} }$$

(5)

From the experimental data in Table 3, it can be seen that the model should fit the appropriately sized dataset to achieve good results, and the deeper the network layer, the larger the dataset is for the model. This is because the shallow model has insufficient feature extraction and limited image processing capability on large datasets, while the deep model is easy to overfit on small datasets. Considering the training time and prediction performance of the model, the SSEResNet101 regression model and the 8000-image dataset were selected in this paper for the later comparison experiment with other models.

In this article, four optimization methods are compared, then the SSEResNet101 model is compared to seven other CNN backbones. Using the SSEResNet101 regression model, the Adam optimization method is tested under the same conditions as the other three optimization methods commonly used in deep learning, which are SGD, Momentum, and RMSprop. The experimental conditions are a fixed learning rate of 0.0025, 100 epochs and other identical parameter configurations. MSE loss, mean absolute error (MAE) and R-Square (R2) are selected as evaluation indices. Equations (6) and (7) are the calculation methods of MAE and R2, respectively. Where (overline{y}_{i}) is the mean of the label values. When the predicted value equals the label value, MAE is 0, and the larger the error, the larger the MAE value. The range of R-Squared values ​​is [0,1]. If the result is 0, the model-fitting effect is poor; if the result is 1, the model is fully fitted. The larger the R-Squared, the better the model fitting effect. Table 4 shows the experimental results of different optimization methods.

$$MAE = frac{1}{n}sumlimits_{i = 1}^{n} {left| {p_{i} – y_{i} } right|}$$

(6)

$$R2 = 1 – frac{{sumlimits_{i} {(p_{i} – y_{i} )^{2} } }}{{sumlimits_{i} {(overline{ y}_{i} – y_{i} )^{2} } }}$$

(seven)

The experimental data in Table 4 shows that Adam’s optimization method has better performance than the other three optimization methods with and without data enhancement. Compared with Momentum, RMSprop and the traditional SGD algorithm, Adam incorporates the advantages of Momentum and RMSprop. Among them, the advantage of Momentum is that it can speed up the learning of parameters with the same gradient direction and reduce the updating of parameters with the change of gradient direction, so that the parameters in the same direction can converge quickly. RMSprop is an adaptive learning rate optimization algorithm. The advantage of RMSprop is that at the beginning of training, the learning rate is large, which can accelerate the convergence of the model, while in the later training phase, the learning rate is low, which is beneficial for removing oscillation from the model and avoiding skipping the optimal solution. Therefore, we use Adam’s optimization method to conduct a comparative experiment between SSEResNet101 and seven other CNN core networks. The experimental conditions are a fixed learning rate of 0.0025, 100 epochs and other identical parameter configurations. Table 5 shows the experimental results of the regression prediction models.

The experimental data in Table 5 shows that the MSE and MAE loss values ​​of the model are reduced after the data enhancement, indicating that the data enhancement operation effectively improves the model performance. In effect, the data enhancement operation generates many similar but different training samples by making a series of random changes to the training images, thus enlarging the scale of the training data set. Moreover, these random changes make the model less dependent on certain attributes in the training samples, thus improving the generalizability of the model. Compared with other CNN models, the MSE and MAE values ​​of our model are the smallest and the R2 value is the largest. After using data enhancement, the MSE of our model is only 0.0285, 0.0097 less than ResNet101 and 0.0032 less than SEResNet101. Meanwhile, SEResNet101 was also 0.0065 smaller than ResNet101. These comparison results show that the SE module and the CSPNet reinforcement module play an important role in improving the prediction ability of the model. Indeed, the SE module exploits the correlation of features between channels and the CSPNet reinforcement module helps to extract deep semantic information. After feature fusion and mapping with shallow semantic information, richer semantic information can be obtained. Under the joint action of these two modules, the model can better extract the detailed features from the optical images, and then learn the mapping relationship between these features and the surface roughness.

### Influence of learning rate decay strategy on model training and prediction effect

In this paper, we also investigate the effect of different learning rate decay strategies on the model during learning. The MSE loss curves of the validation set, which is trained on the 8000-frame dataset using StepLR, MultiStepLR, CosineAnnealingLR and CosineAnnealing with hot restart, are shown in Fig. 6.

The experimental results of FIG. 6 show that the learning rate mitigation method of CosineAnnealing with warm restart has the best convergence effect, and the MSE loss is the smallest. The CosineAnnealingLR method has the second best training effect and StepLR has the worst training effect. CosineAnnealing with hot restart can cause the learning rate to drop to a certain value, hot restart, roll back to the original value, then cycle down again. Such a learning rate adjustment method can cause the model that converges to the local optimal solution to jump out of the local optimal solution and continue updating the model until the model reaches the solution global optimum.

To more intuitively show the effect of surface roughness prediction based on optical images and deep learning regression models, we plot the test set validation results as a plot of points from the predicted values ​​of the regression model and the true label values, as shown in Fig. 7.

The experimental results of FIG. 7 show that the surface roughness predicted by the regression model for optical images is close to the actual value, which indicates that the regression model we designed has a good prediction effect and can directly and accurately predict the surface roughness of optical images. In particular, the prediction effect of the learning rate attenuation method of CosineAnnealing with hot restart is the best, its MAE test value is 0.245μm, and the surface roughness prediction result is more consistent with the actual value.