### I. Introduction

### II. Methods

### 1. Dataset and Preprocessing

### 2. Model Architecture

### 3. Reducing the Effect of Class Imbalance

### 4. Data Augmentation

• Transposition: the rows and columns of the image were swapped with a probability of 0.5;

• Vertical and horizontal flip: each was applied independently with a probability of 0.5;

• Random brightness and contrast change: both were carried out independently with a 20% limit in the positive and negative direction and applied with a probability of 0.75;

• Blur: applied with a probability of 0.7 using either motion blur, median blur, Gaussian blur, or Gaussian noise;

• Distortion: applied with a probability of 0.7 using optical distortion, grid distortion, or elastic distortion;

• Random HSV (hue, saturation, and value) shift: each with a probability of 0.5; with a hue shift limit of 10, a saturation shift limit of 20, and a value shift limit of 10 in both the positive and negative direction;

• Random shifting: applied with a probability of 0.85 and a shift limit of 0.1 (in units of image length);

• Random scaling: with a probability of 0.85, rescaled with a random scale factor between 0.9 and 1.1; and

• Random rotation: with a probability of 0.85, the image was rotated by a random angle (±15°).

### 5. Training and Validation

^{−5}. Five-fold cross-validation was also used to validate the model to obtain more reliable metrics of its performance [15]. The values in Section III (Results) reflect the average performance of the model across the folds on the validation set of each fold.

### 6. Metrics

*I*is the indicator function and

*c*(

*x*) and

*ĉ*(

*x*) are the true and predicted classes, respectively, for the validation sample

*x*; this quantity can also be considered as the average true positive rate (ATPR) (for a given class, “positive” is that class, and an average is then taken over the classes). The average true negative rate (ATNR) is defined similarly:

*k*, the “negative” is any class except

*k*; due to this definition, this multiclass metric tends to be high. Finally, the F1-score generalized to the multiclass context is the harmonic mean of the multiclass sensitivity (ATPR) and multiclass precision (defined similarly to the quantities above).

### III. Results

### 1. Performance Measurements

### 2. Performance Projection for a Larger Dataset

*x*=

*N/N*

_{0}is the ratio of the size of the dataset

*N*and the size of the original dataset (

*N*

_{0}= 247). The mean squared error was minimized by parameter values of

*a*= −0.68 and

*b*= −0.81.