The aims of the study were:
- To assess the association between clinical variables and the type of cancer;
- To create a predictive model that allows predicting type of cancer for new cases;
- To classify patients in clusters according to their clinical characteristics.
Building of neural networks
We constructed 200 classifying neural networks with the automatic search mode. All variables in the database (except for the patient identification number) have been introduced into the model. The cases were divided into learning, testing and validation samples in the 70:15:15 ratio.
For further quality assessment, we selected six networks with the highest stability between test and validation. The additional criterion was the quality at least 96%.
Quality assessment of the network
The error matrix has shown that a neural network number 195, composed of 19 hidden neurons, has classified patients unmistakably. Global sensitivity analysis indicated that we can remove from the model three variables. A set of networks was rebuilt. In the next step, as in the first, selection of networks was based on quality assurance (at least 96%) and the stability between testing and validation samples. The assessment of the error matrix showed that among the selected networks the most reliable is network number 72, built of 11 intermediate neurons.
Sensitivity analysis revealed that only one variable should be excluded. As the value of the crucial factor (0.998) was very near to decision line (1.0), it was abandoned to remove this variable. The model was saved as PMML file.
Construction of the Kohonen network
For the construction of the Kohonen network, we chose a set of variables tested in the model described above.
When building Kohnen network, the number of epochs was set at 1000, because at 200, 400 and 600 epochs the learning error was constantly decreasing. Further elongation was stopped for fear of losing the model due to long-term stabilization.
The number of neurons was determined experimentally so that the distribution of results formed a visually distinguishable pattern in the Kohonen graph, which was achieved at 8 x 8 neurons square.
A cumulative table was generated from the Kohonen chart, grouping patients according to the type of cancer. The tables were then generated for each cluster separately, with ranking based on the colour scale. Each cluster was assigned a serial number, and a table was created. Then the mean, standard deviation, median, and quartiles were calculated for all variables using the cluster number as a grouping variable. The normality or significance of the differences has not been assessed. Such a table can be subjected to further analyses, e.g. analysis of variance.
In the next step, a prediction of values for the test and validation group was made. The predictive classifications for the type of cancer were the same as real. This shows that the neural network obtained in this experiment is suitable for predicting values for new cases.