Utilizing Synthetic Data and Artificial Neural Networks for Clinical Phenotype Prediction in Precision Medicine: A Targeted Metabolomic Analysis of Urinary Organic Acids in Autoimmune Diseases


Published: Jul 9, 2024
Keywords:
Artificial Neural Networks Synthetic Data total Organic Acids metabolomics precision medicine
Vasileios Fragoulakis
Athanassios Vozikis
Abstract

This study aimed to create and contrast the precision of synthetic data with original data as inputs in a binary predictive feed-forward back-propagation Artificial Neural Network (ANN) for targeted analysis of urinary Organic Acids (OAs). The original dataset utilized in this analysis originated from case-control research involving 392 participants (comprising patients with autoimmune diseases and healthy individuals). Two types of synthetic data were generated using a non-parametric bootstrap replication technique and a Classification and Regression Tree (CART) model in place of the original values. Support Vector Machine (SVM) analysis was employed to pinpoint potentially crucial biomarkers for inclusion in the ANN. The accuracy of the ANN models was evaluated through the Receiver Operating Characteristic (ROC) curve, along with standard performance measurements like Sensitivity, Specificity, Positive Predicted Value, Negative Predictive Value, False Positive Rate, False Negative Rate and Overall performance. To assess the model's cross-validation and guard against overfitting, the data was randomly divided into three distinct sets: training data (50%), testing data (25%), and Holdout data (25%). The optimal architecture for all ANN models consisted of a shallow structure with one hidden layer, a hyperbolic activation function, and SoftMax as the output function. SVM analysis did not detect variations among biomarkers, indicating their equal importance. The predictive accuracy of the artificial neural network using real data was approximately 77.3%, compared to 66.6% for bootstrap-synthetic data and 51.27% for the ANN-CART model. None of the models exhibited signs of overfitting. The relatively poor performance of the ANN-CART model could be improved by adopting simpler modeling approaches and integrating alternative strategies for biomarker selection. Synthetic data quality can be enhanced through advanced statistical methodologies and may serve as a reasonable alternative for input in an ANN model while maintaining comparable accuracy in autoimmune disease prediction.

Article Details
  • Section
  • Research Articles
Downloads
Download data is not yet available.
References
Abedinia, A., & Seydi, V. (2024). Building semi-supervised decision trees with semi-cart algorithm. International Journal of Machine Learning and Cybernetics, 1-18. https://doi.org/10.1007/s13042-024-02161-z
Chandra Sekhar, C., Panda, N., Ramana, B.V., Maneesha, B., Vandana, S. (2021). Effectiveness of Backpropagation Algorithm in Healthcare Data Classification. In: Sharma, R., Mishra, M., Nayak, J., Naik, B., Pelusi, D. (Eds). Green Technology for Smart City and Society. Lecture Notes in Networks and Systems, vol 151. Springer, Singapore. https://doi.org/10.1007/978-981-15-8218-9_25
Giuffrè, M., & Shung, D. L. (2023). Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digital Medicine, 6(1), 186. https://doi.org/10.1038/s41746-023-00927-3
Global Autoimmune Institute. (2024). The Global Landscape of Autoimmune Disease. Available at: https://www.autoimmuneinstitute.org/articles/the-global-landscape-of-autoimmune-disease/
Guasch-Ferré, M., Bhupathiraju, S. N., & Hu, F. B. (2018). Use of Metabolomics in Improving Assessment of Dietary Intake. Clinical chemistry, 64(1), 82–98. https://doi.org/10.1373/clinchem.2017.272344
Guido, R., Ferrisi, S., Lofaro, D., & Conforti, D. (2024) An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review. Information, 15(4), 235. https://doi.org/10.3390/info15040235
Jawad, E. (2023). The Deep Neural Network-A Review. IJRDO - Journal of Mathematics, 9(9), 1-5. https://doi.org/10.53555/m.v9i9.5842
Karamizadeh, S., Abdullah, S.M., Halimi, M., Shayan, J., & Rajabi, M.J. (2014). Advantage and drawback of support vector machine functionality. 2014 International Conference on Computer, Communications, and Control Technology (I4CT), 63-65.
Lu, Y., Shen, M., Wang, H., Wang, X., van Rechem, C., & Wei, W. (2023). Machine learning for synthetic data generation: a review. Available at: https://arxiv.org/html/2302.04062v6/#S1
Mendez, K. M., Broadhurst, D. I., & Reinke, S. N. (2019). The application of artificial neural networks in metabolomics: a historical perspective. Metabolomics: Official journal of the Metabolomic Society, 15(11), 142. https://doi.org/10.1007/s11306-019-1608-0
Michelucci, U., & Venturini, F. (2021). Estimating Neural Network’s Performance with Bootstrap: A Tutorial. Machine Learning and Knowledge Extraction, 3(2), 357-373. https://doi.org/10.3390/make3020018
Nowok, B., Raab, G.M., & Dibben, C. (2016). synthpop: Bespoke Creation of Synthetic Data in R. Journal of Statistical Software, 74(11), 1–26.
Nußberger, J., Boesel, F., Lenz, S., Binder, H. & Hess, M. (2021). Synthetic observations from deep generative models and binary omics data with limited sample size. Briefings in Bioinformatics, 22(4), bbaa226. https://doi.org/10.1093/bib/bbaa226
Offenhuber, D. (2024). Shapes and frictions of synthetic data. Big Data & Society, 11(2). https://doi.org/10.1177/20539517241249390
Paul, A. K. & Prasad, A. & Kumar, A. (2022). Review on Artificial Neural Network and its Application in the Field of Engineering. Journal of Mechanical Engineering, 1(1), 53-61.
Tsoukalas, D., Alegakis, A. K., Fragkiadaki, P., Papakonstantinou, E., Tsilimidos, G., Geraci, F., ... & Tsatsakis, A. (2019). Application of metabolomics part II: Focus on fatty acids and their metabolites in healthy adults. International Journal of Molecular Medicine, 43(1), 233-242. https://doi.org/10.3892/ijmm.2018.3989
Tsoukalas, D., Fragoulakis, V., Papakonstantinou, E., Antonaki, M., Vozikis, A., Tsatsakis, A., Buga, A. M., Mitroi, M., & Calina, D. (2020). Prediction of Autoimmune Diseases by Targeted Metabolomic Assay of Urinary Organic Acids. Metabolites, 10(12), 502. https://doi.org/10.3390/metabo10120502
Tsoukalas, D., Fragoulakis, V., Sarandi, E., Docea, A. O., Papakonstaninou, E., Tsilimidos, G., ... & Calina, D. (2019). Targeted metabolomic analysis of serum fatty acids for the prediction of autoimmune diseases. Frontiers in Molecular Biosciences, 6, 120. https://doi.org/10.3389/fmolb.2019.00120
Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of clinical epidemiology, 49(11), 1225–1231.