Development of machine learning models for predicting mutation likelihood in SARS-CoV-2 Mᴾᴿᴼ and the NSP10-NSP16 complex using molecular dynamics simulation data

dc.contributor.advisorBishop, Özlem Taştan
dc.contributor.authorMorgan, Emily Clare
dc.copyrightDate2025
dc.date.accessioned2026-03-18T13:53:22Z
dc.dateIssued2025-10-10
dc.description.abstractThe COVID-19 pandemic, caused by the SARS-CoV-2 virus, highlights the critical need for innovative methods to understand virus evolution and develop effective treatments. Mutations in SARS-CoV-2 proteins can increase virulence, prevent virus detection, and reduce the efficacy of treatments and vaccines. While SARS-CoV-2 mutation research generally focuses on the spike protein, some non-structural proteins (NSPs) warrant attention, such as NSP10, NSP16 and main protease (Mpro), also known as the 3C-like protease (3CLpro). These proteins are essential to the replication and immune capabilities of viruses, making them valuable targets for viral therapies. This study begins with an extension of the residue mutation predictions performed in Barozi et al. (2024), where the Python artificial neural network (ANN) and random forest (RF) models we had developed were fine-tuned and additional support vector machine (SVM) models were produced. All models were trained using the original Mpro dataset from Barozi et al. (2024), achieving moderate performance with an average accuracy of up to 76% on test subsets. In an attempt to improve the mutation prediction performance, an alternative dataset using raw Mpro MD trajectory coordinates was processed using convolutional neural networks (CNNs). However, the CNNs performed worse than the models trained on the processed Mpro trajectory data. Finally, the generalisibility of the ANN, RF and SVM models when applied to other SARS-CoV-2 protein data was investigated using the NSP10-NSP16 complex. To obtain a comparable dataset, molecular dynamics (MD) simulations of the NSP10-NSP16 complex were conducted with and without the SAM ligand. Stable trajectories were analysed through dynamic residue network (DRN) analysis, root mean square fluctuation (RMSF), solvent accessible surface area (SASA), B-factor, and BLOcks SUbstitution Matrix (BLOSUM) metrics to create machine learning (ML) input datasets for NSP10 and NSP16. These datasets were tested using the Mpro-trained models, resulting in a decline in performance compared to the Mpro test sets, indicating limited transferability. This study identified critical ML-based residue mutation prediction limitations, including small datasets, class imbalances, and structural instabilities during molecular dynamics simulations. However, it established a foundation for further research by demonstrating the importance of feature selection and the potential of ML models to predict viral residue mutations.
dc.description.degreeMaster of Science
dc.description.degreeMaster's theses
dc.description.degreelevelMaster's
dc.digitalOriginborn digital
dc.disciplineBioinformatics
dc.extent1 online resource (179 pages)
dc.formpdf
dc.form.carrieronline resource
dc.form.mediacomputer
dc.identifier.otherBishop, Özlem Taştan (https://orcid.org/0000-0001-6861-7849) [Rhodes University]
dc.identifier.urihttps://researchrepository.ru.ac.za/handle/123456789/10059
dc.internetMediaTypeapplication/pdf
dc.language.isoeng
dc.language.isoEnglish
dc.note.thesisThesis (MSc) -- Faculty of Science, Biochemistry, Microbiology and Bioinformatics, 2025
dc.placeTerm.codesa
dc.placeTerm.textSouth Africa
dc.publisherRhodes University
dc.publisherFaculty of Science, Biochemistry, Microbiology and Bioinformatics
dc.rightsMorgan, Emily Clare
dc.rightsUse of this resource is governed by the terms and conditions of the Creative Commons "Attribution-NonCommercial-ShareAlike" License (http://creativecommons.org/licenses/by-nc-sa/2.0/)
dc.subjectUncatalogued
dc.titleDevelopment of machine learning models for predicting mutation likelihood in SARS-CoV-2 Mᴾᴿᴼ and the NSP10-NSP16 complex using molecular dynamics simulation data
dc.typeAcademic theses
dc.typeOfResourcetext

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MORGAN-MSc-TR25-284_-_Thesis.pdf
Size:
1.97 MB
Format:
Adobe Portable Document Format