Of these, 699 were collected in Africa (Burkina Faso, Cameroon, Guinea, Kenya, Malawi, Mali, and Tanzania), 69 in South America (Brazil, Colombia, French Guiana, and Peru), 59 in Oceania (Papua New Guinea), and 183 in Asia (Cambodia, Myanmar, and Thailand). To evaluate the vaccine escape potential of antigens used in vaccines currently in development or clinical testing, we surveyed the genetic diversity, measured population differentiation, and performed in silico prediction and analysis of T-cell epitopes of ten such Plasmodium falciparum pre-erythrocytic-stage antigens using whole-genome sequence data from 1010 field isolates. Thus, this study provides relevant knowledge of genetic and technical deliberations that can improve the state-of-the-art methods for predicting individual malaria risk.įailure to account for genetic diversity of antigens during vaccine design may lead to vaccine escape. Moreover, the deep learning model obtained almost similar MAE scores to the machine learning models, indicating an alternative approach. In addition, incorporating the mutation location significantly improved the machine learning models in predicting the individual malaria risk a Mean Absolute Error (MAE) score of 8.00E−06. The findings of this study demonstrated that the proposed GA could overcome the curse of dimensionality and improve resource efficiency compared to commonly used methods. The analysis is performed on the Malaria Genomic Epidemiology Network (MalariaGEN) dataset comprising 20,817 individuals from 11 populations. Besides that, a deep learning model is also proposed to predict individual malaria risk as an alternative approach. Therefore, this study proposes a machine learning model that incorporates mutation location as well as a Genetic Algorithm (GA) to optimize hyperparameters. Nevertheless, it becomes computationally expensive for hyperparameter optimization on large-scale datasets. Thus, the present study improves the current state-of-the-art genetic risk score by incorporating SNPs mutation location on large-scale genetic variation data obtained from GWAS. Furthermore, many Genome-Wide Associations Studies (GWAS) have associated specific genetic markers, i.e., single nucleotide polymorphisms (SNPs), with malaria. However, a machine learning model based on genetic variation data is still required to fully explore individual malaria risk. ![]() In recent malaria research, the complexity of the disease has been explored using machine learning models via blood smear images, environmental, and even RNA-Seq data. Furthermore, based on wGRS + GF, all models performed significantly better than wGRS alone, in which LightGBM obtained the best performance (0.0033 MAE score). ![]() The analysis and comparison of the three machine learning models demonstrated that LightGBM achieves the highest model performance with a Mean Absolute Error (MAE) score of 0.0373. This is an important result as prior studies have proven that rs334 is a major genetic risk factor for malaria. Our proposed approach identified SNP rs334 as the most contributing feature with an importance score of 6.224 compared to the baseline, with an importance score of 1.1314. Finally, Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and Ridge regression algorithms are utilized to establish the machine learning models for malaria risk prediction. Moreover, to compare the performance of the wGRS-only model, we calculated and evaluated the combination of wGRS with genotype frequency (wGRS + GF). Next, we calculated the wGRS of the selected feature set, which is used as the model's target variables. Therefore, we reduced the feature set by employing the Logistic Regression and Recursive Feature Elimination (LR-RFE) method to select SNPs that improve the efficacy of our model. However, it can become computationally expensive for a machine learning model to learn from many SNPs. We proposed an SNP-based feature extraction algorithm that incorporates the susceptibility information of an individual to malaria to generate the feature set. ![]() More specifically, this study aims to quantify an individual's susceptibility to the development of malaria by using risk scores obtained from the cumulative effects of SNPs, known as weighted genetic risk scores (wGRS). ![]() However, to the best of our knowledge, no study analyses the contribution of Single Nucleotide Polymorphisms (SNPs) to malaria using a machine learning model. Nevertheless, machine learning models have been explored to study the complexity of malaria through blood smear images and environmental data. The malaria risk prediction is currently limited to using advanced statistical methods, such as time series and cluster analysis on epidemiological data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |