Abstract
Knowing the most likely clinical prognosis for a patient infected with SARS-Cov-2 could offer guidelines for tracking their medical evolution, improving attention, and assigning resources. Aiming to assess a patient’s status quantitatively, we explore the analysis of existing clinical information using data-driven methods. Our goal is to extract the characteristics distinguishing between those COVID-19 patients that improve and those who die. In our approach, we select the relevant features using the algorithm of Boruta, a wrapper framework that takes input from classifiers generating relevance assessment of the predictors. Using the extracted features, we train machine learning classifiers, including Random Forests, Support Vector Machine, Extreme Gradient Boosting, and Neural Networks. We assess the performance of the classifiers using Precision-Recall and ROC analysis, establishing the ranges at which risk assessment permits effective decision-making. Our research highlights that local regions present unique sets of essential features, that it is possible to construct effective classifiers based on clinical data, and that an ensemble of classifiers results in the best performing discriminant.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was partially funded by SIP-IPN 20201357 for Joaquin Salas.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Our Data Science study does not require IRB or equivalent oversight
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The code and data has been made public on github
https://github.com/joaquinsalas/COVID19-DataDriven-Classifier