ISSN: 0976-4860
Samy Ghoneimy, Hossam M. Faheem, Noha Gamal
One of the advantages we have today in the fight against coronavirus (COVID-19) that wasn’t as advanced in the SARS outbreak of 2003 is big data analytics and the major advancements in machine intelligence and artificial intelligence technologies. The United States of America’s statistical surveillances have listed pneumonia/influenza as the seventh leading cause of death. Severe influenza seasons can result in more than 60,000 excess deaths and more than 200,000 hospitalizations. US witnessed fifty-five-thousand deaths (55,000 people) caused by pneumonia/ influenza among total number of nine-hundred-thousand deaths (900,000 people) (%6.0)-during Influenza outbreak in 2018. Patients aged 65 years or older are at particular risk for death from viral pneumonia as well as from influenza not complicated by pneumonia. Deaths in these patients account for 89% of all pneumonia and/or influenza deaths. The healthcare industry needs researchers who are interested in applying machine learning for surveillance, prediction and diagnosis of diseases. Many healthcare-related researches, states that machine learning (ML) is the lifesaving technology that will renovate healthcare services. This technology challenges the traditional reactive approach to healthcare. It is the predictive, proactive, and preventive life-saving qualities that make it a critically essential capability in every health system. In order to help in the prediction of pneumonia/influenza outbreaks, regression and classification techniques such as Ridge, Decision Tree Regression/Classification, Multiple Linear Regression, Logistic Regression Classification, K-Nearest Neighbor and Support Vector Machine Regression can be applied to predict forthcoming instances based on a trustworthy training and validation datasets. Accurate predictions will help healthcare stakeholders and governments to address the medical and physical needs during outbreak season. In this paper we exploit a methodology for predicting the number of deaths due to Influenza and Pneumonia in USA Cities using different machine supervised learning algorithms. Each algorithm is implemented, fitted to training dataset, validated by the validation dataset, and evaluated by means of Root Mean Square Error (RMSE) and R2 metric. KNN is the most fitted to the dataset by giving 92.6% accuracy. The least fitted algorithm is Logistic Regression by giving 51% accuracy. The remaining tested algorithms give accuracy levels from 80% to 92%. Evaluation Metrics, R2, and RMSE are obtained both analytically and programmatically using Python-based simulation. Results from both methods are well-matched. The promising results encourage the idea of enhancing the performance of the predictor. A new predictor (KMR-Stack) is implemented by integration of the best three fitted algorithms (KNN, Multiple Linear Regression, Ridge) in one stack. KMR-Stack exceeded KNN accuracy ratio by giving 94.9% accuracy. In KMR, Stack, another improvement was made in comparison with other stacking models introduced in the literature. The improvement included in the dynamicity of choosing the base-model regressors Hence, the stacked-integrated use of different machine learning algorithms showed increased prediction accuracies compared to the use of each individual algorithm, therefore improves influenza surveillance and potentially contributes in developing a robust defence strategy, which will collectively enhance human health.