Abstract
Big data in healthcare contain a huge amount of tacit knowledge that brings great value to healthcare activities such as diagnosis, decision support, and treatment. However, effectively exploring and exploiting knowledge on such big data sources exposes many challenges for both managers and technologists. In this study, we therefore propose a healthcare knowledge management system that ensures the systematic knowledge development process on various data in hospitals. It leverages big data technologies to capture, organize, transfer, and manage large volumes of medical knowledge, which cannot be handled with traditional data-processing technologies. In addition, machine-learning algorithms are used to derive knowledge at a higher level in supporting diagnosis and treatment. The orchestration of a knowledge system, big data, and artificial intelligence brings many advances to healthcare. Our research results show that the system fully ensures the knowledge development process, serving for knowledge exploration and exploitation to improve decision- making in healthcare. The knowledge system is illustrated for the detection and classification of high blood pressure and brain hemorrhage in text and CT/MRI image formats, respectively, from medical records of hospitals. It can support doctors to accurately diagnose the diseases to give appropriate treatment regimens.
Introduction
Knowledge management systems (KMS) are crucial for capturing, organizing, transferring, and applying both explicit and tacit knowledge, particularly in knowledge-intensive industries like healthcare. The rapid growth of healthcare data from various sources, including hospital and national databases, presents challenges in managing and analyzing this information effectively. To address these challenges, a big-data-driven healthcare KMS is proposed, designed to support diagnostic decision-making in a parallel and distributed environment. The system includes four layers: data, information, knowledge, and application, and incorporates machine learning for hypertension and brain hemorrhage diagnosis. Decision trees are used for hypertension, while deep-learning techniques, specifically the Faster R-CNN Inception ResNet v2 model, are employed for brain hemorrhage detection, achieving a mean average precision of 79% in classifying four types of brain hemorrhage. The system processes data collected from hospitals and health-monitoring devices to provide real-time diagnostic support, helping improve outcomes in conditions like hypertension and stroke.
Blood pressure is the force of blood against vessel walls as it moves through them. It's measured in two numbers: systolic (higher, when the heart beats) and diastolic (lower, when the heart rests). High blood pressure is typically defined as 140/90 mmHg or higher in medical settings.
Proposed Method
The problem posed in this study is to build a knowledge management system to support medical diagnosis decisions in a big data environment. we have proposed an architecture for a knowledge management system supporting medical diagnosis including four layers.
- Data layer: The study uses historical datasets stored in HDFS and real-time data from wearables ingested via Apache Kafka, streaming to Spark for processing and storage.
- Information layer : Data is organized, transformed, and stored in HBase, with Apache Spark handling both batch and real-time processing to extract relevant medical diagnostic data.
- Knowledge layer: Machine learning models, including decision trees for text data and deep learning for medical images, are used in Spark to generate knowledge, with data split for training and testing.
- Process layer: Applications are built to input patient data, execute queries, and provide diagnostic and disease classification outputs in a distributed environment.
Results
Decision Tree for High Blood Pressure Detection and Classifcation
In addition to doctors' previous diagnosis results, Table 1 is used to label high blood pressure levels based on systolic and diastolic measurements. Decision trees are built using the extracted feature dataset, with maxDepth being a key variable in the process. The precision of the detect models with different tre dept levels reaches 84% to 87%. The classification of the disease is conducted after the disease detection. We receive a precison of over 92% all the three models.
Application
The application is fesigned for users to enter the medical information needed for the diagnosis and classification of high blood pressure. The machine – learning algorithm to be used in the knonwledge layer for this type of disease is deep learning, which is mentioned in this study as Faster R-CNN, Inception, ResNet v2. The average precisions (AP) of the proposed model for four types of brain hemorrhage (EDH, SDH, SAH, and ICH) are 0.7, 0.59, 0.72, and 0.71, respectively. This model gives the mAP value of 0.68 for the detection and classification of four classes of brain hemorrhage.
Conclusions
In this study, we propose a healthcare knowledge management system designed to enhance medical diagnosis through the integration of big data and artificial intelligence. The system facilitates knowledge exploration via machine learning algorithms and knowledge exploitation through the application of these models, all within a big data environment utilizing technologies such as Spark, HDFS, HBase, and Kafka. We demonstrate the system's effectiveness in detecting and classifying hypertension and brain hemorrhage. Decision tree models, achieving over 84% accuracy in high blood pressure detection and over 92% in classification, are used in the system's knowledge layer. Feature Importance is employed to optimize model accuracy and training time by removing unnecessary data fields. For brain hemorrhage detection, a deep neural network with Faster R-CNN Inception ResNet v2 is utilized, achieving an mAP of 0.68. Data were collected from hospitals in Vietnam's Mekong Delta and health-monitoring devices, with plans to expand to larger databases like the Premier Hospital Database to broaden the system's applicability across different regions and diseases.
References
- Premier Healthcare Database being used by National Institutes of of Health to Evaluate Impact of COVID-19 on Patients Across the U.S. 2020. Available online: https://www.premierinc.com/newsroom/press-releases/premier-healthcare-database-being-used-by-national-institutes-of-health-to-evaluate-impact-of-covid-19-on-patients-across-the-u-s (accessed on 20 March 2022).
- Chung, B.I.; Leow, J.J.; Gelpi-Hammerschmidt, F.; Wang, Y.; Del Giudice, F.; De, S.; Chou, E.P.; Song, K.H.; Almario, L.; Chang, S.L. Racial disparities in postoperative complications after radical nephrectomy: A population-based analysis. Urology 2015, 85, 1411-1416.
- Cheung, H.; Wang, Y.; Chang, S.L.; Khandwala, Y.; Del Giudice, F.; Chung, B.I. Adoption of robot-assisted partial nephrectomies: A population-based analysis of US surgeons from 2004 to 2013. J. Endourol. 2017, 31, 886–892.
- Alavi, M.; Leidner, D. Knowledge management systems: Issues, challenges, and benefits. Commun. Assoc. Inf. Syst. 1999, 1, 7.
- Gallupe, B. Knowledge management systems: Surveying the landscape. Int. J. Manag. Rev. 2001, 3, 61–77. [Google Scholar] [CrossRef]
- Suchanek, F.M.; Weikum, G. Knowledge bases in the age of big data analytics. PVLDB 2014, 7, 1713–1714. [Google Scholar] [CrossRef]
- Begoli, E.; Horey, J. Design principles for effective knowledge discovery from big data. In Proceedings of the 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture, Helsinki, Finland, 20–24 August 2012; pp. 215–218. [Google Scholar]