Institute of Automation and Control Processes FEB RAS, Vladivostok, Russia.
*Corresponding author: Elena A Shalfeeva
Institute of Automation and Control Processes FEB RAS,
Vladivostok, Russia.
Email: shalf@iacp.dvo.ru
Received: Jun 21, 2025
Accepted: Jul 24, 2025
Published Online: Jul 31, 2025
Journal: Journal of Artificial Intelligence & Robotics
Copyright: © Shalfeeva EA (2025). This Article is distributed under the terms of Creative Commons Attribution 4.0 International License.
Citation: Gribova VV, Shalfeeva EA, Petryaeva MV. Оntology for the formalization of knowledge for intelligent systems for medical diagnostic. J Artif Intell Robot. 2025; 2(2): 1025.
Ensuring quality and relevance of knowledge in clinical decision support systems is an important research direction in decision science. Knowledge useful to specialists for establishing a diagnosis takes into account the forms of the disease, etiology, pathogenesis, course options. It is known that not many signs and symptoms are 100% significant and reliable, often reliable signs are invasive and costly. More than 90% of the information that doctors work with belongs to fuzzy category. Using information from datasets or from medical specialists, programmers form static “portraits” of the disease through signs of different modalities - both frequently occurring and rarely manifested. Popular and available symptom checkers and diagnostic services rely on the degree to which patient information matches these disease profiles. The result of such software service is almost always curious, but of little use. Under these conditions, two research questions are found: (1) what should be the structure of representation and storage of knowledge in order to contain enough clinical “details” accumulated and being transmitted by experienced experts; (2) how to programmatically interpret complex knowledge to provide all clinicians who need a qualified third opinion on critical decisions. Many years of work in the subject area “medicine”, systems analysis and ontological engineering allowed a team of applied mathematicians to create the semantic diagnostic model that provides a sufficient format for knowledge about any diseases and create the single interpreter for them. Two research gaps are filled systematically in this paper: (1) a specific ontology of the formalization of temporal knowledge with fuzzy scales is presented; (2) a new technology for the production of diagnostic systems with constantly expanding and updating knowledge is presented. The ontology is implemented on the IACPaaS cloud platform and is currently actively used by specialists to create and interpret knowledge bases in various fields of medicine. Both examples of using ontology to form a knowledge base, and examples of using knowledge bases in clinical decision support systems are also provided in this paper. Some experimental comparisons are provided in order to illustrate the feasibility and practicability of introduced method.
Keywords: Intelligent system; Disease diagnosis; Knowledge base; Ontology; Disease symptom complex; Decision support system.
For making diagnostic decisions a doctor needs to take into account multiple factors: symptoms and syndromes of the disease, its nosological forms, etiology, pathogenesis, clinical manifestations based on the individual characteristics of patients. It is getting harder and harder to keep all this in mind and make correct and timely. New knowledge is constantly emerging, while the time for a doctor to make an appropriate decision does not increase. As a result, the number of medical errors is growing, reaching up to 30% in some countries [1,2].
From software systems that help a doctor, it is expected: the capacity for providing assistance in difficult and real-world situations, the argumentation of advice or decisions, and the permanent development of knowledge following medical updates and discoveries.
At present, a huge range of different intelligent systems for the diagnosis of diseases has been created. Their implementation is carried out using knowledge-based and machine learning methods; less often, a precedent approach is used. Despite the wide range of systems that are already developed, these systems are introduced into the daily practice of a doctor very slowly, because they do not satisfy all the necessary properties.
Extensive knowledge is required to assist in solving real, not simplified tasks. It is naive to hope that they can be obtained only in one way – by involving experts to describe everything they know, or by processing hundreds of datasets with thousands of case histories, or by implementing clinical guidelines as machine-readable.
Each of the methods of acquiring knowledge continues to evolve. But there are no software systems in which all these means of acquiring knowledge are integrated and implemented simultaneously.
The ontological basis for the formalization of knowledge is the de facto standard, but we have not found such a universal medical ontology that could become a single basis for multifaceted diagnostic knowledge.
Most ontologies are highly specialized, so the implementation of diagnostic systems is carried out for a single, or at best for a small group of diseases. However, the main problems of initial diagnosis are associated with the search for possible hypotheses among a wide range of diseases with similar symptoms (that are often belong even to different branches of medicine), or with the search for comorbidity (several diseases in patients), or with the identification of acute diseases occurring against the background of chronic, sometimes absent in the anamnesis.
Often the knowledge model is simplified, does not allow to describe the dynamics of the development of diseases, to ensure the inclusion of fuzzy information in the knowledge base, as well as to diagnose taking into account the severity, form and variant of the development of the disease.
Currently, the search for new technologies for creating systems with knowledge bases continues.
One of the solutions to these problems is the creation of ontologies and ontological tool kits that focus on a wide range of diseases and the spectrum of their representations (ways to formalize knowledge). So, depending on the method of “acquiring knowledge”, the diagnosed disease can be described as:
− Several defining signs or syndromes,
− Variants of combinations of signs of manifestations depending on dozens of important factors (everything that the expert knows)
− Detailed descriptions of changes in the dynamics of each significant sign (everything that is written in textbooks),
− Dynamics of manifestations with different modality (identified by dataset or archive of case records).
Any signs should be taken into account: depending on the method of observation (externally observed, measured, received as reactions to impacts), depending on the scale of values.
Using such ontology, it is possible to provide a complete argumentation of advice or decisions taken in accordance with the knowledge base. And being created applied decision support systems would be compatible with each other for diffirent medicine.
The aim of the paper is the description of the diagnosis ontology. It is the result of many years of ontological engineering of medical diagnostics and testing for the representation of knowledge in ophthalmology, urology, pulmonology, gastroenterology, etc.
It provides a single basis for the formation of all types of diagnostic knowledge. Also the ontology is the core of the decision support toolkit for differential diagnosis of diseases and syndromes considering their development dynamics and fuzziness of the sign (symptom) manifestation.
A well-known method of implementation of the formation of knowledge bases that are focused on the developer who is also a domain expert is the use of the ontological approach [3-7]. Ontology according to the definitions [8,9] is an explicit specification in some language of the meaning of terms defining conceptualization in the domain. It consists of the terminology Σ and a set of statements (axioms) АΣ in the language Σ, representing those properties of ontology terms that are determined by the agreements of domain specialists. At the same time, an important role in the inclusion of experts into the process of knowledge formation and subsequent knowledge maintenance is the representation model (of the knowledge and ontology).
Various models of knowledge representation were used during the time of development of medical diagnostic systems: logical, production, frame, semantic networks of multiple types (from arbitrary networks to networks with a root node or networks similar to decision trees), object-oriented concept models, etc. [5,7,10,11]. Arbitrary semantic networks are also used for visualization of content written in the OWL language, which is intended for data storage or digital data processing [3,4]. Diagnostic process flowcharts are also often used for visualization, while the knowledge from this process is implemented as procedural knowledge in decision support systems; tables are used as a complementary way of specifying relationships between medical entities [3,11]. With the advent of Protégé, ontologies are more often represented as hierarchies of classes of concepts and properties of these concepts [4,6,7]. However, as noted in a number of papers [5-7,12-14] and as the long-term experience of the article‘s authors has shown, it is the ontologyoriented semantic approach that is understandable for a user and provides a natural order of knowledge formation (in contrast to the object-oriented approach) - top-down from general concepts to specific ones [15].
There are known projects that include experts in the process of creating and maintaining medical knowledge. This is provided on the basis of the SynOnt ontology, and in the Doknosis diagnostic assistant, and in the PROforma & OpenClinical.net toolki [16-18].
As a rule, diagnostic ontologies that underlie most of the known expert systems correspond to a single or a small group of diseases [8,19-21].
And even so, they are significantly simplified compared to the real conceptualizations of the domain. Usually, they do not consider the development of pathological processes in time and the interaction of various types of causal relationships, fuzzy scales for specifying the different degrees of symptom influence on the disease diagnosis.
The Proteg’e editor does not give structural concepts natural for medicine, nevertheless, due to the availability of the tool, many ontologies for medicine have been created in it (for example, SynOnt).
Ontologies created by the Protege editor (Figure 1) allow you to represent concepts and relationships of a domain in the form of a terminology block (Tbox), and their instances and statements about class objects in a statement block (Abox). Neither Tbox nor Abox display causal relationships and knowledge inherent in decision-making logic of specialists. It can be important concepts in the form of classifications and binary relationships between concepts [22]:
Formalization of specific knowledge (even in a simple ontology) is not very declarative (Figure 2), and experts are unlikely to understand queries to declared classes or entities, properties and instances in the ontology, written in the query language, for example, SQWRL (Figure 3). Often, in ontologies, unformalized text descriptions are attached to specification elements, the interpretation of which requires special methods and algorithms.
The progress has been achieved in the construction of ontologies with sufficient generality and the property of interpretability (“an executable model of decision making and ways of providing assistance” [16]. This made it possible to process different knowledge bases or versions of knowledge bases with one solver [17].
The Journal of Biomedical Informatics (JBI) contains many publications on Computer-Interpretable Guidelines (CIGs) research and methodologies, covering both knowledge base modeling languages and CIG execution engines and supportive tools. CIGs are machine readable representations of Clinical Practice Guidelines that serve as the knowledge base in many knowledge-based systems oriented towards clinical decision support [24,25]. At the same time, the use of ontological knowledge bases (when knowledge bases are formed on the basis of ontology) is considered as a response to the elimination of a significant disadvantage noted above. Thus, it is actively developing the construction of specialized ontologies [26-28] and diagnostic systems using the ontological approach [12-14].
The downside of this generality is a simplified model that does not cover real relationships and dependencies in medical diagnostics - well-known ontologies suitable for representing a wide range of diseases do not include the description of complex interdependencies required for complex problems. They provide a solution at «symptom-checker»-level [29-32].
Many ontologies allow to represent and take into account the fuzzy of knowledge in reasoning. But it is difficult to find modern ontologies with temporality, this is due to the limitations of the languages and ontology tools used (owl, Protégé) and, possibly, to the experience gained from the painstaking implementation of early works.
Nevertheless, already existing ontologies with the property of generality and executability demonstrated a “step forward” in the form of ease of maintenance of being created madical systems. In this case, maintenance of systems can independently occur at two levels: the implementation level (for example, software) and the level of practice (knowledge bases), for example, when using PROforma & OpenClinical.net. And in the presence of a term classifier, independently maintained and integrated with ontologies, there is progress in integration with existing electronic medical records [17].
The capacity to give a clear explanation is also associated with the knowledge ontology. Some tools (for example, PROforma & OpenClinical.net) allow them to be generated: “If an argument expression is evaluated against current patient data or other information and found to be valid, it is recorded as a reason for the relevant option with an explanation and, if required, supporting evidence that justifies the argument [16]”. However, these explanations are not very detailed, rather they are obvious.
One of the first ontologies of medical diagnosis close to real life medical concepts was “Ontology of Medical Diagnosis of Acute Diseases” [33]. It models the specification of clinical picture of diseases in the dynamics of pathological process (in time) as well as the impact of therapeutic measures and other events on disease manifestations. On the basis on the ontology knowledge bases for diagnosis of diseases of some body systems has been developed: respiratory organs (bronchial asthma, pneumonia), digestive organs (peptic ulcer, acute appendicitis, acute and chronic pancreatitis, acute and chronic colitis), organs of vision (conjunctivitis, keratitis, glaucoma), etc. [34,35], a number of software services have been implemented. The experience of more than ten years of using the ontology for the formation of knowledge bases on the diagnosis of a number of acute diseases made it possible to accumulate and identify a number of limitations of this ontology: the impossibility of specifying clinical manifestations of a disease for specific patient groups, ignoring of the syndromic approach to diagnosis, and the limitations in setting of fuzzy scales (availability of only two values of modality signs - obligatoriness and possibility), the impossibility to specify an alternative diagnosis and take into account in the final diagnosis the forms, disease course types and disease severity
Considering these factors, it is relevant to improve the ontology of diagnostic knowledge and the system of medical concepts (i.e., to develop an updated ontology) that allow us to formalize the disease diagnosis for any medical field. The goal of improvement is keep the ontology corresponding to the modern diagnostic process, the level of medical knowledge development, the possibility of differentiated consideration of patient characteristics during diagnosis.
Main characteristics of the ontology
The ontology of medical diagnosis should have the following main characteristics.
Formation of disease symptom complexes taking into account user categories: Using reference ranges instead of certain “norms” allows describing clinical manifestations, laboratory and instrumental data most accurately. When evaluating results of a survey on various people groups, it becomes obvious that the “normal” indicator values for one group do not always turn out to be normal for another. For example, during pregnancy, many biochemical indicators of woman’s body change, therefore, for this category, special reference ranges for values of these indicators are determined. For children and adolescents high level of alkaline phosphatase is not only normal but also desirable, as the child needs to grow healthy bones. However, the same level for an adult indicates diseases: osteoporosis, metastases of bone tumors. Specification of reference values (taking into account gender, age, characteristics of the profession, pregnancy, sports, etc.) for most clinical manifestations, laboratory and instrumental signs increase informational significance of any symptom complex [36].
Uniform formalization of disease development: In modern systems, it should be possible to diagnose and carry out differential diagnostics with other diseases at different periods of the development of the disease. In this case, it is necessary to analyze the development of the disease before going to the doctor and take into account that the patient can see the doctor at different times from the onset of the disease, both in the first hours and at the moment when the symptomatology fades away. Formalization of the stages of chronic diseases and periods of development of acute diseases differ fundamentally only in duration and units of change. The principles of taking into account their duration when making decisions do not differ. Therefore, it is advisable to provide a unified way of formalizing the stages and periods of development.
Extended range of modality values: The need to expand the concept of modality is associated with ranking of disease symptoms by specificity. For a highly specific symptom, modality = “necessity” (constricting pain behind the sternum during angina pectoris), for a specific symptom = “specificity” (dyspnea during chronic obstructive bronchitis or bronchial asthma), if the symptom is not very specific, modality = “possibility” (weakness, fever, headache - for different diseases).
In addition to the modality specified by the qualitative value, a quantitative specifier can be defined. In this case, the modality can have both a separate sign and diagnostic groups of signs.
For example, the diagnosis of rheumatoid arthritis (according to the American College of Rheumatology, 1987) is made in the presence of 4 out of 7 criteria:
1. Morning stiffness.
2. Arthritis of 3 or more joint zones (edema or effusion in at least three joints identified by a doctor).
3. Arthritis of the joints of the hands (edema of at least one joint area of the wrist, metacarpophalangeal, proximal interphalangeal joints).
4. Symmetrical arthritis (simultaneous lesion of the same joint zones on both parts of the body).
5. Rheumatoid nodules (subcutaneous nodules localized on protruding areas of the body or extensor surfaces).
6. Rheumatoid factor in serum.
7. Radiological changes (typical of rheumatoid arthritis in the hands and feet, including erosion or unmistakable bone decalcification, localized or most pronounced in the affected joints).
Taking into account different options for representing the values of attributes [37,38].
This can be a description of the exact numeric ranges; selection of the direction of change of numerical values (trend); designation of intervals of various deviations from the norm. This allows you to take into account many traditions of the representation of rules and diagnostic methods and ensure their correspondence with each other. The possibility of using integrating qualitative values provides a more compact definition of concept values, consisting of individual values-elements.
1. Formation of alternative symptom complexes. A number of diseases occur with different manifestations in different groups of patients or in different external circumstances. Variants of manifestation differ in the number of symptoms, and the moments of appearance of characteristic signs, the sequence of alternating meanings, the duration of their obvious manifestation. The presence of alternative symptom complexes in the ontology makes it possible to take into account the diversity of the course of the same diseases in different patients, which represents medical experience. They are also necessary to formalize different approaches to identifying reliable signs of the disease in order to choose the most gentle, fast or inexpensive one in the diagnostic process.
2. Clarification of diagnoses, taking into account etiology, pathogenesis, course variant, severity, presence of complications, etc. for differential diagnosis of diseases and selection of appropriate treatment methods.
3. Determination of the necessary conditions for diseases and provoking factors [33]. There are conditions that predispose to the disease or contribute to its development, prevent the onset of the disease and its development, or modify, which modify the action of the causative agent and give the disease specific features. The role of conditions in the occurrence of pathological processes and diseases is different: it can be either decisive or insignificant, which in turn determines their diagnostic value.
4. Isolation of signs (and their complexes) for groups of diseases. Such grouping makes the process of searching and refuting hypotheses based on the knowledge base more efficient [23,39].
5. Formation of the clinical picture of the syndrome. The syndromic approach to diagnosis has become widely accepted in medical practice, complementing the classical approach focused on a set of specific and nonspecific symptoms (symptom complex). A syndrome is a combination (group) of symptoms, united by a common pathogenesis. In modern conditions, the syndromic level of diagnosis has certain advantages, especially at the pre-hospital stage of the diagnostic process. It plays an important role in determining the nosological nature of the most important manifestations of the disease or its complications. The diagnosis can be quickly established with the smallest amount of diagnostic studies, and at the same time it is sufficient to justify pathogenetic therapy or referral of the patient to the hospital for surgery (for example, with acute abdominal syndrome).
6. Taking into account the values of characteristics changed by the impact of events [40]. The presence of such an element of cause-and-effect relationships makes it possible to take into account the external influences exerted on the patient’s body at different stages of the disease. The most significant external conditions include: environmental factors (polluted air, water, impact on the body of harmful industrial, agricultural, domestic factors); quantitative and qualitative inadequacy of food; violation of orderliness and the optimal balance of work and active rest; social factors.
Diagnostics in clinical medicine is a section that studies the content, methods and sequential steps of the process of recognizing diseases by its symptoms (signs of a disease). Detection of a pathognomonic symptom, i.e. occurring only in this disease is sufficient to establish a reliable diagnosis. However, the number of pathognomonic symptoms is limited, therefore, in the diagnosis of most diseases, they are usually guided by symptom complexes [23]. The new ontology presents the cause-and-effect relationships of the elements of symptom complexes with diseases used in medical diagnostics.
Each disease is represented by alternative symptom complexes, the necessary conditions for this disease, and may contain details of the corresponding diagnosis. The symptom complex of the disease consists of a complex of complaints and objective examination, a complex of laboratory and instrumental examination and the necessary conditions for the symptom complex. The number of symptom complexes is determined by the type of disease, the need to take into account premorbid biological, personal and other factors. The role of the use of symptom complexes is great due to the fact that it allows you to combine diagnostically valuable signs within the “framework” of a certain condition.
A prerequisite for a disease is an event without which the disease would not have happened, for example: a tick bite for tick-borne encephalitis, a penetrating wound for traumatic keratitis.
In the complexes of complaints and objective examination, many signs are presented, changes in the values of which are symptoms of the disease. And in the block of laboratory and instrumental examinations, those signs are more often presented that reliably qualify diseases and their forms.
Possible causes of the disease are represented by events or etiological factors that led or contributed to the development of the disease. They are described by modality and temporal characteristics, which in turn include the interval before the onset of the disease and the duration of the event.
Details of the diagnosis - by form, variant, severity, stage, etc. - presents a set of signs (or symptom complex), which makes it possible to make an appropriate clarification to the main diagnosis. For a more detailed (in-depth) diagnosis and differential diagnosis, it is necessary to describe the disease in terms of form, course, severity, and make appropriate complications and functional disorders.
In the knowledge model АΣ there will be sentences of the form “a variant of the process of changing sign values, typical for a certain disease”, the structure of which:
(A-I) < diagnosisj , {symptom complexkj}, [necessary conditionj ]>;
(A-II) < symptom complexk , {featurej , rangekj of values of featurej }>;
(A-IIa) < symptom complexk , {attributej , {periodi , duration of periodi , rangeij of values of attributej in periodi }}>.
This makes it possible to describe diseases taking into account one of the main difficulties of the diagnostic process in medicine - the need to determine a continuously developing process (disease). Each disease develops over more or less time. From the point of view of the rapidity of the development of diseases, the most acute - up to 4 days, acute - about 5-14 days, subacute - 15-40 days and chronic, lasting months and years, are distinguished. In the development of the disease, the following stages can almost always be distinguished:
• The onset of the disease (sometimes called the latent period);
• The stage of the actual disease;
• The outcome of the disease. Diagnostics, as a rule, is carried out at the stage of “the actual disease”. At this stage, the following periods of development are distinguished: 1) a period of increasing manifestations of the disease; 2) the peak period (maximum severity of symptoms); 3) the period of extinction of the manifestations of the disease (the gradual disappearance of clinical symptoms).
• The ontology of medical diagnostics sets the structure for describing groups of diseases, diseases and syndromes.
• In the knowledge model А there will be sentence
• (A-III) < diagnosis groupm, {(diagnosiskm | diagnosis groupk )}>.
• The structure of a syndrome description consists of a description of syndromes and groups of syndromes. Each syndrome contains a clinical picture, which consists of many signs.
• The knowledge model can specify the special conditions necessary for the occurrence of a certain disease:
• (A-IV) < diagnosisk , eventu [, time intervalku]> or
• (A-IVa) < diagnosisk , eventu , time intervalku [, characteristicsi of the event | rangei of event values]> if the event is characterized not only by the moment of its occurrence, but also by some qualitative or quantitative characteristic;
• (A-V) < diagnosisk , factorj [, range of values of factorj ]>
• To describe the change in the development of the disease when exposed to the body from the outside, one needs sentences of the type “a variant of the reaction of the functioning process to the effect of an event” with the following structure:
• (A-VI) < symptom complexk , signj , [range of valuesi ,] eventu , [time intervalkju,] range of changed valuesi >, if the phasing of changes is not important, or
• (A-VIa) < symptom complex, signj , eventu , [time intervalkju,] {periodi , range of valuesuk of signji}>.
• An event can be a composite or a collection of events, or a collection of events and factors. Then we need sentences of the form “a variant of the reaction of the process to the effect of a combination of factors”:
• (A-VII) < symptom complexk , attributej , [range of valuesi ,] eventu , quantitative characteristic of the eventu , [time intervalkju,] {factorj , [range of values of factorj ,]} attributej of interest, range of changed values of attributej >.
• Ontology allows you to describe diseases taking into account etiology, pathogenesis, course, stage, etc. by forming additional symptom complexes for more detailed (in-depth) diagnosis or differential diagnosis of the disease. As the diagnosis is detailed, additional or specific symptoms are described for each unit of the pathological process.
• A symptom of a disease can be simple or composite, its values are presented according to the periods of the dynamics of the development of a symptom or a disease in general. Each period of dynamics is characterized by the upper and lower boundaries of the period duration, the unit of measurement of the boundaries. A simple symptom has modality (the appearance of a symptom in the clinical picture), in each period a sign (symptom) can have more than one variant of values. Each variant of dynamics specifies a set of possible values of a sign (symptom), the necessary conditions for the presence of a sign, as well as a description of the change in the value of this sign under the influence of some events.
• An example of simple signs are the patient’s complaints: thirst, heartburn, nausea, loss of appetite with options for values “present”, “absent”. A composite sign contains a description of the sets of its characteristics varying over periods with the modality of its entering into the clinical picture. Each sign can have many of its own characteristics and many options for the value of this characteristic.
An example of a composite symptom is a patient’s complaint such as abdominal pain. This sign has several characteristics: character, localization, intensity, severity, irradiation, frequency, reason for the amplification. Each characteristic can have one or more different values. For example, the character of pain (for ex. abdominal pain) can have different values: sharp, dull, stabbing, cutting, pulsating, pressing, pulling. Characteristic localization values: epigastrium, right hypochondrium, left hypochondrium, right iliac region, left iliac region, mesogastrium, etc. Characteristic intensity: weak, moderate, strong, sharp, sharpest. Each variant of the characteristic values contains many possible values of the characteristic, the necessary conditions for the presence of the characteristic, and a description of the change in the value of this characteristic under the influence of some events. The term “value changed by the effect of an event” allows you to describe a change in a symptom in dynamics, if after the onset of the development of the disease before going to the doctor, the patient himself took any measures, or the values of the signs (complaints, objective state) change under the influence of any events or manipulations taken by a doctor
The specified ontology is hosted on the IACPaaS cloud platform (in the format of a hierarchical semantic network) and is used to formalize knowledge [41]. Figure 4 shows a fragment of this ontology.
The ontology, in addition to the types of relations between concepts, includes agreements on the rules for comparing facts from reality - Knowledge in the process of reasoning and decision-making. Examples of such agreements are:
− Statements about the correspondence of facts from the Situations of Reality - manifestations or necessary Conditions, important according to the Knowledge of internal processes and subsequent periods of course of the disease in question;
− Statements about the correspondence of the confirmed manifestations (conditions) of the diseases under consideration - the facts from the Reality - to the observations themselves.
Explicitly defined conventions on the rules for matching facts to knowledge become the basis for building software solvers for decision support services.
The ontology and knowledge is formed by it are represented by the hierarchical semantic network using the terminology accepted in medicine. The formation of knowledge by the ontology can be domain expert, using inductive methods based on training samples or automatic text recognition.
Using the ontology in medical knowledge bases forming
Any intelligent medical system based on knowledge is based on a base of formalized knowledge, the quality and volume of which directly affects the efficiency of the system. For a uniform presentation and unambiguous interpretation of knowledge bases by development participants and users (including from different institutions), a common set of all terms used in practice is required. Terminology should be generally accepted and understandable to medical professionals, i.e. be the result of an ontological agreement in the field of medicine. This “Base of Medical Terminology and Observations” was formed by experts, it is a universal resource used to form diagnostic (and not only) knowledge bases of various fields (profiles) of medicine. The general set of terms contains the names of observations, all their possible meanings, as well as their common synonyms used when filling out medical records. Since medical terminology is characterized by “mobility of the lexical composition” and its constant development, it is possible to preserve national and international terminological synonyms, doublets (shortness of breath - dyspnoea, skin - dermis) and partially coinciding synonyms, the introduction of the name of the method or symptom into the name of its discoverer (symptom of irritation peritoneum - a symptom of Shchetkin-Blumberg, slip symptom - a symptom of “shirt” - a symptom of Voskresensky).
On the basis of the described ontology, using the “Base of Medical Terminology and Observations”, knowledge bases of diseases and syndromes from various nosological groups for decision support systems in medicine were formed. They are combined into a common information resource “Knowledge base on diagnostics of diseases and syndromes”, which is located on the IACPaaS platform [Грибова, 2017]. The grouping of diseases corresponds to the structure of the ICD-10. The knowledge base includes a description of diseases from such sections as: “Diseases of the circulatory system”, “Diseases of the digestive system”, “Diseases of the genitourinary system”, “Infectious and parasitic diseases”, etc. The groups of diseases unite conceptually related diseases. For example, the group of diseases “Diseases of the digestive system” consists of groups of diseases - “Diseases of the esophagus”, “Diseases of the stomach and duodenum”, “Diseases of the intestine”, “Diseases of the liver”, “Diseases of the gallbladder, biliary tract”, etc. Each group diseases groups diseases with a common set of diagnostic signs, which includes signs characteristic of this group. So for the group of diseases “Diseases of the digestive system” in the complex of diagnostic signs includes one symptom “Pain in the abdomen”. For the group “Diseases of the gallbladder and biliary tract”, the diagnostic complex includes 5 signs: “Abdominal pain”, “Nausea”, “Itching”, “Increased body temperature”, “Tension of the muscles of the anterior abdominal wall.” For the group of diseases “Cholecystitis”, more than 20 signs are already diagnostic: all the same signs plus “Flatulence”, “Vomiting”, “Belching”, “Nausea”, “Bitterness in the mouth”, “Leukocytes”, “ESR”, “Thickness gallbladder wall for ultrasound “, etc. A fragment of the information resource “Knowledge Base on Diagnosis of Diseases” is shown in Figure 5.
The description of the disease itself includes: the disease code according to ICD-10, the cause of the disease, the necessary condition or event that led to the onset of this disease, a number of different symptom complexes and the details of the diagnosis. The need to group different symptom complexes is dictated by the clinical features of the course of the disease in different categories of patients: children, elderly and senile people, pregnant women, athletes, etc. The disease can manifest itself in the form of an erased (with unexpressed or mild symptoms) or abortive (with a shortened course, the rapid disappearance of all painful manifestations and sudden recovery) forms. So, for example, acute appendicitis in children is more violent than in adults, the clinic is dominated by symptoms such as high temperature (39-40°), diarrhea, repeated vomiting, abdominal pains are cramping in nature, do not have clear dynamics. The pulse often does not correspond to the temperature, the symptoms of intoxication are pronounced, the tension of the muscles of the abdominal wall may be small. In older people, on the contrary, the clinical picture is blurred due to decreased reactivity of the body, the temperature is more often normal or subfebrile, the symptom of peritoneal irritation is often absent, the pain syndrome is insignificant. In pregnant women in the second half of pregnancy, the pain threshold and the body’s protective reaction to the inflammatory process decrease, therefore, the tension of the muscles of the anterior abdominal wall and symptoms of irritation of the peritoneum (Shchetkin-Blumberg, Voskresensky) may be negative, but the symptoms of Obraztsov, Bartomier-Michelson are well expressed. There is a physiological increase in temperature to subfebrile numbers due to an increase in the level of progesterone in the body. Figure 6 shows a fragment of the description of the knowledge base of disease group.
The clinical picture of many diseases can be very diverse, and recently, doctors have increasingly noted an increased incidence of atypical forms of diseases. So, with myocardial infarction, there are 9 variants of the course of a heart attack, which, by their mechanism of development and clinical manifestations, differ markedly from the typical form. Pain may not occur behind the breastbone, but in the abdomen, in the armpit, or elsewhere, or none at all. Such atypical symptoms may occur, such as choking, vomiting, nausea, flashing of flies before the eyes, edema, etc. This is often the cause of medical errors or late diagnosis of the disease. When describing such diseases, we have formed symptom complexes of atypical forms that have a different set of clinical signs and symptoms. The Figure 7 - is an example of the description of the disease “Acute myocardial infarction” with various symptom complexes.
Each symptom complex includes a description of a complex of complaints and objective research and a complex of laboratory and instrumental studies, which in turn include a description of pathognomonic, specific and non-specific signs of the disease with the definition of modality. Ontology makes it possible to describe all variants of values of simple and complex signs with characteristics in all periods of the course of the disease. During the course of the disease “gastric ulcer with perforation” there are three periods of dynamics: the period of pain shock (2-6 hours), the period of imaginary well-being (6-12 hours), the period of development of peritonitis (6-48 hours). The symptom of abdominal pain changes in nature, localization, irradiation, severity. In the first period, the pain is acute, burning, “dagger”, localized in the upper abdomen, more to the right of the midline. During the period of apparent well-being, abdominal pain decreases and may disappear completely in some patients. During the period of purulent peritonitis, the nature of the pain is dull, pressing, bursting, spreads along the right half of the abdomen, including the right iliac region, and then captures all its parts.
The effectiveness of medical diagnostics determines an accurate, complete, reliable diagnosis. In this regard, in modern decision support systems, clinical diagnosis cannot only be a list of the underlying disease. The diagnosis must be detailed, contain an additional characteristic of pathological processes (clinical and anatomical form, type of course, degree of activity, stage of the process, complications, functional disorders), include all morphological, clinical, laboratory and other data known in this particular case. Ontology allows to form knowledge bases with such a set of data. Let us give an example of a description of a fragment of the knowledge base of the disease “Acute cholecystitis” with a detailed diagnosis by form (calculous, non-calculous), variant (catarrhal, phlegmonous, gangrenous), severity (I, II, III), complications. For each element of the diagnosis, a complex of diagnostic signs with reliable values of the sign is described.
The paper describes the ontology of medical diagnostics, designed to create medical knowledge bases on the diagnosis of acute and chronic diseases. This ontology is the result of a review of the literature, as well as more than 20 years of experience of the team in creating medical intelligent systems based on knowledge bases.
The main tasks that the authors set when creating it are as follows:
1. Develop an ontology that would not depend on a particular disease or group of diseases.
The development of a unified ontology made it possible to implement completely new functionality, which, despite the achievements in the field of AI and machine learning, have not yet been implemented. This is the search for diagnostic hypotheses and the diagnosis of combined pathology.
A huge number of mistakes are made by doctors at the stage of preliminary diagnosis of diseases [42], when, based on the patient’s complaints, it is important to understand which disease or group of diseases the patient’s complaints may relate to (find possible hypotheses), for further accurate diagnosis of the disease. The Figure 8 shows one of the typical examples that cause diagnostic difficulties.
Without a single unified ontology, such a task is unrealizable. The set of disparate ontologies and systems for diagnosing diseases based on them makes it difficult to choose both the necessary subset of such systems and the minimum set of additional examination methods - objective, laboratory and instrumental ones that will help to make the correct diagnosis.
The proposed ontology allows us to solve this important problem. We have tested this ontology for a long time. The number of implemented knowledge bases confirms that the ontology is indeed unified and does not depend on a specific disease or a group of diseases. Thus, using this ontology, 15 diseases have been developed in gastroenterology, 10 in cardiology, 7 in endocrinology, 1 in urology, 12 in the field of infectious and parasitic infections [43,44].
At the same time, she allows you to describe the diagnosis of diseases depending on the form, stage, etiology, etc., the system generates an explanation in terminology of experts and suggests what additional examinations (laboratory, instrumental, or objective) should be carried out to confirm or refute any hypothesis about the disease.
Another important functionality that is almost impossible to implement without a unified ontology is the diagnosis of a combined pathology (when the patient has several different diseases). This problem is also extremely relevant for medicine, today a lot of patients have several pathologies and it is very important to identify all possible diseases of the patient.
A gradual increase in knowledge bases, implemented on the basis of this ontology, will provide a search for diagnostic hypotheses and combined pathology among a larger number of diseases.
1. The form of ontology representation should allow doctors independently (without intermediaries) to form and maintain knowledge bases.
This requirement is implemented by three main solutions:
a) The graphical representation of the ontology (in the form of a semantic network), in contrast to the production or object-oriented model, provides a clear and natural “top-down” formation of the knowledge base, moving from general concepts to their refinements and detailing.
b) Terminology used. The semantics of ontology terms (disease, symptom complex, a necessary condition for the disease, periods of disease development, and many others) are understandable to doctors of all profiles, moreover, their description structure also provides assistance to doctors in the formation of knowledge. The use of a single terminological base when describing knowledge about a disease provides both a single interpretation of all terms used and their reuse.
c) The knowledge base editor, formed by ontology on the IACPaaS platform, also greatly simplifies the process of forming knowledge, since it contains control of the integrity and completeness of their description by ontology, syntax errors, as well as several types of user interfaces, implemented by the team in which the authors work.
2. The knowledge base formed by ontology should correspond to real knowledge in the field of medicine (not be simplified).
To implement this task, the ontology must provide the following important properties:
a) Description of chronic and acute diseases. The peculiarity of acute diseases is their development over time. At the same time, an analysis of the literature has shown that different approaches are used to describe the dynamics of the development of the process: in some cases, the diagnosis is divided into several periods of the development of the disease, each period is described by its own symptom complex, and in some cases each sign is described by its development over time.
b) Means of describing the diagnosis of diseases for different groups of patients or the conditions under which the disease occurred. For example, it is known that the clinical picture in some diseases may differ in children and adults, pregnant women, etc.).
c) Using fuzzy scales and sets in the formation of knowledge bases. When diagnosing diseases, not all signs included in the clinical picture of the disease must necessarily be present in the patient. Therefore, it is necessary to introduce fuzzy scales. There are several such approaches to their description: each sign or characteristic has a fuzzy scale that describes some probability of its manifestation in the patient (for example, necessarily, characteristically, possibly).
In addition to the above, the ontology allows you for each disease specify the necessary conditions of the disease (seasonof the year, contact with animals and so on) and provoking factors.
Reducing labor intensity in the development of decision support systems for the diagnosis of diseases.
When developing the ontology, we were guided by the following principle: the ontology should be clearly separated from the knowledge bases that are created on its basis. The problem solver is created on the basis of an ontology. It, as well as the ontology, is unified, does not depend on the content of the knowledge base. In this case, the ontology acts as a “formal parameter” for the solver, the knowledge base is “actually a parameter” (Figure 10).
Thus, in order to “deploy” the diagnosis of a new disease, it is necessary to develop only its knowledge base (and verify it). No programming is required! This is a very important property of the proposed approach, which significantly reduces the complexity of creating decision support systems. The Figure 11 shows an assessment of the complexity of creating a system using the proposed approach and without it (the estimates were obtained based on the average ratings of a survey of 10 developers who used the proposed technology and other technologies).
According to the presented figure, some stages of system development, in accordance with the proposed approach, are excluded: system analysis for building an ontology of medical diagnostics, building a solver.
Ontological engineering of medical diagnostics, which resulted in the model proposed in this work, took two person-years. Formalization of this ontology in the language (in the language of IACPaaS semantic networks) took 2 weeks. The development of a unified diagnostic solver required about 10 people / month.
If you want to formalize our ontology in any other language and implement another Ontological diagnostic solver, you will also need about 2 weeks + 10 months. But what is important, this is a one-time job, since this ontology and solver will be used in the future when creating new diagnostic systems.
Developers, as a rule, want to have their own knowledge base. The effort to form it in terms of ontology depends on the method. With satisfactory textual sources and an ontological textual interpreter - quickly, manually - longer. The formation of datasets for the extraction of knowledge is also laborious, the speed of building knowledge based on them is acceptable.
Assessment of quality on standards takes 2-3 person-months. If we take the formation of knowledge base “manually”, then in the semantic representation by the expert himself (without intermediaries) it will take half the time than in the conditions when the expert says “words” and the engineer forms predicates or rules.
What is fundamentally important, the system generates an explanation in terminology of experts and suggests what additional examinations (laboratory, instrumental, or objective) should be carried out to confirm or refute any hypothesis about the disease.
The time it takes to create a knowledge base by domain experts depends on the volume and complexity of diagnosing a specific disease and ranges from several hours to two weeks. So, the development of an intelligent system for diagnosis of COVID-19 (recall that the development of the system is reduced to the development of its knowledge base, while a detailed explanation is issued) took 5 days, while first the knowledge base was developed in English, then translated into Chinese. The development of such a system was carried out at the request of Chinese colleagues for use in Wuhan in the midst of the coronavirus epidemic (early February 2020), what was actively discussed in the Russian [45] and Chinese press (in www.cankaoxiaoxi.com).
Currently, work in this direction is continuing. The authors are developing methods for automated formation of knowledge bases based on the described ontology, using methods of inductive generalization of training samples and extraction of knowledge from the texts of scientific publications.
The paper presents the ontology of medical diagnostics, which corresponds to the modern representation of knowledge in the field of medicine, does not depend on the branch of medicine, allows you to describe acute (developing over time) and chronic diseases using fuzzy scales and sets. A single, ontology-based solver allows you to deploy a new decision support system without programming with the generation of a detailed explanation. It is only necessary to develop a knowledge base, based on the proposed ontolology.
The ontology of knowledge for diagnosing diseases and syndromes is hosted on the IACPaaS platform and is currently actively used by specialists to create knowledge bases in various fields of medicine.
The community of experts in the field of medicine, developers of intelligent medical systems, interested in creating knowledge bases and their use for the development of intelligent medical systems, have the opportunity to join this process (to create their own knowledge bases, improve and develop the created ones, as well as for their use in software systems) by registering on the IACPaaS platform and requesting the ontology of medical diagnostics from its developers.
Acknowledgments: The research was supported by the Government research assignment for Far Eastern Federal University, project FZNS‒2023‒0010.