Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storgae media, for responding to a query. In some implementations, a computer obtains a query. The computer determines a meaning for each term in the query. The computer identifies one or more ontologies based on the meanings for at least some of the terms. The computer identifies a knowledge graph bsed on the identified ontologies and the user data. The computer generates a response to the query by traversing a path of the identified knowledge graph to identify items in the knowledge graph based on the determined meaning for each of the terms. The computer generates path data that represents the path taken by the computer through the identified knowledge graph. The computer provides the generated response and the path to the client device.
Background
A clinical trial or program can be a single research study or multiple research studies that prospectively assigns human participants/subjects or groups of human subjects to one or more health-related interventions to evaluate the effects on health outcomes.
Summary
As part of the healthcare process, physicians or other medical care providers may perform clinical trials, programs, and other activities to evaluate the safety and efficacy of a particular pharmaceutical drug or other medical treatment option. Conducting health-related clinical trials can help to identify medical treatment options, such as novel treatments, for improving overall patient health and reducing health system costs. Investigators that use a particular geographic site location to interact with study subjects generally conduct clinical trials and other controlled programs. In some instances, a physician for a patient can be associated with a clinical trial and the physician can refer a patient as a candidate for participation in a trial based on a diagnosed condition of the patient. An investigator, a geographic site location, or both, can form an entity that executes a program. The effectiveness of a trial program can depend on a variety of factors, such as obtaining a sufficient number of subjects that are suitable for participation in a trial, the accuracy of diagnosed conditions for each patient or subject in the program, or certain types of conditions that may prospectively affect a subject. In some cases, factors that impact the effectiveness of a clinical trial can vary depending on the treatment options being evaluated and the criteria that are associated with the trial.
Based on the above context, the techniques described in this specification enable a computing system that uses specific computing rules or instructions to predict or generate a response to a query, e.g., clinical trial questions, provided by a user. In particular, the computing system utilizes a model that contains a natural language processor (NLP) module and a Machine Reasoning as a Service (MRaaS) module to generate the response. The NLP module can recognize or identify the contextual meanings of each of the words or phrases in the query. The MRaaS module can generate the response using the contextual meanings of each of the words from the query provided by the NLP module.
The MRaaS module can rely on an artificial intelligence (AI) system that employs ontologies, knowledge graphs, reasoning through use of an inference engine and an explainability module, and clinical trial data provided by external sources, e.g., clinical trial expert knowledge. The MRaaS can provide a response with a high confidence level. More specifically, the MRaaS module can generate and provide reasoning and identified sources behind the response to the user using the inference engine and the explainability module. The computing system can capture interactions from the user responding to the identified reasoning, and provide feedback and various other types of recommendations to improve the model's accuracy.
In some implementations, the computing system utilizes the MRaaS module and an NLP module to derive a response to the query. The MRaaS module can process terms from the query provided by the NLP module that recognizes and extract terms from the query. The MRaaS module can semantically understand relevant terms, e.g., medical and clinical terms, their relations to other medical terms. Additionally, the NLP module can extract terms from external information or sources, such as textbooks, online resources, and medical journals or from unstructured datasets such as electronic medical data for multiple healthcare patients. In some implementations, the NLP module can include one or more machine learning algorithms. The NLP module can be trained using the one or more machine learning algorithms, e.g., deep learning algorithms. In some cases, the NLP module can be external to the computing system.
The NLP module can provide the extracted terms and their contextual meanings to the MRaaS module. In some implementations, the MRaaS module can identify a set of ontologies based on a domain of a specific subject and associated with the specific user that provided the response. The MRaaS module can use characteristics associated with the user to identify the set of ontologies, such as user identification, job type, and user location, to name a few examples. In response to identifying the set of ontologies, the MRaaS module can select a subset of ontologies from the set of ontologies based the extracted terms and their contextual meanings identified by the NLP module. The MRaaS module can then identify one or more knowledge graphs based on the selected subset of ontologies. If the MRaaS module identifies that one or more knowledge graphs are not returned, the MRaaS module can generate the one or more knowledge graphs based on the selected subset of ontologies.
In some implementations, the MRaaS module traverses the one or more knowledge graphs to identify a response. For example, the MRaaS module can traverse a path over multiple nodes in the one or more knowledge graphs that match to the extracted terms and contextual meanings of those terms provided by the NLP module. The end nodes of the path can correspond to the responses to the query provided by the user. Additionally, the MRaaS module can track the path traversed from start to finish in the one or more knowledge graphs. In response, the MRaaS module can generate a response to the query that includes the data identified at the end nodes of the path. The data may include, for example, structured medical documents, results of previous clinical trials, answers to medical questions, or other types of medical information. The MRaaS module can transmit the generated response to the client device of the user. Additionally, the MRaaS module can transmit data illustrating the path traversed from start to finish in the one or more knowledge graphs illustrating how the MRaaS module derived at its answer. By providing both the generated response and the traversed path, a user can analyze results of the generated response and visualize how the computing system arrived at the results.
In some implementations, the user can provide feedback to the computing system based on the generated results and the derivation of the computing system's response. For example, if the user does not agree with the results or determines that the computing system missed important features when deriving the response, the user can provide feedback to the computing system with the corrections. The computing system can update its processes, generate a new response with data identifying how the new response was derived, and provide this information to the user. In this manner, one benefit of this system includes this feedback loop, in which the computing system can continuously improve predicting or generating responses to queries over time. By providing the derived reasoning for each generated response and receiving feedback to update the derived reasoning, the computing system can over time move closer to ideal responses for user queries.
In one general aspect, a method performed by one or more computers includes: obtaining, by the one or more computers, a query from a client device associated with a user, the query comprising a plurality of terms; determining, by the one or more computers, a meaning for each term or group of terms in the plurality of terms; determining, by the one or more computers, user data for the user that submitted the query; identifying, by the one or more computers, one or more ontologies based on the meanings for at least some of the terms in the plurality of terms; identifying, by the one or more computers, a knowledge graph based on the one or more identified ontologies and the user data; generating, by the one or more computers, a response to the query by traversing a particular path of the identified knowledge graph to identify one or more items in the knowledge graph based on the determined meaning for each of the terms or group of terms; generating, by the one or more computers, path data that represents the particular path taken by the one or more computers through the identified knowledge graph; and providing, by the one or more computers, the generated response that comprises the one or more identified items and the path data to the client device for output.
Other embodiments of this and other aspects of the disclosure include corresponding systems, apparatus, and computer programs, configured to perform the actions of the methods, encoded on computer storage devices. A system of one or more computers can be so configured by virtue of software, firmware, hardware, or a combination of them installed on the system that in operation cause the system to perform the actions. One or more computer programs can be so configured by virtue having instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.
The foregoing and other embodiments can each optionally include one or more of the following features, alone or in combination. For example, one embodiment includes all the following features in combination.
In some implementations, the method includes wherein the determined user data comprises one or more of a location of the user or a job position of the user.
In some implementations, the method includes wherein determining the meaning for each term in the plurality of terms further comprises: parsing, by the one or more computers, the plurality of terms into separate terms; and identifying, by the one or more computers, entities and relationships from each of the separate terms, wherein the entities correspond to clinical terms and the relationships correspond to clinical actions performed by or for the entities.
In some implementations, the method includes wherein identifying the one or more ontologies based on at least some of the meanings for each of the terms in the plurality of terms comprises: storing, by the one or more computers, a plurality of ontologies in an ontology Knowledge base, each ontology in the plurality of ontologies built using historical medical data that comprises at least one of previous clinical trial data, clinical journals, biology data, life sciences data, genetic data, disease data, and pharmacological data, and the historical medical data is stored in each of the ontologies in a structured manner; retrieving, by the one or more computers, one or more ontologies based on the user data; and selecting, by the one or more computers, a subset of ontologies from the one or more retrieved ontologies based on a match between historical data in the subset of ontologies to at some of the meanings for each of the terms in the plurality of terms.
In some implementations, the method includes wherein identifying the knowledge graph based on the one or more identified ontologies and the user data further comprises: identifying, by one or more computers, the knowledge graph from a plurality of stored knowledge graphs, each stored knowledge graph of the plurality of stored knowledge graphs connecting entities and corresponding relationships; wherein identifying the knowledge graph from the plurality of stored knowledge graphs further comprises: identifying, by the one or more computers, the knowledge graph that matches to the subset of the ontologies; and providing, by the one or more computers, the identified knowledge graph to an inference engine for determining the path data.
In some implementations, the method includes wherein generating the response to the query by traversing the particular path of the identified knowledge graph to identify one or more items based on the determined meaning for each term further comprises generating, by the one or more computers, the response to the query from the client device by (i) traversing the particular path of the identified knowledge graph based on the determined meaning for each term and (ii) accessing the one or more items located at an end of the particular path.
In some implementations, the method includes wherein generating the path data that represents the particular path taken by the one or more computers through the identified knowledge graph further comprises tracking, by the one or more computers, the particular path taken by the one or more computers through the identified knowledge graph that comprises matching the meaning for each term in the plurality of terms to entities and relationships in the identified knowledge graph.
In some implementations, the method includes wherein providing (i) the generated response to the query and (i) the particular path tracked and taken by the one or more computers through the identified knowledge graph to the client device further comprises transmitting, by the one or more computers, the response to the client device over a network.
In some implementations, the method includes in response to providing the generated response to the query to the client device for output, receiving, by the one or more computers, an update from the client device that comprises one or more modifications to the one or more identified ontologies and to the identified knowledge graph; and modifying, by the one or more computers, the one or more identified ontologies or the identified knowledge graph based on the update that comprises the one or more modi- fications.
The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.





Description
Current search systems can store and access medical information in an unstructured data format and the medical information may be dispersed across a vast number of systems, which makes it difficult for information to be linked and ultimately, queried by external users. Many sources of this medical information are text-based sources and the information can be formed from a variety of elements. However, there is no international or existing categorization of these different elements that is widely followed.
When querying information from these medical sources, domain expertise is often required to recognize relevant terms, categorize the terms based on meaning, and convert the terms to a suitable format that can be then used for data querying. The ability to model and structure this medical information enables users to derive certain insights that can improve health conditions, improve medical clinical trial practices, and assisting with postulating new medical theories, to name a few examples. Additionally, if responses to queries result in inaccuracies or do not specifically respond to the query, users may perform additional tasks based on misleading information, causing inefficiencies and potentially detrimental health effects to patients. To address these inaccuracies, a system is needed that can provide responses to complex medical queries and identifies how the responses were generated.
In this context, techniques are described in this specification for generating a predictive model that recognizes clinical terms and a relationship between the terms, and provides a response to the query using a Machine Reasoning as a Service (MRaaS) module. In particular, a computing system can receive a query from a user or other structured and unstructured datasets, e.g., external medical databases, medical journals, textbooks, and other data sources, using a natural language processor (NLP) module and other deep learning algorithms. The techniques use the NLP module to recognize, extract, and categorize medical entities, e.g., indications, drugs, diseases, procedures, etc., as well as determine relationships between the medical entities with reference to the terms that describe the entities.
The information extracted from the NLP module is provided to the MRaaS module for querying and providing a response to the query. In particular, the MRaaS module can derive a response using various ontologies, knowledge graphs, and other historical medical data. Typically, the queries provided to the computing system are focused on complex medical terms that are relevant to a clinical study, a clinical procedure, or clinical research to name a few examples. The MRaaS module can provide a response to the query as well as data identifying how the MRaaS module derived at the response.
Benefits
- User searches for information using a natural language
- Draws knowledge from multiple ontologies and knowledge graphs
- Reasoning over knowledge performed by inference engine
- Stores results for future re-use and update the knowledge
- Explainability of results to user
Patent Information
- Patent number: 11,610,690]
- Date filed: July 13, 2021
- Date granted: March 21, 2023
- ...