What is electronic phenotyping? 

In context clinical data, phenotype is a measurable biological marker, other markers such as behavioral marker or a cognitive marker that are characteristics of individuals with a disease or condition. Electronic phenotyping is a computational procedure for determining whether a patient does or does not have the condition of interest based on electronic medical records. Electronic phenotyping also offers answers to when did the condition start and when did it end. These queries can identify patients with particular conditions and can be used to support a variety of purposes, including population management, quality measurement, and observational and intervention research.

Importance of electronic phenotyping: 

  • Electronic phenotyping is useful for using observational data for research
  • Recruiting patients into clinical trials, calculating quality metrics for healthcare systems that get publicly reported.
  • Finding similar patients, and sharing definitions to facilitate cross-site research.

How does electronic phenotyping work ?

In the electronic health record ( EHR) system, clinicians register unstructured and structured data of patients. Unstructured data are patients signs and symptoms, radiology and pathology reports, discharge summaries and family histories. ICD codes, lab results and medications are some examples of structured data. Electronic phenotyping uses EHR and other machine readable data about patients to characterize a patient's conditions. Genomic data, diagnostics images, patient generated data, and environmental data are some other kinds of data that can be used in electronic phenotyping.

How to make one?

The electronic phenotyping starts with defining the research question and understanding how and what patient data are generated in given conditions.

Pose a research question ⇒ identify data source ⇒ extract and transform data ⇒ analyze the data and conclude 

Patient timeline and patient feature matrix, are two most important representations of healthcare data. While defining the required data it should contain the necessary and sufficient conditions of features that should be present, or absent, and the required values to determine if an exposure or outcome of interest happened to the patient. It should also contain the criteria for identifying start and end times of the exposures. It is easy to define when the condition starts but defining end time is difficult. For example, when does pneumonia end? The symptoms such as cough or difficulty in breathing may last beyond the period of acute infection. It might be hard to say exactly when the patient’s pneumonia ended. ( We will discuss the patient timeline in one separate article).

When specifying a phenotype to answer your question, it is important to be clear about the intended meaning of that phenotype definition. For example it should answer whether it refers to a condition that the patient has? or does it refer to a patient's past medical history, a condition they had before?

Once you have the phenotype definition, you should evaluate the definition whether it is actually finding the patients that it claims to find? You can do this by comparing the output against a complete review of the patient’s chart by trained clinicians. However this is time intensive and not feasible, also there might not be complete agreement.

what are the ways of doing electronic phenotyping? Two approaches to phenotyping:

The first approach is called rule based phenotyping. The rules comprise explicit inclusion and exclusion criteria that are constructed by experts who reached consensus on the criteria in an iterative fashion,often looking at sample case record.

Steps for Rule based electronic phenotyping:

  • identify the data elements or feature that should be present in the medical record
  • use a relevant knowledge graph to convert those data elements into specific identifier
  • Create the phenotype definition by specifying the criteria
  • Iterate by comparison to some reference standard, eg: clinician review of the full chart.

In each iteration we need to answer the following questions:

Does the definition provide necessary or sufficient conditions for identifying the phenotype?

Does the description identify when the condition starts, and when it ends?

What kind of data elements do we need for the definitions?

2. The second approach is called probabilistic phenotyping. It uses machine learning instead of export consensus to learn a function that assigns a probability to a patient record for having the exposure or the outcome of interest.

Rule based phenotyping is labour and time intensive and it can not be used to cater large numbers of sample size. Also the time taken for rule based phenotyping is large and it cannot accelerate the process without increasing the cost by significant.

Probabilistic phenotyping, uses supervised machine learning methods. Supervised learning starts with a training set, with each patient explicitly labeled as having or not having the condition of interest and also structured on the patient timeline. This dataset is used for building computational models that can classify the new and unseen data based on training dataset. Labeling the dataset by hand is time intensive, and it can be avoided by using the right terms in the data mining process. For example: looking for a medicine that is specifically prescribed for the condition posed in your research question and using that as a label.

Benefits and applications: use cases of electronic phenotyping 

Identify people with specific conditions

Public health and safety surveillance

Administrative purpose

Clinical research studies

Precision medicine ( patients like me)

Challenges and considerations 

Completeness: In electronic health records data are missing in several ways, less data is recorded than expected to be recorded. Patients move across institutions for their care. Health information is limited to the information that is just enough to address the issue at present. This results in fragmented data. Data is also missing in the sense that it is only recorded during the healthcare episodes; illness. 

Accuracy: In EHR the errors can occur anywhere in the process from observing the patient, conceptualizing the results, to recording them on the system records. And many times the records are also influenced by billing requirements and avoidance of liability, this is systematic error. The accuracy is also compromised by mismatch between the normal definition of term and intended use by the author ( it can we any one in clinical care setup)

Complexity: Healthcare is highly complex and it includes a mixture of many continuous variables and a large number of discrete concepts. There are many knowledge structures to define the healthcare data for formal definition of the diagnostics, classification hierarchies, and inter-concept relationships. Apart from this there are also local variations in the definition and use. Patient timeline is really important in healthcare records, and working with time scales from seconds to year with different levels of uncertainty is highly complex. 

Bias: Bias and error in electronic health records can occur at different points throughout the patient journey. It starts with whether a patient seeks care or not when the symptoms or acute illness occurs. This is called selection bias. Misclassification bias can occur while a patient is in clinical care. It can occur during diagnostics, diagnosis and interventions including drug prescribing. Missing data bias can occur in provider notes. While coding of claims and filing of complete claims selection bias and missing data bias can occur.

Conclusion

While electronic health records are a valuable source of knowledge, it's essential to be aware of biases and errors that can occur in the healthcare system when working with electronic phenotyping. Mitigating these biases requires careful consideration and methodological approaches.

Ref

https://ehealthresearch.no/en/fact-sheets/exploring-electronic-phenotyping-for-clinical-practice

https://rethinkingclinicaltrials.org/chapters/conduct/electronic-health-records-based-phenotyping/electronic-health-records-based-phenotyping-introduction/

https://www.researchgate.net/publication/230810877_Next-generation_phenotyping_of_electronic_health_records

Sources of Article

cover Image

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE