Medical records are fundamental to any healthcare ecosystem. They contain invaluable information when put together coherently can result in massive improvements in healthcare outcomes. In India, this wealth of information is either fading away on printed papers, or is locked away in archaic software systems of private hospitals, or is scattered across hundreds of email and WhatsApp messages.
Have you ever wondered, why is it that we can easily track 10 rupees paid to a chai shop 2 years ago, but find it nearly impossible to gather all the Thyroid reports done over the last few years? It can primarily be attributed to:
As a result, the end-user ends up with an unstructured, hard-copied version of their invaluable health data. Thankfully, recent efforts by NRCeS within the NDHM are addressing some of these challenges at a systemic level. If implemented as planned, such efforts will resolve concerns like data ownership, security, data interoperability, and consent to share.
While systemic changes in India bear fruit over the years, EkaCare is empowering consumers and making it easier for them to tap the potential of their health records. Eka mobile app already provides a number of valuable features
Smart report functionality in Eka.Care converts your lab report into a medically and semantically rich digital format which involves codifying each vital using a LOINC identifier. This feature allows consolidating all your vitals across reports over time into an intuitive graph that displays the trend of your vitals. In this section, we highlight some of the key challenges and learnings we have had in the process.
Isn’t extracting lab parameters from a report about performing Optical Character Recognition (OCR) on the images?
Well, while OCR is certainly a the first step, it doesn’t comprise even 5% of the task. Some of the technical difficulties and nuances are described in the following sections.
OCR provides us with the textual content and its spatial location, but which of these elements represent lab tests, their values, units, and ranges needs to be figured out. We exploit both the textual content and its spatial location in our neural network-based machine learning models to perform this task. In this process, one of the biggest challenges is to handle different layouts of the reports across labs. Here are some examples:
It’s not just the variations in the structural layout, different labs also often use different local terms for specifying a test. This problem is further accentuated by the fact that OCR also produces errors. Given the diversity in names, spelling mistakes, and OCR errors, the identification of a lab test becomes a challenging task. Here are some examples.
The real challenge here is to design an algorithm that can truly measure semantic similarity across two concepts. Shallow domain unaware fuzzy string matching algorithms would result in matching Vitamin D2 with Vitamin D3, and T3 Free with T4 Free for instance, since it involves difference of just a character.
This is the most critical and challenging step in converting your lab reports to graphs. Plotting test results on a graph would make sense only when all the data points are mapped to the same LOINC identifier. Let’s understand the LOINC linking step with the help of an example.
Let's say we encounter Creatinine as the test name, what more do we need to link this string to a LOINC identifier? Creatinine can be measured from both serum and urine specimens. In order to correctly link and interpret the value of Creatinine one has to first identify its specimen from the report. Even within the urine, the sample could be a spot urine sample or a 24-hour sample. In addition, Creatinine value is reported in two different forms; moles/volume and mass/volume, which has to be inferred by looking at the units mg/dL or umol/L. Both these units might also have different surface level variations. LOINC identifiers also differ based on the method used for the test. Only when we correctly identify all the contextual information, we can successfully link it to a LOINC identifier.
Our system gathers this nuanced information by contextually parsing different components of your lab report such as panel names, specimens, method of the test, and so on.
If 4 historical values of the platelet count are to be plotted in a graph, they need to be first converted to the same scale. Since there is no standardization of units in which test results can be reported, there exists a wide variation in both scale and surface representation of these units. For example, the platelet count can be reported in 10*3/uL or in 10*5/uL or in lakh/uL and many other variants. For creating graphs one has to understand these units and convert them to the same scale.
Lets jump to some action and see this feature working in the demo video below!
Excited to try it yourself? Use the link below to download our mobile app and convert your lab reports to smart reports and visualise trends. We would love to hear your feedback and suggestions.
**At the time of writing this blog the feature is released as beta.