My Data Science Internship at

July 21, 2023

Eka is building a digitally enabled and connected healthcare ecosystem for better health outcomes.

I started working here as a Data Science intern on 1st April 2021.

Our team at Eka is trying to revolutionize the way we store, access, and share our health records. The idea is to digitize the documents by information extraction using machine learning.

Indexing For medical prescriptions

On the first day of my internship, I was introduced to different ideas and the challenges we plan to work on. My first task was to try and figure out the best way to extract Facility-names and doctor’s-name from medical prescriptions. The challenge here was that we wanted to jointly exploit the document's textual and spatial information. It’s called Document Layout Analysis. I spent the next few days trying to figure out how people usually approach such problems, discussing my ideas with the team, and taking their inputs. Tried a few of the algorithms and finally proposed the one that showed the most promise.

We spent the next few weeks collecting, cleaning, and annotating data for the task, doing experiments, writing inference and post-processing scripts, and deploying the model into production. This extracted data will be used for indexing and searching the documents.

Deep Parsing for medical-lab-reports

Medical lab reports are the only quantitative representation of one’s health. And being able to use this data nicely could lead to great insights, it could lead to the ability to see the trends in one’s health.

After prescriptions, we started working on extracting data from lab reports. But this was different. The challenge here was that we don’t only need to extract the test vitals and their values, but we had to link the test with the corresponding value, unit, and range on the report. And the images could be tilted or warped, it’s possible that the test and values are not having any clear spatial alignment and the value of the same test on two different reports could be in on different scales or different units.

We spent a few days trying to come up with a good solution and we were finally able to implement one. So, we started training a model and presented a POC to the product team, which they found very impressive. They decided to take this to production and we started working on improving our algorithm.

Left: a sample Lab-report | Right: the extracted data

There were a few tasks here, post-processing the model’s output to filter out the false positives, using range and value to interpret whether a test_value is high or low, linking all the tests with LOINC codes, converting the units to standard units, handling the OCR errors and improving our model and many more tiny things which would matter a lot for the whole experience.

We were also low on data. There were so many things we wanted the model to learn but the lack of data was not allowing it to. For example, in most lab reports value column is next to the column for test names. So, the model would fail when it’s not that way. Therefore decided to synthetically generate more data using Layout and Content Augmentation.

We implemented all that stuff and are still working on improving it. The results are worth the hard work. These features are currently live on our app.

Left: Smart-report | Right: Vital trend of Hb across my reports

check it out

Steps: download the app >> login >> go to “records” section >> upload a lab report >> Check the “smart report” and “vitals” sections

It will show you historical graphs of your lab tests across reports and in different units.

Summing up

This summer was an enriching experience in many ways. I got to work on amazing projects and learn state-of-the-art machine learning approaches for document information extraction and software engineering in general. I learned a lot about startup culture and how startups work, which was one of the reasons I wanted to join a startup in the first place.

I have gained a lot of confidence in myself. I was part of a team where my opinions were valued and my ideas were paid attention to, no matter how stupid they were.

Most importantly I learned how research techniques and state-of-the-art deep learning gets implemented in the real world.

P.S: I have accepted a PPO from Eka, and they offered to extend my internship throughout the year. Thanks @Sankalp Gulati and @Vikalp Sahni for the offer 🙂