Technology

Developing a hybrid automatic prescription structuring system

May 31, 2023

In this article, we will explore the challenges involved in automatically structuring dosages, which is a vital and complicated step in automatically numerizing prescriptions. We will delve into what makes this task challenging and detail the artificial intelligence methods we chose to employ to tackle each aspect. We will also provide implementation examples and compare results between several models we tested.

Summary

Introduction

At Posos, the first dynamic therapeutic advice solution designed with healthcare professionals in mind, our primary objective is to reduce the amount of time physicians spend on non-care tasks. By doing so, they can devote more time to patients and provide care that is tailored to each patient's unique needs. That is why we took on the challenge of automatically analyzing pre-existing prescriptions. This eliminates the need for manual data entry into the hospital information system and enables rapid analysis of potential interactions and contraindications.

Prescription texts can be difficult to parse due to the variability of prescriptions. The use of different abbreviations, prepositions, and even misspellings can make it difficult to accurately extract and structure the necessary information. However, with the development of new technologies such as the increasing capabilities of language models, it is possible to automate this process and ensure greater accuracy and efficiency.

In this article, we will explore the use of a hybrid system, consisting in a rule-based system and a deep learning Named Entity Recognition (NER) model to automatically structure posology texts into the Fast Healthcare Interoperability Resources (FHIR) data model. We will discuss the challenges associated with structuring posology texts and how the combination of a rule-based system and NER model can overcome these challenges.

By the end of this article, readers will have a better understanding of the benefits of automated structuring of posology texts and how it can improve the efficiency and accuracy of prescription management.

To exemplify, as our system is dedicated to structuring prescriptions written in French, we will use sentences in French coming from real life examples, but we will translate them to ensure complete comprehension.

Pipeline

As mentioned in the introduction, we will show here the motivations and solutions we have employed to build a hybrid system, consisting in a rule-based system as well as a Transformer-based Named Entity Recognition model in order to automatically structure drug dosages.

Note: the extraction of text from prescription scans is done using an off-the-shelf OCR solution and is outside the scope of this article.

Defining a rule-based baseline

One of the main challenges in structuring drug prescriptions is the variety of ways to write a dosage. For example, a physician may write Xeplion 50mg 1 comprimé le matin (one pill in the morning), Xeplion 50 mg 1 cp au petit-déjeuner (one pill during breakfast), or even Xeplion 50 mg 1 cpr à 7h (1 pill at 7 a.m). Alternatively, a simpler prescription may read Xeplion 50 mg comprimé 1 0 0 (pill 1 0 0), where the three digits represent the number of pills that the patient should take at each meal. Despite these variations, all of these examples mean the same thing and should be structured into the following FHIR object.

Dosage object represention of ‘Xéplion 50 mg 1 cp au petit-déjeuner’

Note that all these ways of writing the dosage differ by the use of abbreviations (cp or cpr instead of comprimé (pill) and by many different prepositions or articles. However the overall structure of these sentences is similar and it is reasonable to assume that many dosage texts will be written using a finite set of terms. This observation lead us to use a rule-based system in order to extract and structure posology entities into the FHIR format, as a strong baseline that will then complemented by a deep learning approach.

Rule based system

A rule-based algorithm is a basic kind of artificial intelligence algorithm which uses a set of predefined rules to make decisions. Given a new situation, the system will search for the rules that apply in order to solve the problem. In the domain of Natural Language Processing, rules can be regular expressions or more complex pattern matching rules. In the context of posology structuring, these rules will be used to extract posology information from the input string.

For example, a basic rule to detect a dosage could be the regular expression r”\\d+\\s(cp|gelule|comprime)s?” This expression will look for a number followed by a drug form and an optional “s” (mark of plural in French). However, such a regular expression will not be enough to capture all dosages, since many units are missing, such as all the international units. Furthermore, more complicated dosages like 2 à 3 comprimés (2 to 3 pills) will be incorrectly detected since the minimal dose will be skipped. A more complex system than regular expressions is needed in order to both detect a wider range of dosages while maintaining readability.

To address those issues, we chose to use Spacy rule-based matching engine. It consists of a Matcher object that performs token-based matching. The matching can refer to the text of the token, its tag or type (such as numeric). A pattern is then defined as a list of rules that each token of the text should match. For example, the following pattern will match all dosages expressed as dose_min [to dose_max] unit.

Let us briefly explain the rules of this pattern. The LIKE_NUM operator matches every token that resembles a number, meaning every number and decimal and also spelled out number (meaning that the text three will be matched as well as the digit 3). This is much simpler than writing a regular expression, and we see therefore a first advantage of using a pattern matching system. The {"LOWER": "term_to_match"} operator matches every token whose lowercase form is term_to_match. Instead of term_to_match one can also provide a list of terms to match as in the last rule. This is very convenient since the list of drug forms is really large. Finally, one can add an operator to each rule in the OP field. Using the ? operator makes the rule optional as in regular expressions. In the previous example, this allows us to capture within the same rule, dosages as une gélule (one capsule) and 10 à 12 comprimés(10 to 12 pills). Using a pattern matching tool gives us as expected readability as well as an important amount of patterns matched.

Now that we have covered how to write a pattern to detect a dosage, we can write additional patterns to detect frequencies, durations and every simple medical entities that occur in a posology. This is why rule-based systems are known as “knowledge-based systems” because they require human expertise to build the rules. However, many of them are really simple and do not require sophisticated medical knowledge. All the rules can be combined together and applied to an input string with the following snippet.

The string_id of the detected spans corresponds to the first argument of the matcher.add function and gives the name of the group of patterns that has been triggered.

The output of the previous code snippet would be:

Dosage structuring

The previously-constructed pattern matching engine works well and detects most of the posology strings and is even capable of classifying them according to the pattern used. This is a great first step towards constructing a tool to structure a posology! However, the key part of the structuring process is still missing. Structuring a posology entity consists of breaking its raw text representation into different fields such as frequency, period, period_unit, dose_min, dose_max, dose_unit, etc. For instance, 10 à 12 comprimés (10 to 12 pills) corresponds to the following dictionary, which will then be converted into the FHIR format:

The key idea to perform dosage structuring using the Spacy pattern matching model is to add patterns one by one using a different pattern id for each pattern. This way, each entity detected by our engine will be linked to the precise set of rules that matched it. Furthermore, we overwrite each rule with an argument field_name corresponding to the name of the field in the structured dictionary we aim to obtain. For example, the pattern detecting dose_min to dose_max unit will be rewritten as:

We can thus match each token of the detected spans to its corresponding field. There is still one difficulty to overcome. You might have noticed that in the newer version of the above pattern, the ? operators have disappeared. This is indeed needed because the length of the patterns must exactly match the number of tokens of the matched string. Therefore, optional rules are incompatible with our structuration method. However, since these optional operators are very convenient to write patterns, we kept them in the pattern definition and we wrote a simple algorithm that generates all the rules without operators from the original patterns. This algorithm also removes the field_name argument that will not be understood by Spacy Matcher object.

Finally, we are able to write simple rules using the very convenient syntax of Spacy rule-based engine and to use those rules to structure all the detected posology entities.

Complementing the rule engine with Deep Learning

Rule-based systems are valuable for creating a functional model with high precision. However, these rules require frequent updates and have a limited recall. As an alternative, we propose utilizing a method that relies on a Named Entity Recognition model and Nearest Neighbor Search to identify and organize posology entities. Once identified, these entities are merged together to generate a FHIR compatible object.

Named Entity Recognition

Named entity recognition is a technique within natural language processing (NLP) that helps to identify entities in text, such as people, places, and organizations. In the context of prescription texts, NER can be used to recognize medication names, dosages, frequencies, and other related information.

For example, consider the following posology text: FORTZAAR 100/25mg 1 cp le matin (one pill each morning). Using an NER algorithm, we can identify the following entities:

To define our NER model, we used the transformers library by HuggingFace, which provides many pre-trained models for different NLP tasks. Specifically, we used the xlm-roberta-base model as our base model and fine-tuned it on our in-house dataset. We loaded the dataset using the load_dataset function of the HuggingFace datasets module.

The model architecture relies on tokenizing the input string. Then, for each token, a representation is computed by the base model and finally classified into a set of classes.

The following code snippet demonstrates how to load the base model and tokenizer from the Hugging Face hub using the AutoModelForTokenClassification and AutoTokenizer classes:

‍

We chose XLM-RoBERTa because it offers improved performance, particularly for less frequent classes. This is reflected in the macro F1 score, which is visible in the following figure.

F1 is the harmonic mean of precision and recall :

Where precision measures the ratio of true positives by the number of all positive results and recall (also called sensitivity) measures the ratio of true positives by the number of samples that should be positive.

Micro F1 then refers to the F1 measured over the whole dataset, regardless of class, Macro F1 refers to F1 measured for each class (such as DRUG, FORM, etc.) , then averaged over the set of class. Macro F1 thus gives equal importance to each class, whereas Micro F1 gives more importance to overall performance.

Evolution of F1 scores with number of training samples for 4 Transformer models: CamemBERT, MultilingualBert (cased and uncased) and XLM-RoBERTa

Data Cleaning

Since the annotation process was done in-house, it was prone to errors. To ensure optimal model performance, it was important to make the annotations as accurate as possible. To achieve this, we utilized the cleanlab library, which identifies samples where the model incorrectly labels a token with a higher probability than the mean probability of correct samples in the same class. This method doesn't always flag incorrect annotations, but we found that it had a high precision and enabled us to improve performance.

Data Augmentation

To enhance performance, we implemented two data augmentation methods. The first method involves adding an identical sample in lower case for each sample. This was motivated by the observation that drugs are tokenized differently in lower and upper case when using the xlm roberta tokenizer. For example, AMLODIPINE is tokenized as _A,ML,ODI,P,INE, while amlodipine is tokenized as _am, lo, di, pine.

The second method for data augmentation is one that we have employed in numerous projects. It involves creating query templates that are defined manually through an exploration of the training data. This approach was inspired by Ribeiro et al.'s 2020 paper, Beyond Accuracy: Behavioral Testing of NLP Models with CheckList, but we use it for data augmentation during the training phase rather than for testing only.

For instance, using the example above FORTZAAR 100/25mg 1 cp le matin (1 pill every morning), we create a sample of the form DRUG STRENGTH DOSE WHEN. We can generate multiple examples by replacing each entity class with a sample from that class, such as DOLIPRANE 50mg 2 gélules matin et soir (2 capsules in the morning and evening).

Although the generated examples may not be medically or semantically accurate, they still serve the purpose of rebalancing the dataset. Moreover, they make it easy to incorporate feedback by adding new templates. However, these templates were only added to the training dataset to ensure that performance on the original data distribution remained high. As a result, the macro F1 score on the test set (computed first for each class, then averaged) increased from 83.1% to 84.4%, while the micro F1 score remained stable (85.1% → 85.3%).

Automatic structuration with Nearest Neighbor Search

After detecting entities, we must extract the corresponding structure to output an entity that is compatible with the FHIR Dosage structure. The way each entity is processed depends on its class, which can be divided into two categories:

(value + unit) entities
Entity Linking

For the first type, we identify the numerical value or values in the extracted entity, and then match the unit or units with a list of possible units for the corresponding field.

The second type is solved using Nearest Neighbor Search. The entity representation is computed using word embeddings, and we then find the concept in the corresponding terminology with the highest cosine similarity.

‍

Post-processing and results

Using the two methods detailed above, we are now capable of extracting a comprehensive set of tokens that represent various components of a posology. We have also devised multiple approaches to transform these tokens into structured data, including dose, drug strength, frequency, and other relevant parameters, which we won’t describe here. It is worth noting that both models may produce conflicting results. Since the rule-based engine has functionally higher precision but lower recall, we prioritize results obtained from this engine.

Finally, in order to show the effectiveness of both models we compared the two baselines consisting of only rule-engine or NER model with the whole system.

Dataset and metrics

Our test dataset consists of 252 queries, extracted from real-world prescriptions that were not used to train our NER model, that have been passed through the OCR system.

The OCR output is then annotated with the expected structured posology data.

We chose two primary metrics. The first is the error ratio, which measures the number of errors in each posology field divided by the total number of posology fields. The second metric is the ratio of well-structured queries to the total number of queries. The error ratio provides a broader sense of the model's capabilities, since a single mistake on a query is preferable to multiple mistakes, which is not captured by the second metric. The second metric however gives a sense of how those errors are spread through the dataset: a high ratio of errors compared to well-structured queries indicates that a few queries have a significant number of errors.

Impact of model hybridation on prescription structuring

Error and well structured queries ratios for each model on our dataset of 252 prescription queries.

First, we notice that the NER has a much higher error ratio than the rule-based system, which makes sense, since the automatic structuration from the extracted tokens is not yet able to handle the most complicated cases. Indeed, the rule-based system was implemented first and has been expanded over the months with sometimes complicated structuring rules which cannot be directly translated to the extracted entities of the NER. This is supported by the fact that both micro and macro F1 for the NER models are greater than 80% on a similar test set.

Second, we observed that utilizing both methods leads to a significant improvement in the ratio of well-structured queries. This suggests that they have complementary functions, and that the NER is capable of enhancing the high precision of the rule engine by detecting a greater number of entities, including those with spelling errors or abbreviations, especially dose units and drug forms, such as this one from our dataset: DOLIPRANE 300mg: 1 suppo 2 à 3 fois par jour (1 suppository 2 to 3 times a day) in which the NER helps detect the correct dose unit suppo (suppository).

Conclusion

In conclusion, we have shown how the combination of a rule-based system and a Named Entity Recognition (NER) model can be used to automatically structure posology texts into the Fast Healthcare Interoperability Resources (FHIR) data model. Despite the variability, ambiguity, and complexity of natural language, we have demonstrated that it is possible to automatically structure posology texts into the FHIR format. This eliminates the need for manual data entry into the hospital information system and enables rapid analysis of potential interactions and contraindications.

By implementing these methods, physicians can spend less time on non-care tasks and devote more time to patients. Automating structuring of posology texts has numerous benefits and can improve the efficiency and accuracy of prescription management. Through the use of a rule-based system and NER model, we have presented a solution that can help healthcare professionals better manage prescription data and ultimately improve patient care.

Furthermore, our comparison of the rule-based system and NER model demonstrates that they have complementary functions and that the combination of both methods leads to a significant improvement in the ratio of well-structured queries.

References

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., NAACL, 2019, https://aclanthology.org/N19-1423/
HuggingFace's Transformers: State-of-the-art Natural Language Processing, Wolf et al., EMNLP, 2020, https://aclanthology.org/2020.emnlp-demos.6/
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList, Ribeiro et al., ACL 2020, https://www.aclweb.org/anthology/2020.acl-main.442.pdf
CamemBERT: a Tasty French Language Model, Martin et al., ACL, 2020**,** https://aclanthology.org/2020.acl-main.645.pdf
Unsupervised Cross-lingual Representation Learning at Scale, Conneau et al., ACL, 2020, https://aclanthology.org/2020.acl-main.747/
Fast Healthcare Interoperability Resources (FHIR) Overview by Health Level Seven International, https://www.hl7.org/fhir/overview.html
SpaCy: Industrial-Strength Natural Language Processing in Python by Explosion AI, https://spacy.io/