Why generic AI fails pharma and how purpose-built AI is revolutionizing drug development

Last Updated: 5 May 2026By Liza LawsTags: AI compliance, AI in drug safety, AI in Pharma, AI limitations, clinical AI, drug development AI, emtelligent, healthcare AI, Healthcare Innovation, machine learning pharma, medical AI, Pharma Technology, pharmacovigilance, purpose-built AI, real-world data

Artificial intelligence (AI) is transforming many industries, but in pharma, generic AI models often fall short when applied to complex clinical data.

Challenges such as medical jargon, regulatory compliance, and the semi-structured nature of healthcare records demand AI solutions built specifically for the pharmaceutical sector. In this exclusive Q&A, Tim O’Connell, co-founder and CEO of emtelligent, shares expert insights on why purpose-built AI outperforms generic models in healthcare.

From real-world examples of AI missteps to the future of AI-powered drug discovery and pharmacovigilance, discover how specialized AI is poised to revolutionize pharma operations.

On the limitations of generic AI in pharma

Why do most general-purpose AI models fall short when applied to clinical or pharmaceutical data?

Generic, general-use AI models either aren’t equipped to handle the complexities of medical language or they fall short in terms of accuracy and use case relevance. Trained on broad, non-clinical datasets, they often misinterpret domain-specific terminology, abbreviations, or context – leading to hallucinations or irrelevant outputs. Consequently, they fail to address the specific needs of healthcare.

To make matters worse, many of these solutions are developed by companies that lack firsthand knowledge about the unique challenges of implementing AI in a healthcare organization. This results in a mismatch between the promises of AI and the capabilities required to extract actionable insights and drive meaningful change.

Real-world examples of generic AI failures

What are some real-world examples where generic AI has failed to deliver reliable or actionable outputs in healthcare settings?

The most common problem with generic AI is its tendency to produce unreliable and inaccurate outputs, a phenomenon known as hallucinating. Here’s a good real-world example: We were asking one of the AI models about a patient’s level of physical activity based on the information in that individual’s chart. The model’s response was two glasses of wine per week. While some of us may enjoy a glass of wine now and then, I think most would agree it’s a stretch to consider wine drinking a physical activity. But the model misinterpreted the data.

Another phenomenon we see with generic AI models is what we call the “missing middle” problem. In some cases you need to feed a model a large amount of text to answer a question because it needs context. Unfortunately, models frequently tend to forget about what’s in the middle. It’s good at remembering what’s at the start of the text and the end of the text, but it will gloss over or forget details that were in the middle of the text. It’s actually very similar to how humans process information.

Challenges posed by unstructured clinical data

How does the unstructured nature of clinical notes, lab reports, and real-world data make this problem even more complex?

It does so in a couple of ways. One is that the language domain in the clinical world is vastly more complex than the conversational language models have been trained on from reddit, Wikipedia, and websites. Clinicians might write “AS” in their notes to indicate aortic stenosis. But in almost every other use case, capitalized or not, AS is used for the preposition “as.” There are many similar examples of clinical shorthand that can be misinterpreted by generic AI.

Another complication is that these models have been trained on large amounts of prose from the New York Times and books. But a lot of what is in medical documents is semi-structured data, such as an implicit table structure. If a model isn’t used to reading medical documents, they won’t be able to understand that. So by the time a medical document gets to the person who’s going to use it, it may have had all this formatting removed, which makes it even harder to understand. That’s the challenge of a special language domain. There’s an inherent sort of semi-structured nature to medical data.

On the case for domain-Specific AI

What does it mean for an AI model to be “purpose-built” for medical use, and what sets a purpose-built model apart from a more generic Large Language Model like GPT-4?

There are two answers to this question. The first is that a purpose-built model usually is trained on data very similar to the data you’re going to be using it on. So ChatGPT 4.0 probably was trained from copyrighted books like Moby-Dick. But no language from Herman Melville’s classic novel from 1851 appears anywhere in medical text. That’s because medical AI uses purpose-built models trained on specific data to perform very narrow tasks, such as extracting measurements from medical text. It’s looking for a test name and a measurement value, such as a white blood cell count above 14,000, which could indicate leukocytosis. If you fed Moby-Dick to one of these task-oriented models, it wouldn’t return anything. You’d get garbage output.

The other way purpose-built and generic AI models vary is the volume of data they are trained on. A large language model is typically trained on terabytes of data. It can cost millions of dollars to train. Whereas a purpose-built model can be trained on a much smaller amount of data and costs far less to train.

Handling medical terminology and context

How do clinically trained models handle medical terminology, abbreviations, and context more effectively?

They can handle clinical content more effectively because they are trained by medical professionals to recognize both the meaning and context of the data under analysis. We know acronyms are used differently across specialties, and that medical terminology is highly specific.

We train the models to deal with different use cases. For example, one model is trained to disambiguate terms. During training, we specify, “In this context, Pt means it’s a blood test. In this other context, Pt means physiotherapy. And in this other context, Pt means patient.”

We teach the model to interpret meaning based on document type and clinical context – just as clinicians do instinctively. If I see “Pt” in a lab report, I think “blood test”; if it’s a rehab report, I see physiotherapy. Humans do this unconsciously – but machines need to be taught how.

Pharma-specific use cases seeing AI benefits

What kinds of pharma-specific use cases are already seeing measurable benefits from purpose-built AI—such as pharmacovigilance, safety signal detection, or RWD analysis?

All of those use cases are starting to see measurable benefits. A great example is RWD analysis, which often involves a human going through chart data to then enter data into some kind of a registry. We have tools, like our clinical workbench platform for AI-assisted human review, that help a person find that needle in a haystack much faster.

Another use case is where people in pharma are trying to build their own models. In order to build them, they need very high-quality data extraction. So we’re doing a lot of data extraction work on really large-scale data sets – billions of clinical notes – to give pharma researchers the large amounts of data they need to build better models.

Trust, compliance, and oversight

Accuracy and traceability are key in regulated environments—how does your team ensure transparency and explainability in AI outputs?

To gain trust in AI’s ability to manage clinical data, transparency is imperative. Patients and clinicians have a right to know when data and communications have been generated by AI. Clinicians, coders, and reviewers also must have confidence in the quality of data being generated by AI.

Our platform provides users with outputs that include links back to the source material. This allows clinicians at the point of care to verify the accuracy and relevance of AI-generated content.

Maintaining human-in-the-loop accountability

What safeguards or oversight mechanisms are necessary to maintain human-in-the-loop accountability in clinical AI workflows?

Number one, very careful contracting needs to be in place. We’re starting to see tools like AI scribes being used in hospital environments. And there are going to be mistakes in these documents. A human who’s seeing 40 patients a day is going to be breezing through these documents and they’re going to make mistakes in documentation as a result of the AI model not understanding something.

This is no different than someone using speech recognition software and their report containing an error. The question is, who owns that error? Is it the AI vendor? Is it the foundational model company? Is it the hospital? Is it the doctor who may be a contractor to the hospital? So it needs to be really well explained to both the hospital or purchaser and the clinician, the end user, regarding who owns accountability for these things.

The other thing we may need to explore is what degree of regulation is required in certain workflows to ensure organizations don’t try to replace humans. I’m sure people are working on vision models for radiology image analysis. But is it OK if patients have X-rays and the images are interpreted solely by a computer, without radiologists also looking at them?

AI in Regulatory Submissions and Pharmacovigilance

How does AI support regulatory submissions or pharmacovigilance while still meeting compliance requirements like FDA or EMA standards?

The answer is to keep a human in the loop. AI can support this by changing the human’s role from search and data entry, because often that human is searching a patient chart to find data and collate it in a spreadsheet, which then gets submitted.

Instead of the human spending 10 minutes looking for a piece of information, AI can find it for them quickly and say, “Here’s the answer, and here’s the document it came from and the sentence it came from. If you agree, click yes, and I’ll put it into the spreadsheet for you.”

So by still having that human in the loop, but changing their role from search and data entry – both of which are highly prone to mistakes – AI puts the human into an approver role to eliminate the data entry mistakes and help them find the data faster. That’s how AI can help the process while maintaining compliance.

Barriers to AI adoption in pharma

What are the biggest barriers to adoption for pharma companies looking to implement AI more broadly across their operations?

One major barrier is the noise being made by companies whose solutions haven’t been proven in the real world. Another barrier is concerns about data security and safety and inappropriate or unethical use of patient data by AI model vendors. If you send data to OpenAI, you can be sure they’re going to use it in their model. In healthcare, you have to be very careful about that.

Scale and cost also are barriers. A lot of people are using large language models to do large-scale tasks, which isn’t cost-effective right now. And as is often the case, internal resistance to change is a formidable barrier to AI adoption.

Future opportunities for AI in pharma

Looking ahead, where do you see the biggest opportunities for AI to make a transformative impact in the pharmaceutical value chain—from discovery to post-market surveillance?

For discovery, we are now at the stage where AI tools can truly enable big data analysis on billions of clinical records. This allows us to do a better job of discovering drug interactions, drug adverse events, opportunities for new drugs, and identifying subpopulations in which certain drugs work well or don’t work well.

Until recently we didn’t have access to the scale of data that we need to do a good job on all this stuff. If one person takes an antidepressant and says, “That didn’t work for me,” and another person takes the same drug and says it works really well, our traditional approach has been to kind of shrug and say, “OK, it works well in some people but not others.”

But the question is, why? Is it a genetic difference? Is it a social difference? Was the patient taking the drug incorrectly? Why does it work in one person and not another? There has to be an answer. Now we can answer those questions and provide more tailored drug therapy for people.

On post-market surveillance, as new drugs become more complex and more expensive, we’re seeing more demand for real-world evidence-gathering to justify continuing use of these drugs. But we don’t have enough trained humans to go through all this RWD to determine whether the drug is still working in a million people who are taking it. AI solutions can help us with that by quickly finding data they’re looking for and answering those questions, but still with human oversight. This enables the human to deal with 100 charts a day instead of 10.