Skip to main content

Bristol-Myers, PathAI Show Artificial Intelligence Is on Par With Manual PD-L1 Scoring

Premium

NEW YORK – An artificial intelligence-powered algorithm appears to be comparable to human pathologists for scoring PD-L1 expression on tumor and immune cells for predicting immunotherapy response, a new analysis has shown. 

The data, which researchers from Bristol-Myers Squibb and PathAI, a provider of AI-powered technology for use in pathology services, presented at a recent immunotherapy conference, are of interest to drugmakers looking for scalable technology solutions to accurately and reproducibly assess PD-L1. 

BMS partnered with PathAI to train and validate that firm's AI-based algorithm for scoring PD-L1 expression in around 300 samples from urothelial cancer patients and performed similar analysis to train, validate, and gauge the algorithm's performance using around 1,000 samples from non-small cell lung cancer patients. The samples used for the retrospective analysis came from patients enrolled in trials for nivolumab (BMS' Opdivo). 

Although the AI-powered algorithm hasn't been launched for commercial use yet, researchers from PathAI and BMS presented the data to demonstrate its potential to improve prediction of which patients will benefit from nivolumab. In the NSCLC analysis –– which the companies claim is the largest evaluation of an AI algorithm in PD-L1 scoring to date –– the association between patients' progression-free survival and PD-L1 expression on tumor cells was similar, whether using PathAI's platform or manual assessment. 

In lung cancer, "there is tremendous interest in understanding the association of response, progression-free survival, or overall survival with different degrees of PD-L1 expression," said Saurabh Saha, senior VP and global head of translational medicine at BMS. "There's an incredible need for technologies like [PathAI's] because we need more robust assessment of PD-L1, … and the manual scoring that's employed by pathologists today is just not scalable, it's not high throughput, and it's not as precise as we'd like it to be."

Not all patients derive benefit from immunotherapy, and PD-L1 expression has shown to be a helpful biomarker for determining whether a lung cancer patient will respond to checkpoint inhibitors, such as nivolumab, pembrolizumab (Merck's Keytruda), and atezolizumab (Genentech's Tecentriq). However, PD-L1 expression status doesn't always yield a straightforward 'yes' or 'no' answer as to whether to prescribe immunotherapy.

The US Food and Drug Administration has approved a number of immunohistochemistry assays for gauging PD-L1 expression and aiding immunotherapy decisions in NSCLC patients, including Dako's PD-L1 IHC 28-8 pharmDx as a complementary diagnostic to guide second-line nivolumab therapy; Dako's PD-L1 IHC 22C3 pharmDx as a companion diagnostic to predict response to first-line pembrolizumab; and Ventana's PD-L1 (SP142) assay as a complementary diagnostic to aid in the administration of atezolizumab. All these assays, with their different intended uses, antibodies, and cutoffs, are confusing for oncologists and have raised concerns that they may not be identifying all the patients who could respond to treatment. 

Manual annotation of PD-L1 expression by pathologists from whole-slide images can add to the variability. For example, a comparison of four PD-L1 IHC assays showed that three out of the four tests yielded concordant results, and pathologists generally agreed in their annotations of PD-L1 expression on tumor cells. However, pathologists showed poor concordance in scoring PD-L1 expression on immune cells. 

If current diagnostic tools are not identifying the entire eligible patient population for a treatment, then the drugmaker is also losing out on market share. It's no surprise then that in the profitable and competitive immunotherapy space, PathAI's platform has attracted the interest of top pharma players. For example, the company recently announced that it had secured an additional $15 million from BMS and Merck Global Health Innovation and raised a total of $75 million in Series B financing.

"It's important that we not have any false negatives, so patients who are PD-L1 positive are actually scored as PD-L1 positive and have immunotherapy available to them," said Saha. "We also anticipate that the overall prevalence of a PD-L1-positive population will be larger using [PathAI's] technology." 

PathAI CEO Andy Beck explained that his company's technology enables counting of individual cells with PD-L1 expression in an image. In an image of 100,000 cells, for example, the technology can provide a score based on the number of cells that are PD-L1 positive or negative. "That type of data will allow us to very sensitively link precise counting of PD-L1 expression with patient response to therapy," Beck said. "For that reason, we anticipate that we can do an even better job than is currently being done for matching patients to therapy."

To develop its algorithms, PathAI collected whole-slide images of PD-L1-stained cells from cancer patients, generated thousands of frames of sections of these slides, and brought them into its cloud-based software platform. Pathologists from PathAI's network of board-certified pathologists around the country then annotated the different features of the cells in each frame, such as PD-L1-positive and negative cells, areas of invasive cancer and necrosis, and normal tissue. These frames were then analyzed using deep learning algorithms to create models, which can be used to score unlabeled slide images and compared against the manual annotations of pathologists. 

PathAI recognized early on that in order to build its algorithms, it would need to draw on the expertise of pathologists. But the company also knew that asking pathologists to annotate whole-slide images would not yield the type of detailed single-cell labeling that would be needed. "As any pathologist in clinical practice knows, you don't hand-label individual cells. You estimate and summarize over tens of thousands of cells in an image," said Beck, who is board certified in anatomic pathology and molecular genetic pathology, and is also a bioinformatician. 

Therefore, PathAI asked pathologists to hand-label features in smaller frames of larger whole-slide images, making it more manageable for them and improving the quality of the data on which the AI-algorithms wee trained and compared against. "The first insight we've had about pathologists is that they're actually very good at identifying objects when they have the ability to stare at a small number of cells and are asked to hand-label individual cells," Beck said. "Instead of overwhelming a pathologist with a whole-slide image, we provide them with a relatively small portion of that image and the job is to hand-label individual cells."  

In more than two years, the company has collected over 5 million annotations of whole-slide image frames from more than 300 pathologists in its network. For the recently presented analysis using NSCLC samples, the algorithm was trained on more than 250,000 pathologist annotations and the analytical performance was gauged against the consensus labels of five pathologists for each frame. 

Researchers generally found a strong correlation between the AI-powered algorithm's assessment of PD-L1 expression levels across tumor cells, macrophages, lymphocytes, and total immune cells and the median PD-L1 score provided by five pathologists who manually annotated images. "In these analyses, we show as a first step that at least we can be as good as manual scoring," Saha said.  

Yielding results that are in line with the consensus of five pathologists is also notable, Beck said, because "it's essentially like having a committee of pathologists hand-labeling cells, with the advantages of an AI system that is fully automated and scalable." 

While the AI-powered and consensus manual scoring aligned particularly well for tumor cells and lymphocytes, the data also captured the inherent challenges associated with macrophages. 

"Certain cell types are easier for both humans and AI to classify in a reproducible way," Beck said, citing cancer epithelial cells and lymphocytes as examples. Other cell types, like macrophages, are morphologically less distinct and therefore more challenging. "There is true biological ambiguity, [and] that's reflected in the consistency of pathologists and in the correlation between the consensus of the pathologists and the AI system," he said. 

Still, the data suggests that the AI algorithm could improve upon the variability seen with manual PD-L1 scoring of immune cells. Using urothelial cancer samples, the AI algorithm's PD-L1 assessments on lymphocytes and macrophages had a stronger correlation with the consensus scores from five pathologists compared to an individual pathologist's scores. 

"It's well known that inter-observer variability is a significant problem in PD-L1 scoring, particularly on immune cells," Beck said. "Having a system to make it very reproducible will become increasingly important to our understanding of how the tumor and the tumor microenvironment are interacting to predict which patients will respond to therapies." 

Saha noted that BMS is looking to optimize every part of the PD-L1 testing process, from collecting quality samples to scoring them, and the data on PathAI's platform suggests opportunities to refine the latter. "It's important we build on where we've had success and what's established in the field, which is PD-L1 expression testing using the antibodies that are validated," Saha said. "But we're always trying to innovate, and this a great example innovating on a part of the process where we feel there is a huge need for improvement." 

When it comes to improving predictions of which patients will respond to nivolumab, however, PD-L1 expression is just one aspect. "There are other ways to determine whether a patient will respond better to a combination of drugs, whether it's genomic analysis, genetic analysis, and others, on top of histopathology analysis," Saha said. "Our next steps are integrating a number of these and determining whether a composite biomarker will give us the best capture of patients we can treat and the highest effect size ... PathAI is a major component of that." 

Although the latest data are from studies involving lung and bladder cancer patients, PathAI is developing algorithms for a number of different settings, and not just for scoring PD-L1 expression but for assessing other tissue-based biomarkers gauged by IHC. "These approaches are very generalizable," Beck said. "We are working on many different biomarkers and anticipate that the companion diagnostics of the future may incorporate multiple tissue-based biomarkers, and in that setting, reliable AI-driven scoring becomes even more important."