Machine Learning Enables Evaluation of Ultrasound HCC Surveillance

By Lynn Antonopoulos

Researchers at Stanford University developed a model employing machine learning (ML) techniques to assess the efficacy of ultrasound (US) surveillance of hepatocellular carcinoma (HCC) in high-risk patients.

Long-term, longitudinal data from the study may help validate and improve care recommendations and assess the clinical outcomes of HCC surveillance programs.

Choi

"The development of robust AI natural language processing techniques, and the introduction of structured reporting with the American College of Radiology (ACR)'s ultrasound Liver Imaging Reporting and Data System (LI-RADS) in recent years, presented an opportunity for us to review our own clinical experience with US screening for HCC on a large scale," said Hailey Choi, MD, PhD, now an assistant professor of Clinical Radiology at University of California, San Francisco (UCSF).

Dr. Choi and her team assessed the free-text in a selection of 13,860 US screening and surveillance exams from 4,830 subjects performed between 2007 and 2017, prior to the release of US LI-RADS specifications.

Then using 1,744 more recent reports containing US LI-RADS specifications, they applied a scalable, ensemble ML approach to build a model that inferred US LI-RADS categories from neural word embedding analysis of the body text — a process that mathematically represents words and can gauge the relationship between them.

"We created a lexicon of key terms used in ultrasound liver imaging to provide a framework for analysis and machine learning algorithms on the report text," Dr. Choi said, adding, "We also labeled a subset of the unstructured reports for further training of the model."

Model Exposes Gaps in Surveillance Adherence

The model was successful in rapidly assessing the free-text reports. Based on a validation set of 215 reports retrospectively categorized by two readers, when applied to the free-text reports, it scored an average of 0.74 precision, 0.64 recall and 0.66 F1-1 score (a measure of accuracy).

According to the model's predictions, 84 percent of subjects remained in the same LI-RADS category over time. Of the remaining subjects, three percent progressed to and remained in the US-3 category developing lesions that warranted further work-up.

About half of the subjects, 2,270, received at least two serial surveillance exams and an average of five exams with a mean screening interval of 13 months and a mean follow-up duration of 43 months. They were assessed for LI-RADS changes over time.

Dr. Choi noted the gap between the initial number of subjects and those who received additional screenings. "The limited number of follow-up exams in our population of high-risk subjects reflects limited adherence to HCC surveillance recommendations in the real world," she said. "Although 4,830 subjects received a surveillance ultrasound in our 10-year interval, only 2,270 had at least two exams."

"Our study enabled us to get an estimate of the effectiveness of our ultrasound HCC screening program on a large scale and identify 'hits' and 'misses' in our screening and surveillance population," Dr. Choi said. "Although limited by its retrospective nature, the study provides a glimpse of US LI-RADS performance in the real world."

She indicated that future research efforts will include a multi-institutional analysis as well as investigation of subthreshold (US-2) exams and stratification by underlying liver disease.