Natural language processing of histopathology reports

Histopathology is a medical specialty in which cellular and tissue specimens from patients are examined to identify and characterise disease.
 
Histopathological findings are described in histopathology reports. These contain a wealth of information of clinical and scientific value; however, analysing them in a systematic way at scale in an automated manner is generally not possible because of their unstructured nature.
 
This project seeks to establish a dataset of histopathology reports from multiple NHS Trusts. Natural language processing (NLP), a computational methodology used to process, analyse and extract information from natural language text, will be implemented to enable high quality, reliable and accurate data extraction at scale.
 
This study aims to provide novel pathological, epidemiological and clinical insights. This is a retrospective cohort study and requires no active engagement by participants.
 
Data from all individuals at the participating hospital NHS Trusts that have had a histopathology specimen processed from the time of commencement of computerised pathology records will be considered for inclusion except individuals that have declined data sharing through the NHS opt-out programme or communicated a wish to dissent either from general or specific use of their data for research at the participating NHS Trust.
 
Histopathology reports will be de-identified and data stored and analysed within a secure computing environment. The study will last five years in the first instance.
 
 
Last updated07 May 2024
Working on it!