In healthcare domain it can be useful to compare unstructured free-text clinical reports in order to enable the search for similar and/or relevant clinical cases. In data mining and text analysis tasks, the cosine similarity is usually used for texts comparison purposes. It is usually performed by computing the standard document vector cosine similarity between the two vectors representing the report pair under analysis. In this paper a novel system based on text pre-processing techniques and a modelled medical knowledge, using an improved radiological ontology, is proposed. Medical terms organized in a hierarchical tree can assess semantic similarity relationships between unstructured report concepts. The proposed retrieval system has been tested on a dataset composed of 126 unstructured mammographic reports written in Italian language, randomly extracted from the available reports in the Radiological Information System of the University of Palermo Policlinico Hospital. The ontology is composed of 731 concepts and it has been developed and enhanced with the collaboration of breast imaging expert radiologists. The proposed system computes the cosine similarity exploiting semantic vectors, adding the "is-a" and "equivalent-to" relationships to the enhanced ontology. It shows great improvements if compared against a classical syntactic method, giving a Sensitivity rise of +45,27%.
|Number of pages||6|
|Publication status||Published - 2015|
All Science Journal Classification (ASJC) codes
- Signal Processing
- Computer Science Applications
- Computer Networks and Communications