Chinese Medical Concept Normalization by Using Text and Comorbidity Network Embedding

Abstract

Chinese medical concept normalization, which maps non-standard medical concepts to standard expressions, is a NLP task with wide-ranging applications in medical big data research and clinical statistic. Many previous works apply supervised methods which require a lot of annotated data. However, they can not address the challenge brought by the high cost of medical data annotation, which requires sufficient professional knowledge and experience. Meanwhile, existing unsupervised methods perform poorly facing the various non-standard expression from different data sources. In this paper, we propose DUNE, Disease Unsupervised Normalization by Embedding, an unsupervised Chinese medical concept normalization framework by applying denoising auto-encoder (DAE) and network embedding. We formulate this task as finding mention-entity pairs with great text and comorbidity similarity. To handle the noise in text, we design a multi-view attention based denoising auto-encoder (MADAE) to capture text information from multiple views, reduce the influence of noise, and transform text to denoised vectors. To introduce comorbidity information, we construct a comorbidity network with both standard and non-standard disease names as nodes from medical records. Because of the diversity of nonstandard expressions, one disease perhaps corresponds to several different nodes, which causes noise in comorbidity network. To handle such network structure noise, we propose a denoising network embedding framework, which reduce the structure noise with the help of text information, to embed the nodes to vectors for comorbidity similarity measurement. Convincing experiment results show that our method performs better than existing unsupervised baselines and approaches the performance of classical supervised machine learning model on this task.

Publication
In Proceedings of 2018 IEEE International Conference on Data Mining (ICDM)
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Supplementary notes can be added here, including code, math, and images.

Yizhou Zhang
Yizhou Zhang
Ph.D Candidate in Computer Science

My research interests include machine learning and its application on social media.