Chinese Medical Documents Dataset

We have collected more than 200 documents of medical laboratory tests in Chinese,and selected 119 documents to design the dataset. The medical document for laboratory results usually has a clear structure. The name and other information of the hospital are at the top of the document. Then the area under the hospital name is patient's private information and messages about the medical examination. For the purpose of protecting privacy, we have erased all the information mentioned above and just left some labels. The details about tests items, results, unit, reference range and other information are listed in the form of tablein the middle of document, which is the area we concern about. Doctor and examinator are supposed to sign at the bottom of the document.

This dataset contains 357 images that divided into three groups. All images are stored in color JPEG format. The images in the first group have a resolution of 2500*3490 pixels. The images in other two groups have a resolution of 2448*3264 pixels. Each image file is named as “xx_xx_xx”. The first field represents the control variables: scan, illumination and rotation. The second field means whether the reported items are more than 10. The last field is the sequence number under the first two fields.

Download notice:

The details of this dataset are still established.

Download the database ( zip file ), and please send an email to Qingyong Li (