使用Python将DICOM标记转换为Excel时出错

2024-09-30 05:29:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将DICOM标记从.dcm文件转换并列出到Excel(使用pydicom),但某些标记在转换过程中显示错误(患者姓名、患者ID等)

有些标记在Excel文件中显示“无”,尽管它们包含/显示DICOM格式的数据(SOP类UID、SOP实例UID等)。我如何解决这个问题

import xlsxwriter 
import sys 
import pydicom 
import os.path
from pydicom.valuerep import PersonName
keywords = ("Patient's Name",
            "Patient ID",
            "Patient's Birth Date",
            "Patient's Sex",
            "SOP Class UID",
            "SOP Instance UID",
            "Group Length",
            "Manufacturer",
            "Referring Physician's Name",
            "Study ID",
            "Patient Orientation",
            "Series Number",
            "Pixel Data",
            "Group Length",
            "Rows",
            "Columns",
           )

# ...
            
dcm_files = [r"C:\Users\akhil\Downloads\Sample_Dataset\Sample_Dataset\PRASANNA_KUMARI\21_12_2013_11_13_46_AM\IMG-0001-00001.dcm"]   

for dcm_file in dcm_files:
    ds = pydicom.filereader.dcmread(dcm_file)
    workbook = xlsxwriter.Workbook(os.path.basename(dcm_file) + '.xlsx')
    worksheet = workbook.add_worksheet()

    row = 0
    col = 0

    for keyword in keywords:
        value = ds.get(keyword, "None")
        if isinstance(value, list):
            value = ", ".join([str(x) for x in value])
        elif isinstance(value, PersonName):
            value = str(value)
        worksheet.write(row, col, keyword)
        worksheet.write(row + 1, col, value)
        col += 1

workbook.close()

DICOM文件中的一些标记:

(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0016) SOP Class UID                       UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.2.300.0.7230010.3.1.4.3397350519.8248.1599586949.14
(0008, 0020) Study Date                          DA: '20200908'
(0008, 0021) Series Date                         DA: '20200908'
(0008, 0022) Acquisition Date                    DA: '20200908'
(0008, 0023) Content Date                        DA: '20200908'
(0008, 0030) Study Time                          TM: '155900'
(0008, 0031) Series Time                         TM: '155900'
(0008, 0032) Acquisition Time                    TM: '155900'
(0008, 0033) Content Time                        TM: '155900'
(0008, 0050) Accession Number                    SH: ''
(0008, 0060) Modality                            CS: 'OT'
(0008, 0064) Conversion Type                     CS: ''
(0008, 0070) Manufacturer                        LO: 'SANTESOFT'
(0008, 0090) Referring Physician's Name          PN: ''
(0010, 0000) Group Length                        UL: 48
(0010, 0010) Patient's Name                      PN: 'NO^NAME'
(0010, 0020) Patient ID                          LO: '00000001'
(0010, 0030) Patient's Birth Date                DA: ''
(0010, 0040) Patient's Sex                       CS: ''
(0018, 0000) Group Length                        UL: 14
(0018, 1063) Frame Time                          DS: "33.0"

Tags: name标记importiduiddatetimevalue
1条回答
网友
1楼 · 发布于 2024-09-30 05:29:10

您在此处使用的关键字不正确。首先,DICOM关键字没有's部分,例如,它被称为“患者姓名”,而不是“患者姓名”(大约在15年前,DICOM标准中对此进行了更改)。
其次,关键字没有空格,因此如果要将名称与空格一起使用以提高可读性,则必须在查找时删除它们,例如:

keywords = ("Patient Name",
            "Patient ID",
            "Patient Birth Date",
            "Patient Sex",
            "SOP Class UID",
            "SOP Instance UID",
            "Group Length",
            "Manufacturer",
            "Referring Physician Name",
            "Study ID",
            "Patient Orientation",
            "Series Number",
            "Group Length",
            "Rows",
            "Columns",
            )

...

for dcm_file in dcm_files:
    ds = pydicom.filereader.dcmread(dcm_file)
    ...
    for keyword in keywords:
        dcm_keyword = keyword.replace(' ', '')  # remove the spaces for the lookup
        value = ds.get(dcm_keyword, "None")

请注意,我已经删除了标记名中的所有撇号,并且还删除了Pixel Data-将二进制数据转换为字符串将无法正常工作,并且您肯定不希望在Excel表格中显示像素数据

相关问题 更多 >

    热门问题