

I added more information at the end of the article. Update - since writing this article, I’ve started using the pylibjpeg package which is a bit easier to install than GDCM. I snuck a few lines in my code below which decompresses the pixel data using GDCM, so I don’t have to worry about it in the future. It’s available as a conda package (“conda install gdcm”) or built from source using cmake. GDCM is a C-based package that allows PyDicom to read these encrypted files. For example, at our institution, all DICOMs have JPEG2000 compression. DICOM files may have image compression performed on them either during storage or during transfer via the DICOM receiver. I want to briefly mention the GDCM package. This code uses the Python package PyDicom for reading and writing DICOM files. I’ve verified this code for both CT and MRI exams it should work for any modality - Patient, Study, and Series information is reported for all DICOM files. Ultimately I decided to write my own utility because I like knowing exactly what my code is doing, and it also provides an introduction to the DICOM header which is essential knowledge for any data scientist who works on medical imaging projects. Finally, this great paper includes a section on image compression which I briefly mention here. I also want to credit this repo for getting me started with code for reading a DICOM pixel dataset. DicomSort has a flexible GUI which can organize files based on any field in the header (DicomSort is also available as a Python package with “pip install dicomsort”). There are many great resources available for parsing DICOM using Python or other languages. In this tutorial, I’ll share some python code that reads a set of DICOM files, extracts the header information, and copies the files to a tiered folder structure that can be easily loaded for data science tasks. DICOM files have information associated with the image saved in a header, which can be extensive.

But these files can be challenging to organize.

All clinical algorithms must be able to read and write DICOM.

Gian Marco Conte for helping write this.Īs a brief recap, DICOM files are the primary format for storing medical images. This article is a follow-up to my previous introduction to DICOM files. This script will help you understand and organize your dataset of medical images
