After returning from spring break, I got to work looking more closely at the data. Because of scheduling difficulties, I couldn't meet my mentor this week, which gave me ample time to familiarize myself with Python and think of a game plan. While I was gone last week, the room that I'm working in, which also functions as a conference room for the physicists, was renovated. There are now 3 computers, and since I'm the only person in the room, I have more options. The renovators also finished installing all the hardware, so the computers are now equipped with Nvidia GPUs, giving them great processing power.
I spent the first part of the week getting permission for and installing all the software. Since I moved to a new computer, I remapped the physics drive, where I'm storing all the radiologist reports, and downloaded PyCharm. I also installed the Python text mining package, textmining 1.0. I needed to do this because the text mining module provides many useful functions for text mining and analysis. The implementation should have been a quick process, but it turns out that the textmining 1.0 package is written in Python 2.0, which has a lot of incompatibilities with Python 3.0. Because of this, when I tried importing textmining and executing my scripts, I kept on getting a File Not Found error message from the compiler. To fix the problem, I converted the textmining package to Python 3.0 with a shell command on the handy command prompt.
Also, it's been a while since I used Python, and there's still a lot of it that I need to learn. It will also take some time to get used to the syntax. While the going will be slow for a while, I expect things to pick up, especially since I will be able to spend a lot of time working at home. As for the specific data to mine from the radiologist reports, I decided to focus first on the image series that are referenced in the reports. Whenever the radiologist makes an observation about the prostate, bone, or lymph nodes, he/she mentions the series and specific image that shows a suspicious or unusual characteristic. I'M curious about what image series are most useful to the radiologists when studying specific regions, and this information could help reduce the reading time of the MRI technicians. There are plenty of other things to text mine, but doing this will probably be the easiest step. My first priority now will be to learn more Python so that I can go at a faster pace.