Week 8: Getting Somewhere

Whoa! Why does everything look so strange now? It's probably because I redesigned my blog! I noticed the new template options and decided to play around with it. Eventually, I settled on this awesome new template, "Emporio". I think the new design makes it easier to read the content on my blog. I also really like the new font, Ubuntu, because it makes everything look cleaner. It's refreshing after spending 7 weeks on the same boring design.

Turns out that I probably won't get to the machine learning part of my project, at least not until May. This is mostly due to the slow progress, although we weren't originally planning on getting to the machine learning and image analysis part in the first place (I thought I could rush through to it, but that hasn't gone very well). So for now at least, my main focus is on doing text mining on the radiologist reports. 

I met with Dr. Panda this week, and we agreed that I should focus first on getting the image series from each report, so we can know which types of images were used for each case. Information on the image series can tell us what settings are most popular for the prostate MRI scans when being read by the radiologists. They also shed light on the location of specific abnormalities that were detected in the MRI exams. I completed this task and ended up with a huge list of the image series used for each case. Right away, it was obvious that some series were referenced for more frequently than others, which tells me that those are the most popular ones. I'll come up with a list of the most popular series next week, and it might be useful in telling radiologists what series to look at first, in order to cut down the reading times, considering the fact that there are dozens of series to look through for each prostate MRI exam.

To create a higher level of specificity in my analysis, I also categorized the image series into the actual regions they were referring to. I went through each radiologist report and ordered the series according to the 3 anatomical locations: prostate, bone, and lymph node. I also created separate categories for some of the other fields, such as the impression (basically the overall opinion), and local staging. I'll go into more detail about this next week. 

Thus far, I've mainly been writing my own library of functions to do my text mining work, and to write the output .csv files, but I realize it would be easier, especially later on, to use the text mining package that I installed. The textmining package is most commonly used to create term-document matrices, which are then used for analysis with a statistical package. This is useful because if I want to look at the description of the prostate across the radiologist reports of different cases, I can create a matrix of the frequency of certain terms in each description. Below, I've added a description of the prostate from one of the cases that was particularly interesting to me:

PROSTATE: Compared to previous exam the prostate is stable in size measuring 3.6 x 3.0 x 3.8 cm (25 cc) compared to 3.3 x 2.9 x 3.7 cm. Again seen is advanced prostate cancer involving the majority of the prostate base and left mid to lower gland.

Tumor again involves both seminal vesicles centrally (series 10 image 13; series 31201 image 14) and likely extends to the posterior bladder base (series 5 image 7; series 16 image 155).

Persistent extracapsular extension and neurovascular bundle invasion from the base of the prostate to the lower gland, more pronounced in the left. Left apical extraprostatic tumor extension runs adjacent to the membranous urethra. No evidence of rectal involvement.

Supposing that I wanted to compare the last line of this description with a similar case, I would create a term-document matrix that tells me if both cases mention an "extracapsular extension" or an "invasion", and whether that invasion was from a "neurovascular bundle", etc... I'll create an example to show next week.

I also promised some pictures last week, so here they are:

Working with Dr. Kawashima in the reading room

Some image series


  1. Hello Anthony, I believe I asked a question of a similar sort earlier, but you haven't answered so I'll ask again. Text-mining only works if there are terms that all doctors will use in all cases of an issue. But you pointed out in a previous post that the radiologist reports often contained different information (in other words: they were not standardized). How do you navigate that?

  2. Hey Anthony! Your project looks like it's going well. What makes you find the prostrate so interesting?

  3. Hey Anthony,
    Your project seems to be progressing. The pictures look so cool. It would be great to see more. I cant wait to hear more about the machine you will be using. Good luck.

  4. hello Anthony, I like the new format! I enjoy your progress and the MRI scans. I thought your descriptions and your explanations helped a lot in helping understand! Thank You!

  5. Hey Anthony.
    Love the new design, it looks great! I am sorry to hear that you won't be able to make it to the machine learning part of your project when you wanted to. The specific example of the prostate case you gave is really interesting. The pictures also helped me understand better. Thank you. I look forward to reading more!
    Zafeerah Sheikh

  6. Hey Anthony! This new setup for your blog is sick (imo the last one was boring and this is awesome.) Anyway, what I get from your blog is that your image mining is to take commonalities in prostate cancer. Will this mean you are able to take into account differentiation in prostate cancer? Will doctors still be able to analyze things properly in these scenarios (sorry this is more of a question of not understanding procedures.)
    Keep up the great work!
    P.S. are 4 monitors cool?

  7. Hi Anthony! I'm glad you were able to meet Dr. Panda this week, it seemed to help your project a lot. By the way, I appreciate the pictures at the end.


Post a Comment