Erik Gösche

Erik Gösche

Master's Student @ University of California, San Francisco

Thesis Topic: Attention-based networks for brain segmentation in k-space

Supervisors: Andreas Rauschecker (University of California San Francisco), Florian Knoll

Description:

Attention-based networks [1] show cutting-edge performance in medical image segmentation due to their ability to capture long-range spatial relationships. However, since they do not have a receptive field like convolutional neural networks (CNNs), they suffer in the modeling of local features. Images, that are in the frequency domain, might be more suitable for the attention mechanism. By transforming images into the frequency domain, local features are represented globally. Moreover, considering the convolution theorem, the attention operation could intuitively be viewed as a convolution. Due to the properties of MRI data acquisition, these types of images are particularly suitable. The goal of this work is to investigate how the choice of the image domain (pixel domain or frequency domain) affects the segmentation results of deep learning models. Attention-based networks are in particular focus here. Furthermore, it is to be examined whether additional positional encoding is necessary when an attention-based network is used and the input images are in the frequency domain. For the evaluation of these research questions, a skull stripping task and a brain tissue segmentation task are posed. The attention-based models used in this work are the PerceiverIO [2] and a Transformer encoder. To provide a comparison to non-attention-based models, an MLP and the ResMLP [3] are additionally trained and tested. As a reference and for a better placement, the results will be compared with those of the nnU-Net [4]. It was experimentally shown that the choice of input and label domain have significant effects on the segmentation results. Also, additional positional encoding does not seem to be beneficial for attention-based networks if the input is in the frequency domain. Even though none of the models used reached the performance of the nnU-Net, the rather non-complex models showed promising results.

[1] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need.

[2] Jaegle, A., Borgeaud, S., Alayrac, J.-B., Doersch, C., Ionescu, C., Ding, D., Koppula, S., Zoran, D., Brock, A., Shelhamer, E., Hénaff, O., Botvinick, M. M., Zisserman, A., Vinyals, O., & Carreira, J. (2021). Perceiver IO: A General Architecture for Structured Inputs & Outputs.

[3] Touvron H., Bojanowski P., Caron M., Cord M., El-Nouby A., Grave E., Izacard G., Joulin A., Synnaeve G., Verbeek J., Jégou H. (2021). ResMLP: Feedforward networks for image classification with data-efficient training

[4] Isensee F., Petersen J., Klein A., Zimmerer D., Jaeger P.F., Kohl S., Wasserthal J., Koehler G., Norajitra T., Wirkert S., Maier-Hein K.H. (2018). nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation