Combined paired-unpaired GAN training for underwater image enhancement (UIE)
Sonia Cromp, University of Wisconsin-Madison Computer Sciences
See here for the final presentation, here for the proposal and here for the midterm report.
Introduction
Underwater image enhancement (UIE) is the problem of improving the visual quality of images, from the color to the clarity and sharpness. UIE has a broad range of applications, from archaeological and biological research to sunken ship recovery. However, UIE also faces several challenges. It is difficult to obtain large datasets required to train typical state-of-the-art machine learning models, and a dataset dedicated to one location (such as the Mariana trench) may not generalize well to other locations (such as a shallow lake).
One line of prior work focuses on applying unpaired learning techniques to UIE. In a typical, paired UIE learning regime, one trains a model to translate from an unclear image to a clear image using a dataset consisting of n unclear images and their corresponding n clear images that have been manually enhanced. In unpaired learning, techniques are designed to learn from a dataset of n unclear images and a seperate dataset of m clear images. In this way, the model learns to the distinguishing characteristics of the source/unclear and the target/clear domains without any examples of one scene in both domains.
In the present work, I propose UW PUP GAN (UnderWater / University of Wisconsin) Paired-or-UnPaired Generative Adversarial Network, which fuses paired and unpaired learning techniques. The anticipated use-case is a situation where relatively large quantities of unpaired unclear and clear examples are available, along with a smaller paired dataset. For instance, perhaps a research group has a limited time or budget to manually enhance a small number of unclear images and wishes to aid the learning of this dataset with unpaired images. To my knowledge, this is the first UIE work to combine paired and unpaired learning.
UW PUP GAN is inspired by a combination of the paired and unpaired versions of FUnIE-GAN (Islam et al. 2020), which in turn also draws inspiration from CycleGAN (Zhu et al. 2020). In the procress of this project, I created a PyTorch implementation of FUnIE-GAN that is equivalent to training UW PUP GAN on unpaired data only.
Approach
Overview of the paired training regime:
Paired training consists of four components:
- Gc, which learns to generate a clear image when given an unclear input image,
- Gu, which learns to generate an unclear image when given a clear input image,
- DPc, which takes as input a (clear, unclear) image pair and learns to distinguish whether the clear image was generated by Gc or manually enhanced and
- DPc, which takes as input a (unclear, clear) image pair and learns to distinguish whether the unclear image was generated by Gc or pulled from the real dataset.
Overview of the unpaired training regime:
Unpaired training replaces DPc and DPu with DUPc and DUPu. DUPc takes as input a clear image and learns to distinguish whether the image is drawn from the clear dataset or generated by Gc. DUPu performs an analagous task for unclear images.
Three types of loss functions are involved in the unpaired regime:
- GAN loss is the traditional loss, where the discriminator is rewarded for guessing image source correctly and the generator is rewarded for fooling the discriminator,
- ID loss inputs a clear image to Gc and an unclear image to Gu. The generated images should ideally be identical to the input images and
- Cycle loss is inspired by CycleGAN. A clear image is given to Gu, generating a fake unclear image which is given to Gc to create a doubly fake clear image. The difference between the real and double fake clear image is minimized. A similar function is performed for the unclear images.
Combining paired and unpaired:
The paired and unpaired regimes are combined to create a model with three datasets (one paired, an unpaired clear and unpaired unclear) and six model components. Any given epoch may be specified as conducting either paired or unpaired training.
Experiments
I experiment with training under a solely paired or unpaired regime, with alternating between epochs and with a paired/unpaired learning scheme that evolves over time. Specifically, for the evolving learning scheme, all epochs start as paired training. Gradually, the probability of unpaired epochs increases until there is a 50% of unpaired learning halfway through the total number of epochs (150) and a 100% chance of unpaired learning at the last epoch. For the i-th epoch out out T epochs total, I draw one sample from a binomial distribution with i/T probability of success.
I also evaluated the paired, alternating and evolving schemes with a paired dataset containing 1000, 5000 or the full 11,000 examples. The employed dataset is EUVP CITE.
The training losses develop as follows:
In the paired regime, I always encountered an issue where the paired clear discriminator DPc hit near-zero loss. I was unsure as how to rectify this issue, but a solution may provide more stable training and better results. The evolutions of the other components’ losses appear relatively typical for adversarial training, where improvements in one component pose challenges for the adversarial component.
Results
(top row is unclear and bottom row is the generated clear image)
CycleGAN, full unpaired dataset
CycleGAN requires approximately two days to train 200 epochs, while FUnIE GAN (and by extension UW PUP GAN) are designed to be lighter-weight and trains 150 epochs in about six hours on an Euler Cluster GPU.
Paired, full dataset
Unpaired, full dataset
1000 paired and full unpaired, alternating scheme
1000 paired and full unpaired, evolving scheme
5000 paired and full unpaired, alternating scheme
5000 paired and full unpaired, evolving scheme
Future work
While there are some metrics for quantitatively assessing image quality in paired learning regimes, none to-date are seemingly designed for the unpaired regime. As such, the development of a metric that works for models trained with paired and/or unpaired data may aid progress in this line of research.
Acknowledgement: Thank you to the Wisconsin Applied Computing Center’s Euler Cluster and its administrator, Colin Vanden Heuvel, for compute resources!
Sources
Martín Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.
Software available from tensorflow.org. 2015. url: https://www.tensorflow.org/.
Saeed Anwar and Chongyi Li. “Diving deeper into underwater image enhancement: A survey”.
In: Signal Processing: Image Communication 89 (2020), p. 115978.
Mu Cai et al. “Frequency domain image translation: More photo-realistic, better identity-
preserving”. In: Proceedings of the IEEE/CVF International Conference on Computer Vision.
2021, pp. 13930–13940.
Keming Cao, Yan-Tsung Peng, and Pamela C Cosman. “Underwater image restoration us-
ing deep networks to estimate background light and scene depth”. In: 2018 IEEE Southwest
Symposium on Image Analysis and Interpretation (SSIAI). IEEE. 2018, pp. 1–4
John Y Chiang and Ying-Ching Chen. “Underwater image enhancement by wavelength com-
pensation and dehazing”. In: IEEE transactions on image processing 21.4 (2011), pp. 1756–
1769.
Ian Goodfellow et al. “Generative adversarial nets”. In: Advances in neural information processing systems 27 (2014).
Vu Cong Duy Hoang et al. “Iterative back-translation for neural machine translation”. In:
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 2018, pp. 18–
24.
Kaiming He, Jian Sun, and Xiaoou Tang. “Single image haze removal using dark channel prior”. In: IEEE transactions on pattern analysis and machine intelligence 33.12 (2010), pp. 2341–
2353.
Md Jahidul Islam, Youya Xia, and Junaed Sattar. Fast Underwater Image Enhancement for
Improved Visual Perception. 2019. doi: 10.48550/ARXIV.1903.09766. url: https://arxiv.
org/abs/1903.09766.
Phillip Isola et al. “Image-to-image translation with conditional adversarial networks”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 1125–
1134.
Muwei Jian et al. “Underwater image processing and analysis: A review”. In: Signal Processing:
Image Communication 91 (2021), p. 116088.
Chau Yi Li and Andrea Cavallaro. “Background light estimation for depth-dependent under-
water image restoration”. In: 2018 25th IEEE International Conference on Image Processing
(ICIP). IEEE. 2018, pp. 1528–1532.
Chongyi Li, Jichang Guo, and Chunle Guo. “Emerging from water: Underwater image color
correction based on weakly supervised color transfer”. In: IEEE Signal processing letters 25.3
(2018), pp. 323–327.
Jingyu Lu et al. “Multi-scale adversarial network for underwater image restoration”. In: Optics & Laser Technology 110 (2019), pp. 105–113.
Karen Panetta, Chen Gao, and Sos Agaian. “Human-visual-system-inspired underwater image
quality measures”. In: IEEE Journal of Oceanic Engineering 41.3 (2015), pp. 541–551.
Adam Paszke et al. “Pytorch: An imperative style, high-performance deep learning library”.
In: Advances in neural information processing systems 32 (2019).
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for
biomedical image segmentation”. In: International Conference on Medical image computing
and computer-assisted intervention. Springer. 2015, pp. 234–241.
Xin Sun et al. “Deep pixel-to-pixel network for underwater image enhancement and restora-
tion”. In: IET Image Processing 13.3 (2019), pp. 469–474.
Jun-Yan Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
Networks. 2017. doi: 10.48550/ARXIV.1703.10593. url: https://arxiv.org/abs/1703.
10593.
Miao Yang and Arcot Sowmya. “An underwater color image quality evaluation metric”. In:
IEEE Transactions on Image Processing 24.12 (2015), pp. 6062–6071.