BronchoGAN: Anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy

1Medical Informatics, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
2Faculty of Electrical Engineering and Computer Science, University of Technology Lübeck, Mönkhofer Weg 239, 23909 Lübeck, Germany
IJCARS: CARS 2025
Contributing authors: marian.himstedt@th-luebeck.de
BronchoGAN Teaser

BronchoGAN architecture: RGB input images from virtual bronchoscopy and phantom datasets are processed by depthAnything generating a depth image as intermediate representation. A cGAN is trained on this depth images synthesizing real bronchoscopy using a hierarchical pix2pixHD. The output is translated to a depth image again. Bronchial orifices are segmented from both, input and output depth images using.

Abstract

The limited availability of bronchoscopy images makes image synthesis particularly interesting for training deep learning models. Robust image translation across different domains -- virtual bronchoscopy, phantom as well as in-vivo and ex-vivo image data -- is pivotal for clinical applications. This paper proposes BronchoGAN introducing anatomical constraints for image-to-image translation being integrated into a conditional GAN. In particular, we force bronchial orifices to match across input and output images. We further propose to use foundation model-generated depth images as intermediate representation ensuring robustness across a variety of input domains establishing models with substantially less reliance on individual training datasets. Moreover our intermediate depth image representation allows to easily construct paired image data for training. Our experiments showed that input images from different domains (e.g. virtual bronchoscopy, phantoms) can be successfully translated to images mimicking realistic human airway appearance. We demonstrated that anatomical settings (i.e. bronchial orifices) can be robustly preserved with our approach which is shown qualitatively and quantitatively by means of improved FID, SSIM and dice coefficients scores. Our anatomical constraints enabled an improvement in the Dice coefficient of up to 0.43 for synthetic images. Through foundation models for intermediate depth representations, bronchial orifice segmentation integrated as anatomical constraints into conditional GANs we are able to robustly translate images from different bronchoscopy input domains. BronchoGAN allows to incorporate public CT scan data (virtual bronchoscopy) in order to generate large-scale bronchoscopy image datasets with realistic appearance. BronchoGAN enables to bridge the gap of missing public bronchoscopy images.

Results

Quantitative Results

Quantitative Results

Table 1: Quantitative results obtained for 2271 VB test images of the Harvard image dataset. Dice coefficients were estimated based on input and synthesized image bronchial orifice segmentations obtained with our training free pipeline.

Qualitative Results

Poster

BibTeX

@article{soliman2025bronchogan,
  title={BronchoGAN: Anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy},
  author={Soliman, Ahmad and Keuth, Ron and Himstedt, Marian},
  journal={International Journal of Computer Assisted Radiology and Surgery},
  year={2025},
  publisher={Springer},
  doi={10.1007/s11548-025-03450-w}
}