Sep 2020

High-content image generation for drug discovery using generative adversarial networks

Shaista Hussaina, Ayesha Aneesa, Ankit Dasa, Binh P. Nguyenb, Mardiana Marzukic, Shuping Lind, Graham Wrightd, Amit Singhalc

Large amounts of high-content image data generated in drug discovery screening requires computationally driven automated analysis. Emergence of advanced machine learning algorithms, like deep learning models, has transformed the interpretation and analysis of imaging data. Deep learning methods however generally require large number of high-quality data samples, which could be limited during preclinical investigations.

To address this issue, the writers propose a generative modelling based computational framework to synthesize images, which can be used for phenotypic profiling of perturbations induced by drug compounds. Three variants of Generative Adversarial Network (GAN) are investigated for use in the proposed framework, namely: a basic Vanilla GAN, Deep Convolutional GAN (DCGAN) and Progressive GAN (ProGAN). DCGAN was to be most efficient in generating realistic synthetic images. A pre-trained convolutional neural network (CNN) was used to extract features of both real and synthetic images, followed by a classification model trained on real and synthetic images. The quality of synthesized images was evaluated by comparing their feature distributions with that of real images. The DCGAN-based framework was applied to high-content image data from a drug screen to synthesize high-quality cellular images, which were used to augment the real image data. The augmented dataset was shown to yield better classification performance compared with that obtained using only real images.

The writers also demonstrated the application of proposed method on the generation of bacterial images and computed feature distributions for bacterial images specific to different drug treatments.

Fig. 1. High-content real images. (a) Cell membrane and bacteria shown in green and red respectively, (b) Cell channel image, (c) Bacteria channel image. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 2. Framework for generation of synthetic images (HCS) and evaluation of synthesized data.

 

Fig. 3. Generative Adversarial Network. 

Fig. 4. Architecture of the Generator Network.

In summary, the article presents results that the proposed DCGAN-based framework can be utilized to generate realistic synthetic high-content images, thus enabling the study of drug-induced effects on cells and bacteria. This work is a novel application of GAN-based approach to generate images of cellular and bacterial structures in a high-content fluorescence microscopy based experiment. This generative model based cellular image analysis framework can have context specific capabilities such as providing novel insights into the subcellular structures and enabling the identification of new cell types. The approach of synthesizing realistic cellular or bacterial images could enable the generation of large volumes of augmented image data, which can facilitate the use of advanced computational methods, like deep learning algorithms.

 

The full article can be accessed here.

 

aInstitute of High Performance Computing, A*STAR, 138673, Singapore. bSchool of Mathematics and Statistics, VUW, 6140, New Zealand. cSingapore Immunology Network, A*STAR, 138648, Singapore. dSkin Research Institute of Singapore, A*STAR, 138648, Singapore.