Crop and Segment images using OpenAI’s CLIP

Vishnu Nandakumar
3 min readJun 14, 2022

Ever wondered if we can crop out sections in an image depending upon our interest or segment out images based on our wishes by just using a word query, well there is a way that we can work out to achieve the same. Leveraging OpenAI’s CLIP and DeTr’s (any other can also be used) computer vision models we can segment and crop out sections of images using simple text queries. Cropping out sections can be achieved using a combination of object detection models and CLIP by deriving the probabilities of each detected section of images for the text that was queried for. While to get the segment we have to just replace the object detection model with the panoptic segmentation models.

What is CLIP?

Briefly, CLIP (Contrastive Language Image Pre-training) is a zero-shot learning method devised by the famous OpenAI for tackling multi-modal problems. A number of text-image pairs were chosen and their encodings were computed to create an (N-no of images*M-no of texts) matrix with each representing a similarity score between each text and image as a pre-training step. Assuming we want to perform zero-shot image classification, we should encode the image and the set of texts into embeddings using the pre-trained tensor. By sorting the computed similarity score we can determine the more relevant text to the image and vice-versa in case of finding out the perfect image from the set of images (image neural search).

In our scenario, we are extending the same principle to crop/segment out sections from the image. To make it easy I have created a simple python library clipcrop (still developing) to perform the same. To perform the object detection and image segmentation task I have used Facebook’s DeTr as it is faster and more accurate when compared to conventional single-stage or two-stage detectors. What's more, you can even leverage HuggingFace pipelines to accomplish the same. Below I have attached examples of both the features.

Implementation

  • Install the library using pip install clipcrop
  • Crop sections of objects from the image using
  • Extract segments from the image using

So from above, we see that we can either crop out sections or segments by leveraging OpenAI’s CLIP and Facebook’s DeTr models. I will continue upgrading with more options and functionalities to improve the library. For further info, please jump on to the below-mentioned repo.

Sources:

That’s a wrap thank you for the immense support that you have shown to me and let’s continue this journey. Let’s grow together and learn together. Until next time, bye and take care.

--

--

Vishnu Nandakumar

Machine Learning Engineer, Cloud Computing (AWS), Arsenal Fan. Have a look at my page: https://bit.ly/m/vishnunandakumar