[PYTHON] Handle transparent images with OpenCV-Make sprites dance-

Introduction

Let's handle transparent png images with OpenCV and do something like the sprite of the old hobby computer. Those who want to make games using sprites in earnest should use pygame. I just want to study OpenCV.

Basic RGBA elements

Here, Irasutoya's "Characters of the squadron taking a fixed pose (group)" (reduced version) is used. The background is transparent.

sentai.png
sentai.png

To import RGBA images, specify * flags = cv2.IMREAD_UNCHANGED * in cv2.imread (). Actually, you don't have to remember such a spell, just specify -1 as the second argument. If the argument is set to 1 or omitted, it will be captured as an RGB 3-channel image.

python


import cv2
filename = "sentai.png "
img4 = cv2.imread(filename, -1)
img3 = cv2.imread(filename)
print (img4.shape)  #result:(248, 300, 4)
print (img3.shape)  #result:(248, 300, 3)

When this image is displayed with cv2.imshow (), the background becomes black. When you draw a picture on a transparent canvas with drawing software, the transparent part is treated as black because there is no color element. All RGB components are zero = (0,0,0) = black.

Take out each element of RGBA

As I write many times, OpenCV images are stored in numpy.ndarray format, so you can not only trim but also extract each color element by slicing.

python


import cv2
filename = "sentai.png "
img4 = cv2.imread(filename, -1)
b = img4[:, :, 0]
g = img4[:, :, 1]
r = img4[:, :, 2]
a = img4[:, :, 3]

cv2.imshow("img4", img4)
cv2.imshow("b", b)
cv2.imshow("g", g)
cv2.imshow("r", r)
cv2.imshow("a", a)
cv2.waitKey(0)
cv2.destroyAllWindows()
The original image A element
sentai.png imgA.png
B element G element R element
imgB.png imgG.png imgR.png

Each element has a shape of ndim = 2, that is, (height, width). Therefore, when each element is displayed as an image, it becomes grayscale. Since the Aka Ranger is (G, B, R) = (21,11,213), for example, the brightness of red is quite high, while the brightness of blue and green is low. On the other hand, since the killer is (G, B, R) = (0,171,247), the brightness of red is higher than that of the akaranger, and the green element is also high as it is. Also, the alpha value is 0 for transparent and 255 for opaque. It seems better to remember it as opacity rather than transparency.

Then, how to display each color element in each color is to create an RGB image with other color components set to 0.

Method 1


import cv2
import numpy as np
filename = "sentai.png "
img3 = cv2.imread(filename)

b = img3[:, :, 0]
g = img3[:, :, 1]
r = img3[:, :, 2]
z = np.full(img3.shape[:2], 0, np.uint8)
imgB = cv2.merge((b,z,z))
imgG = cv2.merge((z,g,z))
imgR = cv2.merge((z,z,r))

cv2.imshow("imgB", imgB)
cv2.imshow("imgG", imgG)
cv2.imshow("imgR", imgR)
cv2.waitKey(0)
cv2.destroyAllWindows()

There is also a way to reduce the brightness of unnecessary colors to 0 in the original RGB image.

Method 2


import cv2
import numpy as np
filename = "sentai.png "
img3 = cv2.imread(filename)

imgB = img3.copy()
imgB[:, :, (1,2)] = 0  # 3ch(BGR)1st of(G)And second(R)To 0
imgG = img3.copy()
imgG[:, :, (0,2)] = 0  # 3ch(BGR)0th(B)And second(R)To 0
imgR = img3.copy()
imgR[:, :, (0,1)] = 0  # 3ch(BGR)0th(B)And the first(G)To 0

cv2.imshow("imgB", imgB)
cv2.imshow("imgG", imgG)
cv2.imshow("imgR", imgR)
cv2.waitKey(0)
cv2.destroyAllWindows()
B element in blue G element in green R element in red
imgB.png imgG.png imgR.png

Combine images

From here is the production. The background image is "Illustration of the Universe (Background Material)". It is a jpeg image and has no alpha value. The image overlaid on the background is "Illustration of an astronaut" (reduced version).

space.jpg
space.jpg
uchuhikoushi.png
4.png

Advance preparation

Create an RGB image and a mask image from an RGBA image. Since the one mentioned earlier as "A element" is one channel, it cannot be combined with the background image. Create a mask image of RGB3 channel in the same way as making the B element blue and the R element red.

python


import cv2

filename = "uchuhikoushi.png "
img4 = cv2.imread(filename, -1)

img3 = img4[:, :, :3]  #The first three of RGBA, namely RGB
mask1 = img4[:, :, 3]  #The third of RGBA, counting from 0, that is, A
mask3 = cv2.merge((mask1, mask1, mask1)) #3 channel RGB image
Original image (RGBA) RGB image Mask image
4.png 3.png mask3.png

Also, cut out an image of the same size as the foreground from the background.

Method 1 Set the transparent color

If your foreground image says "this color isn't part of the main image, it's just used as a background", you can use numpy.where () to set the transparency. It's like chroma key. If you write only the minimum elements, it will be like this.

python


#back and front must be the same shape
transparence = (0,0,0)
result = np.where(front==transparence, back, front)
back front result
back.jpg 3.png result.jpg

It's easy, but for example, in this image of Irasutoya, it is necessary to confirm in advance that (0,0,0) is not used for the black hair of the astronaut. If you fail, you will end up with something like Transparent Gachapin.

Method 2 Mask processing

If you look at other sites, the answer will come out immediately, but let's try and error for studying. Identity in logical operations   x and 1 = x   x and 0 = 0   x or 1 = 1   x or 0 = x Is performed for arbitrary values, not just Boolean values of 0 and 1. Luminance is expressed in 8 bits, so if you write it roughly x (arbitrary color) and 255 (white) = x (arbitrary color) x (arbitrary color) and 0 (black) = 0 (black) x (arbitrary color) or 255 (white) = 255 (white) x (arbitrary color) or 0 (black) = x (arbitrary color) That is. The table below was made by hand, so I'm sorry if there are any mistakes.

No back Calculation mask tmp
1 back.jpg OR mask3.png result1.jpg
2 back.jpg AND mask3.png result2.jpg
3 back.jpg OR mask3_inv.png result3.jpg
4 back.jpg AND mask3_inv.png result4.jpg

No. 1 and No. 4 are likely to be used at this stage. Let's synthesize the foreground image with them.

No tmp Calculation front result Evaluation
1-1 result1.jpg OR 3.png result1.jpg ×
1-2 result1.jpg AND 3.png 3.png ×
1-3 result1.jpg OR front3_white.png result5.jpg ×
1-4 result1.jpg AND front3_white.png result.jpg
4-1 result4.jpg OR 3.png result.jpg
4-2 result4.jpg AND 3.png result6.jpg ×
4-3 result4.jpg OR front3_white.png front3_white.png ×
4-4 result4.jpg AND front3_white.png result4.jpg ×

So the correct answers were 1-4 and 4-1. "Foreground image with black background" and "Mask image with black background and white foreground" were prepared in advance, but they could not be combined by themselves. In 1-4, "foreground image with white background" was required, and in 4-1 "mask image with white background and black foreground" was required. True life is something that can't be left.

Correspondence to depiction outside the image

If you call yourself a sprite, you have to be able to describe it outside the range of the background image. I was going to write a commentary, but already in the previous article ["Making the function of drawing Japanese fonts with OpenCV general purpose"](https://qiita.com/mo256man/items/b6e17b5a66d1ea13b5e3#% E7% 94% BB% E5% 83% 8F% E5% A4% 96% E3% 81% B8% E3% 81% AE% E6% 8F% 8F% E5% 86% 99% E3% 81% B8% E3% 81% AE% E5% AF% BE% E5% BF% 9C), so the explanation is omitted.

Method 3 Use PIL

Even though I'm studying OpenCV, I can't help but touch PIL.

Paste the image on the image Image.paste (im, box = None, mask = None)

-* Im * Image to paste. -* box * The upper left coordinates are represented by (x, y). Thankfully, it can be outside the range of the original image. The default value is None, which is the upper left. There is also a method to specify with a 4-element tuple, but that is omitted. -* Mask * Mask image. The default value is None. Images can be specified as RGBA images as well as black and white and grayscale. A kind specification that the alpha value is treated as a mask in the case of RGBA.

To combine a transparent image with an image, just specify ʻImage.paste (im, box, im)` and the same arguments for the first and third. It's PIL, and it makes me want to say what I've done with OpenCV so far.

Execution speed comparison

So far, three methods have been shown. Also, in the previous article, I learned how to PIL only the necessary parts instead of PILing the entire image. So, let's see the execution speed with the following four self-made functions.

--putSprite_npwhere Set the transparent color and combine the images with np.where. There is support outside the background image. --putSprite_mask Set the mask image and combine. There is support outside the background image. --putSprite_pil_all Synthesize with PIL. Make the entire background image PIL. There is no need to deal with the outside of the background image. --putSprite_pil_roi Synthesize with PIL. Only the part necessary for synthesis is made into PIL. When setting the ROI, it corresponds to the outside of the background image.

The content to be executed is like this. I'm cutting the frame of the animated GIF to reduce the capacity. The sauce is long so it's at the bottom. uchuhikoushi.gif

result

Average value when each is executed 10 times.

putSprite_npwhere : 0.265903830528259 sec
putSprite_mask    : 0.213901996612548 sec
putSprite_pil_all : 0.973412466049193 sec
putSprite_pil_roi : 0.344804096221923 sec

Aside from the fact that the program is not optimized, my machine is slow, and Python is slow in the first place, the slowness of PIL stands out. It was also found that even with PIL, speed can be significantly improved if only the minimum ROI is used. I really appreciate the comments I received in the previous article.

Masking and np.where () are faster than PIL. np.where () is fast enough, but slightly slower than masking because it makes decisions internally. It is necessary to prepare a mask image for mask processing, but it is probably the lightest in terms of processing because it only overwrites the image.

Notes

Masking and np.where () cannot be used for translucent images.

Since the mask processing is 0 or 1, the identity holds, and if it is translucent, that is, it is calculated with a value that is neither 0 nor 1. x (background color) and a (half-finished mask) = tmp (strange color) tmp (strange color) or y (foreground color) = z (unexpected color) It is natural that it becomes.

I tried various things, thinking that np.where () could be translucent depending on the device, but at least I couldn't.

Next time preview

Next, I would like to challenge translucency and support for rotation.

Source

I knew the function ʻeval ()` that uses a string as a Python command just before I posted this article. Reference: Python-Call a function dynamically from a string

python


import cv2
import numpy as np
from PIL import Image
import time
import math

# numpy.Synthesize with where
#Although RGBA images are imported due to the relationship with other functions
#Only RGB elements are used, not alpha values.
#Transparent color(0,0,0)It's a definite decision, but it's not a good idea.
def putSprite_npwhere(back, front4, pos):
    x, y = pos
    fh, fw = front4.shape[:2]
    bh, bw = back.shape[:2]
    x1, y1 = max(x, 0), max(y, 0)
    x2, y2 = min(x+fw, bw), min(y+fh, bh)
    if not ((-fw < x < bw) and (-fh < y < bh)) :
        return back
    front3 = front4[:, :, :3]
    front_roi = front3[y1-y:y2-y, x1-x:x2-x]
    roi = back[y1:y2, x1:x2]
    tmp = np.where(front_roi==(0,0,0), roi, front_roi)
    back[y1:y2, x1:x2] = tmp
    return back

#Mask
#The mask image is created from the RGBA image in the function each time.
#It is faster to make a mask image in advance,
#This is easier to use. I think.
def putSprite_mask(back, front4, pos):
    x, y = pos
    fh, fw = front4.shape[:2]
    bh, bw = back.shape[:2]
    x1, y1 = max(x, 0), max(y, 0)
    x2, y2 = min(x+fw, bw), min(y+fh, bh)
    if not ((-fw < x < bw) and (-fh < y < bh)) :
        return back
    front3 = front4[:, :, :3]
    mask1 = front4[:, :, 3]
    mask3 = 255 - cv2.merge((mask1, mask1, mask1))
    mask_roi = mask3[y1-y:y2-y, x1-x:x2-x]
    front_roi = front3[y1-y:y2-y, x1-x:x2-x]
    roi = back[y1:y2, x1:x2]
    tmp = cv2.bitwise_and(roi, mask_roi)
    tmp = cv2.bitwise_or(tmp, front_roi)
    back[y1:y2, x1:x2] = tmp
    return back

#Whole background image to be combined with PIL
def putSprite_pil_all(back, front4, pos):
    back_pil = Image.fromarray(back)
    front_pil = Image.fromarray(front4)
    back_pil.paste(front_pil, pos, front_pil)
    return np.array(back_pil, dtype = np.uint8)

#Only the part in the background image to be synthesized with PIL
def putSprite_pil_roi(back, front4, pos):
    x, y = pos
    fh, fw = front4.shape[:2]
    bh, bw = back.shape[:2]
    x1, y1 = max(x, 0), max(y, 0)
    x2, y2 = min(x+fw, bw), min(y+fh, bh)
    if not ((-fw < x < bw) and (-fh < y < bh)) :
        return back
    back_roi_pil = Image.fromarray(back[y1:y2, x1:x2])
    front_pil = Image.fromarray(front4[y1-y:y2-y, x1-x:x2-x])
    back_roi_pil.paste(front_pil, (0,0), front_pil)
    back_roi = np.array(back_roi_pil, dtype = np.uint8)
    back[y1:y2, x1:x2] = back_roi
    return back

def main(func):
    filename_back = "space.jpg "
    filename_front = "uchuhikoushi.png "
    img_back = cv2.imread(filename_back)
    img_front = cv2.imread(filename_front, -1)
    bh, bw = img_back.shape[:2]
    xc, yc = bw*0.5, bh*0.5
    rx, ry = bw*0.3, bh*1.2
    cv2.putText(img_back, func, (20,bh-20), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255))

    ###Start from here to measure the time
    start_time = time.time()

    for angle in range(-180, 180):
        back = img_back.copy()
        x = int(xc + rx * math.cos(math.radians(angle)))
        y = int(yc + ry * math.sin(math.radians(angle)))
        img = eval(func)(back, img_front, (x,y))
        
        #This can be enabled or disabled as needed
        #cv2.imshow(func, img)
        #cv2.waitKey(1)

    elasped_time = time.time() - start_time
    ###So far

    print (f"{func} : {elasped_time} sec")    
    cv2.destroyAllWindows()
                
if __name__ == "__main__":
    funcs = ["putSprite_npwhere",
             "putSprite_mask",
             "putSprite_pil_all" ,
             "putSprite_pil_roi" ]
    for func in funcs:
        for i in range(10):
            main(func)

Recommended Posts

Handle transparent images with OpenCV-Make sprites dance-
Handle Excel with python
Handle rabbimq with python
Rotate sprites with OpenCV