Everyone is a super saiyan with ARKit + Metal

In the process of investigating Apple sample of People Occlusion, I tried to make an effect when I became a super saiyan.

Complete image demo.png demo.gif Referenced article: ・ Apply effects only to the human body with ARKit3 and MetalLumin effect with Swift + MetalMastering GPU with MetalKit

How to make an aura

If you search for images of Super Saiyan, there are various variations of aura (?). In this article, I referred to the aura immediately after Goku became a Super Saiyan on Planet Namek (because it seemed easy to do without jaggedness. The yellow aura makes it brighter as it gets closer to the human body).

The base is Apple sample here.

① Border the human body Enlarge the human body part and remove the color of the original human body part to make it an edge. ② Blur ① ・ By making the yellow blur larger and the white blur smaller, it will become brighter as you get closer to the human body. ③② and the camera capture image are combined

① Border the human body

As a method of thickly edging, the method of enlarging the human body image and erasing the original color of the human body is adopted. Processed with compute shader.

kernel void matteConvert(texture2d<half, access::read> inTexture [[ texture(0) ]],
                         texture2d<half, access::write> outWhiteTexture [[ texture(1) ]],
                         texture2d<half, access::write> outYellowTexture [[ texture(2) ]],
                         uint2 gid [[thread_position_in_grid]]) {
    
    uint2 textureIndex(gid);
    if (inTexture.read(textureIndex).r > 0.1) {
        //No color on the human body
        outWhiteTexture.write(half4(0.0), gid);
        outYellowTexture.write(half4(0.0), gid);
        return;
    }
    
    //Expansion
    constexpr int scale = 15;
    constexpr int radius = scale / 2;
    half color = 0.0;
    
    for (int i=0; i<scale; i++) {
        for (int j=0; j<scale; j++) {
            uint2 textureIndex(gid.x + (i - radius), gid.y + (j - radius));
            half alpha = inTexture.read(textureIndex).r;
            if (alpha > 0.1) {
                color = 1.0;
                break;
            }
        }
        if (color > 0.0) {
            break;
        }
    }

    outWhiteTexture.write(half4(color, color, color, 1.0), gid);
    outYellowTexture.write(half4(color, color, 0.0, 1.0), gid);
}

If each pixel of the human body image is within 15px up, down, left and right, it is enlarged by setting it as "human body". (It loops 225 (= 15x15) times per 1px, so the load is very high. It would be better to reduce the resolution before processing because it will blur anyway, but I could not do my best. I will do it someday) By making the part that was originally the human body uncolored (black), the result is a border image. The output is stored in the texture buffer prepared for each of white and yellow.

Border result (confirmed with Capture GPU Frame of Xcode12) edge.png

② Blur ①

Blur uses Metal Performance Shader (Image Filter). The Metal Performance Shader can be easily combined with your own shader.

//Change the size of the blur with time
time += 1
//Blur (white)
let whiteIntensity = Int((sin(Float(time)/3) + 2) * 30) | 0x01  //It is necessary to specify an odd number for the size of MPSImageTent.
let kernel1 = MPSImageTent(device: device, kernelWidth: whiteIntensity, kernelHeight: whiteIntensity)
kernel1.encode(commandBuffer: commandBuffer,
              inPlaceTexture: &whiteBlurTexture!, fallbackCopyAllocator: nil)
//Blur (yellow)
let yellowIntensity = Int((sin(Float(time)/3) + 2) * 100) | 0x01
let kernel2 = MPSImageTent(device: device, kernelWidth: yellowIntensity, kernelHeight: yellowIntensity)
kernel2.encode(commandBuffer: commandBuffer,
              inPlaceTexture: &yellowBlurTexture!, fallbackCopyAllocator: nil)

There are various filters for Metal Performance Shader, but after trying some, MPSImageTent seems to be the best, so I adopted it. The white blur is small, the yellow blur is large, and the size of the blur is changed over time.

③② and the camera capture image are combined

Here, simply add the camera capture image and the image after blurring the edge (white, yellow).

fragment half4 compositeImageFragmentShader(CompositeColorInOut in [[ stage_in ]],
                                            texture2d<float, access::sample> capturedImageTextureY [[ texture(0) ]],
                                            texture2d<float, access::sample> capturedImageTextureCbCr [[ texture(1) ]],
                                            texture2d<float, access::sample> whiteColorTexture [[ texture(2) ]],
                                            texture2d<float, access::sample> yellowColorTexture [[ texture(3) ]],
                                            texture2d<float, access::sample> alphaTexture [[ texture(4) ]])
{
    constexpr sampler s(address::clamp_to_edge, filter::linear);

    float2 cameraTexCoord = in.texCoordCamera;

    // Sample Y and CbCr textures to get the YCbCr color at the given texture coordinate.
    float4 rgb = ycbcrToRGBTransform(capturedImageTextureY.sample(s, cameraTexCoord), capturedImageTextureCbCr.sample(s, cameraTexCoord));

    half4 cameraColor = half4(rgb);
    half4 whiteColor = half4(whiteColorTexture.sample(s, cameraTexCoord));
    half4 yellowColor = half4(yellowColorTexture.sample(s, cameraTexCoord)) * 2.0;
        
    return cameraColor + whiteColor + yellowColor;
}

The addition result works without clamping.

Whole source code

・ Swift

ViewController.swift


class ViewController: UIViewController, MTKViewDelegate {
    
    var session = ARSession()
    var renderer: Renderer!
    
    override func viewDidLoad() {
        super.viewDidLoad()
        
        if let view = self.view as? MTKView {
            view.device = MTLCreateSystemDefaultDevice()
            view.backgroundColor = UIColor.clear
            view.delegate = self
            renderer = Renderer(session: session, metalDevice: view.device!, mtkView: view)
            renderer.drawRectResized(size: view.bounds.size)
        }
    }
    
    override func viewWillAppear(_ animated: Bool) {
        super.viewWillAppear(animated)
        let configuration = ARWorldTrackingConfiguration()
        configuration.frameSemantics = .personSegmentation
        session.run(configuration)
    }
    
    func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
        renderer.drawRectResized(size: size)
    }
    
    func draw(in view: MTKView) {
        renderer.update()
    }
}

Renderer.swift


let kMaxBuffersInFlight: Int = 3

let kImagePlaneVertexData: [Float] = [
    -1.0, -1.0, 0.0, 1.0,
    1.0, -1.0, 1.0, 1.0,
    -1.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 0.0
]

class Renderer {
    let session: ARSession
    let matteGenerator: ARMatteGenerator
    let device: MTLDevice
    let inFlightSemaphore = DispatchSemaphore(value: kMaxBuffersInFlight)
    var mtkView: MTKView
    
    var commandQueue: MTLCommandQueue!
    var imagePlaneVertexBuffer: MTLBuffer!
    //Pipeline State for final image composition
    var compositePipelineState: MTLRenderPipelineState!
    //Pipeline State for enlarging human body images
    var computeState: MTLComputePipelineState!
    //Captured image texture
    var capturedImageTextureY: CVMetalTexture?
    var capturedImageTextureCbCr: CVMetalTexture?
    var capturedImageTextureCache: CVMetalTextureCache!
    //Human body image texture
    var alphaTexture: MTLTexture?       //Human body image
    var whiteBlurTexture: MTLTexture!   //A texture that enlarges and blurs the human body image in white
    var yellowBlurTexture: MTLTexture!  //Enlarged / blurred texture of human body image in yellow
    //Screen size
    var viewportSize: CGSize = CGSize()
    var viewportSizeDidChange: Bool = false
    //Thread group size of compute shader when processing human body images
    var threadgroupSize = MTLSizeMake(32, 32, 1)
    //Animation count
    var time = 0
    
    init(session: ARSession, metalDevice device: MTLDevice, mtkView: MTKView) {
        self.session = session
        self.device = device
        self.mtkView = mtkView
        matteGenerator = ARMatteGenerator(device: device, matteResolution: .half)
        loadMetal()
    }
    
    func drawRectResized(size: CGSize) {
        viewportSize = size
        viewportSizeDidChange = true
    }
    
    func update() {
        _ = inFlightSemaphore.wait(timeout: DispatchTime.distantFuture)
        
        let commandBuffer = commandQueue.makeCommandBuffer()!
        //Keep camera-captured textures from being released during rendering
        var textures = [capturedImageTextureY, capturedImageTextureCbCr]
        commandBuffer.addCompletedHandler { [weak self] commandBuffer in
            if let strongSelf = self {
                strongSelf.inFlightSemaphore.signal()
            }
            textures.removeAll()
        }
        //Camera capture texture acquisition (Y, CbCr 2)
        guard let currentFrame = session.currentFrame else { return }
        let pixelBuffer = currentFrame.capturedImage
        if CVPixelBufferGetPlaneCount(pixelBuffer) < 2 { return }
        capturedImageTextureY = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat: .r8Unorm, planeIndex: 0)
        capturedImageTextureCbCr = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat: .rg8Unorm, planeIndex: 1)
        //Set uv coordinates according to screen size
        if viewportSizeDidChange {
            viewportSizeDidChange = false
            // Update the texture coordinates of our image plane to aspect fill the viewport
            let displayToCameraTransform = currentFrame.displayTransform(for: .portrait, viewportSize: viewportSize).inverted()
            
            let vertexData = imagePlaneVertexBuffer.contents().assumingMemoryBound(to: Float.self)
            for index in 0...3 {
                let textureCoordIndex = 4 * index + 2   //kImagePlaneVertexData is the vertex coordinates(x,y) +uv coordinates(u,v)It has become. Because it sets uv+2
                let textureCoord = CGPoint(x: CGFloat(kImagePlaneVertexData[textureCoordIndex]), y: CGFloat(kImagePlaneVertexData[textureCoordIndex + 1]))
                let transformedCoord = textureCoord.applying(displayToCameraTransform)
                //Captured image
                vertexData[textureCoordIndex] = Float(transformedCoord.x)
                vertexData[textureCoordIndex + 1] = Float(transformedCoord.y)
            }
        }
        //Human body image acquisition
        alphaTexture = matteGenerator.generateMatte(from: currentFrame, commandBuffer: commandBuffer)
        //I want to make the effect of the blur look bigger than the human body, so I enlarged the human body image. In addition, textures for two colors, white and yellow, are generated.
        if let width = alphaTexture?.width, let height = alphaTexture?.height {
            let colorDesc = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .bgra8Unorm,
                                                                     width: width, height: height, mipmapped: false)
            colorDesc.usage = [.shaderRead, .shaderWrite]
            whiteBlurTexture = device.makeTexture(descriptor: colorDesc)
            yellowBlurTexture = device.makeTexture(descriptor: colorDesc)

            let threadCountW = (width + self.threadgroupSize.width - 1) / self.threadgroupSize.width
            let threadCountH = (height + self.threadgroupSize.height - 1) / self.threadgroupSize.height
            let threadgroupCount = MTLSizeMake(threadCountW, threadCountH, 1)
            
            let computeEncoder = commandBuffer.makeComputeCommandEncoder()!
            
            computeEncoder.setComputePipelineState(computeState)
            computeEncoder.setTexture(alphaTexture, index: 0)
            computeEncoder.setTexture(whiteBlurTexture, index: 1)
            computeEncoder.setTexture(yellowBlurTexture, index: 2)
            computeEncoder.dispatchThreadgroups(threadgroupCount, threadsPerThreadgroup: threadgroupSize)
            computeEncoder.endEncoding()
        }
        //Change the size of the blur with time
        time += 1
        //Blur (white)
        let whiteIntensity = Int((sin(Float(time)/3) + 2) * 30) | 0x01  //It is necessary to specify an odd number for the size of MPSImageTent.
        let kernel1 = MPSImageTent(device: device, kernelWidth: whiteIntensity, kernelHeight: whiteIntensity)
        kernel1.encode(commandBuffer: commandBuffer,
                      inPlaceTexture: &whiteBlurTexture!, fallbackCopyAllocator: nil)
        //Blur (yellow)
        let yellowIntensity = Int((sin(Float(time)/3) + 2) * 100) | 0x01
        let kernel2 = MPSImageTent(device: device, kernelWidth: yellowIntensity, kernelHeight: yellowIntensity)
        kernel2.encode(commandBuffer: commandBuffer,
                      inPlaceTexture: &yellowBlurTexture!, fallbackCopyAllocator: nil)
        //Captured image + blur (white / yellow) composite
        guard let renderPassDescriptor = mtkView.currentRenderPassDescriptor, let currentDrawable = mtkView.currentDrawable else { return }
        let compositeRenderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)!
        compositeImagesWithEncoder(renderEncoder: compositeRenderEncoder)
        compositeRenderEncoder.endEncoding()
        
        commandBuffer.present(currentDrawable)
        commandBuffer.commit()
    }
    
    func loadMetal() {
        commandQueue = device.makeCommandQueue()

        let imagePlaneVertexDataCount = kImagePlaneVertexData.count * MemoryLayout<Float>.size
        imagePlaneVertexBuffer = device.makeBuffer(bytes: kImagePlaneVertexData, length: imagePlaneVertexDataCount, options: [])
        //Camera capture image cache
        var textureCache: CVMetalTextureCache?
        CVMetalTextureCacheCreate(nil, nil, device, nil, &textureCache)
        capturedImageTextureCache = textureCache
        //Composite pipeline of camera capture image + human body image
        let defaultLibrary = device.makeDefaultLibrary()!

        let compositePipelineStateDescriptor = MTLRenderPipelineDescriptor()
        compositePipelineStateDescriptor.sampleCount = 1
        compositePipelineStateDescriptor.vertexFunction = defaultLibrary.makeFunction(name: "compositeImageVertexTransform")!
        compositePipelineStateDescriptor.fragmentFunction = defaultLibrary.makeFunction(name: "compositeImageFragmentShader")!
        compositePipelineStateDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
        try! compositePipelineState = device.makeRenderPipelineState(descriptor: compositePipelineStateDescriptor)
        
        //Compute shader for human edging
        let edgeShader = defaultLibrary.makeFunction(name: "matteConvert")!
        computeState = try! self.device.makeComputePipelineState(function: edgeShader)
    }

    //Generate MTL Texture from captured image
    func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> CVMetalTexture? {
        let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
        let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
        
        var texture: CVMetalTexture? = nil
        let status = CVMetalTextureCacheCreateTextureFromImage(nil, capturedImageTextureCache, pixelBuffer, nil, pixelFormat,
                                                               width, height, planeIndex, &texture)
        if status != kCVReturnSuccess {
            texture = nil
        }
        return texture
    }

    func compositeImagesWithEncoder(renderEncoder: MTLRenderCommandEncoder) {
        guard let textureY = capturedImageTextureY, let textureCbCr = capturedImageTextureCbCr else { return }
        renderEncoder.setCullMode(.none)
        renderEncoder.setRenderPipelineState(compositePipelineState)

        renderEncoder.setVertexBuffer(imagePlaneVertexBuffer, offset: 0, index: 0)

        renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureY), index: 0)
        renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureCbCr), index: 1)
        renderEncoder.setFragmentTexture(whiteBlurTexture, index: 2)
        renderEncoder.setFragmentTexture(yellowBlurTexture, index: 3)
        renderEncoder.setFragmentTexture(alphaTexture, index: 4)
        renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
    }
}

・ Shader

Shaders.metal


typedef struct {
    float2 position [[attribute(kVertexAttributePosition)]];
    float2 texCoord [[attribute(kVertexAttributeTexcoord)]];
} ImageVertex;


typedef struct {
    float4 position [[position]];
    float2 texCoord;
} ImageColorInOut;

vertex ImageColorInOut capturedImageVertexTransform(ImageVertex in [[stage_in]]) {
    ImageColorInOut out;
    out.position = float4(in.position, 0.0, 1.0);
    out.texCoord = in.texCoord;
    return out;
}

// Convert from YCbCr to rgb
float4 ycbcrToRGBTransform(float4 y, float4 CbCr) {
    const float4x4 ycbcrToRGBTransform = float4x4(
      float4(+1.0000f, +1.0000f, +1.0000f, +0.0000f),
      float4(+0.0000f, -0.3441f, +1.7720f, +0.0000f),
      float4(+1.4020f, -0.7141f, +0.0000f, +0.0000f),
      float4(-0.7010f, +0.5291f, -0.8860f, +1.0000f)
    );

    float4 ycbcr = float4(y.r, CbCr.rg, 1.0);
    return ycbcrToRGBTransform * ycbcr;
}

// This defines the captured image fragment function.
fragment float4 capturedImageFragmentShader(ImageColorInOut in [[stage_in]],
                                            texture2d<float, access::sample> capturedImageTextureY [[ texture(kTextureIndexY) ]],
                                            texture2d<float, access::sample> capturedImageTextureCbCr [[ texture(kTextureIndexCbCr) ]]) {
    
    constexpr sampler colorSampler(mip_filter::linear,
                                   mag_filter::linear,
                                   min_filter::linear);
    
    // Sample Y and CbCr textures to get the YCbCr color at the given texture coordinate.
    return ycbcrToRGBTransform(capturedImageTextureY.sample(colorSampler, in.texCoord),
                               capturedImageTextureCbCr.sample(colorSampler, in.texCoord));
}

typedef struct {
    float2 position;
    float2 texCoord;
} CompositeVertex;

typedef struct {
    float4 position [[position]];
    float2 texCoordCamera;
} CompositeColorInOut;

// Composite the image vertex function.
vertex CompositeColorInOut compositeImageVertexTransform(const device CompositeVertex* cameraVertices [[ buffer(0) ]],
                                                         unsigned int vid [[ vertex_id ]]) {
    CompositeColorInOut out;

    const device CompositeVertex& cv = cameraVertices[vid];

    out.position = float4(cv.position, 0.0, 1.0);
    out.texCoordCamera = cv.texCoord;

    return out;
}

// Composite the image fragment function.
fragment half4 compositeImageFragmentShader(CompositeColorInOut in [[ stage_in ]],
                                            texture2d<float, access::sample> capturedImageTextureY [[ texture(0) ]],
                                            texture2d<float, access::sample> capturedImageTextureCbCr [[ texture(1) ]],
                                            texture2d<float, access::sample> whiteColorTexture [[ texture(2) ]],
                                            texture2d<float, access::sample> yellowColorTexture [[ texture(3) ]],
                                            texture2d<float, access::sample> alphaTexture [[ texture(4) ]])
{
    constexpr sampler s(address::clamp_to_edge, filter::linear);

    float2 cameraTexCoord = in.texCoordCamera;

    // Sample Y and CbCr textures to get the YCbCr color at the given texture coordinate.
    float4 rgb = ycbcrToRGBTransform(capturedImageTextureY.sample(s, cameraTexCoord), capturedImageTextureCbCr.sample(s, cameraTexCoord));

    half4 cameraColor = half4(rgb);
    half4 whiteColor = half4(whiteColorTexture.sample(s, cameraTexCoord));
    half4 yellowColor = half4(yellowColorTexture.sample(s, cameraTexCoord)) * 2.0;
        
    return cameraColor + whiteColor + yellowColor;
}

//(Enlarged human body image-Create the edge of the human body with (human body image) and output it as a white and yellow texture
kernel void matteConvert(texture2d<half, access::read> inTexture [[ texture(0) ]],
                         texture2d<half, access::write> outWhiteTexture [[ texture(1) ]],
                         texture2d<half, access::write> outYellowTexture [[ texture(2) ]],
                         uint2 gid [[thread_position_in_grid]]) {
    
    uint2 textureIndex(gid);
    if (inTexture.read(textureIndex).r > 0.1) {
        //No color on the human body
        outWhiteTexture.write(half4(0.0), gid);
        outYellowTexture.write(half4(0.0), gid);
        return;
    }
    
    //Expansion
    constexpr int scale = 15;
    constexpr int radius = scale / 2;
    half color = 0.0;
    
    for (int i=0; i<scale; i++) {
        for (int j=0; j<scale; j++) {
            uint2 textureIndex(gid.x + (i - radius), gid.y + (j - radius));
            half alpha = inTexture.read(textureIndex).r;
            if (alpha > 0.1) {
                color = 1.0;
                break;
            }
        }
        if (color > 0.0) {
            break;
        }
    }

    outWhiteTexture.write(half4(color, color, color, 1.0), gid);
    outYellowTexture.write(half4(color, color, 0.0, 1.0), gid);
}

Recommended Posts

Everyone is a super saiyan with ARKit + Metal
Optical camouflage with ARKit + SceneKit + Metal ①
Web browsing with ARKit + SceneKit + Metal
Optical camouflage with ARKit + SceneKit + Metal ②
Try shaking your hands with ARKit + Metal