my blog my blog

Category: Computer Vision
Groundtruth data for Computer Vision with Blender

In the video below you can see the sequence of a car driving in a city scene and braking. The layers I rendered out for groundtruth data are the rendered image with the boundingbox of the car (top left), the emission layer ( shows the brakelights when they start to emit light, top right ), the optical flow (lower left), and the depth of each pixel in the world scene ( lower right).


Render-time was about 10h on a Nvidia GeForce GTX 680, tilesize 256×256, total image-size: 960×720.

Setting up groundtruth rendering, saving and reading it again

After you have composed your scene, switch to the Cycles rendering engine, and enable the render-passes in the properties-view (under render-layers) you plan on saving as ground-truth information. In this case, I selected the Z-layer, which represents the depth of the scene:

Now open the Node-editor-view, display the Compositing-Nodetree and put a checkmark at „Use Nodes“, it should look like this:

Hit „T“, and in the „output“-tab, select „File Output“. Connect it to the Z-output of the render-layers-node. Hit „N“ while the File-Output-node is selected, enter an appropriate File-subpath as name for the data (I chose „Depth“ here), and depending on your usecase, change the output format. As I am expecting float-values that are not constrained to 0-255, I will save the data in the OpenEXR-format:

After rendering ([F12]), we have a file called /tmp/Depth0001.exr. It can be read and displayed using python:

 

#!/usr/bin/python                                                                                                             

'''
author: Tobias Weis
'''

import OpenEXR
import Imath
import array
import numpy as np
import csv
import time
import datetime
import h5py
import matplotlib.pyplot as plt

def exr2numpy(exr, maxvalue=1.,normalize=True):
    """ converts 1-channel exr-data to 2D numpy arrays """                                                                    
    file = OpenEXR.InputFile(exr)

    # Compute the size
    dw = file.header()['dataWindow']
    sz = (dw.max.x - dw.min.x + 1, dw.max.y - dw.min.y + 1)

    # Read the three color channels as 32-bit floats
    FLOAT = Imath.PixelType(Imath.PixelType.FLOAT)
    (R) = [array.array('f', file.channel(Chan, FLOAT)).tolist() for Chan in ("R") ]

    # create numpy 2D-array
    img = np.zeros((sz[1],sz[0],3), np.float64)

    # normalize
    data = np.array(R)
    data[data > maxvalue] = maxvalue

    if normalize:
        data /= np.max(data)

    img = np.array(data).reshape(img.shape[0],-1)

    return img

depth_data = exr2numpy("Depth0001.exr", maxvalue=15, normalize=False)

fig = plt.figure()
plt.imshow(depth_data)
plt.colorbar()
plt.show()

The output of this script on the default-scene looks like this:

Further layers to generate groundtruth data

Besides depth information, we can also generate groundtruth data for other modalities:

  • Object/Material index: Can be used to generate pixel-accurate annotations of objects for semantic segmentation tasks, or simply computing bounding boxes for IOU measures
  • Normal: Can be used to generate pixel-accurate groundtruth data of surface normals
  • Vector: Will show up as „Speed“ in the node editor, contains information for optical flow between two frames in a sequence
  • Emit: Pixel-accurate information of emssion-properties.
Displacement priors

What is the target of all this ? Driving in an automotive scenario with a given speed and turnrate at any moment, we want to predict the displacement of a 2D-projection (pixel) between two frames:
p(\vec{uv}_{x,y} | speed, turnrate, camera-matrix, world-geometry)

By using the camera-calibration, I can create artificial curves and walls as 3D point-sets and project them back to 2D. Using discretized values for speed, turnrate, streetwidth and wall-height, I can then simulate the displacement of these 3D-Points when they are projected to 2D (our image).
(Note for me: this is the backprojection-code, main-file: main_displacements.py)

2_flows

Symmetry detection

This will probably become one of our modalities in the future: symmetry !

Thanks to the guys at hs-niederrhein, there is symmetry-detection-code that can already be used
for some first estimates:

This software implements the gradient product transform for symmetry
detection that is described in the paper

C. Dalitz, R. Pohle-Froehlich, F. Schmitt, M. Jeltsch:
„The gradient product transform for symmetry detection
and blood vesselm extraction.“ International Conference on
Computer Vision Theory and Applications (VISAPP), pp. 177-184,
2015

And the first results look quite promising:

Lane detection

Today I will try to detect some lanes..

Assumptions:
– We know the lane-width (plus minus)
– We are inside the middle of a lane
– We know the camera geometry
– Based on the turnrate of the IMU we can estimate the curvature of the street
– A line in pixels can be detected by a upward flank and a downward flank

Here are some exemplary results:

1) Of course, the best one first 😉
lanedet_00004871