I am really interested in all kinds auf automotive applications and adas (advanced driver assistance systems) applications, and have several experiments for the detection of objects (cars, trafficsigns, etc.) or events (brakelight-transitions). Over the course of the last years, I developed several recording systems, with the Android Multisensor-Grabber being the most recent.
To efficiently manage, share and annotate these sequences, I created this tool. The frontend is a website powered by Python’s flask, the backend is regular Python code interfaced to a database using sqlalchemy.
The tool is best described by looking at some typical workflows that I use daily:
In case of the android multisensor-grabber, images are saved to binary encoded yuv files to save processing resources on the mobile device, the annotations are xml. The first step in uploading a sequence consists of compressing the sequence folder on the device and uploading it to the server via a webpage. Once uploaded, the server unzips the archive, converts all images to png and moves them to a sequence folder. Afterwards, the annotations are saved to the database. From here on, the sequence is available for viewing and annotation.
Sequence display and meta-data
To quickly identify sequences and get an overview of relevant informations, sequences are visualized as shown below:
The top row shows general informations like the sequence id, the sensor it has been captured with, the time of the recording, number of frames, it’s location on disc, the exposure time, as well as how many annotations exist for this sequence.
Ten sample images are randomly sampled from the database.
In the backend, the gps-coordinates are plotted on top of a map-tile received by a map service, and a plot is created displaying the timestamps of saved frames over time.
The annotation framework is a little more intricate. Different projects or tasks require different types of annotations, so the users should be able to add annotation types and desired attributes. Furthermore, sometimes we have to place restrictions on certain attributes to enforce consistency in annotations (if several users annotate).
Lets consider the „car“-example. Car is the class-name of the annotation we want to annotate, but we might also be interested in certrain attributes, for example the brakelight-state or the color. Those are attribute-classes. If several users annotate this class, they might come up with different values for these attributes. User1 might annotate a car and put „On“ for brakelight-state, User2 might put in „1“. This would make it hard to group the annotations afterwards, so in the current system it is possible to restrict the space of answers to just [0, 1].
The above functionality is provided on an extra page where the relations and semantics of annotations can be setup by the user. Once this is done, we can start annotating in the browser:
Export and usage in algorithms
The most important part: using the annotated data to train/evaluate algorithms! Currently I use annotated patches to train convolutional neural nets (CNNs), so the only exporter implemented is just extracting annotated image patches from all sequences, putting them in a folder-structure, compressing all folders and handing them to the user, which results in a folder-structure for the example „Car“:
Annotations | +-- Car | +- BL1 +- TL1 +- BL0_TL0
And some samples of these extracted patches (scaled and cropped to get quadratic patches):
Images are saved as png-files on disk, just their filenames are saved to the database. This fact, as well as the above considerations about annotation classes, attributes and values thereof are reflected in the database schema (please excuse the messy ordering, this is all that schemacrawler gives me):
Open issues and TODOs
As of now, there are several open issues that need to be adressed:
In order to be used in a community-driven fashion where anyone can annotate and contribute, user-management and authentication mechanisms need to be implemented.
Some tasks require pixel-accurate annotations, polygons or other geometric shapes.
Currently there do not exist means of annotating time-series (objects accross several different frames).
Only one object of a class can be annotated per frame at the moment.