Mappotino: A Robot for Exploration, Mapping, and Object Recognition

How can we let a robot create a map of an unknown indoor environment using three cameras that simultaneously provide depth information and color images? We investigated this question in a four-week practical course at our university. This article sheds some light on the challenges involved.

Disasters are scenarios where a robot could be sent into an unknown environment for a search-and-rescue mission. Although we have not set out to create a robot that is deployable in a real disaster zone, we were keen on solving some of the challenges posed by such a setting.

The Mappotino robot and a room in the basement.

In our case, the mission of the robot (we named it Mappotino) was to autonomously navigate in the basement of our university building, a place with obstacles and narrow passageways. Thereby it needs create a 2D map of the basement while driving. The data collected by three combined depth+color cameras is then processed offline. The observations are used to compute a 3D model of the basement. Furthermore, the color images serve to recognize and locate targets.

On the hardware side, we equipped a standard Robotino platform by Festo Didactic with three Kinect cameras mounted on top of the robot, two computers, and a switch.

We have rid the Kinect cameras of their movable bases such that the cameras can be rigidly attached to the robot platform. Via USB, no two Kinect cameras can be connected to the same bus which reduced the choices on which computers the Kinect sensor data is read and processed. The two computers were equipped with a standard hard disk (no SSD) and less powerful than a common notebook, although one of the computers needed to be replaced by a notebook due to problems with the power supply. All computers, including the computer that is part of the standard Robotino platform were connected to each other using a switch.

Four weeks with such a variety of tasks leave little time for implementing every software component from scratch; therefore our choice fell upon the Robot Operating System (ROS) which not only provides an interface for controlling the actors and reading the sensors on the Robotino, but also software components for mapping, exploration, navigation, object recognition, and visualization. Unfortunately, the existing packages often proved insufficient in features or reliability, or were not tailored for our needs; hence we left none of them unchanged and even ended up implementing our own object detector for textured planar objects.

Most importantly, the robot needs to know about the obstacles in its vicinity. In order to achieve this, we wrote a fake laser-scanner that projects the 3D points acquired by all three Kinect cameras within the height of the robot onto the floor. The projections then – in principle – are inserted into the 2D map as obstacles (see image above). Since the odometry cannot be fully trusted, the series of such observations are merged into a grid-based map over time using the gmapping SLAM package from OpenSLAM.org.

A generated 2D map, fake laser-scanner data, and obstacles.

For exploration, we tweaked an existing frontier-based ROS package (see: explore) for our purposes. Basically, given the current 2D map of the environment, the robot tries to find boundaries between unknown areas and known areas that are not obstructed by a wall.

For navigation, we adapted an existing ROS package (see: robotino_navigation). In theory, planning by finding shortest paths is straight-forward. Due to noise and uncertainty, this task turned out to be much harder in reality, especially given the narrow passageways our robot was to face in the basement.

The three Kinect cameras produce both depth images and color images, thereby accumulating massive amounts of data in the range of several Gigabytes. It already required significant efforts to improve the efficiency of existing packages to be able to write the data to disk. The post-processing of the data is done offline after the robot has finished exploration and mapping.

The Mappotino robot in action, visualized in RVIZ.

The collected data is used to compute 3D models of the environment and also to recognize and locate objects. Again, OpenSLAM.org came to the rescue with the RGBDSLAM package which merges 3D point clouds from multiple views into a common frame. Keypoints tracked in the images are used in order to compute the rigid transformation between two camera frames. This often fails when walls do not exhibit enough distinguishable features; in this case, RGBDSLAM will fall back to ICP, an iterative 3D point matching algorithm. We found the results to be good in rooms with many good features to track.

A 3D point cloud of the basement, stitched together with RGBDSLAM.

The Mappotino robot knows how to recognize objects and locate them in the environment. As we found none of those object detectors we were aware of to be sufficiently usable for our purpose, we ended up implementing our own detector for textured planar objects. We found a lot of promising features in existing object detectors, that is: RoboEarth, BLORT, and the detector shipped with ECTO but at the time of our practical course, these partly cutting-edge software components could not be integrated easily in a project that is bound to a certain software and hardware platform.

Our textured planar object detector (source) is based on matching local features learnt from a model image with local features from a scene image. The 3D data is completely ignored during recognition. Possible homographies between the model images and the scene images are suggested by RANSAC, and discarded depending on the number of inliers and the nature of the homography. The 3D location in the map is approximated by computing the 3D centroid; this requires the depth information from the Kinect camera.

The five target objects that need to be located.

All in all, this project taught us that ROS provides a lot of packages and very helpful tools in visualization. None of the packages could be used out-of-the-box; they needed adaptation to our platform, bug-fixes, installation tweaks, and much coding to meet our performance requirements. OpenCV has become powerful enough such that an object detector can be written on top of it almost without any other dependencies. See the robot in action .

« Fix for Wireless Presenters and Flash-based Full-screen Prezi

Rotating Backups with rsnapshot »

a blog by Julius Adorf

Posts in TechnologyWhere is the Runway?Virtualizing the Controllers Controls, Yours Or Mine?Foraying into Flight Simulation and Control Pomodoro Timer: Prototype, Round 3 Pub combinatorics: the joy of rediscovery Quick-fix: Typing ÄÖÜ on a UK Keyboard Pomodoro Timer: Prototype, Round 2 Pomodoro Timer: Prototype with an ATmega32 Right control key on keyboard as i3 modifier in Ubuntu 20.04 A formula for converting pace from min/mile to min/km in Google Spreadsheets Visualizing Strava activities with BigQuery and Google Data Studio Thoughts on Model Thinking: a smörgåsbord Statistics tell you when to stop practicing Applying Machine Learning to Strava activities using BigQuery ML Inspecting air pollution data from OpenAQ using Colab, Pandas, and BigQuery What probability theory tells you about starting on time Analysing Strava activities using Colab, Pandas & Matplotlib (Part 4)Analysing Strava activities using Colab, Pandas & Matplotlib (Part 3)Analysing Strava activities using Colab, Pandas & Matplotlib (Part 2)Analysing Strava activities using Colab, Pandas & Matplotlib (Part 1)Misleading infographics: How Not To Bubble Chart Memories from University: Teaching the Computer to play Connect Four Missing Maps: Use Your Phone for the Better How data can assist us in forming good habits Missing Maps: Putting People on the Map Energy from Thin Air: Measuring Air Pollution with CleanSpace Bletchley Park and the rebuilt bombe Motion Segmentation of RGB-D Videos via Trajectory Clustering Preview: Motion Segmentation of RGB-D Videos via Trajectory Clustering Fixing a Shimano EF50-8R bicycle shifter Programmer-friendly German keyboard layout on GNU/Linux Case study: when average speed matters Recursive circle packing with PostScript Managing encrypted devices with LVM on top of LUKS with luksctl Benchmarking Google's Speech Recognition Web Service Asus Xtion Pro Live – First Impressions Using Google's Speech Recognition Web Service with Python Speech Input in Google Chrome: x-webkit-speech Clustering Crash Simulation Data with LLCA German PC keyboard layout in Mac OS Prolonging the Life of a Logitech K340 Keyboard Computing PageRank for the Swedish Wikipedia Case Study: Role-Playing Game in C++Artificial Neural Network: Animation of Training Inspecting Algorithms with Graphs Behind the scenes: a thought abroad HP Officejet 6500 e710n-z on Arch Linux Task Manager with Focus on Usability: dropandforget Netgear WNR612 Classic Wireless Router – Good Value for Money Version Control on Top of Dropbox Public Transport in Munich now on Google Maps Quick-fix for X11: Typing Å on German Keyboard Rudimentary Recognition of Spoken Words at KTH Recognizing Textured Planar Objects with OpenCV The Viterbi Algorithm and Breadth-First Search Arch Linux: switched to systemd Rotating Backups with rsnapshot Olve Maudal and Deep C++Mappotino: A Robot for Exploration, Mapping, and Object RecognitionTemplate Tracking using Hyperplane Approximation Fix for Wireless Presenters and Flash-based Full-screen Prezi Reinventing the Wheel: Panorama Stitching with Matlab Saving the Parrots with Homogeneous Coordinates A Connection between Motion Blur and the Fourier Transform Disabling hot-corner effect in Gnome 3 Dual-booting Arch and Ubuntu with LVM on top of LUKS Team Black Sheep presents amazing stunts with first-person-view RC plane Sampling from a Poisson distribution - a benchmark Understanding someone else's source code Enhancing Details with Unsharp Masking Nearest-Neighbor-Resampling in Matlab Zweidimensionale Bereiche plotten mit Wolfram|Alpha Hosting bei Dreamhost, Domain woanders Eine weitere Identität für Binomialkoeffizienten Remote Procedure Calls über den DBus Syntaxhervorhebung mit Pygments 2D-Grafik-Ausgabe mit Cairo und OCaml Programmierkonzepte für Multi-Core-Prozessoren Funktionsgraphen zeichnen mit PostScript