Simultaneous Localization and Mapping Based on Seamless Fusion of Heterogeneous Multi-Modal Features

Published by Nankai University, 2021

Abstract

Simultaneous localization and mapping (SLAM) consists of the estimation of the robot pose as well as the map of the environment by processing the measurement data captured by on-board sensors. Nowadays, with an increasing demand for the application in domestic services, autonomous driving and so on, the SLAM techniques have gained rapid development. It is of great importance in a SLAM system to estimate the poses of the sensor and build a model of the scene using the features extracted from the environment. Different types of features have their own strengths and weaknesses in various scenes. Specifically, there exist multimodal features such as points, lines and planes in the structured environments. Hence, to increase the effectiveness of the robot system in unknown scenes, the fusion of multiple types of features has become a hot topic in SLAM field.

This thesis focuses on the simultaneous localization and map building based on the seamless fusion of heterogeneous multi-modal features. First, a multi-modal feature association algorithm is proposed based on a multi-hypothesis framework. Then, the constraints provided by different types of features on the robot motion estimation are quantitatively analyzed. Based on the analysis results, a seamless fusion method is proposed to calculate the poses of the robot fusing heterogeneous multi-modal features. In this thesis，the major contributions are summarized in the following.

(1) The problem of multi-modal feature association is addressed. A multihypothesis framework-based feature matching algorithm is proposed to simultaneously associate different types of features (planes and lines) and further estimate the pose of the sensor. The multi-hypothesis framework is achieved by constructing an interpretation tree (IT) structure. Specifically, an inter-node consistency is proposed for generation of hypotheses and a consistent transformation model (CTM) for each hypothesis is explicitly expressed and incrementally updated. When the IT is constructed, a closed-form solution to the feature association and the pose estimation can be obtained. Then, a multi-modal feature joint optimization method is introduced to further refine the pose estimate and parameters of features. During the optimization, different types of geometric features are appropriately parameterized and the uncertainties arising from feature extraction are derived and used to balance the contributions of multiple types of features in the cost function. Extensive experiments are executed on public datasets and the results demonstrate that the proposed method can achieve higher accuracy and stronger robustness.

(2) A plane-line-based RGB-D visual odometry (PLVO) is proposed to address the correspondences between the degenerate cases in the pose estimation and the spatial configurations of features. First, the plane-line hybrid association graph (PLHAG) is proposed to describe the spatial configurations of different features as well as the geometric relationships between them. Then, the pose of the sensor is estimated based on the adaptive fusion of planes and lines. Specifically, an adaptive weighting algorithm is proposed based on correspondences between the degenerate cases and the spatial configurations of features, considering the geometric relationships between plane and line features. For the degrees of freedom (DoFs) of the pose that cannot be constrained by planes, the line features are supplementarily used to obtain the full 6DoF pose estimation of the sensor and solve the degenerate problem. Various experiments on public benchmarks as well as in real-world environments demonstrate the efficiency of the proposed method.

(3) A SLAM system is achieved based on a seamless fusion of heterogeneous multi-modal features (plane-line-point). First, a probabilistic fitting algorithm for the geometric feature is proposed to compute the parameters of planes and lines. By exploiting the error model of the depth sensor, the proposed probabilistic fitting is adaptive to various measurement noises corresponding to different depth measurements. As a result, the estimated parameters are more accurate and robust to the points with large uncertainties. Then, a constraint analysis is performed to quantitatively measure the constraints provided by different types of features on the pose estimation of a sensor. Using the results of the constraint analysis, a plane-line-point fusion-based SLAM method is proposed. Through the fusion of multi-modal features, both the structure and texture information is fully exploited and the problem of pose estimation remains well-posed in all circumstances. In addition, the comparison results of extensive experiments on public datasets demonstrate that the SLAM system seamlessly fusing multi-modal features can achieve high accuracy and robustness.

Key Words: Mobile robot; simultaneous localization and mapping; high-level feature; multi-modal feature fusion; multi-hypothesis framework; adaptive weighting

摘要

同时定位与建图(SimultaneousLocalizationandMapping,SLAM)是通过对传感器信息的处理，在未知环境中对移动机器人进行定位，并建立环境地图的过程。近年来，随着移动机器人在家庭服务、自动驾驶等领域的应用，SLAM技术也随之得到广泛的研究和发展。基于环境特征的机器人位姿求解与环境建模，是SLAM技术的主要实现途径。面向复杂多样的作业场景，不同种类的特征有着各自的优势和不足。特别是在结构化环境下，存在点、线、面等多模态特征，如何有效地组织、利用和融合这些特征，提升移动机器人对未知场景的适应性，逐渐成为SLAM领域的研究热点。

本文针对结构化环境下多模态特征无缝融合的移动机器人同时定位与建图问题进行研究，首先提出基于多假设框架的多模态特征混合关联算法，并针对多种模态特征对机器人位姿求解的约束进行分析，在此基础上，提出基于多模态特征无缝融合的机器人位姿估计算法。本文具体研究工作如下：

(1) 对SLAM中的多模态特征联合关联问题进行研究，提出基于多假设框架的面-线多模态特征混合关联方法，实现平面和直线两类高层几何特征的联合关联，基于此完成传感器位姿变换的解算。具体而言，设计基于多假设框架的假设树(InterpretationTree,IT)结构及其构建方法，根据结点间一致性原则生成假设路径，对于每一条假设路径，增量式维护并更新一致变换模型(ConsistentTransformationModel,CTM)。当IT结构构建结束时，即可同时获得特征关联问题以及位姿解算问题的封闭解。接着，提出多模态特征联合优化方法，对传感器位姿的求解结果以及特征参数进行进一步优化，并针对不同种类的几何特征，使用有针对性的参数化方法完成特征参数的更新。在联合优化的过程中，考虑特征提取及拟合过程中的参数不确定性，对各类特征在目标函数中的贡献进行自适应设计，有效降低参数不确定性的影响。公开数据集上的大量实验结果证明了所提方法的准确性和鲁棒性。

(2) 对传感器位姿求解问题的退化情况与特征空间分布的对应关系进行研究，并提出结构化环境下基于面-线多模态特征自适应融合的视觉里程计(Plane-Line-basedVisualOdometry,PLVO)。首先，提出平面-直线混合关联图(Plane-LineHybridAssociationGraph,PLHAG)结构，充分考虑平面和平面、平面和直线之间的几何关系，全面刻画多模态几何特征的空间分布特性。然后，提出基于平面和直线主辅相济、自适应融合的传感器位姿估计方法。具体而言，基于平面特征空间分布与传感器位姿求解退化之间的对应关系，并根据直线特征的空间分布以及直线与平面特征之间的几何关系，提出直线特征自适应加权方法，利用直线特征对平面无法约束的位姿自由度进行补充，从而实现两类特征的融合，解决了传感器位姿求解时的退化问题。最后，通过公开数据集上的定量实验以及真实室内环境下的机器人本体实验，验证了所提方法的有效性。

(3) 针对基于面-线-点多模态特征无缝融合的SLAM问题进行研究。首先，提出平面以及直线特征参数的概率拟合算法，通过对传感器观测误差的建模与传递，降低了数据点观测噪声的影响，使得拟合的平面和直线特征参数对观测噪声具有更强的鲁棒性。接着，针对多模态特征对传感器位姿求解问题的约束情况进行严格地定量分析，在此基础上，提出基于面-线-点多模态特征无缝融合的SLAM方法，在充分利用环境结构信息的基础上，进一步增强了对视觉纹理信息的开发和利用，与此同时，提高了位姿求解问题的适定性(well-posedness)。通过公开数据集上的定量实验对比，证明了多模态特征无缝融合方法的准确性和有效性。

关键词：移动机器人；同时定位与建图；高层特征；多模态特征融合；多假设框架；自适应加权

Recommended citation: Q. Sun. Simultaneous Localization and Mapping Based on Seamless Fusion of Heterogeneous Multi-Modal Features. Ph.D. dissertation, Nankai University, Tianjin, China, 2021. http://sunqinxuan.github.io/files/publications-2021-06-07-PhD.Dissertation.pdf

孙沁璇

Abstract

摘要