A PHD FILTER BASED RELATIVE LOCALIZATION FILTER FOR ROBOTIC SWARMS

In this thesis, we present a Probability Hypothesis Density (PHD) filter based relative localization system for robotic swarms. The system is designed to use only local information collected by onboard lidar and camera sensors to identify and track other swarm members within proximity. The multi-sensor setup of the system accounts for the inability of single sensors to provide enough information for the simultaneous identification of teammates and estimation of their position. However, it also requires the implementation of sensor fusion techniques that do not employ complex computer vision or recognition algorithms, due to robots’ limited computational capabilities. The use of the PHD filter is fostered by its inherent multi-sensor setup. Moreover, it aligns well with the overall goal of this localization system and swarm setup that does not require the association of a unique identifier to each team member. The system was tested on a team of four robots. This thesis content was accepted to DARS-SWARM 2021 conference [1].


Introduction
In recent years, robotic swarms have been receiving increasing attention thanks to many potential applications [4]. Tasks as target search and tracking [5], search and rescue [6], exploration [7], information gathering, clean up of toxic spills [8,9], and construction [10] have been proposed throughout the years.
A robotic swarm is defined as a group of low cost, relatively simple robots that intends to perform tasks in an unknown/undiscovered environments. In a swarm, each individual member performs its own task and collectively the swarm intends to achieve the main goal.
Robotics swarms are highly depend on its member locations. Having a centralised common positioning system for a swarm is not feasible in the above mentioned environments, hence each member must have the knowledge of other members in its attached frame of reference. This knowledge is gained by utilizing its own sensors without depending on an external system.
Most works focus on control algorithms; however, many control laws and collaborative swarm behaviors require the ability to identify other robots in the environment, and compute an estimate of their position.
To retrieve this information many localization algorithms in different operative conditions have been proposed for multi-robot systems. In cooperative localization (e.g., [11]), the robots communicate each other's odometry and relative measurements to compute the location of each team member in a common frame of reference, usually through an online Bayesian filter (e.g., [12,13]) or estimator (e.g., [14]). However, the assumption of a common frame of reference accounts as a form of centralization and should be avoided in a robotic swarm setting.

Localization methods in a Swarm
In relative localization algorithms, the assumption of a common frame of reference is eased and the goal of each robot is to estimate the pose of other robots in its attached frame of reference. This has been addressed through Bayesian filters [15], geometrical arguments [16], or a combination of both [17]. Usually, both relative and cooperative localization algorithms require not only some position, bearing or distance measurements, but also that each measurement comes with the unique identifier of the measured robot. However, unique identification of each robot could be difficult or undesirable. Typical approaches include visually tagging each robot and extracting the tag through cameras [18], using dedicated infrared systems [19] or RFIDs [20]. Tagging and ID exchange in many cases are not viable solutions. It could be technically unfeasible, particularly in case of large numbers of robots or with sensors, as laser scanners, that do not allow for unique identification capabilities. It also accounts as a form of centralization, meaning that all robots need to know the same set of IDs. Last but not least, it may jeopardize the task to make explicit the identity of each robot, if the swarm is for example in an escorting or disguising mission.
In a number of papers, the problem of computing an estimate of other robot's location with untagged measurements has been referred to as localization with anonymous measurements [17,21], or unknown data association. In [21], using odometry and untagged relative measurements communicated from other robots, the robots were able to produce tagged (i.e., associated with a unique identifier) relative pose estimates.
However, associating ids to each robot is not a mandatory condition to perform cooperative tasks as formation control [22], encircling [23], and connectivity maintenance [24], as long as each robot is able to identify that some entities in the environment are generically teammates, and compute an estimate of their relative positions. Moreover, in a robotic swarm setup, robots could have limited or no communication capabilities, and should rely only on local self-gathered measurements to perform their tasks.
In this situation, the choice of the sensor equipment endowed to the robots becomes even more crucial. On the one hand, the sensors should provide enough information to allow (non-unique) identification of other robots, and quantita-  [25,26]. RGB-D sensors offer the best of both worlds but usually have limited fields of view, while the robots should be aware of teammates and obstacles in entirety of their surroundings.

PHD filter approach
Given the multi-target multi-sensor tracking nature of the proposed problem, a natural choice would be the employment of a Probability Hypothesis Density (PHD) filter. The PHD filter was first proposed in [27] as a recursive filter for multi-target multi-sensor tracking. The filter in its theoretical form would require infinite computational power. However, some authors have proposed Gaussian mixture [28] and particle based implementations [29] among others.
PHD filters have already been employed in multi-robot localization. In [30], the authors presented a PHD filter to incorporate absolute poses exchanged by robots and local sensory measurement to maintain robots' formation when communication fails. In [31], a team of mobile sensors was employed to cooperatively localize an unknown number of targets via PHD filter. However, in these two works a common frame of reference was assumed, which is not compatible with our setup. In [32], the authors implemented a PHD filter to fuse ground robot (UGV) odometry and aerial camera measurements to estimate the location and identity of the UGVs. However, only the aerial robot computes the position of the other robots, and not every team member. In [33], the authors used two different visual features to describe the target of interest enhancing the PHD-based tracking. However, this is an example of video tracking and the metric pose of the targets are not estimated. Therefore, none of the setups discussed in literature is compatible with the needs of a robotic swarm and our settings.

Statement of the problem
In this thesis we propose a multi-model approach in which we employ multiple sensors, fisheye cameras and laser scanners, to combine the recognition capability of the first with the accuracy of the second. However, this approach requires non-trivial data fusion techniques. Hence we propose a novel robo-centric implementation of the PHD filter for the fusion of lidar and camera measurements in a swarm setup.
The proposed filter design runs independently on each robot to compute estimates of other teammates position in its attached frame of reference. The measurements for the filter will be obtained using on-board lidar and camera sensors and will not depend on any external sensor data. The filter computations are done in the on-board main processing unit, which computes estimates in real time.
The filter will be first tested in a simulated robotic swarm where each swarm member is equipped with above mentioned sensors. Then we develop a UGV (Unmanned ground vehicle) to test the filter in a real robotic swarm. Each UGV will be equipped with said sensors and a processing unit as per filter requirement. The issues that arose during the experiment will be investigated and will be addressed by modifying filter parameters.

CHAPTER 2
Problem Setting and Background The system we consider ( Figure 1) consists of n UGVs {R 1 , R 2 , ..., R n } in a 2D space, with n unknown and time-variant. The generic robot R j is modeled as a rigid body moving in 2D space and is equipped with an attached reference frame F j = {O j , X j , Y j } whose origin coincides with a representative point of the robot.
Let q j h ∈ R 2 be and ψ j h ∈ SO(1) respectively the position and orientation of R h in F j , and let o j h be the position of O h in F j . In the following, we indicate with R(φ) the elementary 2D rotation matrix of an angle φ: Robot R j is equipped with multiple sensors. First, the odometry module of R j provides, at each time k, a measurement U j k = [∆x j k ∆y j k ∆ψ j k ] T ∈ R 2 × SO(1) of the robot linear and angular displacement between two consecutive sampling instants k − 1 and k on the XY plane.
R j is also equipped with a lidar sensor. Lidar measurements are processed with a feature extraction algorithm that identifies all objects in the scan (including robots) whose size is comparable with the size of the robots. In general, we assume that there is an unknown number of objects in the environment that will be detected in the lidar as possible robots. Therefore, at each time step k the algorithm provides a set of l k relative position measurements L k = {l 1 k , ..., l l k k } in F j , representing the position of robots or obstacles in the field of view of the sensor. The sensor is affected by false positive (some measurements may not refer to actual objects) and false negative measurements (some object or robot may not be detected) due to obstructions and errors of the feature extraction algorithm. Lastly, two fisheye cameras are mounted on R j , one oriented towards the front of the robot, and one towards the back. This setup allows us to identify robots in a 360 • field of view. The images from the cameras are processed using a feature extraction algorithm that has the capability of identifying a generic robot based on color. The algorithm does not uniquely identify and label each robot. At time step k, the cameras provide a set of c k bearing measurements C k = {c 1 , ...c c k k } in F j . Also in this case, there may be false positive (non-robots identified as robots) and false negative (missed robot detections) measurements. In the following, the camera and lidar measurements collected at time k will be denoted together as Note that camera and lidar have different rates and in general are not synchronized. Therefore, without loss of generality, for some k it may be either L k = ∅, or C k = ∅, or both. A representation of all sensor readings is provided in Figure 1.
The objective of R j is to compute at each time step k an estimate of the number n(k) and positions of all robots in the environment.

Multi-sensor PHD filter
This Section provides the necessary background on the PHD filter and is mostly based on [27,28,29]. Assuming that there are n (with n unknown and variable over time) targets living in a space X , the goal of the standard PHD filter is to compute an estimate of the PHD of targets in X . The PHD f k (x) at time k is defined as the function such that its integral over any subset S ⊆ X is the The PHD filter is a recursive estimator composed of two main steps: a time update and a measurement update. The time update is meant to produce a prediction of the PHD f k|k−1 (x) at time step k given the estimate f k−1|k−1 (x) at time k − 1, through the time update equation: is the probability that a new target appears in x between times k − 1 and k, P s (x ) is the probability that a target in x at time k − 1 will survive into step k, f k|k−1 (x|x ) is the probability density that a target in x moves to x, and b k|k−1 (x|x ) is the probability that a new target spawns in x at time k from a target in x at time k − 1.
Note that both f k−1|k−1 (x) and f k|k−1 (x) are computed considering only the measurements up to time k − 1. Measurements Z k collected at time k are incorporated in the estimate through the measurements update to compute the posterior PHD: where P D (x) is the probability that an observation is collected from a target with state x, g(z|x) is the sensor likelihood function, and λc(z) expresses the probability that a given measurement z is a false positive.
Although elegant, equations (2) and (3) Therefore, the PHD prediction will have a component for each component in the PHD posterior f k−1|k−1 (x). Moreover, the integral term will be the same as a prediction step of the standard Kalman filter, so every component of f k|k−1 (x) will be a Gaussian function, and it will be possible to compute the PHD prediction by simply applying component-wise the time update of a Kalman filter.
Introducing the GM representation (4) in equation (3), assuming that the probability of detection can be approximated as a constant P D (x) P i D for each , and c(z) = 0, the GM-PHD filter measurement update equation becomes: showing that, if Z k contains m measurements, each component f i k|k−1 (x) generates m + 1 components in f k|k (x). Moreover, if g(z|x) is a Gaussian function, the last term is a sum of Gaussian functions, each function being the result of a singlecomponent Kalman filter measurement update step.
An additional pruning step is needed to limit the number of components in the PHD. In fact, if all components were kept at each time step, their number would grow exponentially with the number of measurements. Therefore, all components whose weight is below a given threshold at the end of the measurement update are eliminated.
It is clear from its formulation that the PHD filter is inherently multi-sensor.
In fact, when multiple sensors are present, multiple measurement updates can be applied consecutively, each one as a component-wise Kalman filter update step.
where the * expresses the concept that the i-th component may refer to any of the tracked target robots. The covariance of m i k|k is therefore p i k|k ∈ R 2×2 .

Time Update
During the time update, the owner's R j odometry U j k = [∆x j k ∆y j k ∆ψ j k ] T is used to update the mean and covariance of all components of the PHD. The k th time update for the generic i th component is given by: where P i s is the survival probability from time step k − 1 to the time step k of the i th component f i k−1|k−1 , and Q k−1 is the system noise. Ideally, the survival probability P i s , depends on the real probability that a target disappear. In a robotic swarm context, this probability would be extremely low in the whole domain. Therefore, we have used it as a design parameter to meet the objectives of the localization module. Coherently with the task and motivation of this thesis, only local information is required and available to each robot. Using P i s , we prefer to let too far components fade. At this aim, we use an inverse sigmoid function ( Figure 3) to compute P i s : This creates a circular area in around R j in which it tracks other robots. To account for targets that enters into this area from outside, a birth target component b k|k−1 (x) is added at each time update, such that its mean, covariance and weight The assigned weight is very low so if there is no correspondence with the measurements (i.e., at least one measurement without a good correspondence with one or more components of the PHD prior), b k|k−1 (x) will be pruned immediately. The choice of limiting the area in which each robot tracks its teammates is also beneficial for the scalability of the method. In fact, even if the swarm was comprised of hundreds of agents, each robot would only track the ones that are closer to it, therefore linking the computational complexity of the filter to the density of the swarm rather than to the total number of robots.

Lidar Measurement Update
After the time update, the lidar measurement update is performed only when new measurements are available. We assume that the lidar at time k collect l k position measurements l h k ∈ L k , h = 1, ..., l k in F j . Following equation (6), each where L P i d is the probability that a target corresponding to component f i k|k−1 is detected by the lidar. The other l k components are created using measurement update equations of the Kalman filter: w i(l k +1)+h k|k where L H k is the lidar observation matrix and K k is the associated Kalman gain: where L R k is the covariance of the noise on the lidar measurements, that is determined experimentally as: The probability of detection L P i d is a key parameter for the success of the filter. For each component f i k|k−1 , L P i d is calculated considering four factors that limit the lidar sensor ability to detect objects. Distance, blind spots caused by the camera holders, obstruction of a robot by another robot, and interference caused by other lidar sensors. The final L P j d is the product of all those factors: The first factor is the distance of each component f i k|k−1 which is related to the lidar sensor range. If a component is located beyond the range of the lidar, then it will not be detected. In our particular case, the range of the lidar is limited to 2m. A sigmoid function was used to calculate L P i d|dis : The second factor is due to the two pillars that support the fish eye cameras on R j , that create four blind spots in the field of view (FOV) of the lidar, whose centers α i , i = 1, . . . , 4 and angular width β i , i = 1, . . . , 4 were determined experimentally.
Denoting with θ i k|k−1 the bearing angle of the mean of the i-th component, a sum of Gaussian functions is implemented to calculate L P i d|cs (Figure 4): The third factor L P i d|b models the situation in which robots block each other from the FOV of the lidar, that is therefore unable to collect a measurement for the robot that is behind. Similar situation seen in Figure 5. Hence the probability of detection of each component is reduced incorporating a zero mean Gaussian function L P i d|b based on i) the angle difference θ dif f between pairs of components; and ii) their Euclidean distance ||m i k+1|k || from R j . When θ dif f becomes close to zero for some pair, the robot which has the shortest Euclidean distance from R j will block the other robot. For the generic component f i k+1|k (x), using all the components, L P i d|b is calculated as:  Figure 4: Probability of detection for blind spots created by camera structure The last factor L P i d|in models the interference caused by the lidar sensors mounted on the other robots, that we noticed during the testing phase. Whenever two lidar sensors are pointing at each other, their readings record null (invalid) measurements in correspondence to the other robots. Given the rotational nature of the internal mechanical structure of the lidars, this interference manifested itself as a loss of a measurement associated with the appearance of null measurements with a pseudo-periodic pattern. To model this interference, we considered that, along with the measurements, the lidar provides the intensity of the returning laser beam. When interference occurs, it will zero the lidar intensity l θ int ∈ L k , ϑ = 0, 0.5, . . . , 360. Therefore we used the intensity readings to calculate L P i d|in for f i k+1|k (x): where c denotes the covariance of each l ϑ int = 0.
The other c k components are computed using the Kalman filter equations: where C P i d is the camera probability of detection, H k is the observation matrix, and K k is Kalman gain: where m i k|k−1 = [x i y i ] T and C R k = 25deg is the covariance of the noise of the camera measurements.
The probability of detection C P i d is computed as the product of two factors, distance from R j and obstruction of a robot by another robot: The first factor is computed with an inverse sigmoid function: while for the second factor, C P i d|b = L P i d|b .

CHAPTER 4
Simulation Figure 6: Gazebo 20 UGVs simulation The filter was first tested in simulation before testing in real robot experiment.
A swarm of UGVs was simulated in Gazebo/ROS. The UGV model was equipped with a simulated lidar sensor. The implementation of a fish-eye camera was difficult due to unavailability of its manufacture parameters and the computational load of simultaneously simulating 10 -20 camera sensors, hence the camera measurements were generated directly using the UGV locations. A zero mean Gaussian noise was added to the camera measurements to simulate realistic measurements.

Simulation results
Three plots were generated using simulated results to evaluate the performance of the filter and illustrate the benefits of the proposed multi-sensor approach. The multi-sensor approach was compared by running the filter with (Lidar+Camera -LC) and without camera measurements (Lidar only -LO).
The bar chart in Figure 7   In Figure 9 the plot shows the path for each UGV during the simulation when    The design consist of a four-wheeled base equipped with two Omni-directional cameras, one Lidar sensor, a main processing unit, a lower level processing unit, wheel encoders and a 12V battery pack to provide power. Additionally each UGV was retrofitted with a red color strip on the camera structure to improve the detection in camera measurements.

UGV Platform
The UGVs were constructed using a commercially available four wheeled differential drive robot platform, the DFRobot Cherokey (22.5cm x 17.5cm). Each UGV is equipped with wheel encoders and a Romeo V2 (an Arduino Robot Board (Arduino Leonardo) with Motor Driver). The Romeo V2 processes and executes the low level control to follow desired velocity commands. Figure 11: Robot Platform -DFRobot Cherokey [2] The DFRobot Cherokey has two levels in its platform. The lower level platform is embedded with motor controllers and the power distribution for the motors.
The space between two levels is utilized to accommodate the Battery pack, main processing unit, voltage divider circuits and Arduino robot board. The top level is utilized to attach the three sensors.   Angular displacement θ t is given by:

Main Processing unit
where P L t , P R t are pulses of the left side motor and pulsars of the right side motor, r w is the wheel radius, and w ax is the UGV axis width.

UGV Sensors
Each UGV is equipped with a Lidar and two omni directional cameras that are integrated to have 360 degree field of view. All three sensors are connected to the Odroid using standard USB ports. Both the lidar and the cameras are fixed in the origin O j of F j and aligned with the X axis, eliminating rotational and translational complexities during the image and scan processing.

Omni-directional Cameras
The two cameras are standard USB web cameras equipped with a 180 degree fish-eye lens. Each camera provides two megapixel 1920x1080 resolution images at 30 fps rate.

Figure 17: Camera Sensor
A camera holder was designed and 3D printed to hold the cameras above the lidar at an height of 130mm from UGV and a 45 downward degree angle.
This specific height and angle is designed to maximize the horizontal FOV of the combined camera images.
All the UGVs have been equipped with a red strip around the camera holder in order to allow camera tracking via color extraction. Figure 18: Front and rear camera images

The testing area
The testing area is a 3m x 3m square space with raised walls to avoid disturbances from the external environment and keep the robots in proximity of each other. However, it is also larger than the FOV of the lidar to allow robots to exit and re-enter the tracking radius. Ground truth of the actual position of the robots is provided at each time by an Optitrack 6Dof motion tracking system. Figure 19: Testing area

Lidar Measurements
An algorithm was developed to obtain each lidar scan published in the ROS topic and search for other UGV. During a typical lidar scan, the laser beam is reflected when it hits on other UGV's lidar structure which has a shape of a circle.
As seen in Figure 20 each UGV was represented by small set of points, which has a shape of an arch when inspected closely. Arc lengths of these points were computed and compared with the diameter of lidar structure to distinguish UGVs from larger obstacles. The measured diameter of typical lidar structure is 65mm. The arc lengths which were less than 65mm were extracted as the possible measurements for other UGVs. Range and bearing for these possible measurements were translated into Cartesian coordinates, rotated to UGV frame of reference and provided as the lidar measurements to the filter.

Camera Measurements
The camera measurement are generated using a color extraction method. The color extraction algorithm first defines the boundaries for RGB (Red, Green, Blue) color space to filter the red color. When an image is received from an UGV camera, it is sent through the color filter to eliminate other colors except red. The remaining red color blobs were extracted and their pixel coordinates are used to compute the bearing measurements.
Spherical coordinate system was used to compute the angular measurements from the pixel coordinates. Given pixel coordinate x p , y p for an identified blob, pixel distance D s between sensor and fish eye lens, the angular measurement in camera frame of reference c θ 1 : Where R R j c is the rotational matrix from camera frame of reference to UGV frame of reference and y R j ,x R j are the coordinates in UGV frame of reference. Figure 22 shows the geometric argument for the sensor and the lens.

CHAPTER 6
Experimental Results The experimental data was analysed in Matlab software to highlight the benefit of the proposed multi-sensor approach with (Lidar+Camera -LC) and without (Lidar only -LO) providing the camera measurements. Figure 23 shows the distance error for each robot from the closest component whose weight is greater than 0.1. Overall, the LC method performs well except for some instants near time 150s and 190s in which the robot performing the estimate was consistently in a corner of the arena, hence with a limited field of view, effectively leading to robot's UGV 07 position not being measured for several tens of seconds. The plots show also that the LC method outperforms the LO method being able to keep the error bounded for most of the time when measurements are available.
A numerical comparison between the LC and LO methods is provided in Figure 24. In order to quantify the better performance of the LC filter, we have computed the percentage of time for which each robot's distance error is greater than 30cm. The values, reported in the table in Figure 24(left), show how the employment of camera measurements in addition to the lidar greatly reduces the error time of a factor 2 to 5. Finally, in Figure 24(right) we report the total sum of the weight of all components with respect to time during the whole experiment.
From this plot, it is possible to establish that the LC method is more effective in correctly estimating the number of robots, and therefore in eliminating estimates that refer to objects in the environment and not robots. for robotic swarms based on the PHD filter. Our system has been tested with real robot experiments, and evaluated against a single-sensor method based on the same principle. The results show that the multi-sensor approach performs better than the single-sensor method.
In the future, on the one hand we plan on improving the relative localization and include negative information measurements to simultaneously track robots and obstacles. On the other hand, we plan to pair the localization system with a decentralized formation control methods to perform real-world tasks as exploration, SLAM, patrolling and human-swarm interaction.