Learned Hallway Passing Behavior and UUV Diver Interaction

Learned Hallway Passing Behavior

The goal of learned hallway passing behavior is to provide insight into lifelong learning in human robot interaction. This passing behavior takes place between a robot and a human bystander when they approach ‘head-on’ in a hallway. Both robot and bystander must deflect in order to avoid contact. And while the direction that the robot uses to pass can be programmed, the preferred passing direction of a bystander is an unknown preference. One that can experience a slow change over time and is subject to noise. And with any robot designed for lifelong human robot interaction, there are certain parameters that must be met to make the continued interactions a success. The robot behavior must remain within predictable bounds. The robot must be able to identify and correct errors without a human expert inputting the data. And the robot must meet user’s expectations of interactive ability. The learned hallway passing behavior is bounded with detectable errors, so the last constraint, meeting bystanders expectations is important. Meeting the bystander’s expectations means that the bystander does not attempt to pass along the same wall as the robot. It also means that the robot’s behavior must be predictable to a human, or put another way, robust to noise. And finally this is the reason that current solutions, which involve the robot stopping when it sees a bystander in the hallway, are not adequate. Since those behaviors imply to the bystander that the robot can have more meaningful interactions with them. To learn the human’s unknown preference in a robust way so that the robot can remain moving during interaction, an avoidance direction algorithm was developed, based off of, but more robust than the weighted majority algorithm. Like the weighted majority algorithm this avoidance direction algorithm does have error that is bounded by the amount of noise present in the system. In addition the avoidance direction algorithm’s robustness, measured by how often it switches preference for turning direction, is determined by the frequency of noise. And is adaptable. The more frequently noise shows up in the system the more robust the system becomes in ignoring that noise. For the initial test of this algorithm a Matlab simulation was created. It showed that the error observed was within the theoretical error bound. It also showed that the algorithm could adapt to the level of noise in a system. With the robot being able to learn how much noise is expected and ignore all noise under that level. When the noise became greater than that level, representing a human preference changing, the simulation was once again able to adapt to the direction.

Learned UUV Diver Interaction

The goal of learned UUV (Unmanned Underwater Vehicle) Diver interaction was to create a learning algorithm that would find a safe, intuitive, and dynamical way of communicating its desires with a diver. In the set up of this problem safe means that the UUV never engages in behavior that would be risky for the diver such as getting too close, moving too fast or losing sight of the diver. In addition intuitive means that a diver could understand the commands without having to memorize the UUV vocabulary beforehand, and dynamical means that the only way that the UUV can communicate is by using motions. A two-dimensional simulation environment was created in MatLab. The simulated UUV had a set of ‘words’, basic moves which it could use to maneuver around the diver. In addition a basic diver dynamics were modeled, and the diver could either directly approach or move away from the UUV. The diver dynamics were based on a function of ‘Attention’, which was determined based of the UUV’s position in the divers cone of attention, personal space, the direction the UUV is facing, and the change in acceleration of the UUV. These parameters were chosen based on previous HRI research. From here a Temporal Difference Q-Learning algorithm was developed for the UUV to learn how to lead a diver towards a goal location. The states chosen were all considered to be relative to the diver, and included the separation between the diver and the UUV, the relative UUV orientation, and the UUV velocity. The actions that the UUV could perform at each step were a set of words that consisted of moving towards or away from the diver, circling the diver, waving or not moving. The reward was based off of the movement of the diver towards a goal point and not breaking any of the safety constraints. The program was run for 5,000,000 trials. And resulted in sets of words that were linked together to form ‘sentences’ that could be seen throughout various relative start positions of goal and UUV. (see summary)

 

Figure 1.

 

Figure 2.

 

Figure 3.