Object Rearrangement in Clutter for Mobile Manipulator Using Hybrid Soft Actor-Critic Method
- Authors
- Lee, Jin hwi; Kang, Seunghyun; Kim, Chang Hwan
- Issue Date
- 2023-07-24
- Publisher
- ECCOMAS Thematic Conference on Multibody Dynamics
- Citation
- 11th ECCOMAS Thematic Conference on Multibody Dynamics
- Abstract
- 1 Introduction
Bringing an object to people is an essential service in robot manipulation. A robot is often requested to plan an obstacle rearrangement task and execute it to grasp a target object when objects are placed in narrow spaces, such as cupboards, refrigerators,
and shelves. Figure 3 shows an example environment with objects on the shelf and a mobile manipulator. The robot system
has a multibody system as a combination of a manipulator with multiple DoF, a mobile device, and a vision system. In such
environments, objects could be grasped from the side or front because the boards obstruct the manipulator from grasping from
the top. Especially when objects stand around a target object in the environment, a robot must move obstacles aside and grasp the
target object. In this kind of dense environment, a task and motion planner should determine which objects to move and where
to relocate them before grasping a target.
2 Method
We employ one of the actor-critic methods because it deals with the continuous action space by learning the policy itself, not
calculating the state or Q-value function. However, such a value-based method as DQN evaluates the state or Q-value function
using the Bellman equation, which is hard to be calculated in the continuous space. In our problem, we apply hybrid Soft
Actor-Critic (hybrid SAC) in [1] to simultaneously consider the two characteristics of the object rearrangement task and motion
planning (OR-TAMP) problem. As shown in Fig. 1, the actor returns the mean and the standard deviation from a state s identical
to the standard SAC algorithm in [2]. The actor also returns a discrete value by sharing the hidden layer, which is different from
the standard SAC.
3 Experiments
We compare the simulation results of the proposed method with those of the method suggested in [3] as a baseline. For comparing
the two methods, the same object configurations are used, and the outputs from both methods are the actions of rearrangement
(i.e., which obstacles to move and where to relocate them). We perform the experiments with the 50 random instances for each of
N = 5,10,15, where N is the number of objects. The results are shown in Fig. 2. The proposed method relocates fewer obstacles
and shows higher success rates than the baseline method. Especially for a very dense environment due to many objects in a fixed
workspace, the baseline method may often fail to find feasible relocation positions, and it could not grasp the target object finally.
4 Conclusions
We propose a reinforcement learning model based on hybrid SAC to deal with the two characteristics of the OR-TAMP problem:
One is to determine which obstacles to move among multiple objects, which is formulated in a discrete space, and another is
to obtain where to relocate the obstacles in the continuous workspace. The method works faster than the baseline and plans
more feasible actions. We observe that the method attempts more actions for rearrangement in denser object configurations as
expected. The actions planned by the method are executed using a real robot within acceptable runtime.
- URI
- https://pubs.kist.re.kr/handle/201004/76409
- Appears in Collections:
- KIST Conference Paper > 2023
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.