Lee, Jin hwi Kang, Seunghyun Kim, Chang Hwan 2024-01-12T02:45:37Z 2024-01-12T02:45:37Z 2023-11-30 2023-07-24 https://pubs.kist.re.kr/handle/201004/76409 1 Introduction Bringing an object to people is an essential service in robot manipulation. A robot is often requested to plan an obstacle rearrangement task and execute it to grasp a target object when objects are placed in narrow spaces, such as cupboards, refrigerators, and shelves. Figure 3 shows an example environment with objects on the shelf and a mobile manipulator. The robot system has a multibody system as a combination of a manipulator with multiple DoF, a mobile device, and a vision system. In such environments, objects could be grasped from the side or front because the boards obstruct the manipulator from grasping from the top. Especially when objects stand around a target object in the environment, a robot must move obstacles aside and grasp the target object. In this kind of dense environment, a task and motion planner should determine which objects to move and where to relocate them before grasping a target. 2 Method We employ one of the actor-critic methods because it deals with the continuous action space by learning the policy itself, not calculating the state or Q-value function. However, such a value-based method as DQN evaluates the state or Q-value function using the Bellman equation, which is hard to be calculated in the continuous space. In our problem, we apply hybrid Soft Actor-Critic (hybrid SAC) in [1] to simultaneously consider the two characteristics of the object rearrangement task and motion planning (OR-TAMP) problem. As shown in Fig. 1, the actor returns the mean and the standard deviation from a state s identical to the standard SAC algorithm in [2]. The actor also returns a discrete value by sharing the hidden layer, which is different from the standard SAC. 3 Experiments We compare the simulation results of the proposed method with those of the method suggested in [3] as a baseline. For comparing the two methods, the same object configurations are used, and the outputs from both methods are the actions of rearrangement (i.e., which obstacles to move and where to relocate them). We perform the experiments with the 50 random instances for each of N = 5,10,15, where N is the number of objects. The results are shown in Fig. 2. The proposed method relocates fewer obstacles and shows higher success rates than the baseline method. Especially for a very dense environment due to many objects in a fixed workspace, the baseline method may often fail to find feasible relocation positions, and it could not grasp the target object finally. 4 Conclusions We propose a reinforcement learning model based on hybrid SAC to deal with the two characteristics of the OR-TAMP problem: One is to determine which obstacles to move among multiple objects, which is formulated in a discrete space, and another is to obtain where to relocate the obstacles in the continuous workspace. The method works faster than the baseline and plans more feasible actions. We observe that the method attempts more actions for rearrangement in denser object configurations as expected. The actions planned by the method are executed using a real robot within acceptable runtime. English ECCOMAS Thematic Conference on Multibody Dynamics Object Rearrangement in Clutter for Mobile Manipulator Using Hybrid Soft Actor-Critic Method Conference 1 11th ECCOMAS Thematic Conference on Multibody Dynamics 11th ECCOMAS Thematic Conference on Multibody Dynamics PO Lisbon, Portugal 2023-07-24 Proceedings of ECCOMAS Thematic Conference on Multibody Dynamics