Closed-Loop Dynamic Control of a Soft Manipulator Using Deep Reinforcement Learning

Centurelli, A.; Arleo, L.; Rizzo, A.; Tolu, S.; Laschi, C.; Falotico, E.

doi:10.1109/LRA.2022.3146903

The focus of the research community in the soft robotic field has been on developing innovative materials, but the design of control strategies applicable to these robotic platforms is still an open challenge. This is due to their highly nonlinear dynamics which is difficult to model and the degree of stochasticity they often incorporate. Data-driven controllers based on neural networks have recently been explored as a viable solution to be employed for these manipulators. This letter presents a neural network-based closed-loop controller, trained by a deep reinforcement learning algorithm called Trust Region Policy Optimization (TRPO). The training takes place in simulation, using an approximation of the robot forward dynamic model obtained with a Long-short Term Memory (LSTM) network. The trained controller allows following different paths executed with different velocities in the workspace of the robot. The results demonstrate that the controller is effective in normal working conditions and with a payload attached to the end-effector of the manipulator.

Closed-Loop Dynamic Control of a Soft Manipulator Using Deep Reinforcement Learning

Centurelli A.;Arleo L.;Rizzo A.;Tolu S.;Laschi C.;Falotico E.

2022-01-01

Abstract

The focus of the research community in the soft robotic field has been on developing innovative materials, but the design of control strategies applicable to these robotic platforms is still an open challenge. This is due to their highly nonlinear dynamics which is difficult to model and the degree of stochasticity they often incorporate. Data-driven controllers based on neural networks have recently been explored as a viable solution to be employed for these manipulators. This letter presents a neural network-based closed-loop controller, trained by a deep reinforcement learning algorithm called Trust Region Policy Optimization (TRPO). The training takes place in simulation, using an approximation of the robot forward dynamic model obtained with a Long-short Term Memory (LSTM) network. The trained controller allows following different paths executed with different velocities in the workspace of the robot. The results demonstrate that the controller is effective in normal working conditions and with a payload attached to the end-effector of the manipulator.