2018-06-26 强化学习
- Deep Drone Racing: Learning Agile Flight in Dynamic Environments [1]
深度无人机竞速:在复杂环境中学习敏捷飞行(方法)
文章地址
In this paper, we consider these challenges in the context of autonomous, vision-based drone racing in dynamic environments. Our approach combines a convolutional neural network (CNN) with a state-of-the-art path-planning and control system. The CNN directly maps raw images into a robust representation in the form of a waypoint and desired speed. This information is then used by the planner to generate a short, minimum-jerk trajectory segment and corresponding motor commands to reach the desired goal.
本文中,我们研究了在复杂环境中,基于视觉的无人机自主竞速中的挑战。我们使用了将CNN与路径规划,控制系统相结合的方法。CNN网络输入为原始图像,输出为期望航点和航点速度。航点与航点速度接下来交由规划器生成一条最小痉挛度的路径和相应的电机指令,使飞行器到达指定目标。
- Real-time gait planner for human walking using a lower limb exoskeleton and its implementation on Exoped robot [1]
下肢外骨骼实时人类行走步态规划与在()机器人上的实现
文章地址
In this paper, we present a real-time walking pattern generation method which enables changing the walking parameters during the stride. For this purpose, two feedback controlled third order systems are proposed as optimal trajectory planners for generating the trajectory of the x and y components of each joints position. The boundary conditions of the trajectories are obtained according to some pre-considered walking constraints. In addition, a cost function is intended for each trajectory planner in order to increase the trajectories smoothness.
本文中,我们设计了一种实时的步态生成方法,这种方法允许在迈步过程中改变行走参数。为了实现这一目标,设计了两个三阶反馈系统作为最优轨迹规划器来生成每个关节的x,y轨迹。关节运动轨迹的边界条件通过预先给定的约束给出。另外,为了保证轨迹的平滑性,每个轨迹规划器都设计有一个代价函数。
- Human-Interactive Subgoal Supervision for Efficient Inverse Reinforcement Learning [1]
高效逆强化学习的人类可交互的子目标监督(方法)
文章地址
This paper analyzes the benefit of incorporating a notion of subgoals into Inverse Reinforcement Learning (IRL) with a Human-In-The-Loop (HITL) framework. The learning process is interactive, with a human expert first providing input in the form of full demonstrations along with some subgoal states.
本文分析了在逆强化学习的人类在环架构中引入子目标的概念的优点。学习过程是可交互的,首先由人类专家进行展示,并给出几种子目标状态。