每周500字大数据 爬虫Python AI Sql机器学习与计算机视觉

AI 笔记 Week 15 Planning under unc

2018-12-02  本文已影响1人  我的名字叫清阳

Link to the videos
Link to the video transcripts

Introduction

This lecture will focus on marrying planning and uncertainty together to drive robots in actual physical roles and find good plans for these robots to execute.

Planning Under Uncertainty MDP

Methods categorized based on the characteristics of the world (observability and certainty).

MDP

Robot Tour Guide Examples

All these robots need to deal with uncertainties and observabilities to do their jobs (tour guide or mine explorer).

MDP Grid World

image.png

Absorbing states: search will end if the agent is at the absorbing states.
Policy assign action based on the state the agent is in.

Problems With Conventional Planning 1

Policy Question


Question: what is the best action to take when an agent is in states a1, c1, c4 and b3?

MDP And Costs

The reason that the agent should be avoiding the b4 state is the cost.


Value Iteration

Intuition of VI functions

Quiz

Determistic question:


I Quiz: calculate Value when an agent was at each state.

Stochastic Question :


Calculation is complicated here because the reward of each action must be evaluated.

Value Iterations And Policy

The policy can be defined by the value function after the value of each cell is calculated. The action policy is to choose the action which leads to the highest path reward.

If the cost of each state is positive, the policy will encourage action to stay in the current state.


If the cost is too low, the value of each state might become so low that the agent will try to end the search as soon as possible without looking for an optimal solution

MDP Conclusion

Conclusion

POMDP Vs MDP

POMDP

conventional planning MDP POMDP POMDP wouldn't work

POMDP would not work when there are two worlds
So here's a solution that doesn't work: Obviously, the agent might be in 2 different worlds and it does not know. Solving the problem for both of these cases and then put these solutions together will not work because the average result will never let it go south to gather information.

POMDP will work

POMDP on belief states will work. If the agent goes south and reaches the sign. 50% chance it will go the right-side belief state. if MDP was performed, then it will reach the +100 state. the same will happen if it goes to left-side belief state (50% chance).

Readings on Planning under Uncertainty

AIMA: Chapter 17

Further Study

Charles Isbell and Michael Littmann’s ML course:

Peter Norvig and Sebastian Thrun’s AI course:

2018-12-01 First draft
上一篇下一篇

猜你喜欢

热点阅读