Jack's Car Rental

2019-05-30  本文已影响0人  捡个七

Jack's Car Rental 是一个经典的应用马尔可夫决策过程(MDP)的问题。下图来自 Reinforcement Learning: An Introduction 这本书的 98 页。

C 语言编程课的 Term Project 就是解决这个问题,在这里记录一下相关内容。

Jack 租车问题概述如下:

从上面提供的信息中,可以分析出以下内容:

描绘成 MDP 如下所示:

MDP JackCarRenral
    states cars[2]
    actions moveCars
    reward r
    discount rate 0.9
    vaild states { 0 <= cars <= 20 } 
    vaild actions {-5 <= moveCars <= 5}
    environment
        random requests [poisson(3); poisson(4)]
        random dropoffs [poisson(3); poisson(2)]
        nextMorn = min(cars + [-moveCars, moveCars], 20)
        satisfied_req = min(requests, nextMorn)
        new_cars = nextMorn + dropoff - satisfied_req
        new_cars = max(new_cars, 0)
        new_cars = min(new_cars, 20)
        r = satisfied_req * 10 - 2 * abs(moveCars)

初始化

一旦指定了 λ 的值,我们就可以计算出两地的 prob 和 reward。

/* Initialization */
void init_probs_rewards(double probs[next_morning][ncar_states], double rewards[next_morning], 
double l_reqsts, double l_drpffs){
    double req_prob;
    double drp_prob;
    int satisfied_req; // the number of satisfied requests
    int new_n;
    for(int req = 0; (req_prob = poisson(req, l_reqsts)) > theta; req++){
        for(int n = 0; n < next_morning; n++){
            satisfied_req = min(req, n); // at most, all the cars avaliable
            rewards[n] += RENTAL_INCOME * req_prob * satisfied_req; 
        }
        for (int drp = 0; (drp_prob = poisson(drp, l_drpffs)) > theta; drp++){
            for (int  m = 0; m < next_morning; m++){
                satisfied_req = min(req, m);
                new_n = m + drp - satisfied_req; 
                new_n = max(new_n, 0); // 0 at least
                new_n = min(20, new_n); // 20 at most
                probs[m][new_n] += req_prob * drp_prob; // add up the joint probability
            }
        }

策略迭代

Jack 租车的问题,已经完全知道了 MDP,现在就是利用这些已知环境信息的情况下找出最优的策略。

找到最优策略的方法大致可以表述为:

这部分的笔记具体可以查看这篇文章的前半部分(动态规划):Reinforcement Learning笔记(2)--动态规划与蒙特卡洛方法

结果

策略迭代
最优策略的值函数

参考

[1]. http://lumiere.ens.fr/~dmarti01/software/jack.pdf
[2]. http://incompleteideas.net/book/code/jacks.lisp
[3]. https://www.youtube.com/watch?v=Nd1-UUMVfz4&t=2179s
[4]. http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/DP.pdf

上一篇 下一篇

猜你喜欢

热点阅读