【技术】用地铁刷卡数据识别个体职住

2019-11-10  本文已影响0人  pugkingup
0前言

大容量、全样本和多时间维度的地铁刷卡数据是研究城市交通和出行行为的重要样本。在研究之前,应该确定基于刷卡原始数据,界定工作-居住地点。笔者参考过往的文献中的提取方法(Long, 2015; Zhou, 2014; Gao, 2018; Lee, 2014),并对相应方法进行提炼。以一日(1 d)刷卡数据为例,笔者给出了相应的提取逻辑,同时完成了相应MATLAB代码。
如果您有任何疑问,欢迎联系本文作者。如果你使用了本文中的代码,不必告知作者,但请正确给出引用

1方法
1.1数据格式

刷卡数据不易获取,但格式都大同小异。笔者使用的刷卡数据包括5个最基本的字段:卡号、上车站、上车时间、下车站、下车时间。在MATLAB中,笔者把这些数据分别整理成了Category、Double、Datetime、Double和Datetime格式。
下图是小部分数据的截图,笔者的数据不可以共享,请不要联系笔者获取数据。截图仅为方便理解代码之用。

数据示例
1.2识别规则

每日职住识别

1.3编程实现

在具体的编程实现上,代码的编写与识别规则的描述基本一致,但是有许多具体的技巧,这些技巧的目的主要是为了让程序运行的更加快速。譬如,要注意到提取职住地是极为复杂-耗时的运算,应该一步一步缩小提取的对象,对于不可能提取的个体提前剔除(例如仅1次出行的)。
下面是每日职住的识别代码。对于该代码的表现,笔者给出初步的数据:在约300w条的一日刷卡记录下,笔者采用一般的个人台式电脑(4代i5,16gb内存,全固态硬盘)运行,大约可以在15个小时内提取出8w个微观个体的职住站点。因此,如果在更大的时间维度上进行提取,应该对代码进行相应的优化,具体的优化方法笔者将在后续的技术文章中给出。

%% Extract the unique individuals (提取个体,即准备识别对象的全集)
% Extract all unique users and their metro ride/rides.
CardData.CardID = categorical(CardData.CardID); % Transform the ID into catogorical format for the use of countcats function.
TbUser_All = table(unique(CardData.CardID)); % extract a table containing all metro riders in a day
TbUser_All.Properties.VariableNames{1} = 'CardID';
TbUser_All.Count = countcats(CardData.CardID);
TbUser = TbUser_All(TbUser_All.Count>= 2,:);% a metro-rider should take at least 2 rides in a day to extract a travel-chain.

% Display some results
clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'.','There are ',num2str(height(TbUser_All)),' unique users. ','Users take more than 2 rides count for ',num2str(height(TbUser)),'.##'];
disp(Text) %print the day, unique users and users that took more than 2 rides.
clear TbUser_All
clear Text

%% Generate the job station of all detectable individuals, meanwhile, delete the undetectable individuals from the table(TbUser_J) (在全集中提取可以被识别就业地的地铁乘客)
% First,calculate all unique riders' possible job station and duration. (TbUser 4 rows)
clear i;
clear ID;
for i = 1:5000
clear TravelRecord
    ID = TbUser{i,1};
    if TbUser{i,2} == 2 %for the users with only 2 records (which acccount for more than 90%), things can be more simple.
        TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
        
        clear T_Work; % the possible duration Time of Work
        clear S_Work; % the possible Station of Work
        T_Work = duration(00,00,00); %set the duration to 00:00:00
        if TravelRecord{1,5} == TravelRecord{2,3} %s1 in trip2 == s2 in trip1
            T_Work = TravelRecord{2,2}-TravelRecord{1,4};
            S_Work = TravelRecord{1,5};
        else
            S_Work = 0;% for station ID == 0, it represents no detectable results.
        end
        TbUser{i,3} = S_Work;
        TbUser{i,4} = T_Work;
        clear TravelRecord;
        clear T_Work;
        clear S_Work;
        
    elseif TbUser{i,2} >= 3 % when TbUser.Count > 2
        TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
        
        clear T_Work;
        clear T_Work0;
        clear S_Work;
        clear m;
        T_Work = duration(00,00,00);
        S_Work = 0;
        
        for m = 1:(TbUser{i,2}-1) % This loop is to find the longest staying station and calculate the duration
            if TravelRecord{m,5} == TravelRecord{(m+1),3}
                T_Work0 = TravelRecord{(m+1),2}-TravelRecord{m,4};%the staying time
                
                if T_Work0 >= T_Work
                    T_Work = T_Work0;
                    S_Work = TravelRecord{m,5};
                end
                clear T_Work0
            end
        end
        TbUser{i,3} = S_Work;
        TbUser{i,4} = T_Work;
        clear TravelRecord;
        clear T_Work;
        clear S_Work;
    end
end
clear i;
clear ID;
clear m;
TbUser.Properties.VariableNames{3} = 'S_Job';
TbUser.Properties.VariableNames{4} = 'T_Job';

% In this part, the TbUser has been calculated the possbile work station and work duration of all unique users.
T = duration(06,00,00);
TbUser_J = TbUser(TbUser.T_Job>T,:);%extract the unique records that takes a job duration more than 6 hours.
clear T;

% Print the results
clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'.','There are ',num2str(height(TbUser_J)),' job-detetable users.##'];
disp(Text) %print the day, unique users and users that took more than 2 rides.
clear TbUser
clear Text

%% Generate the home station of all detectable individuals, meanwhile, delete the undetectable ones (TbUser_JH)(提取他们的居住地)
clear i;
clear ID;
clear S_Home
for i = 1:height(TbUser_J)
    ID = TbUser_J{i,1};
    clear TravelRecord;
    TravelRecord = CardData(CardData.CardID == ID,:); % extract a table containing all riders records
    S_Home = TravelRecord{1,3};
    
    TbUser_J{i,5} = S_Home;
end
clear i;
clear ID;
clear S_Home;
clear TravelRecord;

TbUser_J.Properties.VariableNames{5} = 'S_Home';
TbUser_JH = TbUser_J;


clear Text
Text = ['##The day is the day ',num2str(y),' in year ',num2str(x+2014),'. The job-home location has been extracted successfully!##'];
disp(Text)
clear Text

clear TbUser_J;
clear x;
clear y;
clear CardData;

toc(代码结束,tic-toc用于计算时间,不需要可以删去)
资料
>  Long, Y., & Thill, J. C. (2015). Combining smart card data and household travel survey to analyze jobs-housing relationships in Beijing. Computers, Environment and Urban Systems, 53, 19–35. 
https://doi.org/10.1016/j.compenvurbsys.2015.02.005
*这篇文献详细介绍了本文采用的识别规则,并使用北京的数据给出了实例*
> Zhou, J., & Long, Y. (2014). Jobs-housing balance of bus commuters in Beijing: Exploration with large-scale synthesized smart card data. Transportation Research Record: Journal of the Transportation Research Board, 2418, 1–10. 
https://doi.org/10.3141/2418-01
*规则与数据均与Long(2015)的文章一致*
> Gao, Q.-L., Li, Q.-Q., Yue, Y., Zhuang, Y., Chen, Z.-P., & Kong, H. (2018). Exploring changes in the spatial distribution of the low-to-moderate income group using transit smart card data. Computers, Environment and Urban Systems, 72(July 2017), 68–77. 
https://doi.org/10.1016/j.compenvurbsys.2018.02.006
*这篇文章中采用的识别规则比较特别,笔者未采用*
> Lee, S. G., & Hickman, M. (2014). Trip purpose inference using automated fare collection data. Public Transport, 6(1–2), 1–20. 
https://doi.org/10.1007/s12469-013-0077-5
上一篇 下一篇

猜你喜欢

热点阅读