preprocess_ngsim.m NGSIM数据集数据预处理
Workflow and Hyperparameters
Workflow
1.输入ngsim原始数据
2.选取我们需要的数据
3.数据预处理
4.给横向和纵向移动打标签
5.给邻域车辆id打标签
6.划分训练集 验证集 测试集
7.剔除边缘数据
8.数据处理完成,保存mat文件
Hyperparameters
history traj 30ms
future traj 50ms
lateral behavior detaction past and future 40ms
grid is splitted with size 25x5 (8x7 feet for each grid) 25x7 (8x5 feet for each grid)
%% WORKFLOW
%{
1) Reading csv files
2) Parse fields including sptail grid and maneuver labels
3) Using unique vehicle ids to spilit train(70%)/validation(10%)/test(20%)
4) Only reserve those data sample with at least 3s history and 5s future
5) Save the dataset with a fixed 8Veh targets
Optional: filter on-ramp and off-ramp part or not. (Our result is obtained without filtering lane)
%}
%% Hyperparameters:
%{
30ms for history traj
50ms for future traj
past and future 40ms for lateral behavior detaction.
grid is splitted with size 25*5 (8x7 feet for each grid)
25*7 (8x5 feet for each grid)
%}
lane_filter = false;
grid_length=25; grid_width=5; cell_length=8; cell_width=7;
1.输入ngsim原始数据
分别读取ngsim中us-101和i-80国道三个时间段的txt数据,得到六组数据
%% 0.Inputs: Locations of raw_ngsim input files:
dataset_to_use = 6;
us101_1 = './raw_ngsim/us101-0750am-0805am.txt';
us101_2 = './raw_ngsim/us101-0805am-0820am.txt';
us101_3 = './raw_ngsim/us101-0820am-0835am.txt';
i80_1 = './raw_ngsim/i80-0400-0415.txt';
i80_2 = './raw_ngsim/i80-0500-0515.txt';
i80_3 = './raw_ngsim/i80-0515-0530.txt';
2.选取我们需要的数据
分别将1中得到的六组数据标记为六组traj,在六组traj中做截取选取我们需要的数据。这里作者选取的是:
% Loading 1:dataset id, 2:Vehicle id, 3:Frame index,
% 6:Local X, 7:Local Y, 15:Lane id.
% 10:v_length, 11:v_Width, 12:v_Class
% 13:Velocity (feet/s), 14:Acceleration (feet/s2).
因此,我们可以得到:
%% 1.Load data
disp('Loading data...')
% Add dataset id at the 1st column
traj{1} = load(us101_1);
traj{1} = single([ones(size(traj{1},1),1),traj{1}]);
traj{2} = load(us101_2);
traj{2} = single([2*ones(size(traj{2},1),1),traj{2}]);
traj{3} = load(us101_3);
traj{3} = single([3*ones(size(traj{3},1),1),traj{3}]);
traj{4} = load(i80_1);
traj{4} = single([4*ones(size(traj{4},1),1),traj{4}]);
traj{5} = load(i80_2);
traj{5} = single([5*ones(size(traj{5},1),1),traj{5}]);
traj{6} = load(i80_3);
traj{6} = single([6*ones(size(traj{6},1),1),traj{6}]);
% traj is the base for adding more infomation later.
for k = 1:dataset_to_use
traj{k} = traj{k}(:,[1,2,3,6,7,15,10,11,12,13,14]);
需要注意的是NGSIM数据集中本身的1st column是vehicle id,所以这一大串操作只是加了数据集id而已。
NGSIM
3.数据预处理
lane_filter为true则过滤所有驶入匝道和驶出匝道的车辆
% @@ Filter all vehicles in the parts of on-ramp and off-ramp
if lane_filter
fprintf( 'Dataset-%d #data: %d ==>> ', k, size(traj{k}, 1));
traj{k} = traj{k}(traj{k}(:, 6) < 7, :);
fprintf( '%d after filtering lane>6 \n', size(traj{k}, 1));
else
% Prev: US101 make all lane id >= 6 to 6.
if k <=3
traj{k}( traj{k}(:,6)>=6,6 ) = 6;
end
end
为maneuver labels (2 columns) and grid (25x5=125 grid_cells columns)留出空间
% Leave space for maneuver labels (2 columns) and grid (grid_cells columns)
traj{k} = [ traj{k}(:,1:6), zeros(size(traj{k},1),2), traj{k}(:,7:11), zeros(size(traj{k},1),grid_cells) ];
将车辆的中心作为位置
% Use the vehilce's center as its location
offset = zeros(1,dataset_to_use);
for k = 1:dataset_to_use
traj{k}(:,5) = traj{k}(:,5) - 0.5*traj{k}(:,9);
offset(k) = min(traj{k}(:,5));
if offset(k) < 0
% To make the Y location > 0
traj{k}(:,5) = traj{k}(:,5) - offset(k);
end
end
这一段代码也比较好理解,用Y坐标减去车长或者车宽的1/2(具体是车长还是车宽我不清楚取决于数据集的标记方式),得到车辆中心的Y坐标,而不是角点坐标。
但这里最初看的时候我有两个问题:
为什么只对Y坐标进行了处理?而X坐标没有动
代码中写的traj{k}(:,5)代表Local Y车辆的局部Y坐标,而0.5traj{k}(:,9)代表v_Class车辆类型,为啥是这俩相减,不应该是traj{k}(:,8)代表的v_Width车辆宽度或者traj{k}(:,7)代表的v_Width车辆长度吗?
其实当我仔细看了后面一部分和前面一部分问题就解决了。只对Y坐标进行处理得到车体中心的Y坐标而没有管X坐标是因为后面我们主要用的是Y坐标X坐标根本没用;至于后面的traj{k}(:,9)是因为上一步为打标签在6-7column之间留出了2 columns的空间,所以现在traj{k}(:,9)相当于之前的traj{k}(:,7),也就是车体Y坐标-0.5车长得到车体中心的Y坐标。
4.给横向和纵向移动打标签
针对每一个vehId代表的具体车辆,提取其全部轨迹traj,得到其每一时刻的lane id也就是所在车道线id
poolobj = parpool(dataset_to_use);
parfor ii = 1:dataset_to_use % Loop on each dataset.
% for ii = 1:dataset_to_use
tic;
disp(['Now process dataset ', num2str(ii)])
% Loop on each row.
for k = 1:length(traj{ii}(:,1))
% Refresh the process every 1 mins
if toc > 60
fprintf( 'Dataset-%d: Complete %.3f%% \n', ii, k/length(traj{ii}(:,1))*100 );
tic;
end
dsId = ii;
vehId = traj{ii}(k,2);
time = traj{ii}(k,3);
% Get all rows about this vehId
vehtraj = traj{ii}(traj{ii}(:,2)==vehId, : );
% Get the row index of traj at this frame.
ind = find(vehtraj(:,3)==time);
ind = ind(1);
lane = traj{ii}(k,6);
接下来比对将来、现在、过去车辆所在的车道线位置,即6th column的lane id,判断车辆左转还是右转,在7th column的预留位打标签。
ind代表当前帧时间,ub代表40帧以后的时间也就是将来时间,lb代表40帧之前也就是过去时间。
% Lateral maneuver in Column 7:
ub = min(size(vehtraj,1),ind+40); %Upper boundary (+40 frame)
lb = max(1, ind-40); %Lower boundary (-40 frame)
if vehtraj(ub,6)>vehtraj(ind,6) || vehtraj(ind,6)>vehtraj(lb,6) %(prepate to turn or stablize after turn)
traj{ii}(k,7) = 3; % Turn Right==>3.
elseif vehtraj(ub,6)<vehtraj(ind,6) || vehtraj(ind,6)<vehtraj(lb,6)
traj{ii}(k,7) = 2; % Turn Left==>2.
else
traj{ii}(k,7) = 1; % Keep lane==>1.
end
接下来比对将来、现在、过去的行驶速度,即delta Y(根据5th column的Y坐标计算)/delta t,判断车辆是正常行驶还是减速刹车,在8th column的预留位打标签。
ind代表当前帧时间,ub代表50帧以后的时间也就是将来时间,lb代表30帧之前也就是过去时间。
% Longitudinal maneuver in Column 8:
ub = min(size(vehtraj,1),ind+50); % Future boundary (+50 frame)
lb = max(1, ind-30); % History boundary (-30 frame)
if ub==ind || lb ==ind
traj{ii}(k,8) = 1; % Normal==>1
else
vHist = (vehtraj(ind,5)-vehtraj(lb,5))/(ind-lb);
vFut = (vehtraj(ub,5)-vehtraj(ind,5))/(ub-ind);
if vFut/vHist <0.8
traj{ii}(k,8) = 2; % Brake==> 2
else
traj{ii}(k,8) = 1; % Normal==>1
end
end
5.给邻域车辆id打标签
选出在当前帧时刻存在的所有车辆,以ego车辆作为中心以0.5xgrid_lengthxcell_length为长以0.5xgrid_widthxcell_width为宽画长方形,判断其他车辆是否在长方形内,若是则成为邻域车辆。精确计算出邻域车辆的网格位置(grid location)并将二维index展成一维index called exactGridLocation,根据exactGridLocation在14th column及以后的预留位记录邻域车辆的id。
% Get grid locations in Column 14~13+grid_length*grid_width (grid_cells, each with cell_length*cell_width):
centVehX = traj{ii}(k,4);
centVehY = traj{ii}(k,5);
gridMinX = centVehX - 0.5*grid_width*cell_width;
gridMinY = centVehY - 0.5*grid_length*cell_length;
otherVehsAtTime = traj{ii}( traj{ii}(:,3)==time , [2,4,5]); % Only keep the (vehId, localX, localY)
otherVehsInSizeRnage = otherVehsAtTime( abs(otherVehsAtTime(:,3)-centVehY)<(0.5*grid_length*cell_length) ...
& abs(otherVehsAtTime(:,2)-centVehX)<(0.5*grid_width*cell_width) , :);
if ~isempty(otherVehsInSizeRnage)
% Lateral and Longitute grid location. Finally exact location is saved in the 3rd column;
otherVehsInSizeRnage(:,2) = ceil((otherVehsInSizeRnage(:,2) - gridMinX) / cell_width);
otherVehsInSizeRnage(:,3) = ceil((otherVehsInSizeRnage(:,3) - gridMinY) / cell_length);
otherVehsInSizeRnage(:,3) = otherVehsInSizeRnage(:,3) + (otherVehsInSizeRnage(:,2)-1) * grid_length;
for l = 1:size(otherVehsInSizeRnage, 1)
exactGridLocation = otherVehsInSizeRnage(l,3);
if exactGridLocation ~= grid_cent_location % The center gird location is kept to NONE
traj{ii}(k,13+exactGridLocation) = otherVehsInSizeRnage(l,1);
end
end
end
6.划分训练集 验证集 测试集
将所有的traj安装0.7:0.1:0.2的比例划分为训练集 验证集 测试集,并得到traj去掉前两个维度的track(没有datasetID, carID)
%% 3.Merge and Split train, validation, test
disp('Splitting into train, validation and test sets...')
% Merge all datasets together.
trajAll = [];
for i = 1:dataset_to_use
trajAll = [trajAll; traj{i}];
fprintf( 'Now merge %d rows of data from traj{%d} \n', size(traj{i},1), i);
end
clear traj;
% Training, Validation and Test dataset (Everything together)
trajTr = [];
trajVal = [];
trajTs = [];
for k = 1:dataset_to_use % Split the vehilce's trajectory in all dataset_to_use
% Cutting point: 0.7*max vehilecId (70% Training set)
ul1 = round(0.7* max( trajAll(trajAll(:,1)==k,2) ));
% Cutting point: 0.8*max vehilecId (20% Test set)
ul2 = round(0.8* max( trajAll(trajAll(:,1)==k,2) ));
% Extract according to the vehicle ID
trajTr = [trajTr; trajAll(trajAll(:,1)==k & trajAll(:,2)<=ul1, :) ];
trajVal = [trajVal; trajAll(trajAll(:,1)==k & trajAll(:,2)>ul1 & trajAll(:,2)<=ul2, :) ];
trajTs = [trajTs; trajAll(trajAll(:,1)==k & trajAll(:,2)>ul2, :) ];
end
% Merging all info together in tracks
% The neighbour existence problem is addressed
tracks = {};
for k = 1:dataset_to_use
trajSet = trajAll(trajAll(:,1)==k,:);
carIds = unique(trajSet(:,2)); % Unique Vehicle ID, then get a cell for each car.
for l = 1:length(carIds)
% The cell in {datasetID, carID} is placed with (11+grid_cells)*TotalFram.
tracks{k,carIds(l)} = trajSet( trajSet(:,2)==carIds(l), 3:end )';
end
end
7.剔除边缘数据
因为用到过去3s和未来5s的数据做预测,需要剔除边缘数据,即当前帧至少为31帧且未来至少要有50帧。
%% 4.Filter edge cases:
disp('Filtering edge cases...')
% Flag for whether to discard this row of dataraw_folder
indsTr = zeros(size(trajTr,1),1);
indsVal = zeros(size(trajVal,1),1);
indsTs = zeros(size(trajTs,1),1);
% Since the model uses 3 sec of trajectory history for prediction, and 5s
% future for planning, therefore the reserve condition for each row of data:
% 1) this frame t should be larger than the 31st id, and
% 2) has at least 5s future.
for k = 1: size(trajTr,1) % Loop on each row of traj.
t = trajTr(k,3);
if tracks{trajTr(k,1),trajTr(k,2)}(1,31) <= t && tracks{trajTr(k,1),trajTr(k,2)}(1,end)>= t+50
indsTr(k) = 1;
end
end
trajTr = trajTr(find(indsTr),:);
for k = 1: size(trajVal,1)
t = trajVal(k,3);
if tracks{trajVal(k,1),trajVal(k,2)}(1,31) <= t && tracks{trajVal(k,1),trajVal(k,2)}(1,end)>= t+50
indsVal(k) = 1;
end
end
trajVal = trajVal(find(indsVal),:);
for k = 1: size(trajTs,1)
t = trajTs(k,3);
if tracks{trajTs(k,1),trajTs(k,2)}(1,31) <= t && tracks{trajTs(k,1),trajTs(k,2)}(1,end)>= t+50
indsTs(k) = 1;
end
end
trajTs = trajTs(find(indsTs),:);
8.数据处理完成,保存mat文件
%% Save mat files:
% traj : n*(13+grid_cells), n is the data number.
% tracks: 6*maxVehicleId, each cell is specified for (datasetId, vehicleId), with size (11+grid_cells)*totalFramNum.
disp('Saving mat files...')
% Save raw data
save(strcat(raw_folder,'gridTrainAround'), 'trajTr', 'tracks','-v7.3');
save(strcat(raw_folder,'gridValAround'), 'trajVal','tracks','-v7.3');
save(strcat(raw_folder,'gridTestAround'), 'trajTs', 'tracks','-v7.3');
% Save post-processed data
fprintf( '### Train data: \n');
traj = nbhCheckerFunc(trajTr, tracks);
save(strcat(post_folder,'gridTrainAround'),'traj','tracks','-v7.3');
traj = targSpecFunc(traj, tracks);
save(strcat(fix_tar_folder,'gridTrainAround'),'traj','tracks','-v7.3');
fprintf( '### Validation data: \n');
traj = nbhCheckerFunc(trajVal, tracks);
save(strcat(post_folder,'gridValAround'),'traj','tracks','-v7.3');
traj = targSpecFunc(traj, tracks);
save(strcat(fix_tar_folder,'gridValAround'),'traj','tracks','-v7.3');
fprintf( '### Test data: \n');
traj = nbhCheckerFunc(trajTs, tracks);
save(strcat(post_folder,'gridTestAround'),'traj','tracks','-v7.3');
traj = targSpecFunc(traj, tracks);
save(strcat(fix_tar_folder,'gridTestAround'),'traj','tracks','-v7.3');
fprintf('Complete');