BigQuery中拆解字符串并平展为多行
2021-02-25 本文已影响0人
Houtasu
我在介绍bigquery的半结构化数据中介绍过我们总表的结构,
但是有些命令的参数个数是不固定的,比如玩家在一次战斗中击杀的敌人种类,
有可能有一种,也有可能有两种三种...
这种在表中是直接用#号拼接在一起的,如下所示:

当需要统计海盗的数量时,这种拼接的就没法统计。
所以需要先把这个拼接的字符串变成两行数据,
这里要用到bigquery的平展数组。可以参考这篇文档。
WITH sequences AS
(SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS some_numbers
UNION ALL SELECT 3 AS id, [5, 10] AS some_numbers)
SELECT id, flattened_numbers
FROM sequences, sequences.some_numbers AS flattened_numbers;
+------+-------------------+
| id | flattened_numbers |
+------+-------------------+
| 1 | 0 |
| 1 | 1 |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 5 |
| 2 | 2 |
| 2 | 4 |
| 2 | 8 |
| 2 | 16 |
| 2 | 32 |
| 3 | 5 |
| 3 | 10 |
+------+-------------------+
上面这个例子中,本来是3行数据,some_numbers是一个数组,
然后直接把some_number接到from的后面,就可以把这个数组展开,并且和前面的id是绑定的。
参照这个例子,我们只需要使用split()函数把#号拆成数组,在接到from后面就可以变成两行了。
WITH
t1 AS (
SELECT
uid,
SPLIT(p.str,'#') AS pirates
FROM
`命令表`,
UNNEST(params) AS p
WHERE
cmd='PVE Win'
AND p.vn = 'v1')
SELECT
uid,
pirates
FROM
t1,
t1.pirates AS pirates