PowerBi排序的思考
2020-03-20 本文已影响0人
BI罗
Rank函数只能对单独一张表进行排序,假如一张fact表,多张维表,按照维表的组合进行排序就做不到了
表的关系如下

Rank可以针对单张维表进行排序
rankx2 = RANKX(ALL(pd_code[PD_code]),[sumof])
rankx1 = RANKX(ALL(salse_region[Sales_Region]),[sumof])

维表组合的算法如下
rank2_pd_region_1 =
var s=[sumof]
var x1= CALCULATETABLE(SUMMARIZE('One_M_TY',
salse_region[Sales_Region],pd_code[PD_code],"sum",SUM('One_M_TY'[value_integer_decimal])),
ALL(salse_region[Sales_Region]),ALL(pd_code[PD_code]))
var x2=DISTINCT(SELECTCOLUMNS(x1,"sum",[sum]))
var x3=ADDCOLUMNS(x2,"rank",RANKX(x2,[sum]))
var t=SELECTCOLUMNS(FILTER(x3,[sum]=s),"a",[sum],"x",[rank])
return SUMX(t,[x])
一开始遇到的业务问题是对一张有一亿数据的表以及三张维表组成扩展表进行排序,rankx函数耗费了大量的计算力,测试都难以进行,我就在想是否可以单独拉出来最后的sum结果进行排序,然后再返回排序结果,得到两个结论:
1.这种方法是普通的rankx函数的计算时间的两倍,不过10万以内的非重复维表的排序的时间还是可以接受的
2.意外之喜,居然可以将维表组合排序
下面两个算法是等价的,思考也从这两个算法开始
普通的rankx
rank_pd = RANKX(ALL('One_M_TY'[PD_code]),[sumof])
rank2_pd_new =
var s=[sumof]
var x1= SUMMARIZE(ALL(pd_code[PD_code]),pd_code[PD_code],"sum",SUM('One_M_TY'[value_integer_decimal]))
var x2=DISTINCT(SELECTCOLUMNS(x1,"sum",[sum]))
var x3=ADDCOLUMNS(x2,"rank",RANKX(x2,[sum]))
var t=SELECTCOLUMNS(FILTER(x3,[sum]=s),"a",[sum],"x",[rank])
return SUMX(t,[x])
3.更好玩的返回前n名
rank2_pd_region_x =
var s=[sumof]
var x1= CALCULATETABLE(SUMMARIZE('One_M_TY',salse_region[Sales_Region],pd_code[PD_code],"sum",SUM('One_M_TY'[value_integer_decimal])),ALL(salse_region[Sales_Region]),ALL(pd_code[PD_code]))
var x2=DISTINCT(SELECTCOLUMNS(x1,"sum",[sum],"pdcode",[PD_code]))
var x3=ADDCOLUMNS(x2,"rank",RANKX(x2,[sum]))
var t=SELECTCOLUMNS(FILTER(x3,[sum]=s&&[rank]<=3),"a",[sum],"x",[rank],"p",[pdcode])
return SUMX(t,[x])

直接返回排名前三的文本,这个做销售分析的时候可以返回某个类别买的最好的3个
rank2_pd_region_p =
var s=[sumof]
var x1= CALCULATETABLE(SUMMARIZE('One_M_TY',salse_region[Sales_Region],pd_code[PD_code],"sum",SUM('One_M_TY'[value_integer_decimal])),ALL(salse_region[Sales_Region]),ALL(pd_code[PD_code]))
var x2=DISTINCT(SELECTCOLUMNS(x1,"sum",[sum],"pdcode",[PD_code]))
var x3=ADDCOLUMNS(x2,"rank",RANKX(x2,[sum]))
var t=SELECTCOLUMNS(FILTER(x3,[sum]=s&&[rank]<=3),"a",[sum],"x",[rank],"p",[pdcode])
return MAXX(t,[p])

maxx也可以代替sumx函数
rank2_pd_region_p_maxx =
var s=[sumof]
var x1= CALCULATETABLE(SUMMARIZE('One_M_TY',salse_region[Sales_Region],pd_code[PD_code],"sum",SUM('One_M_TY'[value_integer_decimal])),ALL(salse_region[Sales_Region]),ALL(pd_code[PD_code]))
var x2=SELECTCOLUMNS(x1,"sum",[sum],"pdcode",[PD_code])
var x3=ADDCOLUMNS(x2,"rank",RANKX(x2,[sum]))
var t=SELECTCOLUMNS(FILTER(x3,[sum]=s&&[rank]<=3),"a",[sum],"x",[rank],"p",[pdcode])
return MAXX(t,[p])

================================================
SUBSTITUTEWITHINDEX求法:

rank2_pd_region_sub =
VAR s = [sumof]
VAR x1 =
CALCULATETABLE (
SUMMARIZE (
'One_M_TY',
sales_region[Sales_Region],
pd_code[PD_code],
"sum", SUM ( 'One_M_TY'[value_integer_decimal] )
),
ALL ( sales_region[Sales_Region] ),
ALL ( pd_code[PD_code] )
)
VAR x2 =
DISTINCT ( SELECTCOLUMNS ( x1, "sum", [sum] ) )
VAR x3 =
SELECTCOLUMNS ( x1, "sum", [sum] )
var x4=ADDCOLUMNS(x3,"sum2",[sum])
VAR t =
SUBSTITUTEWITHINDEX(x4,"index",x2,-[sum],ASC)
var t2=SELECTCOLUMNS(FILTER(t,[sum2]=s),"sum",[sum2],"index",[index]+1)
return SUMX(t2,[index])
=============================================================================
既然maxx和sumx函数可以这样做,但是又为什么x系列函数可以呢
一个有用的特性还是一个奇怪的特性,这是成神的前奏--------------马克大佬说的
原来
1.x系列函数是可以迭代函数,意思是其可以创造行上下文,逐步计算每一行
2.上面的公式我本想直接用sum,或者calculate+max返回结果,实际上是不行的,因为构造表函数计算出来的表是没有行上下文