python结果批量导出excel（三）

2020-03-28 本文已影响0人 Hobbit的理查德

之前用stata 16的putexcel将结果批量导出到excel，考虑到python在自动化办公上有明显优势，因此，试着用python将结果批量导出excel。

stata结果导入excel-方差分析

python结果批量导出excel（一）：介绍了频率分析、交叉分析和多重响应结果批量导出excel；

python结果批量导出excel（二）：介绍了卡方检验，描述统计结果批量导出excel；

本篇介绍组间差异比较（F检验、T检验和事后比较）结果批量导出excel。

一、分析内容

F检验；
T检验；
事后比较；

目标：只需要修改字段名，实现组间差异比较（F检验、T检验和事后比较）结果批量导出excel。

二、具体代码

1. 导入第三方库及修改

import pandas as pd
import numpy as np
import scipy.stats as stats
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from  scipy.stats import ttest_ind, levene

# 修改值，文本转数值
def replace(df,colls):
    dfreplace=df[colls]
    dfreplace.replace('非常不符合',1 , inplace = True)
    dfreplace.replace('比较不符合',2 , inplace = True)
    dfreplace.replace('一般',3 , inplace = True)
    dfreplace.replace('比较符合',4 , inplace = True)
    dfreplace.replace('非常符合',5 , inplace = True)
    return dfreplace

2. 构造分组多个因变量均值结果函数

# 分组多个因变量的均值
def groupby_mean(df,row,colls):
    des=df.groupby(row).mean()[colls].reset_index()
    des[colls]=des[colls].applymap(lambda x:format(x,'.2f'))#结果保留2位小数
    # des=pd.DataFrame(des.values.T,index=des.columns,columns=des.index)
    return des

3. 构造方差检验函数

通过构造方差检验函数，返回结果F = ，p =，当然，结果呈现形式，可以根据需求自行构造。

#方差分析结果（F值，显著性），若想知道返回值，可以print(dir(model))
def anova_para(df,row,col):
    model=ols('df[col]~C(df[row])',data=df).fit()
    F=model.fvalue
    p=model.f_pvalue
    Fresult='F = '+format(F,'.2f')+' p = '+format(p,'.3f')
    return F,p,Fresult

4. 构造独立样本T检验函数

通过构造方差检验函数，返回结果t = ，p =，若方差不齐性，返回调整后的t值。结果呈现形式，可以根据需求自行构造。

# T检验结果（T值，显著性）
def T_indtest(df,row,col):
    groups=list(set(df[row]))
    x=df[df[row]==groups[0]][col].dropna()
    y=df[df[row]==groups[1]][col].dropna()
    lev_s,lev_p=levene(x,y)
    if lev_p>=0.05: #方差齐性
        T_s,T_p=ttest_ind(x, y,nan_policy='omit')
        Tresult='t = '+format(T_s,'.2f')+' p = '+format(T_p,'.3f')
    elif lev_p<0.05:#方差不齐性
        T_s,T_p=ttest_ind(x, y,nan_policy='omit',equal_var = False)
        Tresult='t = '+format(T_s,'.2f')+' p = '+format(T_p,'.3f')
    else:
        T_s,T_p,Tresult='NULL','Null','levene检验有误'
    return T_s,T_p,Tresult

5. 构造组间差异结果导出函数

将分组均值及相应检验结果导出，若组别大于2，导出F检验结果，若组别为2，导出T检验结果。

# 组间差异（F检验或T检验）在一个sheet
def anova_one_sheet(df,rowls,colls,sheetname,writer):
    start_row=0
    for r in rowls:
        gmean=groupby_mean(df,r,colls)
        gmean_colname=list(gmean.columns.values)
        if len(set(df[r]))>2:
            result_ls=['F检验']
            for c in colls:
                s,p,result=anova_para(df,r,c)
                result_ls.append(result)
        elif len(set(df[r]))==2:
            result_ls=['T检验']
            for c in colls:
                s,p,result=T_indtest(df,r,c)
                result_ls.append(result)
        Fresult=pd.DataFrame(result_ls)#结果转df
        Fresult=pd.DataFrame(Fresult.values.T)#转置
        Fresult.columns=gmean_colname #相同列名
        F_with_mean=pd.concat([gmean,Fresult],axis=0)
        F_with_mean=pd.DataFrame(F_with_mean.values.T,index=F_with_mean.columns,columns=F_with_mean.index) #转置
        F_with_mean.to_excel(writer,index=True, header = False,sheet_name=sheetname,startrow=start_row)
        start_row=start_row+F_with_mean.shape[0]+2
    writer.save()
    writer.close()

6. 构造事后检验函数

# 事后检验
def post_check(df,row,col):
    df=df.dropna()
    m_comp=pairwise_tukeyhsd(endog=df[col], groups=df[row], alpha=0.05)
    return m_comp

7. 构造事后检验导出函数

当F检验结果显著（p<0.05）时，将每个因变量的两两比较结果导出相应的sheet，sheetname为“两两比较_因变量字段前5个字符”

# 方差检验结果显著，将每个因变量的两两比较结果导出excel(因变量名字太长，截取5个字符作为sheetname)
def post_check_in_sheets(df,rowls,colls,writer):
    for r in rowls:
        if len(set(df[r]))>2:
            for c in colls:
                s,p,result=anova_para(df,r,c)
                if p<0.05:# 当F检验显著，导出两两比较至相应sheet
                    post_comp=post_check(df,r,c)
                    post_comp=pd.DataFrame(data=post_comp._results_table.data[1:], columns=post_comp._results_table.data[0])
                    post_comp.to_excel(writer,index=0,sheet_name='两两比较_'+c[:5])
                    writer.save()
                    writer.close()
        else:
            continue

8. 主函数并调用

其中，自变量为11个：因变量为：13个

def main():
    df=pd.read_excel('data.xlsx') # 数据源
    colnamels=list(df.columns.values) #打印出字段的索引和列名，方便检索
    # for i,c in enumerate(colnamels): 
    #     print(i,c)
        
    file_dir='result.xlsx' #输入excel文件
    writer=pd.ExcelWriter(file_dir) #用于追写excel
    
    dfreplace=replace(df,colls)#修改值，文本转数字
    newdf=pd.concat([df[colnamels[140:151]],dfreplace],axis=1) #多列拼接，形成新的df
    newdf_col_name=list(newdf.columns.values)
    newdf_rowls=newdf_col_name[0:11]
    newdf_colls=newdf_col_name[12:]
    anova_one_sheet(newdf,newdf_rowls,newdf_colls,sheetname='组间差异',writer=writer)
    post_check_in_sheets(newdf,newdf_rowls,newdf_colls,writer=writer)

main()

三、效果

将上述代码结合后，大约花费13s的时间将上述分析结果导出excel。

1. 组间差异

将分组均值、F检验、T检验结果导出

组间差异.gif

2. 事后两两比较

将每个因变量在不同组的F检验显著的两两比较结果导出相应sheet

事后两两比较.gif

python结果批量导出excel（一）：频率分析、交叉分析和多重响应

python结果批量导出excel（二）：卡方检验，描述统计

python结果批量导出excel（三）：组间差异比较（F检验、T检验和事后比较）