STATA的因子变量
2022-11-13 本文已影响0人
冬之心
定义
因子变量(factor variable)是对现有变量的展开,即展开成一组变量。常用于从分类变量中创建虚拟变量。注意,带因子变量操作符的分类变量的取值必须是非0的正整数,不能存在小于0的负数。
因子变量运算符
Operator | Description | 说明 |
---|---|---|
i. | unary operator to specify indicators | 指定为分类变量各类别 |
c. | unary operator to treat as continuous | 指定为连续变量 |
o. | unary operator to omit a variable or indicator | 忽略一个变量或类别 |
# | binary operator to specify interactions | 交互 |
## | binary operator to specify full-factorial interactions | 全因子交互 |
例子
Factor specification | Result |
---|---|
i.group | indicators for levels of group |
i.group#i.sex | indicators for each combination of levels of group and sex, a two-way interaction |
group#sex | same as i.group#i.sex |
group#sex#arm | indicators for each combination of levels of group, sex, and arm, a three-way interaction |
group##sex | same as i.group i.sex group#sex |
group##sex##arm | same as i.group i.sex i.arm group#sex group#arm sex#arm group#sex#arm |
sex#c.age | two variables—age for males and 0 elsewhere, and age for females and 0 elsewhere; if age is also in the model, one of the two virtual variables will be treated as a base |
sex##c.age | same as i.sex age sex#c.age |
c.age | same as age |
c.age#c.age | age squared |
c.age#c.age#c.age | age cubed |
基准类别
默认为组1(取值最小的组别)为基准类别。指定基准类别,使用操作符ib.
Base operator [1] | Description | 说明 |
---|---|---|
ib#. | use # as base, # = value of variable | 指定值 |
ib(##). | use the #th ordered value as base [2] | 指定次序值 |
ib(first). | use smallest value as base (default) | 指定最小值,即第一组 |
ib(last). | use largest value as base | 指定最大值,即最后一组 |
ib(freq). | use most frequent value as base | 指定频数最高 |
ibn. | no base level | 没有基准项 |
操作符ibn.的特殊用法
i.varlist的系数为其他类别与基准类别的偏差。
ibn.varlist配合noconstant选项使用,则varlist的系数变成各类别的实际系数而非偏差。
试比较下列命令的结果。
reg y i.group age
reg y ibn.group age, noconstant
参考文献
STATA参考手册[U] User's Guide
- 11 Language syntax
- 11.4 varname and varlists
- 11.4.3 Factor variables
- 11.4 varname and varlists