Stata编程|基础命令
0.常用便捷命令
0.1 levelof
levelsof displays a sorted list of the distinct values of varname
levelsof 可以帮助我们了解指定变量的取值情况
*Synax
levelsof varname [if] [in] [, options]
假设我们想知道auto.dta中变量rep78都有哪些取值
cls
sysuse auto, clear
levelsof rep78
1 2 3 4 5
0.2 fs
fs lists the names of files in compact form.
fs 可以列示指定路径下的指定类型的文件,结果存储在返回值r(files)中。需要注意的是fs是外部命令,首次使用时需要安装。
*Synax
fs [filespec [filespec [ ... ]]]
假设我们想知道某一文件夹内有哪些dta文件
cls
ssc install fs
cd D:\software\stata16\Stata16MP\ado\base\a
fs *.dta
auto.dta auto2.dta autornd.dta
return list
macros:
r(files) : ""auto.dta" "auto2.dta" "autornd.dta" "
0.3 rmsg
set rmsg determines whether the return message is to be displayed at the completion of each command. The initial setting is off. The return message shows how long the command took to execute and what time it completed execution.
rmsg可以帮助我们了解代码运行的时间,以便优化代码。
# Synax
set rmsg [on | off] [, permanently]
假设我们我们想知道从1循环到100000需要多久
cls
set rmsg on
forvalues k = 1(1)100000{
display `k'
}
...
99997
99998
99999
100000
r; t=0.87 0:39:24
0.4 log
log可以将输入及输出等过程内容保存到文件中
*报告日志文件状态
log
log query [logname | _all]
*生成并打开日志文件
log using filename [, append replace [text|smcl] name(logname) nomsg]
*关闭日志
log close [logname | _all]
*暂停或继续日志记录
log off [logname]
log on [logname]
使用auto数据集给出一个简单的例子
log using log_auto, name(log1)
sysuse auto, clear
drop if rep78 ==.
save test3, replace
log close log1
宏的知识较多,以后再另写推文
1.循环
1.1 while
while evaluates exp and, if it is true (nonzero), executes the stata commands enclosed in the braces. It then repeats the process until exp evaluates to false (zero).
while 是依据表达式的真假进行循环,后面的forvalues和foreach可以理解为是while的
变种。
*Synax
while exp {
stata_commands
}
下面给出单循环和嵌套循环的简单例子
*单循环
local j = 1
while `j' < 10{
display `j'
local j = `j' + 1
}
*嵌套循环
local i = 1
while `i' <= 5{
local j = 1
while `j' < `i'{
display "`j' 小于 `i'"
local j = `j' + 1
}
local i = `i' + 1
}
再补充个用while求解方程的例子
local x_est = 0
while abs(`x_est' ^ 2 - 4 * `x_est' + 4 - 0) > 0.0000001{
local x_est = `x_est' + 0.001
}
display in red `x_est'
2
1.2 forvalues
Loop over consecutive values
forvalues 只能用于数值的循环
*Synax
forvalues lname = range {
Stata commands referring to `lname'
}
对于while中方程求解例子,我们也可以用forvalues来做,假如我们猜到解在1-3范围内
forvalues x_est = 1(0.0001)3{
if abs(`x_est' ^ 2 - 4 * `x_est' + 4 - 0) < 0.000000001{
display in red `x_est'
continue, break
}
}
2
1.3 foreach
foreach repeatedly sets local macro lname to each element of the list and executes the commands enclosed in braces. The loop is executed zero or more times; it is executed zero times if the list is null or empty.
foreach后面跟的对象可以是宏、变量名和文件名等,比forvalues的适用性更强。
foreach lname
in | of listtype
list {
commands referring to ‘lname’ }
Allowed are
foreach lname in any list {
foreach lname of local lmacname {
foreach lname of global gmacname {
foreach lname of varlist varlist {
foreach lname of newlist newvarlist {
foreach lname of numlist numlist {
假设我们想逐个显示auto.dta中变量make的值,以下两种方式是等价的,但更推荐使用of方式。
cls
sysuse auto, clear
levelsof make, local(make_info)
set rmsg on
foreach x of local make_info{
display "`x'"
}
cls
sysuse auto, clear
levelsof make, local(make_info)
set rmsg on
foreach x in `make_info'{
display "`x'"
}
set rmsg off
通过foreach和其他命令的搭配,我们可以让电脑帮忙做些重复性工作。
*在指定文件夹依次生产名称为2010-2018的excel表格
cd C:\Users\Van\Desktop\test1
foreach file_name of numlist 2010/2018{
putexcel set `file_name'.xlsx, replace
putexcel A1 = "Year"
putexcel B1 = "Variable"
putexcel C1 = "Varlue"
}
*以上用forvalues实现更简单、速度更快
*将上面生产的excel文件转换成dta格式
local xlsx_list: dir . files "*.xlsx"
foreach excel_file of local xlsx_list{
display "`excel_file'"
import excel using `excel_file', firstrow clear
save `excel_file'.dta, replace
}
#批量计算并查看变量的均值
cls
sysuse auto, clear
foreach v of varlist price mpg weight length{
quietly summarize `v'
display "mean of variable `v' is:" `r(mean)'
}
1.4 continue
The continue command within a foreach, forvalues, or while loop breaks execution of the current loop iteration and skips the remaining commands within the loop. Execution resumes at the top of the loop unless the break option is specified, in which case execution resumes with the command following the looping command.
有时在做循环运算时,需要根据某种情况终止循环,此时可以使用continue
*synax
continue [, break]
- continue:中止当前循环余下所用命令,返回上一级循环
- continue, break:中止全部循环余下所用命令,返回上一级循环
*continue
forvalues i = 1(1)5 {
disp `i'
if `i' >2{
continue
}
disp "`i':Hello World"
}
1
1:Hello World
2
2:Hello World
3
4
5
*continue, break
forvalues i = 1(1)5 {
disp `i'
if `i' >2{
continue, break
}
disp "`i':Hello World"
}
1
1:Hello World
2
2:Hello World
3
2.条件判断
The if command evaluates exp. If the result is true (nonzero), the commands inside the braces are executed. If the result is false (zero), those statements are ignored, and the statement (or statements if enclosed in braces) following the else is executed.
*synax
if exp { or if exp single_command
multiple_commands
}
else { or else single_command
multiple_commands
}
假设我们想写个计算小程序,当n>0时,表达式为x^n; 当n=0时,表达式是log(n);当n<0时,表达式是-x^n
program power
if `2' > 0 {
display in red `1'^`2'
}
else if `2' == 0 {
display in red log(`1')
}
else {
display in red -(`1'^(`2'))
}
end
power 16 2
256
3.数据恢复
Preserve and restore data
在数据处理的过程中失误不可避免,这会导致数据被覆,于是我们只能从头处理。面对这种情况,在处理时添加preserve会很有帮助。
*synax
preserve [, changed]
restore [, not preserve]
我们随机生产一组数据,它是由year和value组成,现在想把同一年的数据单独提取出来分别保存。我们很容易能想到使用如下命令
keep if year == 2015
save 2015.dta, replace
但是执行了上述命令后,原始数据就会被更改,除了2015年外其他年份的样本均会被删除,这使得我们只能写多次keep if 命令。此时,可以使用preserve和restore解决问题。
clear
set seed 12345
set obs 10
gen year = _n + 2009
expand 3
gen value = uniform()
save test2, replace
use test2, clear
forvalues i = 2009/2019{
preserve
keep if year == `i'
save `i', replace
restore
}
4.异常处理
capture executes command, suppressing all its output (including error messages, if any) and issues a return code of zero. The actual return code generated by command is stored in the built-in scalar _rc.
在日常数据处理,我们需要程序在遇到错误时自动跳过,此时添加capture即可。
*synax
capture [:] command
capture {
stata_commands
}
我们可以通过对比以下两段代码返回值差异来认识capture的作用。
hist 杭州
dis "杭州"
no variables defined
capture hist 杭州
dis "杭州"
杭州
进一步地,我们可以再添加noisily以显示错误情况
capture noisily hist 杭州
dis "杭州"
no variables defined
杭州
使用display _rc可以查看错误返回值
capture noisily hist 杭州
dis "杭州"
display _rc
111
如果代码正常运行有_rc=0
capture noisily display "杭州"
display _rc
0
capture也可以包括一批命令
capture noisily {
di "杭州"
error
di "上海"
}
杭州
invalid syntax
r(197);
需要注意“上海”没有显示,这是因为一旦报错,capture命令直接跳到了括号外边,结束运行。