计量经济学stata小小白

Stata #12 结构方程建模学习笔记(1)

2019-05-13  本文已影响16人  847963901d13

Structural equation modeling is a way of thinking, a way of writing, and a way of estimating. 结构方程建模是一种思考方式,一种写作方式,一种评估方式。

Description

SEM stands for structural equation model. Structural equation modeling is

  1. A notation for specifying SEMs.
  2. A way of thinking about SEMs.
  3. Methods for estimating the parameters of SEMs.

Stata’s sem and gsem commands fit these models: sem fits standard linear SEMs, and gsem fits generalized SEMs.

In sem, responses are continuous and models are linear regression.
In gsem, responses are continuous or binary, ordinal, count, or multinomial.

sem fits models to single-level data.
gsem fits models to single-level or multilevel data.

There are obviously overlap (and difference) between the capabilities of sem and gsem.

Learning the language: Path diagrams and command language

A particular SEM is usually described using a path diagram.

Using path diagrams to specify standard linear SEMs

diagram is composed of the following

  1. Boxes (observed variable) and circles (unobserved or latent variable).
  2. Arrows, called paths, that connect some of the boxes and circles.
  3. Other elements to indicate variances and between variable correlations.

sem (x1<-X) (x2<-X) (x3<-X) (x4<-X)
the above model is a linear single-level model which can be fit by sem or by gsem. sem is preferred because it has added useful features.

It's a measurement model, a term loaded with meaning for some researchers.

Specifying correlation
measurement model

One of the key features of SEM is the ease with which you can allow correlation between latent variables to adjust for the reality of the situation.

The curved path states that there is a correlation to be estimated between the variables it connects. By not drawing them, we are asserting that the corresponding covariance is 0.

A curved path from a variable to itself indicates a variance.

When we draw diagrams, however, we will assume variance paths and omit drawing them, and we will similarly assume but omit drawing covariances between observed exogenous variables. we will not assume correlations between latent variables unless they
are shown.

In sem’s (gsem’s) command-language notation, curved paths between variables are indicated via an option:
(x1<-X) (x2<-X) (x3<-X) (x4<-X), cov(e.x2*e.x3)

Using the command language to specify standard linear SEMs

Q: sem’s command language or path diagram?
A: command language only produce is standard computer output, it is usually quicker than drawing in the builder, but by using path diagrams, you can see the results of your estimation as path diagrams or as standard computer output. Suggestion is: type models in the command language and store them in do-files. By doing so, you can more easily correct the errors you make.

  1. Path diagrams have squares and circles to distinguish observed from latent variables.
    In the command language, variables are assumed to be observed if they are typed in lowercase and are assumed to be latent if the first letter is capitalized.
    type rename all, lower to covert variables to lowercase
  2. remember the /// continuation line indicator
  3. So long as your arrow points the correct way, it does not matter which variable comes first. The following mean the same thing:
    (x1 <- X)
    (X -> x1)
  4. you may type multiple variables on either side of the arrow:
    (X -> x1 x2 x3 x4)
    the same thing is (X -> x1) (X -> x2) (X -> x3) (X -> x4)
  5. In path diagrams, you are required to show the error variables. In the command language, you may omit the error variables. sem knows that each endogenous variable needs an error variable. You can type
    (x1 <- X) (x2 <- X) (x3 <- X) (x4 <- X)
    and that means the same thing as
    (x1 <- X e.x1) ///
    (x2 <- X e.x2) ///
    (x3 <- X e.x3) ///
    (x4 <- X e.x4)
    To constrain the path coefficient, you type
    (x1 <- X e.x1@1) ///
    (x2 <- X e.x2@1) ///
    (x3 <- X e.x3@1) ///
    (x4 <- X e.x4@1)
    if you wanted to constrain the path coefficient x2<-X to be 2
    (x1 <- X) (x2 <- X@2) (x3 <- X) (x4 <- X)
    If you wanted to constrain the path coefficients x2<-X and x3<-X to be equal, you could type (x1 <- X) (x2 <- X@b) (x3 <- X@b) (x4 <- X)
  6. Curved paths are specified with the cov() option after you have specified your model:
    (x1 x2 x3 x4 <- X), cov(e.x2*e.x3)
  7. Nearly all the above applies equally to gsem.
Specifying generalized SEMs: Family and link

if x1, x2, x3, x4 were instead binary outcomes, we will have to use gsem rather than sem. We will use a probit model.

The path diagram for the measurement model with binary outcomes

the diagram shows boxes for x1, ..., x4 with “Bernoulli” and “probit” at the top and bottom, meaning that the variable is from the Bernoulli family and is using the probit link and btw, the error terms disappear.


“Gaussian” and “identity” now appear for variable x4 and e.x4 is back

In gsem’s command language, we write this model as
(x1 x2 x3 x4<-X, family(bernoulli) link(probit)) or (x1 x2 x3 x4<-X, probit)

The response variables do not have to be all from the same family and link. Perhaps x1, x2, and x3 are pass/fail variables but x4 is a continuous variable. Then the model would be diagrammed as "the words “Gaussian” and “identity” now appear for variable x4 and e.x4 is back".
this model can be written as
(x1 x2 x3<-X, family(bernoulli) link(probit))
(x4 <-X, family(gaussian) link(identity))
or as
(x1 x2 x3<-X, probit) (x4<-X, regress)
regress is a synonym for family(gaussian) link(identity). Because family(gaussian) link(identity) is the default, we can omit the option altogether: (x1 x2 x3<-X, probit) (x4<-X)

Specifying generalized SEMs: Family and link, multinomial logistic regression

Consider a multinomial logistic model in which y takes on one of four possible outcomes and is determined by x1 and x2. such a model could be fit by Stata’s mlogit command:
mlogit y x1 x2

three boxes for y, which takes on one of four possible outcomes and is determined by x1 and x2

When specifying a multinomial logistic regression model in which the dependent variables can take one of k different values, you draw (k-1)boxes. Names like 1.y, 2.y, ... , mean y = 1, y = 2, and so on.
one of them that you omit is known as the base outcome. it might be easier for you to interpret your model if you chose a meaningful base outcome, such as the most frequent outcome (Stata’s mlogit command does by default).
The command syntax for our simple example is
(2.y 3.y 4.y<-x1 x2), mlogit
2.y, 3.y, and 4.y are examples of Stata’s factor-variable syntax. The factor-variable syntax has some other features that can save typing in command syntax. i.y, for instance, means 1b.y, 2.y, 3.y, and 4.y. It is especially useful because if we had more levels, say, 10. To fit the model we diagrammed, we could type (i.y<-x1 x2), mlogit

上一篇下一篇

猜你喜欢

热点阅读