Econ78010: Econometrics for Economic Analysis, Fall 2023
Homework #3
Due date: Dec. 4th, 2023; 1pm.
Do not copy and paste the answers from your classmates. Two identical homework will be treated as
cheating. Do not copy and paste the entire output of your statistical package's. Report only the relevant part
of the output. Please also submit your R-script for the empirical part. Please put all your work in one single
le and upload via Moodle.
Part I Multiple Choice (30 points in total, 3 points each)
Please choose the answer that you think is appropriate.
1.1 A nonlinear function
a. makes little sense, because variables in the real world are related linearly.
b. can be adequately described by a straight line between the dependent variable and one of the explanatory
variables.
c. is a concept that only applies to the case of a single or two explanatory variables since you cannot draw
a line in four dimensions.
d. is a function with a slope that is not constant.
1.2 To test whether or not the population regression function is linear rather than a polynomial of order r,
a. check whether the regression for the polynomial regression is higher than that of the linear regression.
b. compare the TSS from both regressions.
c. look at the pattern of the coecients: if they change from positive to negative to positive, etc., then the
polynomial regression should be used.
d. use the test of (r-1) restrictions using the F-statistic.
1.3 In the regression model , Yi = β0 + β1Xi + β2Di + β3(Xi × Di) + ui
, where X is a continuous variable
and D is a binary variable, β3
a. indicates the slope of the regression when D = 1
b. has a standard error that is not normally distributed even in large samples since D is not a normally
distributed variable.
c. indicates the dierence in the slopes of the two regressions.
d. has no meaning since (Xi × Di) = 0 when Di = 0.
1.4 The interpretation of the slope coecient in the model ln(Yi) = β0 + β1Xi = ui
is as follows:
a. 1% change in X is associated with a β1% change in Y.
b. 1% change in X is associated with a change in Y of 0.01β1 .
c. change in X by one unit is associated with a 100β1% change in Y.
d. change in X by one unit is associated with a β1 change in Y.
1.5 The major aw of the linear probability model is that
a. the actuals can only be 0 and 1, but the predicted are almost always dierent from that.
b. the regression R2 cannot be used as a measure of t.
c. people do not always make clear-cut decisions.
d. the predicted values can lie above 1 and below 0.
1.6 In the expression, P r(Y = 1|X1) = Φ(β0 + β1X) ,
a.(β0 + β1X) plays the role of z in the cumulative standard normal distribution function.
b. β1 cannot be negative since probabilities have to lie between 0 and 1.
c.β0 cannot be negative since probabilities have to lie between 0 and 1.
d. min(β0 + β1X) > 0 since probabilities have to lie between 0 and 1.
1
1.7 In the expression Pr(deny = 1| P/I Ratio, black) =Φ (2.26 + 2.74P/I ratio + 0.71black), the eect of
increasing the P/I ratio from 0.3 to 0.4 for a white person
a. is 0.274 percentage points.
b. is 6.1 percentage points.
c. should not be interpreted without knowledge of the regression R2 .
d. is 2.74 percentage points.
1.8 E(Y |X1, ...Xk) = P r(Y = 1|X1, ..., Xk) means that:
A) for a binary variable model, the predicted value from the population regression is the probability that
Y=1, given X.
B) dividing Y by the X's is the same as the probability of Y being the inverse of the sum of the X's.
C) the exponential of Y is the same as the probability of Y happening.
D) you are pretty certain that Y takes on a value of 1 given the X's.
1.9 For the measure of t in your probit regression model, you can meaningfully use the:
A) regression R2.
B) size of the regression coecients.
C) pseudo R2.
D) standard error of the regression.
1.10 Your textbook plots the estimated regression function produced by the probit regression of deny on
P/I ratio. The estimated probit regression function has a stretched S shape given that the coecient on the
P/I ratio is positive. Consider a probit regression function with a negative coecient. The shape would
a. resemble an inverted S shape (for low values of X, the predicted probability of Y would approach 1)
b. not exist since probabilities cannot be negative
c. remain the S shape as with a positive slope coecient
d. would have to be estimated with a logit function
Part II Short Questions (32 points in total)
(10 points) 2.1 Dr. Qin would like to analyze the Return to Education and the Gender Gap. The equation
below shows the regression result using the 2005 Current Population Survey. lnEearnings refer to the logarithem of the monthly earnings; educ refers to the year of education; DF emme is a dummy variable, if the
individual is female, =1; exper is the working experience, measured by year; M idwest, South and W est are
dummy variables indicating the residence regions, while Northeast is the ommited region. Interpret the major
results(discuss the estimates for all variables and also address the question that Dr. Qin wants to analyze.
LnEarnings ˆ = 1.215 + 0.0899 × educ − 0.521 × DF emme + 0.0180 × (DF emme × educ)
(0.018) (0.0011) (0.022) (0.0016)
+0.0232 × exper − 0.000368 × exper2 − 0.058 × M idwest − 0.0078 × South − 0.030 × W est
(0.0008) (0.000018) (0.006) (0.006) (0.006)
n = 57, 863 ¯ R2 = 0.242
(14 points) 2.2 Sports economics typically looks at winning percentages of sports teams as one of various
outputs, and estimates production functions by analyzing the relationship between the winning percentage
and inputs. In Major League Baseball (MLB), the determinants of winning are quality pitching and batting.
All 30 MLB teams for the 1999 season. Pitching quality is approximated by Team Earned Run Average
(teamera), and hitting quality by On Base Plus Slugging Percentage (ops). Your regression output is:
W inpct = −0.19 − 0.099 × teamera + 1.49 × ops, R2 = 0.92
(0.08) (0.008) (0.126)
(a) (3 points) Interpret the regression. Are the results statistically signicant and important?
2
(b) (8 points) There are two leagues in MLB, the American League(AL) and the National League (NL). One
major dierence is that the pitcher in the AL does not have to bat. Instead there is a designatedhitter in
the hitting line-up. You are concerned that, as a result, there is a dierent eect of pitching and hitting in
the AL from the NL. To test this Hypothesis, you allow the AL regression to have a dierent intercept and
dierent slopes from the NL regression. You therefore create a binary variable for the American League
(DAL) and estiamte the following specication:
W inpct = −0.29 + 0.10 × DAL − 0.100 × teamera + 0.008 × (DAL × teamera)
(0.12) (0.24) (0.008) (0.018)
+1.622 ∗ ops − 0.187 ∗ (DAL × ops)
(0.163) (0.160) R
2 = 0.92
How should you interpret the winning percentage for AL and NL? Can you tell the dierent eect of
pitching and hitting between AL and NL? If so, how much?
(3 points) (c) You remember that sequentially testing the signicance of slope coecients is not the same as
testing for their signicance simultaneously. Hence you ask your regression package to calculate the F-statistic
that all three coecients involving the binary variable for the AL are zero. Your regression package gives a
value of 0.35. Looking at the critical value from the F-table, can you reject the null hypothesis at the 1%
level? Should you worry about the small sample size?
(8 points) 2.3 Four hundred driver's license applicants were randomly selected and asked whether they
passed their driving test (P assi = 1) or failed their test (P assi = 0 ); data were also collected on their gender
(M alei = 1 if male and = 0 if female) and their years of driving experience (Experiencei
in years). By this
data, a probit model is estimated and the result is as the following.
P r(P ass ˆ = 1) = Φ(0.806 + 0.041Experience − 0.174M ale − 0.015M ale × Experience)
= (0.200) (0.156) (0.259) (0.019)
The cumulative standard normal distribution table is appended.
(2 points) (a) Alpha is a man with 12 years of driving experience. What is the probability that he will
pass the test?
(2 points) (b) Belta is a woman with 5 years of driving experience. What is the probability that she will
pass the test?
(4 points) (c) Does the eect of experience on test performance depend on gender? Explain.
Part 3 Empirical Exercise (38 points in total)
For all regressions, please report the heteroskedasticity-robust standard errors.
(16 points) 3.1 Please use vote2023.dta to answer the following questions. The following model can be used
to study whether campaign expenditures aect election outcomes:
voteA = β0 + β1log(expendA) + β2log(expendB) + u_(1)
voteA = β0 + β1log(expendA) + β2log(expendB) + β3prtystrA + u (2)
where voteA is the percentage of the vote received by Candidate A, expendA and expendB are campaign
expenditures (in 1000 dollars) by Candidates A and B, and prtystrA is a measure of party strength for
Candidate A (the percentage of the most recent presidential vote that went to A's party).
(4 points) (i) Please run the regression (1) and report your result in a table. Do A's expenditure aect the
outcome and how? What about B's expenditure? (Hint: you need to rst creat the variables ln(expendA)
and ln(expendB)
(8 points) (ii) Please run the regression (2) and report your result in the same table. Do A's expenditure
aect the outcome and how? What about B's expenditure? Compare result from (i) and (ii), explain whether
we should include prtystrA in the regression or not. If we exclude it, to which direction the coecient of
interest tend to be biased towards?
3
(4 points) (iii) Can you tell whether a 1% increase in A's expenditures is oset by a 1% increase in B's
expenditure? How? Please suggest a regression or test and then answer the question according to your result.
(22 points) 3.2. Use the data set insurance.dta to answer the following questions. Please read the description le to understand the meanings of variables.
For the following questions, please use observations from those who report their health status as healthy
only.
(4 points) (a) Generate a new variable age2 = age ∗ age. Estimate a linear probability model with insured
as the dependent variable and the following regressors: selfemp age age2 deg_ged deg_hs deg_ba deg_ma
deg_phd deg_oth race_wht race_ot reg_ne reg_so reg_we male married. Please report the regression
outcome in a table. How does health insurance status vary with age? Is there a nonlinear relationship between
the probability of being insured and age?
(4 points) (b) Estimate a probit model using the same regressors as in (a), please report the regression
outcome in the same table as a. How does insurance status vary with age by this model?
(6 points) (c) Please get rid of the variable age2 and estimate the probit model by the left regressors.
Please report the regression outcome in the same table as a. Does throwing away age2 aect the t of the
model? How does insurance status vary with age by this model? Are the self-employed less likely to have
health insurance than wage earners? How does the status of self-employment aect insurance purchase for
individuals aged at 30? For individuals aged at 40?
(4 points) (d) Estimate a logit model using the same regressors as in (c). Pleasue report the regression
outcome in the same table. Is the eect of self-employment on insurance dierent for married workers than
for unmarried workers?
(4 points) (e) Use a linear probability model to answer the question: Is the eect of self-employment on
insurance dierent for married workers than for unmarried workers ? Is your answer consistent with the answer
in (d)?
4
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681  微信:codinghelp 电子信箱:99515681@qq.com  
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。