联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-08-25 08:06

ARE 106 Summer Session II

Homework 3

This homework will be due on August 29th at 2pm

SSID: 916184515

Please put your name and SSID in the corresponding cells above.

The homework is worth 13.5 points.

For each of the following questions, show as much of your steps as you can (without going overboard). If you

end up getting the wrong answer, but we spot where you made a mistake in the algebra, partial credit will be

more readily given. If you only put the final answer, you will be marked either right or wrong.

Answer questions in the correct cell. For problems where you have to input math, make sure that you know that

it's a markdown cell (It won't have a In: [] on the left) and make sure you run the cell by either pressing

Ctrl + Enter or going to Cell -> Run Cell . Alternatively, write all your answers and then go to Cell ->

Run All Cells after you're done.

Please ignore cells that read \pagebreak . These are so your document converts to PDF in a way that will

make it possible to grade your homework. Ignore them and only write your answers where it is specified.

When you are finished export your homework to a PDF by going to File -> Download as -> PDF .

Exercise 1: Single Regression

Please don't forget to comment your code. Failure to do so will result in a loss of points.

Also remember that all code that is required here (unless otherwise stated) can be found in the lecture Jupyter

Notebooks or the coding notebooks from class.

Here are three models for the median starting salary of law school graduates in 1985.

Each observation represents a school.

\pagebreak

2019/8/24 HW3

localhost:8888/notebooks/HW3.ipynb 2/8

The variables in the dataset are:

| | Variable | Description | |---------|---------------|---------------| | 1. | rank |law school ranking |

| 2. | salary |median starting salary| | 3. | cost |law school cost| | 4. | LSAT |median LSAT score| | 5. | GPA

|median college GPA| | 6. | libvol |no. volumes in lib., 1000s| | 7. | faculty |no. of faculty| | 8. | age |age of law

sch., years| | 9. | clsize |size of entering class| | 10. | north |=1 if law sch in north| | 11. | south |=1 if law sch in

south| | 12. | east |=1 if law sch in east| | 13. | west |=1 if law sch in west| | 14. | studfac |student-faculty ratio| |

15. | top10 |=1 if ranked in top 10| | 16. | r11_25 |=1 if ranked 11-25| | 17. | r26_40 |=1 if ranked 26-40| | 18. |

r41_60 |=1 if ranked 41-60|

a. In the code cell below, write the appropriate imports you will need for this question (we will need pandas ,

numpy and statsmodels.formula.api ). You can do an abbreviated import if you wish (but the standard for

pandas is pd , statsmodels.formula.api is smf , and numpy is np ). Afterwards, load in the data from

here:

https://raw.githubusercontent.com/lordflaron/ARE106data/master/lawsch85.csv

(https://raw.githubusercontent.com/lordflaron/ARE106data/master/lawsch85.csv)

This can be done using the read_csv() function. Name this dataset raw_df . After loading in the data, show

the first 10 observations in the output.

In?[1]:

b. Use the describe() method on raw_df to show a table of summary statistics for each variable in the

dataset. How many observations does have? Write this in a print statement. (Hint: This is in the "count"

row the summary table).

c. Since we'll need a log-transformed version of for all our models, use assign() to create a new

variable which is the log of . Name this new variable log_salary .

Hints:

Remember that assign is not an inplace operation!

Remember to use a lambda function in this case. To log a variable, you can use np.log()

Remember the syntax for assign() :

\pagebreak

## a. Put your answer in this cell.

\pagebreak

## b. Put your answer in this cell.

\pagebreak

2019/8/24 HW3

localhost:8888/notebooks/HW3.ipynb 3/8

my_df.assign(new_variable = expression)

After this we now need to also drop any observations that are missing. This isn't actually how econometricians

deal with missing data, but this is good enough for us for now.

You can do this by chaining the dropna() method after the assign() method.

Warning: Do not do dropna BEFORE assign

The end result should look something like this:

df = raw_df.assign(log_salary= expression).dropna()

In?[3]:

d. Before estimating the model, explain how to interpret 1

in Model 1.

Please write your answer for d here. If you need to use more than one line, you may do so.

e. Before estimating the model, explain how to interpret 1 in Model 2.

Please write your answer for e here. If you need to use more than one line, you may do so.

f. Before estimating the model, do you expect and to be positive or negative in Model 2? Explain. (Hint:

I'm not asking for any rigorous mathematical way to answer this question. Just use your economic intuition and

reasoning skills to write an argument).

Please write your answer for f here. If you need to use more than one line, you may do so.

g. Estimate Model 1. Show the regression output.

In[4]:

h. What is the effect of a one unit increase in LSAT score on the log of median salary?

\pagebreak

## c. Put your answer in this cell.

\pagebreak

\pagebreak

## g. Put your answer in this cell.

\pagebreak

2019/8/24 HW3

localhost:8888/notebooks/HW3.ipynb 4/8

Please write your answer for h here. If you need to use more than one line, you may do so.

i. What does the measure in the regression? What is the in this case? (Not the adjusted ).

Please write your answer for i here. If you need to use more than one line, you may do so.

Exercise 2: Multiple Regression

This is a continuation of what we were doing in Exercise 1.

For this exercise, observe the expression for when there are two regressors in the equation:

Hint: Notice that each of these terms in the equation look similar to either covariances or variances (in fact if

you multiply the denominator and numerator by then they are in fact variances and covariances without

changing the value of the coefficient (since is 1).

Also notice that the covariance is like an un-normalized correlation coefficient. So if you calculate the

correlation between two variables, you won't know the covariance between the two, but you'll know the direction

and strength of their relationship.

a. Estimate Model 2. Show the regression output.

In?[5]:

b. Calculate the correlations between , and .

Use the slicing notation to first make a subset of the data with only log_salary, LSAT and GPA.

Then use the corr() method to get the correlation for those variables, i.e. it will look something like this:

df[['log_salary', 'GPA', 'LSAT']].corr()

This will give a matrix where you can see correlation between variables. (Note: correlation of a variable with

itself is always 1).

\pagebreak

\pagebreak

## a. Put your answer in this cell.

\pagebreak

2019/8/24 HW3

localhost:8888/notebooks/HW3.ipynb 5/8

In[7]:

c. Using you answer from (b) and the expression for above, answer this question:

Why is in Model 2 different from in Model 1

Please write your answer for c here. If you need to use more than one line, you may do so.

d. Why is the in Model 2 higher than Model 1? (Not the adjusted ). 2 ??2

Please write your answer for d here. If you need to use more than one line, you may do so.

e. Estimate Model 3. Show the regression output.

Hint: One of the extra regressors in Model 3 is log-transformed. Instead of doing another assign() call, run

this regression by explicitly logging the variable in the patsy formula. Use np.log() to do this.

In?[8]:

f. Suppose School A and School B have the same values for all the variables on the right hand side in Model 3,

except School A is ranked 10 places higher than School B. What is the predicted difference in log median salary

between the two schools?

This question can be answered by simply printing out the math you did in a print statement using an f-string .

In?[9]:

Exercise 3: Multicollinearity

a. Re-estimate Model 1, except add north, south, east, and west as the additional right hand side variables.

## b. Put your answer in this cell.

\pagebreak

\pagebreak

## e. Put your answer in this cell.

\pagebreak

## f. Put your answer in this cell.

\pagebreak

2019/8/24 HW3

localhost:8888/notebooks/HW3.ipynb 6/8

In?[12]:

b. What is wrong with this regression? What happens when you estimate it? How could fix this problem?

Hint: Look at the warnings underneath the regression.

Please write your answer for b here. If you need to use more than one line, you may do so.

Exercise 4: Auxiliary Regression

Consider the following two regressions:

a. Estimate . This is a two-step process. First, you need to estimate the first regression model and save the

errors. Then, you regress on those errors ( ). Compare your estimate of to the estimate you

found from Model 2. Explain the similarity or difference.

In order to do this, you need to save the errors (also called residuals) after you run the first stage. In order to do

this, after fitting the first stage, the results variable will have an attribute resid . So to call the residuals all

you need to do is type this: results.resid .

You can then run the second stage in one of two ways:

1. assign a new variable to your data, called "residuals" and run a regresion with it like any other

variable, or

2. Directly call results.resid in your second stage's patsy formula , i.e, 'log_salary ~

results.resid'

b. What do you notice from the coefficient on this regression, versus the one in Model 1?

## a. Put your answer in this cell.

\pagebreak

\pagebreak

\pagebreak

## a. Put your answer in this cell.

\pagebreak

2019/8/24 HW3

localhost:8888/notebooks/HW3.ipynb 7/8

Please write your answer for b here. If you need to use more than one line, you may do so.

Exercise 5: Back to

Suppose that we have an estimated regression model , where are estimated OLS

coefficients. Let , so that:

Let's look at the next step of solving this problem in order to finally get at solving a mystery we've had during the

class.

If we wanted to solve for the , we would use the fact that a way to understand the variability in is to look at

its variance. And we already know that:

Up until now, we've just assumed it to be true that was 0 and it allowed us to finish the proof. But all

along, we've been implicitly assuming a Gauss-Markov assumption in order to make that claim.

Which of the Gauss-Markov assumptions do we need in order to say that ?

Hint: Don't forget that you can express the covariance in terms of expectations.

Hint: Try plugging in into this expression and seeing what you end up with.

Hint: Don't forget that

Please write your answer for exercise 5 here. If you need to use more than one line, you may do so.

Exercise 6: Data Types

Let's say that we have a population model:

The subscripts for the variables have been purposely omitted. For each part, rewrite the model so that it

corresponds to each data type and explain why you wrote it that way.

a. Cross-section

b. Time Series

c. Panel

\pagebreak

\pagebreak

\pagebreak

2019/8/24 HW3

localhost:8888/notebooks/HW3.ipynb 8/8

Please write your answer for exercise 6 here. If you need to use more than one line, you may do so.

\pagebreak


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp