Transport Sociology and Psychology (LV 240834759)

Take-Home Exam for Part ‘Transport Sociology’

Department of Civil, Geo and Environmental Engineering at the Technical University Munich

Release Date: 9 February 2021

Due Date: 17 March 2021, end of day (CET)

Introduction

The following tasks shall be answered in a written report. The recommended software

package to calculate answers in Task 1 and Task 2 is ‘The R Project for Statistical

Computing’. You may, however, use other software packages that you might be more

familiar with (such as Matlab or Biogeme), or you write your own code in any language

of your choice. While there is no word limit, answers should be rather short with one to

two paragraphs per question.

Submit your report as a PDF document. Please add your name and matriculation

number (03xxxxxx) at the beginning of your report and specify which software was

used to answer Tasks 1 and 2. Do not provide the script or code you wrote. Once you

are done, upload your PDF report to Moodle.

You are allowed to work in teams to solve these tasks. Note, however, that each

student needs to submit an individual solution. The final estimation results are likely

to be different for every student, as there are thousands of right answers for many

tasks. Provide your own solution. Also, all text needs to be written in your own words,

using copy-and-paste will result in failing the exam and a report to the examination

board. To acknowledge these rules at TUM, you were asked to sign the “Pledge

against Plagiarism” that you find on Moodle. Some of you have handed in a signed

copy already. If you have not done so already (or if you are unsure whether you already

did), please sign this document an upload it with your take-home exam report.

Task 1 [35 points]

You have been provided with a household travel survey that provides information on

household characteristics and the number of trips reported. The Excel file

(householdTravelSurvey.xlsx) provides a description of the available variables. The

CSV file contains the same data and was provided to be read in R. In this task, you

shall identify the most important socio-demographic attributes that explain the number

of auto trips.

a) Read the data with R (or your language of choice). To understand the range of

data, provide min, max and mean values for each variable of this dataset in your

report. [2 points]

b) Create a histogram for number of auto trips and a boxplot for income. Copy the

two graphics into your report. Describe the two graphics in two to three

sentences in your report. [2 points]

c) Estimate a multiple regression, where you try to explain the number of trips by

car with all other socio-demographic attributes available in this survey. Provide

the estimation results in the report*

. [6 points]

Describe the estimation results in the report:

2

× Which independent variables are statistically significant with a

confidence level of at least 90%?

× Are estimated coefficients (called ‘Estimate’ in R) reasonable? Or did you

find coefficients that do not make sense to you? Name coefficients that

seem unlikely and explain why you think they don’t seem right from a

theoretical point of view.

d) A possible reason for unreasonable coefficients is multicollinearity. Use R (or

your preferred software) to plot the correlation between all variables. Add this

plot to your report and identify the three pairs of independent variables that are

most correlated. [5 points]

e) Create another multiple regression with auto trips as the dependent variable.

This time, select independent variables that lead to an estimate where all

coefficients:

× are statistically significant (here defined as 90% confidence or more),

× have signs (+ or –) that make sense to you, and

× no two independent variables correlate with more than |R| = 0.6

This will require some trial and error. Provide and briefly describe the final

estimate in your report*

. Explain for each independent variable in your final

estimation why it makes sense to you (i.e., explain why every + and – sign is

reasonable). [10 points]

f) Attempt to improve the estimation result further by removing the intercept [-1],

by using a quadratic transformation [I(variable^2)], by using a logarithmic

transformation [log(variable)] and by testing interactions [variableA*variableB]

for selected independent variables. This may require to drop additional

independent variables to ensure that all estimated coefficients are statistically

significant. The same three rules listed under the bullet points of subtask (e)

shall apply. In your report, provide the final estimate* that provides the best

model fit that you can find. [10 points]

Task 2 [20 points]

You were provided with another dataset on mode choice

for long-distance travel (file modeChoiceData.csv, see

xlsx file for definition of variables). The survey data was

collected for long-distance trips between Sydney,

Canberra and Melbourne in Australia. Travelers had the

choice between auto, bus, train and air.

[Data Source: Greene, W.H. and D. Hensher: Multinomial logit and discrete choice models. In Greene,

W. H. (1997) LIMDEP version 7.0 user’s manual revised. Plainview, New York. Note that data were

modified for this exam.]

a) Read the data in R (or the software of your choice). To understand the range of

data, provide min, max and mean values of in-vehicle travel times for each mode

in your report. [2 points]

b) Estimate of a multinomial logit model, where mode is the dependent variable

and all other variables serve as independent variables. Provide the estimation

result in your report? and briefly describe whether these estimates make sense

to you (refer to statistical significance and describe whether + and – signs are

reasonable). [8 points]

c) The estimation under (2b) provides for WaitTime, InVehCosts, InVehTime and

GenCosts one coefficient each across all modes. Modify your estimation to

Auto Bus Train Air

Trip

3

provide mode-specific InVehTime (i.e., estimate a different coefficient for

InVehTime for every mode). Provide the estimation result in your report? and

briefly assess how using coefficients by mode has improved this estimation

(provide two reasons why estimation (2c) is better than estimation (2b)).

[4 points]

c) Try to further improve the estimation result from task (2b)

? by removing the intercept [-1] or

? by raising a variable to the power of two [I(variable^2)] for selected

independent variables or

? by using a logarithmic transformation [log(variable)] for selected

independent variables or

? by estimating mode-specific coefficients for InVehCosts, InVehTime or

GenCosts.

To ensure that all variables are statistically significant, you may have to drop

some independent variables. In your report, provide the final estimate? that

provides the best model fit that you can find. Make sure that your best model

estimation only includes independent variable that (i) have the expected sign

[+ or –] and (ii) have a 90% significance level or more. This will require some

trial and error. It is ok to include constants that do not reach this significance

level. [6 points]

Task 3 [15 points]

In task 2, you were asked to estimate a multinomial logit

model. Here, we explore a nested logit model instead.

a) Describe the reasons why nested mode choice

models sometimes work better than multinomial logit

models. There is no need to estimate a model. A

written description of the potential benefits of nested

logit models is sufficient. [7 points]

b) Create a nesting structure for the modes conventional

car, autonomous car, tolled road, non-tolled road,

walk, bike, e-bike, e-scooter, bus, tram and commuter rail. Use as many nesting

layers as make sense to you. Draw a nesting diagram (as shown in the diagram

above), label the boxes with modes and provide it in your report. Explain in one

paragraph your chosen nesting structure. There are many different solutions

that are plausible. While your nesting structure will not be evaluated, your

reasoning for your chosen nesting structure will be evaluated. [8 points]

Task 4 [20 points]

Task 1 explored multiple regression and Tasks 2 and 3 discrete choice models. In this

Task 4, we look at the differences between the two.

a) We apply multiple regression and discrete choice models for different problem

sets. Explain when to use which one. [4 points]

b) Could you have solved Task 1 with a discrete choice model? Why? [8 points]

c) Could you have solved Task 2 with a multiple regression? Why? [8 points]

Trip

4

Task 5 [10 points]

To explore travel behavior, both household travel surveys (e.g., MiD in Germany) and

panel surveys (e.g., MOP in Germany) have been conducted.

a) Explain the difference between a household travel survey and a panel survey in

terms of selection of participants and common sample sizes. [4 points]

b) For each of the following questions, select a survey (MiD or MOP) that is likely

to be most useful. Explain your choices in two or three sentences [6 points]

× Explain mode choice behavior for shopping trips of high-income

households with 0 workers and 0 cars.

× Explore if people who travel less on weekdays travel more on weekends.

× Explain how household relocation to the suburbs has affected the

likelihood to buy a car.

I appreciate any feedback you like to give on clarity, length and difficulty of this exam.

Also, it would be helpful if you could give an estimation of number of hours it took you

to complete this exam. Your answer is optional and will not affect your grade. Thanks!

* Please provide your estimation results including at least: Variable names, estimated

coefficients, statistical significance of each variable and R2 of the estimate.

? Please provide your estimation results including at least: Variable names, estimated

coefficients, statistical significance of each variable, as well as log-likelihood and R2 of

the estimate.

版权所有：留学生编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。