STA 138留学生作业代做、代做Book Portion作业、
STA 138 Winter 2019Homework 5 - Due Friday, Feb 22ndBook Portion (does not require R)Note: This may be hand written or typed. Answersshould be clearly marked. Please put your name inthe upper right corner.1. A study is trying to predict if someone will get the flu shotor not, with the following dataset:Column 1: shot (Y ): If the subject got a flu shot (y = 1),or not (y = 0)Column 2: age (X1): The age of the subject in years.Column 3: aware (X2): The health awareness score, wherea higher score indicates a higher level of awareness.Column 4: gender (X3): M or FThe estimated regression function is:1.1772+0.0728X1 0.0990X2 0.4340X3,M(a) Interpret the exponential of the β associated withawareness score.(b) Interpret the exponential of the β associated with gender.(c) Estimate the probability that a male subject aged 50with awareness score 60 would not get a flu shot.(d) Estimate the odds that a female subject aged 30 withawareness score 50 would get the flu shot.2. Continue with problem 1. The estimated standard errorsfor the β coefficients follow:Estimate Std. Error z value Pr(>|z|)(Intercept) -1.1772 2.9824 -0.3947 0.6931age 0.0728 0.0304 2.3959 0.0166aware -0.0990 0.0335 -2.9567 0.0031genderM 0.4340 0.5218 0.8317 0.4056(a) Based on the above, with an α of 0.05, does it appearthat gender was a significant predictor for the probabilityof getting the flu shot? Explain your answer.(b) Which coefficient appears to be the most useful in predictingif a subject gets the flu shot? Explain youranswer.(c) Find the 95% corrected confidence interval for the βassociated with age, assuming you are making g = 3confidence intervals.(d) What does your interval from (c) suggest about retainingor removing the age variable from the model?Explain your answer.3. Continue with problem 1. The error matrix for this modelis (where the cutoff used was 0.50):Predicted : y = 0 Predicted : y = 1Truth : y = 0 130 5Truth : y = 1 18 6(a) Estimate the sensitivity, specificity, and overall errorrate.(b) The 95% confidence interval for AUC is :(0.7308,0.9139). Do you believe that the modelis predicting Y = 1 well? Explain your answer.(c) Explain why you might be interested in AUC over theerror matrix.(d) Explain why a standardized residual with a value over3 may be concerning.4. A study was performed to examine what effects the probabilityof using birth control in women.Column 1: con (Y ): If the subject uses birth control,where Y = 1 indicated they do, and Y = 0indicated they do not.Column 2: age (X1): The age of the subject in years.Column 3: edu (X2): The level of education of the subject,with A (advanced), G (graduate or above), M(high school), L (below highschool).Column 4: working (X3): N (they are not working) or Y(they are working). The purpose of the studywas to examine contraceptive use in marriedwomen.The estimated coefficients (β’s) and their standard errorsare:Estimate Std. Error(Intercept) 0.3392 0.5364X1 -0.0095 0.0151X2,G 0.8300 0.2964X2,L -0.7679 0.4669X2,M -0.1119 0.3370X3,Y -0.0320 0.2888(a) Write down the model for each of the categories correspondingto X2. This should give four models.(b) Estimate the probability that a subject with an advanceddegree who is not working and is age 30 usesbirth control.(c) Interpret the value exp(0.8300) in terms of the problem.(d) The log-likelihood for the model that includes all Xvariables is: -195.3582 and the log-likelihood for themodel which includes only X1 and X2 is: -195.3644.Use these to test to see if X3 can be dropped fromthe model. State the null, alternative, test-statistic,p-value, and conclusion.5. Continue with problem 4. The estimated, corrected 95%confidence intervals for the model with X1 and X2 in itfollow:βage1-0.0474 0.0280βedu2,G 0.0938 1.5791βedu2,L -1.9989 0.3666βedu2,M -0.9586 0.7315(a) Does this suggest a significant difference in the oddsof success for education level L vs. A? Explain youranswer.(b) Does this suggest a significant difference in the oddsof success for education level G vs. A? Explain youranswer.(c) What would adding an interaction term between ageand education level do? What would be the practicaleffect, in other words?(d) What would your recommendation for the final modelfor this data be? Explain your answer.6. Continue with problem 4. Assume we are using the modelwith both X1 and X2 in it.(a) The five-number summary for the standardized residualsare below:Min First Quartile Median Third Quartile Max-2.0220 -0.9455 -0.0008 0.9874 2.1746Does this suggest there may be outliers in the data?Explain why or why not.(b) The error matrix with the cutoff of 0.50 follows:Predicted: Y=0 Predicted: Y=1Truth: Y=0 63 68Truth: Y=1 50 119Estimate the sensitivity, specificity, and overall errorrate.(c) The error matrix with the cutoff of 0.70 follows:Predicted: Y=0 Predicted: Y=1Truth: Y=0 108 23Truth: Y=1 130 39Estimate the sensitivity, specificity, and overall errorrate.(d) Which cutoff would you suggest using, and why?7. Answer the following questions as True or False:(a) In logistic regression, the larger the value of DFbeta,the more influential the corresponding row of your datawas.(b) In logistic regression, the intercept does not alwayshave a practical interpretation.(c) In logistic regression, the larger the absolute value ofβi, the more the corresponding X effects ?π.R Portion (requires some use of R)Note: You do not have to use R Markdown to turnin the homework, but the homework must be turnedin in a reasonable format. The answers to the questionsshould be in the body of the homework, and thecode used to obtain those answers should be in an appendix.There should be no code in the body of thehomework. You can accomplish this in R, Word, LaTex,Google Docs, etc. This portion should be printedout and turned in with the hand-written portion.I. Online under “Files” you will find the datasetinternet.csv, which has the following columns:Column 1. Newbie: 1 the subject identified themselvesas “new to the Internet”, 0 otherwise.Column 2. Age: The age of the subjectColumn 3. Gender: 1 indicates the subject was male, 0indicates female.Column 4. Educational.Attainment: With levels“High School“, “College”, “Masters”, and“Doctoral”.Column 5. score: The corresponding score for theEducational.Attainment column, where 1= High School, 2 = College, 3 = Masters,and 4 = Doctoral.The goal is to predict whether someone considers themselvesas “new to the Internet“.(a) Fit a logistic regression model with Newbie as yourresponse variable, and Age, Gender, and Score asyour explanatory variables. Write down the estimatedlogistic regression function.(b) Interpret the value of exp β associated with Age interms of the problem.(c) Interpret the value of exp β associated with Genderin terms of the problem.(d) Interpret the value of exp β associated with scorein terms of the problem.II. Continue with problem I.(a) Find and report the 99% profile likelihood confi-dence intervals for all values of β.(b) Using (a), which of your explanatory variables doyou believe significantly effect if someone identifiesthemselves as “new to the Internet“? Explain.(c) Predict the probability that a female, aged 28, witha doctoral degree identifies themselves as “new tothe Internet“.(d) Are there any unusual observations in your dataset?Explain.III. Online under “Resources” you will find the datasetwork.csv, which has the following columns:Column 1. obese: 1 the subject was obese, 0 otherwise.Column 2. gender: with levels male, female.2Column 3. age: the age of the subject.Column 4. marriage: With levels married, widowed, divorced,never married.Column 5. min: Minutes of Sedentary Activity perWeekThe goal is to predict whether a subject is obese or not.(a) Fit and report the estimated logistic regressionmodel with coefficients for gender, age, and the categoriesfor the marriage variable.(b) Write down the estimated logistic regression modelfor people who have never been married.(c) Write down the estimated logistic regression modelfor people who are divorced.IV. Continue with problem III.(a) Display the Wald Test-statistics and p-values fortesting if each coefficient is zero or not.(b) Based on the above, which variables would you retainin your model, and why? Assume α = 0.10.(c) Fit the estimated logistic regression model with onlythe variables you chose from (b).(d) Interpret the coefficients of the estimated regressionmodel you chose in (c).V. Continue with problem III and IV.(a) Predict the probability that a married women aged28 who has 400 sedentary minutes per week is obeseusing the full model (all first order predictors, nointeractions).(b) Predict the probability that a married women aged28 who has 400 sedentary minutes per week is obeseusing the model suggested in IV(c).(c) Using the LR-ratio test, test to see if you can dropthe coefficient for gender from the model. Assumethe “full model” is: logit(π) = α + β1xgender +β2xmin. Assume α = 0.05.Report back the test-statistic, conclusion onthe test, and p-value.(d) Using the LR-ratio test, test to see if you can dropthe coefficient for min from the model. Assume the“full model” is: logit(π) = α + β1xgender + β2xmin.Assume α = 0.05.Report back the test-statistic, conclusion onthe test, and p-value.VI. Continue with problem IV, and use the “best model”suggested.(a) Find the value of AUC, the 95% confidence intervalfor AUC, and plot the ROC.(b) Does this value of AUC suggest that the model hasfit the data well? Explain your answer.(c) Fit the full model (including all predictors) and repeat(a) for the full model.(d) What does (c) suggest AUC and adding predictors,if anything?本团队核心人员组成主要包括硅谷工程师、BAT一线工程师,精通德英语!我们主要业务范围是代做编程大作业、课程设计等等。我们的方向领域:window编程 数值算法 AI人工智能 金融统计 计量分析 大数据 网络编程 WEB编程 通讯编程 游戏编程多媒体linux 外挂编程 程序API图像处理 嵌入式/单片机 数据库编程 控制台 进程与线程 网络安全 汇编语言 硬件编程 软件设计 工程标准规等。其中代写编程、代写程序、代写留学生程序作业语言或工具包括但不限于以下范围:C/C++/C#代写Java代写IT代写Python代写辅导编程作业Matlab代写Haskell代写Processing代写Linux环境搭建Rust代写Data Structure Assginment 数据结构代写MIPS代写Machine Learning 作业 代写Oracle/SQL/PostgreSQL/Pig 数据库代写/代做/辅导Web开发、网站开发、网站作业ASP.NET网站开发Finance Insurace Statistics统计、回归、迭代Prolog代写Computer Computational method代做因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com 微信:codehelp