讲解:ECON4016、R、data、RProcessing|
ECON4016 - FINAL EXAMThe final exam consists 4 small projects. You can choose 2 of them to finish and send meyour report. For each of the small projects you choose, you should perform data analysisusing the data I provide to you and the techniques we discussed in class. For each project,you should tell me in your report what kinds of questions you were trying to answer withthose analysis, and what did you do and what finding to you have. Please send your reportand R-script to my email (colonct@gmail.com) by 9AM 18 May, 2019. NO late submissionafter will be considered.1. Please download the subsample data of Hong Kong census (2001-2016) from the followinglink:https://drive.google.com/file/d/1Md6c5J0VcV0_g_veL48i9upJNOKoLc-V/view?usp=sharingThe zipped file contains four data files: hkcensus2001025.dta, hkcensus2006025.dta,hkcensus2011025.dta, hkcensus2016025.dta which are the subsample data for HK censusin 2001, 2006, 2011 and 2016 respectively. You could use the following code to readthese dta files into Rlibrary(foreign)mydata Select two to three variables that interested you from these dataset, try to demonstratethe relationship between these variables using visualisation. You can also extend youranalysis by showing the temporal changes and/or spatial distribution of these variables.Examples for the research questions are:1) How gender inequality in employment changes by the rising education level ofwomen?2) How poor households distribute spatially in different districts of Hong Kong,and how does this spatial pattern change over time?These are only examples, feel free to choose other research questions that interests you.2. Please download the data for textual data analysis from the following link:https://drive.google.com/file/d/1vmqN5wsUYvAq0yzdpHud32jbB4EKndTD/view?usp=sharingIt contains two datasets, both in .csv format:1) historical news headlines from Reddit WorldNews Channel which collectedthe top 25 hea代写ECON4016作业、代做R编程设计作业、data留学生作业代写、代做R语言作业 代做留学生Processing|代dlines in each date based on reddit users’ votes (RedditNews.csvcontains two columns: the first column is the ”date”, and second column is the”news headlines”. All news are ranked from top to bottom based on how hot theyare)12) Dow Jones Industrial Average (DJIA) roughly between 2009 and 2016.And please use the first dataset to generate some useful indices or variables to summarisethe information in those texts and see if these indices or variables could havesome predictive power for the stock price in the second dataset. (Hint: you can eitheruse simple regression or more complex machine learning methods to test for therelationship.)3. Please download the U.S. patent dataset for network analysis from the following link:https://drive.google.com/file/d/1qytpbWCkyZNYG4GGHdo-P7OxZjtTYYBV/view?usp=sharingIt contains two datasets, both in .txt format:1) acite75 99.txt: all US patent citations for utility patents granted between 1975and 1999 (the edge file)2) apat63 99.txt: all utility patents information (the node file)You can find the data documentation files Cite75 99.txt and pat63 99.txt containingthe detail description of all variables inside.And please use these dataset to create a citation network for the U.S. patents. Try tovisualise and describe the characteristics of this network and try to find some usefulinformation from these analysis (e.g. which was the key innovations in this patentdataset).4. Please download the data of real estate transactions for building a predictive modelfrom the following link:https://drive.google.com/file/d/1T6e6-iy15A9OQZyjsbzWDNkiTiOlHHrW/view?usp=sharingThe link connect to a guangzhou2017.dta file which contains all the real estate transactionsin Guangzhou at 2017. You can use the same code in the first small projectto read this file into R. Please use the apartment characteristics information in thisdataset to build a model for predicting house price using the tree based or neutralnetwork method.2转自:http://www.7daixie.com/2019051221713953.html