Map-Reduce作业代写、Programming留学生作业代

2019-03-28  本文已影响0人  yoyaodu

Assignment 1: Basics and Map-ReduceFormative, Weight (10%), Learning objectives (1, 2, 3),Abstraction (4), Design (4), Communication (4), Data (5), Programming (5)Due date: 17 : 59pm, 30 March, 2019, Weight: 15% of the course1 OverviewAssignments should be done in groups consisting of TWO students. If you haveproblems finding a group partner use the forum to search for group partners orcontact the lecturer.2 AssignmentExercise 1 Suspected Pairs (10 points)Using the information from the first lecture (or Section 1.2.3 in the textbook),what would be the number of suspected pairs if the following changes were madeto the data (all changes should be applied at once).1. The number of days of observation was raised to 5000.2. The number of people observed was raised to 5 billion (and there weretherefore 500, 000 hotels).3. We only reported a pair as suspect if they were at the same hotel at thesame time on four different days.Exercise 2 Hadoop (10+10 points)For this exercise, you have to set up and configure your system to use Hadoop.Follow the instructions in Stanford document at http://snap.stanford.edu/class/cs246-2017/homeworks/hw0/tutorialv3.pdf and set up the virtualmachine as described in Section 1. Run the example program of Section 2and carry out the different steps given in that section. The number of days of observation was raised to 5000.1COMP SCI 3306, COMP SCI 7306 Mining Big Data Semester 1, 2019 Run your job on the file http://www.gutenberg.org/files/100/100-0.txt in standalone mode and pseudo-distributed mode and record the output.Exercise 3 Friend Recommendation System (Stanford) (35 points)Write a MapReduce program in Hadoop that implements a simple People YouMight Know social network friendship recommendation algorithm. The key ideais that if two people have a lot of mutual friends, then the system should recommendthat they connect with each other. You have to run the program onthe system setup in Exercise 2 in order to receive points for this exercise.Input: Download the input file from the link: http://snap.stanford.edu/class/cs246-data/hw1q1.zip. The input file contains the adjacency list andhas multiple lines in the following format:Here, is a unique integer ID corresponding to a unique user and is a comma separated list of unique IDs corresponding to the friendsof the user with the unique ID . Note that the friendships are mutual(i.e., edges are undirected): if A is friend with B then B is also friend withA. Algorithm: Let us use a simple algorithm such that, for each user U, thealgorithm recommends N = 10 users who are not already friends with U, buthave the most number of mutual friends in common with U.Output: The output should contain one line per user in the following format:where is a unique ID corresponding to a user and is a comma separated list of unique IDs corresponding to the algorithms recommendationof people that might know, ordered in decreasing numberof mutual friends. Even if a user has less than 10 second-degree friends, outputall of them in decreasing order of the num- ber of mutual friends. If there arerecommended users with the same number of mutual friends, then output thoseuser IDs in numerically ascending order. Also, please provide a description ofhow you are going to use MapReduce jobs to solve this problem. Do not writemore than 3 to 4 sentences for this: we only want a very high-level description ofyour strategy to tackle this problem. Note: It is possible to solve this questionwith a single MapReduce job. But if your solution requires multiple map reducejobs, then that is fine too.For your submission Include your source code2COMP SCI 3306, COMP SCI 7306 Mining Big Data Semester 1, 2019 Include in your writeup a short paragraph describing your algorithm totackle this problem. Include in your writeup the recommendations for the users with followinguser IDs: 924, 8941, 8942, 9019, 9020, 9021, 9022, 9990, 9992, 9993.Exercise 4 MapReduce (15 points)This exercise has 4 parts. In this exercise, you will be writing and implementingtwo MapReduce programs. Both are a bit challenging, but they will helpyou to have a better understanding about the MapReduce implementation. Afteryou write the programs, you will need to answer some questions about them.Remember that neither problem is case sensitive, so transform words to lowercaseor uppercase. Also remember to use the StringTokenizer to find the correctanswers.Part 1: Write a program that processes the FirstInputFile http://www.gutenberg.org/cache/epub/100/pg100.txt and the SecondInputFile http://www.gutenberg.org/files/3399/3399.txt. This program should count the number of wordswith a specific amount of letters in these files - for example, the number of wordswith 4 letters, 5 letters and so on. If one word is repeated 20 times in the text,count it individually 20 times.Part 2: Answer Questions 1-6. Q1: How many words are there with length 10 in FirstInputFile? Q2: How many words are there with length 4 in FirstInputFile? Q3: What is the longest length between words and what is its frequencyin FirstInputFile? Q4: How many words are there with length 2 in SecondInputFile? Q5: How many words are there with length 5 in SecondInputFile? Q6: What is the most frequent length and what is its frequency in SecondInputFile?Part 3: Write a second program that again processes the FirstInputFile http://www.gutenberg.org/cache/epub/100/pg100.txt and the SecondInputFilehttp://www.gutenberg.org/files/3399/3399.txt. However, in addition tocounting the number of words with a specific amount of letters, if one word isrepeated several times, count it only once. So, your output should be the frequencyof words with same length, but count a repeated word only once. Note:You may need to use 2 MapReduce jobs.Part 4: Answer Questions 7-12.3COMP SCI 3306, COMP SCI 7306 Mining Big Data Semester 1, 2019 Q7: How many words are there with length 10 in FirstInputFile? Q8: How many words are there with length 4 in FirstInputFile? Q9: What is the most frequent length and what is its frequency in FirstInputFile? Q10: How many words are there with length 5 in SecondInputFile? Q11: How many words are there with length 2 in SecondInputFile? Q12: What is the second-most frequent length and what is its frequencyin SecondInputFile?Exercise 5 Summary of 2.4 and 2.5 (10 +10 points) (Postgraduate Students(COMP SCI 7306) only)For this exercise you have to read Section 2.3.9-2.3.11, 2.4, and 2.5 in Leskovec,Rajara- man, Ullman (second edition, 2014). Summarize the content of 2.4 in your own words (600 words). Summarize the content of 2.5 in your own words (600 words).3 Procedure for handing in the assignmentWork should be handed in using Canvas. The submission should include: PDF file of your solutions for theoretical assignments. all source files descriptions as required in the statement of the exercises Hadoop outputs for the exercises a README.txt file containing instructions to run the code, the two groupmembers’ names, student numbers, and email addresses of the group members,only one submission per group.4本团队核心人员组成主要包括硅谷工程师、BAT一线工程师,精通德英语!我们主要业务范围是代做编程大作业、课程设计等等。我们的方向领域:window编程 数值算法 AI人工智能 金融统计 计量分析 大数据 网络编程 WEB编程 通讯编程 游戏编程多媒体linux 外挂编程 程序API图像处理 嵌入式/单片机 数据库编程 控制台 进程与线程 网络安全 汇编语言 硬件编程 软件设计 工程标准规等。其中代写编程、代写程序、代写留学生程序作业语言或工具包括但不限于以下范围:C/C++/C#代写Java代写IT代写Python代写辅导编程作业Matlab代写Haskell代写Processing代写Linux环境搭建Rust代写Data Structure Assginment 数据结构代写MIPS代写Machine Learning 作业 代写Oracle/SQL/PostgreSQL/Pig 数据库代写/代做/辅导Web开发、网站开发、网站作业ASP.NET网站开发Finance Insurace Statistics统计、回归、迭代Prolog代写Computer Computational method代做因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com 微信:codehelp

上一篇下一篇

猜你喜欢

热点阅读