Hadoop - Pig

2017-12-31  本文已影响0人  Xiangyuan_Ren

Ambari

ssh yourname@127.0.0.1 -p 2222
su root
ambari-admin-password-reset

Pig Concept

image.png
  1. Grunt
  2. Script
  3. Ambari

example -> find the oldest 5-star movie

image.png

ratings = LOAD 'ml-100k/u.data' AS (userID:int, movieID:int, rating:int, ratingTime: int);

metadata = LOAD 'ml-100k/u.item' USING PigStorage('|') AS (movieID:int, movieTitle:chararray, releaseDate:chararray,
videoRelease:chararray, imdbLink:chararray);

nameLookup = FOREACH metadata GENERATE movieTitle, ToUnixTime(ToDate(releaseDate, 'dd-MM-yyyy')) AS releaseTime;

ratingsByMovie = Group ratings BY movieID;

*Return Result:

avgRatings = Foreach ratingsByMovie Generate group AS movieID, AVG(ratings.rating) AS avgRating;
fiveStarMovies= Filter avgRatings By avgRating > 4.0;
fiveStarsWithData = join fiveStarMovies by movieID, nameLookup by movieID;
oldestFiveStarMovie = order fiveStarsWithData by nameLookup::releaseTime;

dump oldestFiveStarMovie;

上一篇 下一篇

猜你喜欢

热点阅读