AWS
2017-09-20 本文已影响8人
klory
IaaS (Infrastructure as a Service, like Azure, Google Could, these kinds of virtual services), this is used to build large companies involved in different kinds of servers.
Not like DigitalOcean and Linode(VPS - virtual process service). It is more for building wordpress or kind of small websites involved in single server.
Services
- CDN (CloudFront)
Content deliver network, make you to access the website from the closest place. - Glacier
Store data that is not used frequently - Storage
Store data that is used frequently - Virtual Server
- Lambda
Pure compute without worrying the server. - Database
Benefits
- Scalable(just spend more money)
- Total Cost of Ownership is low , you need to hire people to deal with different servers and modules, like power, cooler, etc.
- Highly reliable for price point
- Centralized Billing and Management
Problems
- lock in
- learning curve
- cost adds up
Pricing
- compute
- storage
- bandwidth
- interaction
Normal File system
- Linux default disk block size = 4 KB, file smaller than a block, the rest of the block will be wasted
- GFS <-> HDFS
- MapReduce <-> Hadoop
HDFS
- Specially designed FS for storing big data with a streaming access pattern (write once and read as many as you want)
- default disk block size = 64MB, file smaller than a block, the rest of the block will NOT BE wasted
Hadoop
daemons
- master daemons: name node, secondary name node, job tracker
- slaves daemons: data node, task tracker
example - theory
- we(client) have 200MB data, so we need 4 blocks
- we need 1 name node(nn), and several data node(dn), e.g. 8 data nodes.
- nn creates metadata, creates daemons.
- nn passes metadata back to client. Then client distributes the blocks to the data nodes and make replications based on the info from name node.
- the data nodes send heartbeats back to the nn to notify that it is alive.
- client sends code to the data node
- job tracker tells task trackers to do its job
- after the job are finished, the job tracker will assign a reducer.
example - real world
- split data(documents) into input splits, and pass them to Record Readers,
then send them to the mapper. (default for text jobs is to split document into lines then send the lines to the mappers). - then shuffle the data to make the pairs with the same key together, default shuffle(sort) in Hadoop is alphabetically.
- then reduce (each reducer reduces one key)
HDFS instructions
- step 1
hdfs dfs -ls /
,hdfs dfs -mkdir
,hdfs dfs -put
,hdfs dfs -get
- step 2 move file to hdfs
hdfs dfs -put input.txt /user/class/
- step 3 complie
javac -cp $HADOOP_core.jar *.java
- step 4
jar cvf test.jar *.class
- step 5
hadoop jar wordcount.jar ...WordCount
Setup
Setup your AWS accounts by following the below steps:
- Go to AWS (https://aws.amazon.com/) and create an account. You need to enter your credit card info.
- You can find an AWS account number in your AWS profile. Use that account number to apply for AWS educate credits at https://aws.amazon.com/education/awseducate/apply/ It will take a few hours before your receive an email confirming your credits are active.
If you have not received your AWS educate credits and are not using free tier services you will be charged on your credit card for usage! If you do, you will be responsible for any costs incurred.