Big Data

2016-02-22  本文已影响16人  虎耳

[TOC]

- [BigData](#bigdata)
    - [Glossary](#glossary)
    - [Hadoop Ecosystem](#hadoop-ecosystem)

BigData

Glossary

Storage Layer, distributed, scalable, java-based, large volumes of unstructured data

Compute Layer, software framework. Jobs, Map function, Reduce function

a framework, Hadoop-based warehousing like. HiveSQL: SQL like language,
convert to MapReduce to query Hadoop

Hadoop-based language, for data pipelines

non-relational database, open source implementation of Googl BigTable. Column DB
lookups in Hadoop, add transaction capability on Hadoop

framework, populate data Hadoop with data.
could be used to collect logs, agent(file,syslog), collector, storage(file, HDFS)

workflow processing system, support multiple language. similar to Aether

web-based tool, to depoy/manage/monitor Hadoop cluster

RPC and data serialization framework
no need run code-gen when schema Change
similar to Thrift/ProtocolBuffer

data mining lib, implement modelling using Map Reduce model

connective tool, move data from non-Hadoop data store to Hadoop

Provide distributed configuration service, synchronization service and
naming registry

abstract compute resoure (CPU, memory, storage) from machines (physical or virtual)

Hadoop Ecosystem

Hadoop Ecocsystem
st=>start: Start:>http://www.google.com[blank]
e=>end:>http://www.google.com
op1=>operation: My Operation
sub1=>subroutine: My Subroutine
cond=>condition: Yes
or No?:>http://www.google.com
io=>inputoutput: catch something...

st->op1->cond
cond(yes)->io->e
cond(no)->sub1(right)->op1
上一篇 下一篇

猜你喜欢

热点阅读