谷歌云数据工程师考试 - Bigtable复习笔记

2018-07-21  本文已影响0人  塞小娜

Bigtable Summary

What is?
-> more expensive because you pay for the number of nodes that you are using
-> if 10 nodes, 100,000 queries per second with 6 millisecond latency
-> low latency
-> high throughput -> fast
-> structured data
-> NOT transactional
-> NOT SQL
-> global availability
-> durable, replicated, and you can get access to it

Screen Shot 2018-06-27 at 1.37.00 pm.png

[图片上传中...(Screen Shot 2018-06-26 at 11.04.42 am.png-5ada72-1532174291870-0)]

Serverless?
No

Benefits

What good for?
Storing time-series data in Cloud Bigtable is a natural fit

How to use?

cbt

HBase shell

Indexing
-> can only be indexed by row key. none of other columns can be indexed

Design
As a summary:

Get a balance between:
Distribute the reading load between tablets (you don’t want reading to be to only one tablet)
AND
Distribute the writing load between tablets (you don’t want writing to be to only one tablet)
AND
Design a row key to allow common queries to return consecutive rows

先看要query的东西在不在key里

然后看key有没有以下东西,避免hotspotting

Avoid using a row key that’s a domain or starts with a domain (can be part of domain though)

-> because certain domains are extremely active than others

-> the tablets corresponding to those customers are going to cause hot spotting

Avoid using User ID as row key if user IDs are sequentially assigned

-> it is OK if your user ID is randomly assigned e.g. by a hash code

-> because in many applications, newer users are going to be more active than users that were created 6-7 years ago

-> so if the User IDs are assigned in sequential order, the tablets that correspond to new users will tend to be more active -> hots potting

Avoid using a static identifier as a key, especially if you have a static identifier that’s going to keep getting used

-> if you have row key that’s mem usage or CPU usage or disk usage and you keep updating them over and over again, those nodes that do processing for these constantly updated data will get overworked

Avoid using dates as most writes will have the latest dates, thus same tablets -> hot spotting

上一篇下一篇

猜你喜欢

热点阅读