Python happaybase使用Thrift API和Fi

2019-05-13  本文已影响0人  EAST4021

1 背景

HappyBase是一个开发人员友好的Python库,可与Apache HBase进行交互。 HappyBase为应用程序开发人员提供了Pythonic API与HBase交互。这些api包括:

详细的文档参考这里
happybase在scanapi中也提供了hbase thrift的Filter查询接口,但是却没有详细的Filter语法文档,在互联网上也没有找到很详细的文档。
为此,我查看了hbase的的文档,翻译了Thrift API and Filter Language部分的内容。接下来2.1介绍了happybase的scan接口,2.2为hbase的翻译内容。

2 使用Filter进行复杂的Hbase查询

2.1 happybase的scan接口

scan(row_start=None, row_stop=None, row_prefix=None, columns=None, filter=None, timestamp=None, include_timestamp=False, batch_size=1000, scan_batching=None, limit=None, sorted_columns=False, reverse=False)
其中的filter参数就是用于hbase的Filter查询。下面是一个简单的示例:

import happybase
hbase_host = ''
hbase_port = 9090
# hbase连接
conn = happybase.Connection(host=hbase_host, port=hbase_port)
table = conn.table('test')
# filter
scan_filter = "SingleColumnValueFilter('info', 'item_delivery_status', =, 'binary:1', true, true) " 
# 查询
result = table.scan(filter=scan_filter)
# 打印查询结果
for row_key, item in result:
    print(row_key)
    print(item)

2.2 Filter语法

这一部分的文字内容翻译自:hbase文档-Thrift API and Filter Language,代码为自己书写,使用时需要将host、表名、列名等信息更改为自己信息。

2.2.1 基本查询语法

"FilterName (argument, argument,... , argument)"

语法指导:

2.2.2 多个过滤条件和逻辑运算符

二元运算符

一元运算符

例子

(Filter1 AND Filter2) OR (Filter3 AND Filter4)

运算优先级

例子1

Filter1 AND Filter2 OR Filter
is evaluated as
(Filter1 AND Filter2) OR Filter3

例子2

Filter1 AND SKIP Filter2 OR Filter3
is evaluated as
(Filter1 AND (SKIP Filter2)) OR Filter3

2.2.3 比较运算符

用户需要使用这些符号 (<, ⇐, =, !=, >, >=) 表示比较运算符

2.2.4 比较器(Comparator)

比较器的语法是: ComparatorType:ComparatorValue

ComparatorType与comparators的对应关系如下:

例子

  1. binary:abc 将匹配字典序大于 abc的数据;
  2. binaryprefix:abc 将匹配前三个字符的字典序与abc相等的数据;
  3. regexstring:ab*yz 将会根据正则表达式 ab*yz 进行匹配(该正则表达式表示:不以ab为开头和以yz为结束的数据)
  4. substring:abc123将会匹配包含子字符串 abc123 的数据

2.2.5 Filter

英文原文: This filter doesn’t take any arguments. It returns only the key component of each key-value.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "KeyOnlyFilter()"
result = table.scan(filter=scan_filter)
for item in result:
    print(item)

英文原文: This filter doesn’t take any arguments. It returns only the first key-value from each row.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "FirstKeyOnlyFilter()"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes one argument – a prefix of a row key. It returns only those key-values present in a row that starts with the specified row prefix

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "PrefixFilter('0047a')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes one argument – a column prefix. It returns only those key-values present in a column that starts with the specified column prefix. The column prefix must be of the form: “qualifier”.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "ColumnPrefixFilter('box')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes a list of column prefixes. It returns key-values that are present in a column that starts with any of the specified column prefixes. Each of the column prefixes must be of the form: “qualifier”.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "MultipleColumnPrefixFilter('box', 'create')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes one argument – a limit. It returns the first limit number of columns in the table.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "ColumnCountGetFilter(6)"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes one argument – a page size. It returns page size number of rows from the table.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "PageFilter(5)"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes two arguments – a limit and offset. It returns limit number of columns after offset number of columns. It does this for all the rows.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "ColumnPaginationFilter(3, 7)"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes one argument – a row key on which to stop scanning. It returns all key-values present in rows up to and including the specified row.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "InclusiveStopFilter('005c2_4530489164_10599261608')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes a list of timestamps. It returns those key-values whose timestamps matches any of the specified timestamps.

英文原文: This filter takes a compare operator and a comparator. It compares each row key with the comparator using the compare operator and if the comparison returns true, it returns all the key-values in that row.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "RowFilter(=, 'binary:0047a_4530641731_102627717')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes a compare operator and a comparator. It compares each column family name with the comparator using the compare operator and if the comparison returns true, it returns all the Cells in that column family.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "FamilyFilter(=, 'binary:info')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes a compare operator and a comparator. It compares each qualifier name with the comparator using the compare operator and if the comparison returns true, it returns all the key-values in that column.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "QualifierFilter(=, 'binary:item_delivery_status')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes a compare operator and a comparator. It compares each value with the comparator using the compare operator and if the comparison returns true, it returns that key-value.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "ValueFilter(=, 'binary:2')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes two arguments – a family and a qualifier. It tries to locate this column in each row and returns all key-values in that row that have the same timestamp. If the row doesn’t contain the specified column – none of the key-values in that row will be returned.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "DependentColumnFilter('info', 'store_code')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes a column family, a qualifier, a compare operator and a comparator. If the specified column is not found – all the columns of that row will be emitted. If the column is found and the comparison with the comparator returns true, all the columns of the row will be emitted. If the condition fails, the row will not be emitted.

注意⚠️: 实际上,该filter还有两个参数 <filterIfColumnMissing_boolean>、<latest_version_boolean>分别表示是否过滤缺失数据的行、是否只取最近的版本

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "SingleColumnValueFilter(
    'info', 'item_delivery_status', =, 'binary:2', true, true)"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter takes the same arguments and behaves same as SingleColumnValueFilter – however, if the column is found and the condition passes, all the columns of the row will be emitted except for the tested column value.

conn = happybase.Connection(host=TEST_HBASE_HOST)
table = conn.table('openapi:openapi_suning_purchase_order')
scan_filter = "SingleColumnValueExcludeFilter(
    'info', 'item_delivery_status', =, 'binary:2')"
result = table.scan(filter=scan_filter)
for index, item in enumerate(result):
    print(item)

英文原文: This filter is used for selecting only those keys with columns that are between minColumn and maxColumn. It also takes two boolean variables to indicate whether to include the minColumn and maxColumn or not.

3 参考资料

1.happybase文档
2.hbase文档-Thrift API and Filter Language

上一篇 下一篇

猜你喜欢

热点阅读