Hive與Hbase整合
Hive與Hbase整合
我們這邊開始使用hbase做實時查詢,但是分析的任務還是得交給hive,hive計算的結果導入到hbase.
hive提供了幾個jar包,幫助我們實現:
- 創建與hbase共享的表,數據(數據和表兩邊都有)
- 映射來自hbase的表到hive
- hive查詢的結果直接導入hbase
啟動hive
啟動命令如下,主要是指定jar包,以及hbase使用的zookeeper的地址
bin/hive --auxpath /opt/CDH/hive/lib/hive-hbase-handler-0.10.0-cdh4.3.2.jar,/opt/CDH/hive/lib/hbase-0.94.6-cdh4.3.2.jar,/opt/CDH/hive/lib/zookeeper-3.4.5-cdh4.3.2.jar,/opt/CDH/hive/lib/guava-11.0.2.jar -hiveconf hbase.zookeeper.quorum=192.168.253.119,192.168.253.130
測試表
我們先在hive中創建測試表:
//create hive tmp table
CREATE TABLE pokes (foo INT, bar STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
//test.txt數據格式:1 hello
//插入數據到hive表
LOAD DATA INPATH '/user/mapred/test.txt' OVERWRITE INTO TABLE pokes;
創建hive-hbase表
在hive中創建表時,製定映射到對應的hbase表,默認兩邊的表名字一樣。
//create table share with hbase
hive> CREATE TABLE hbase_hive_table(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") ;
切換到hbase shell,查看一下表是否存在:
hbase(main):007:0> describe 'hbase_hive_table'
DESCRIPTION ENABLED
{NAME => 'hbase_hive_table', FAMILIES => [{NAME => 'cf1', DATA_BL true
OCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE
=> '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '
0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE
=> '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOC
KCACHE => 'true'}]}
1 row(s) in 0.0800 seconds
寫數據測試
//insert test
hive> INSERT OVERWRITE TABLE hbase_hive_table SELECT * FROM pokes WHERE foo=1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201407241659_0007, Tracking URL = https://centos149:50030/jobdetails.jsp?jobid=job_201407241659_0007
Kill Command = /opt/CDH/hadoop/share/hadoop/mapreduce1/bin/hadoop job -kill job_201407241659_0007
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2014-08-07 16:15:14,505 Stage-0 map = 0%, reduce = 0%
2014-08-07 16:15:20,010 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.46 sec
2014-08-07 16:15:21,087 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.46 sec
2014-08-07 16:15:22,190 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.46 sec
2014-08-07 16:15:23,200 Stage-0 map = 100%, reduce = 100%, Cumulative CPU 2.46 sec
MapReduce Total cumulative CPU time: 2 seconds 460 msec
Ended Job = job_201407241659_0007
1 Rows loaded to hbase_hive_table
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 2.46 sec HDFS Read: 196 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 460 msec
OK
Time taken: 34.594 seconds
我們切換到hbase shell,查看一下表是否已經寫入信息:
hbase(main):005:0> scan 'hbase_hive_table'
ROW COLUMN+CELL
1 column=cf1:val, timestamp=1407399353262, value=hello
如果想要提高寫入hbase表的速度,可以添加如下設置,關閉wal預寫日誌
//hbase write maybe slow, because of wal, so set to false
set hive.hbase.wal.enabled=false;
Reference
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
最後更新:2017-04-03 05:39:47