这篇文章主要讲解了“Sqoop2从Mysql导入Hdfs的过程”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“Sqoop2从Mysql导入Hdfs的过程”吧!
一、环境搭建
1.Hadoop
http://my.oschina.net/u/204498/blog/519789
2.Sqoop2.x
http://my.oschina.net/u/204498/blog/518941
3. mysql
二、从mysql导入hdfs
1.创建mysql数据库、表、以及测试数据
xxxxxxxx$ mysql -uroot -p
Enter password:
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
4 rows in set (0.00 sec)
test => 是新建的数据库
mysql> use test;
mysql> show tables;
+----------------------+
| Tables_in_test |
+----------------------+ |
| test |
+----------------------+
1 rows in set (0.00 sec)
test => 是新增的表
mysql> desc test;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(45) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> select * from test;
+----+------+------+
| id | name | age |
+----+------+------+
| 7 | a | 1 |
| 8 | b | 2 |
| 9 | c | 3 |
+----+------+------+
3 rows in set (0.00 sec)
2. 为各个用户授权
注意:sqoop提交job后,各个节点在map阶段会访问数据库,所以需事先授权
mysql> grant [all | select | ...] on {db}.{table} to {user}@{host} identified by {passwd};
mysql> flush privileges;
#我给特定的hostname授权 username:root passwd:root 访问db:test 中任意table,权限是all
mysql> grant all on test.* to 'root'@{host} identified by 'root';
3.启动sqoop2-server
[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ pwd
/home/hadoop/sqoop-1.99.6-bin-hadoop200
[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ ./bin/sqoop2-server start
...
...
webui可以访问校验,也可以查看log
4.启动sqoop2-shell
[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ pwd
/home/hadoop/sqoop-1.99.6-bin-hadoop200
[hadoop@hftclclw0001 sqoop-1.99.6-bin-hadoop200]$ ./bin/sqoop2-shell
...
...
sqoop:000> show version
...
...
sqoop:000> show connector
+----+------------------------+---------+------------------------------------------------------+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+---------+------------------------------------------------------+----------------------+
| 1 | generic-jdbc-connector | 1.99.6 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| 2 | kite-connector | 1.99.6 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| 3 | hdfs-connector | 1.99.6 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| 4 | kafka-connector | 1.99.6 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
+----+------------------------+---------+------------------------------------------------------+----------------------+
根据你的connector创建connector
sqoop:000> create link -c 1 => 先创建jdbc
会填写name、jdbc-driver、url、username、passwd等等
sqoop:000> create link -c 3 => 创建hdfs
会填写name、hdfs url、等等
sqoop:000> show link
+----+-------------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+-------------+--------------+------------------------+---------+
| 3 | 10-21_jdbc1 | 1 | generic-jdbc-connector | true |
| 4 | 10-21_hdfs1 | 3 | hdfs-connector | true |
+----+-------------+--------------+------------------------+---------+
创建job -f=> from -t to 即从哪些导入到哪里
sqoop:000> create job -f 3 -t 4
会填写,相应的table信息。还有hdfs信息
sqoop:000> show job
+----+---------------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+---------------+----------------+--------------+---------+
| 1 | 10-20_sqoopy2 | 1 | 3 | true |
+----+---------------+----------------+--------------+---------+
#启动job
sqoop:000> start job -j 2
...
...
...
可以再webui上访问到,查看进度,也可以使用
sqoop:000> status job -j 2
sqoop的guide
http://sqoop.apache.org/
5.troubleshooting
多看日志,慢慢的排查
感谢各位的阅读,以上就是“Sqoop2从Mysql导入Hdfs的过程”的内容了,经过本文的学习后,相信大家对Sqoop2从Mysql导入Hdfs的过程这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是天达云,小编将为大家推送更多相关知识点的文章,欢迎关注!