一、搭建Hadoop集群
1.修改网络配置文件
- vi /etc/sysconfig/network-scripts/ifcfg-ens33
复制
#配置静态IP
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=72948409-8d3a-40fb-abc3-c1ebc1199d50
DEVICE=ens33
ONBOOT=yes
DNS1=8.8.8.8
IPADDR=192.168.2.2
NETMASK=255.255.255.0
GATWAY=192.168.2.1复制
2.重启网络服务network
service network restart
复制
3.更改主机名
vi /etc/hostname
master复制
4.建立主机名和ip的映射
vi /etc/hosts
192.168.2.2 master
192.168.2.3 slave1
192.168.2.4 slave2复制
5.增加Hadoop用户
adduser hadoop 将jdk,hadoop等安装包拷贝到/home/hadoop目录下 解压相关压缩包 -hadoop用户有sudo的权限
vi /etc/sudoers
找到 root ALL=(ALL) ALL
在这行下面增加一行内容:
hadoop ALL=(ALL) ALL复制
tar -xzvf xxx
复制
6.设置java和hadoop的环境变量
vi /etc/profile
export JAVA_HOME=/home/hadoop/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop-2.7.0
export SPARK_HOME=/home/hadoop/spark-2.3.1-bin-hadoop2.7
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.0.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.0.jar
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$PATH复制
7.让配置文件生效
source /etc/profile
复制
8.检查安装是否成功
java -version
hadoop version复制
9.修改hadoop配置文件
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>复制
hdfs-site.xml
<configuration>
<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:50020</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:50075</value>
</property>
<!-- 存储名称节点永久元数据的目录的列表。名称节点在列表中的每一个目录下存储着元数据的副本 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/data/namenode</value>
</property>
<!-- datanode存放数据块的目录列表。各个数据块分别存放于某一个目录中 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/data/datanode</value>
</property>
<!-- 辅助namenode存放检查点的目录列表。在所列每个目录中均存放一份检查点文件的副本 -->
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/home/hadoop/data/secnamenode</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>复制
修改hadoop-env.sh 配置JAVA_HOME,HADOOP_HOME环境变量(JAVA_HOME必须配置,因为已经配置过了)
修改slaves,删掉原来的内容,添加其他两个节点的主机名
10.接下来配置mapreduce
要修改yarn-site.xml , mapred-site.xml文件
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>复制
修改mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>复制
11.ssh配置
在root用户下输入ssh-keygen -t rsa 一路回车
ssh-keygen -t rsa
复制
运行完后产生私钥id_rsa和公钥id_rsa.pub 把所有的公钥拷贝到authorized_keys,然后分发到所有的机器
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
chmod 600 authorized_keys复制
12.关闭防火墙
systemctl status firewalld
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl mask firewalld.service
vi /etc/selinux/config
SELINUX=disabled复制
13.启动hdfs服务
格式化Namenode
hadoop namenode -format
复制
./start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/hadoop-2.7.0/logs/hadoop-root-namenode-master.out
slave2: starting datanode, logging to /home/hadoop/hadoop-2.7.0/logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /home/hadoop/hadoop-2.7.0/logs/hadoop-root-datanode-slave1.out
Starting secondary namenodes [slave1]
slave1: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.0/logs/hadoop-root-secondarynamenode-slave1.out复制
jps
master:
2615 NameNode
2846 Jps
slave1:
1411 SecondaryNameNode
1589 Jps
1359 DataNode
slave2:
1323 DataNode
1455 Jps复制
如果jps发现没有namenode进程
1、停集群
2、清空各个节点配置的hadoop tmp目录、name目录、data目录、以及hadoop logs目录
3、格式化namenode
$ hadoop namenode -format复制
14.启动yarn集群
./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.7.0/logs/yarn-root-resourcemanager-master.out
slave2: starting nodemanager, logging to /home/hadoop/hadoop-2.7.0/logs/yarn-root-nodemanager-slave2.out
slave1: starting nodemanager, logging to /home/hadoop/hadoop-2.7.0/logs/yarn-root-nodemanager-slave1.out复制
jps
2898 ResourceManager
3158 Jps
2615 NameNode复制
15.web页面访问
1.yarn管理页面访问
CentOS 7.0默认使用的是firewall作为防火墙
关闭firewall:
systemctl stop firewalld.service
http://192.168.2.2:8088复制
2.hadoop管理页面
http://192.168.2.2:50070
复制
3.打开windows 文件的host文件
C:\Windows\System32\drivers\etc\hosts
192.168.2.2 master
192.168.2.3 slave1
192.168.2.4 slave2复制
16.hadoop配置参数详解
http://note.youdao.com/noteshare?id=24e30e29f89e438a566b67840aad4797&sub=2F6870F904FC4494AB84CB88BC74F638
17.增加节点
在配置文件slaves中增加节点域名,重启集群
二、安装spark
1.修改spark-env.sh
export JAVA_HOME=/home/hadoop/jdk1.8.0_171
export SPARK_MASTER_IP=192.168.2.2
export SPARK_MASTER_PORT=7077复制
2.slaves文件修改
spark/conf里,添加内容到slaves,这里有4个运行节点把master也算进去了,master既做管理又做计算。
master
slave1
slave2复制
3.启动hadoop集群后,启动spark集群
./sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark-2.3.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
slave2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-2.3.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave2.out
master: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-2.3.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
slave1: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark-2.3.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave1.out复制
4.查看spark web管理界面
http://192.168.2.2:8080/
三、搭建zookeeper集群
1.zookeeper
修改master上的配置文件 cp zoo_sample.cfg zoo.cfg
# The number of milliseconds of each tick 服务端和客户端之间交互的基本时间单元(ms)
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take zookeeper所能接收的客户端数量
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement 服务端和客户端之间请求和应答之间的时间间隔
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes. 保存zookeeper数据,日志的路径
dataDir=/home/hadoop/apache-zookeeper-3.5.5-bin/data
dataLogDir=/home/hadoop/apache-zookeeper-3.5.5-bin/logs
# the port at which the clients will connect 客户端与zookeeper相互交互的端口
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
#zookpeer集群ip及端口设置
#server.A=B:C:D 其中A是一个数字,代表这是第几号服务器;B是服务器的IP地址;#C表示服务器与群集中的“领者”交换信息的端口;当领导者失效后,D表示用来执行选#举时服务器相互通信的端口。
server.1=192.168.2.2:2888:3888
server.2=192.168.2.3:2888:3888
server.3=192.168.2.4:2888:3888
#修改AdminServer的端口
admin.serverPort=8888复制
2.myid文件($ZOOKEEPER_HOME/data目录下)
master: 1
slave1: 2
slave3: 3复制
3.将zookeeper发送给其它节点
在此zookeeper位置为/home/hadoop下,将发送到node1,2,3的/home/hadoop/下,一定要注意权限问题,如果是root则忽略
scp -r /home/hadoop/apache-zookeeper-3.5.5-bin root@node2:/home/hadoop/
scp -r /home/hadoop/apache-zookeeper-3.5.5-bin root@node3:/home/hadoop/
chown -R hadoop:hadoop /home/hadoop/apache-zookeeper-3.5.5-bin 修改文件所属者和组
chmod -R 777 /home/hadoop/apache-zookeeper-3.5.5-bin 修改访问权限复制
4.关闭防火墙(虚拟机的暂时关闭就可以)
systemctl status firewalld
c
systemctl stop firewalld.service复制
5.依次启动集群机器上的zookeeper
bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.5.5-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Error contacting service. It is probably not running.
先启动其他两台机器上的zookeeper服务
slave1:
[root@slave1 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.5.5-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave1 bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.5.5-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader
slave2:
[root@slave2 bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.5.5-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@slave2 bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.5.5-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
再重新启动master:
master:
[root@master bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.5.5-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@master bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/apache-zookeeper-3.5.5-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower复制
jps查看进程 QuorumPeerMain即为zookeeper的进程
6.附zookeeper便捷启动脚本:
#!/bin/shell
echo "start zookeeper server..."
#hosts里是安装zookeeper的主机名
hosts="node1 node2 node3"
#用循环来分别执行zkServer.sh start的脚本
for host in $hosts
do
echo "$host"
ssh $host "source /etc/profile; /home/hadoop/zookeeper-3.4.10/bin/zkServer.sh start"
done复制
四、Hbase集群搭建
1.修改 habse-env.sh文件
export JAVA_HOME=/home/hadoop/jdk1.8.0_171
复制
使用外部的zookeeper,不使用hbase内置zookeeper,如果是单节点伪分布式集群的话可以省力使用内置zookeeper(推荐)
export HBASE_MANAGES_ZK=false
复制
2.配置hbase-site.xml
<configuration>
<!--hbase的根目录-->
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<!--是否是分布式hbase-->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!--zookeeper所在的节点-->
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<!--hbase的zookeeper数据存放位置-->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/hbase-1.3.0/tmp/zk/data</value>
</property>
</configuration>复制
3.修改regionservers
此配置文件是配置副节点的,即写进的节点都为hbase的副节点
slave1
slave2复制
4.将hbase发送到其它节点
5.在hbase主节点上开启hbase(需要先启动hadoop和zookeeper集群)
start-hbase.sh
复制
6.使用jps指令查看进程
master:
[root@master bin]# jps
4992 Worker
5300 Jps
4057 DataNode
4922 Master
1947 QuorumPeerMain
3963 NameNode
5195 HMaster
4461 ResourceManager
slave1:
[root@slave1 ~]# jps
3312 NodeManager
3616 Jps
3108 DataNode
3476 Worker
1243 QuorumPeerMain
3180 SecondaryNameNode
3583 HRegionServer
slave2:
[root@slave2 ~]# jps
1152 QuorumPeerMain
1297 DataNode
1587 Worker
1428 NodeManager
1814 Jps
1690 HRegionServer复制
7.在web端查看hbase状态
http://master:16010
五、hive集群搭建
hive内嵌模式安装
内嵌模式(元数据保村在内嵌的derby种,允许一个会话链接,尝试多个会话链接时会报错) Derby数据库简单介绍和使用方法
https://blog.csdn.net/kai_wei_zhang/article/details/7749568
复制
添加环境变量
export JAVA_HOME=/home/hadoop/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop-2.7.0
export SPARK_HOME=/home/hadoop/spark-2.3.1-bin-hadoop2.7
export ZOOKEEPER_HOME=/home/hadoop/apache-zookeeper-3.5.5-bin
export HIVE_HOME=/home/hadoop/apache-hive-3.1.2-bin
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin:$PATH复制
先启动hadoop 编辑hive-env.sh
如果已配置过HADOOP_HOME,HIVE_HOME,则cp hive-env.sh.template hive-env.sh
复制
hive命令启动即可
[root@master conf]# hive
which: no hbase in (/home/hadoop/jdk1.8.0_171/bin:/home/hadoop/hadoop-2.7.0/sbin:/home/hadoop/hadoop-2.7.0/bin:/home/hadoop/spark-2.3.1-bin-hadoop2.7/bin:/home/hadoop/apache-zookeeper-3.5.5-bin/bin:/home/hadoop/apache-hive-3.1.2-bin/bin:/home/hadoop/jdk1.8.0_171/bin:/home/hadoop/hadoop-2.7.0/sbin:/home/hadoop/hadoop-2.7.0/bin:/home/hadoop/spark-2.3.1-bin-hadoop2.7/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.7.0/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = caae22a0-672b-465f-9ab0-b0b598b57810
Logging initialized using configuration in jar:file:/home/hadoop/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>复制
然后我们看到在hadoop的HDFS上已经创建了对应的目录
复制
1.编辑/home/hadoop/.bash_profile
export JAVA_HOME=/home/hadoop/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop-2.7.0
export SPARK_HOME=/home/hadoop/spark-2.3.1-bin-hadoop2.7
export ZOOKEEPER_HOME=/home/hadoop/apache-zookeeper-3.5.5-bin
export HIVE_HOME=/home/hadoop/apache-hive-3.1.2-bin
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$ZOOKEEPER_HOME/bin:$HIVE_HOME/bin:$PATH复制
2.安装配置mysql,检查并卸载掉centos自带的mysql(虚拟机上没有,忽略这一步操作)
tar -xvf MySQL-5.6.26-1.linux_glibc2.5.x86_64.rpm-bundle.tar
复制
[hadoop@master app]$ rpm -qa |grep mysql
mysql-libs-5.1.71-1.el6.x86_64
[hadoop@master app]$ rpm -e mysql-libs-5.1.71-1.el6.x86_64 --nodeps复制
3.准备mysql安装包,上传至Linux服务器并解压
tar -xvf MySQL-5.6.26-1.linux_glibc2.5.x86_64.rpm-bundle.tar
复制
4.安装服务端Server
rpm -ivh MySQL-server-5.6.26-1.linux_glibc2.5.x86_64.rpm
复制
5.安装客户端client
rpm -ivh MySQL-client-5.6.26-1.linux_glibc2.5.x86_64.rpm
复制
6.开启MySQL服务
service mysql start
复制
7.编辑hive-env.sh
export JAVA_HOME=/home/hadoop/jdk1.8.0_171
export HADOOP_HOME=/home/hadoop/hadoop-2.7.7
export SPARK_HOME=/home/hadoop/spark-2.3.1-bin-hadoop2.7
export ZOOKEEPER_HOME=/home/hadoop/apache-zookeeper-3.5.5-bin
export HIVE_HOME=/home/hadoop/apache-hive-3.1.2-bin
export HIVE_CONF_DIR=$HIVE_HOME/conf
#export HIVE_AUX_JARS_PATH=$SPARK_HOME/lib/spark-assembly-1.6.0-hadoop2.6.0.jar
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$HADOOP_HOME/lib:$HIVE_HOME/lib#export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS"复制
8. 编辑hive-site.xml
更多内容欢迎关注公众号
文章转载自程序猿小P,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。
评论
相关阅读
国产数据库需要扩大场景覆盖面才能在竞争中更有优势
白鳝的洞穴
623次阅读
2025-04-14 09:40:20
Hologres x 函数计算 x Qwen3,对接MCP构建企业级数据分析 Agent
阿里云大数据AI技术
608次阅读
2025-05-06 17:24:44
一页概览:Oracle GoldenGate
甲骨文云技术
487次阅读
2025-04-30 12:17:56
GoldenDB数据库v7.2焕新发布,助力全行业数据库平滑替代
GoldenDB分布式数据库
477次阅读
2025-04-30 12:17:50
优炫数据库成功入围新疆维吾尔自治区行政事业单位数据库2025年框架协议采购!
优炫软件
365次阅读
2025-04-18 10:01:22
XCOPS广州站:从开源自研之争到AI驱动的下一代数据库架构探索
韩锋频道
301次阅读
2025-04-29 10:35:54
千万级数据秒级响应!碧桂园基于 EMR Serverless StarRocks 升级存算分离架构实践
阿里云大数据AI技术
298次阅读
2025-04-27 15:28:51
Coco AI 入驻 GitCode:打破数据孤岛,解锁智能协作新可能
极限实验室
249次阅读
2025-05-04 23:53:06
优炫数据库成功应用于晋江市发展和改革局!
优炫软件
201次阅读
2025-04-25 10:10:31
首批!百度智能云向量数据库以优异成绩通过中国信通院向量数据库性能测试
百度智能云
191次阅读
2025-05-08 19:35:25