软件安装

软件安装

Window/Linux

安装 hadoop

jdk 路径不能包含空格,否则启动会报:Error JAVA_HOME is incorrectly set.

推荐:hadoop 2.8.3 版本 以便使用 winutils 地址:https://github.com/DNGiveU/winutils

  1. 配置 HADOOP_HOME
  2. core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
  1. hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <!-- 在当前磁盘根路径下创建 distributed 目录 -->
    <property>
       <name>dfs.namenode.name.dir</name>
       <value>/distributed/hadoop/data/namenode</value>
    </property>
    <property>
       <name>dfs.datanode.data.dir</name>
       <value>/distributed/hadoop/data/datanode</value>
    </property>
</configuration>
  1. yarn-site.xml
<configuration>
    <!-- nodemanager要求的内存最低为1024 -->
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>1</value>
    </property>
</configuration>
  1. hadoop-env.cmd

设置 JAVA_HOME

  1. 格式化 hdfs 系统

bin/hdfs namenode -format

  1. 启动

sbin/start-all.cmd

  1. 验证

游览器地址:http://localhost:8088 或者 http://localhost:50070

安装 Spark

  1. jdk 1.8+
  2. scala latest
  3. spark-x-bin-hadoopx.tgz
  4. hadoopx+
  5. winutils替换hadoop bin目录

hadoop 推荐2.8.5,因为安装HBase 2.2.2版本时需要hadoop 2.8.5版本支持。

winutils 地址:https://github.com/DNGiveU/winutils

安装:

  1. 配置 SPARK_HOME、HADOOP_HOME

  2. 验证

执行 spark-shell,访问:localhost:4040

如果是集群安装,请修改各机器的hostname以方便识别。

安装 HBase

  1. jdk 1.8+
  2. HBase 2.2.2
  3. hadoop 2.8.5

HBase 各版本对应所需的 hadoop 版本:http://hbase.apache.org/book.html#hadoop

安装
  1. 配置 HBASE_HOME、HADOOP_HOME

  2. hbase-env.cmd 修改

@rem jdk 位置 或者使用 %JAVA_HOME%
set JAVA_HOME=D:\software\jdk\jdk1.8.0_151
@rem hbase 安装目录
set HBASE_CLASSPATH=D:/software/hbase/hbase-2.2.2-bin/hbase-2.2.2-bin/hbase-2.2.2
set HBASE_MANAGES_ZK=true
  1. hbase-site.xml 修改
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<!-- hbase 数据存储目录,如:表数据等等 -->
	<property>
		<name>hbase.rootdir</name>
		<value>file:///D:/software/hbase/root</value>
	</property>

	<!-- zk 数据。 表schame数据等 -->
	<property>
		<name>hbase.zookeeper.property.dataDir</name>
		<value>D:/software/hbase/zoo</value>
	</property>

	<property>
		<name>hbase.unsafe.stream.capability.enforce</name>
		<value>false</value>
	</property>
	
	<property>
		<name>hbase.zookeeper.quorum</name>
		<value>127.0.0.1</value>
	</property>
	
	<!-- 单机模式 -->
	<property>
		<name>hbase.cluster.distributed</name>
		<value>false</value>
	</property>

	<!-- 增加统计支持 -->
	<property> 
		<name>hbase.coprocessor.user.region.classes</name>
		<value>org.apache.hadoop.hbase.coprocessor.AggregateImplementation</value>
	</property>
</configuration>
  1. 启动

执行 bin/start-hbase.cmd(如果启动失败,请先启动hadoop)

  1. 验证

执行 hbase shell 或者 游览器打开网址:http://localhost:16010/master-status

HBase Indexer 安装

地址:https://github.com/DNGiveU/hbase-indexer

安装文档:https://github.com/NGDATA/hbase-indexer/wiki/Installation

安装
  1. 构建 HBase Indexer 源码

mvn clean package -Pdist -DskipTests

  1. hbase-index-site.xml
<configuration>
   <property>
      <name>hbaseindexer.zookeeper.connectstring</name>
      <value>127.0.0.1:2181</value>
   </property>
   <property>
      <name>hbase.zookeeper.quorum</name>
      <value>127.0.0.1</value>
   </property>
</configuration>
  1. hbase 配置 hbase-site.xml
<configuration>
  <!-- SEP is basically replication, so enable it -->
  <property>
    <name>hbase.replication</name>
    <value>true</value>
  </property>
  <!-- Source ratio of 100% makes sure that each SEP consumer is actually
       used (otherwise, some can sit idle, especially with small clusters) -->
  <property>
    <name>replication.source.ratio</name>
    <value>1.0</value>
  </property>
  <!-- Maximum number of hlog entries to replicate in one go. If this is
       large, and a consumer takes a while to process the events, the
       HBase rpc call will time out. -->
  <property>
    <name>replication.source.nb.capacity</name>
    <value>1000</value>
  </property>
  <!-- A custom replication source that fixes a few things and adds
       some functionality (doesn't interfere with normal replication
       usage). -->
  <property>
    <name>replication.replicationsource.implementation</name>
    <value>com.ngdata.sep.impl.SepReplicationSource</value>
  </property>
</configuration>
  1. 复制 HBase indexer jar

复制Hbase Indexer Sep 相关jar到 HBase 的lib中:cp lib/hbase-sep-* $HBASE_HOME/lib

  1. 启动 solr

%SOLR_HOME%/bin/solr start, 可根据提示端口访问 http://localhost:8983/solr/

或者

$SOLR_HOME/bin目录下:

./solr -c -z localhost:2181 -e cloud

$SOLR_HOME/server目录下:

java -Dbootstrap_confdir=./solr/demo/conf -Dcollection.configName=demo -DzkHost=localhost:2181/solr -jar start.jar

  1. 启动 HBase Index

%HBASE_INDEXER_HOME%/bin/hbase-indexer server

Hive 安装

  1. jdk 1.8+
  2. hadoop
  3. hive 3.1.1+
  4. mysql

文档:https://cwiki.apache.org/confluence/display/Hive/GettingStarted

hive 需要依赖 hdfs

步骤

  1. 配置 hadoop (参考 hadoop 安装)
  2. 启动 hdfs

如果出现拒绝连接22端口,则需要检查是否安装openssh-server服务,以及hadoop-env.sh中是否有暴露HADOOP_SSH_OPTS="-p 22"属性

  1. 启动 mysql
  2. 配置 hive-env.sh

复制模板创建:cp hive-env.sh.template hive-env.sh

HADOOP_HOME=$HADOOP_HOME
export HIVE_CONF_DIR=/home/google/software/apache-hive-3.1.1-bin/conf
export HIVE_AUX_JARS_PATH=/home/google/software/apache-hive-3.1.1-bin/lib
  1. 添加默认配置

cp hive-default.xml.template hive-default.xml

  1. 配置 hive-site.xml

创建:vi hive-site.sh

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <!-- 如果是集群模式运行则将值设置为 yarn,并在hdfs中创建 tmp 和 /user/hive/warehouse 文件夹(dfs -mkdir /tmp)以及赋予权限:dfs -chmod g+w /tmp -->
  <property>
    <name>mapreduce.framework.name</name>
    <value>local</value>
  </property>
  <property>
    <name>mapred.local.dir</name>
    <value>/tmp/hadoop/mapred/local</value>
  </property>
  <property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/hive_meta?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for aJDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for aJDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
  <description>username to use againstmetastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>admin</value>
  <description>password to use againstmetastore database</description>
</property>
<property>
  <name>hive.metastore.schema.verification</name>
  <value>false</value>
</property>
<property>
  <name>datanucleus.schema.autoCreateAll</name>
  <value>true</value>
</property>
</configuration>

  1. 添加 mysql connector jar 至$HIVE_HOME/lib
  2. 启动 hive

./bin/hive

如果启动出现 SQL 创建语法错误,请删除 hive_meta 手动创建元数据

drop database hive_meta;
create database hive_meta;
use hive_meta;
source $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-3.1.0.mysql.sql

Hive 整合 Hbase

将hive中的 hive-hbase-handler-*.jar 拷贝到hbase/lib中:

cp $HIVE_HOME/lib/hive-hbase-handler-*.jar $HBASE_HOME/lib

Spark 整合 hive

文档: http://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html

  1. $SPARK_HOME/conf 新增 hive-site.xml
  2. 拷贝 $HADOOP_HOME/etc/hadoop/core-site.xml 至 $SPARK_HOME/conf
  3. 拷贝 $HADOOP_HOME/etc/hadoop/hdfs-site.xml 至 $SPARK_HOME/conf

软件地址

Scala: https://www.scala-lang.org/download/ Hadoop: https://archive.apache.org/dist/hadoop/common/ Spark: https://spark.apache.org/downloads.html HBase: http://hbase.apache.org/downloads.html Hive: https://hive.apache.org/downloads.html

# Software 

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×