Hadoop分布式文件系统(HDFS)是Hadoop生态系统的核心组件之一。在Docker环境中部署HDFS可以提供隔离的、可移植的测试和开发环境。以下是几种在Docker中创建HDFS文件系统的常用方法。
拉取官方镜像:
docker pull sequenceiq/hadoop-docker:2.7.1
运行容器:
docker run -it sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash
验证HDFS:
hdfs dfs -ls /
创建docker-compose.yml:
version: '3'
services:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
container_name: namenode
ports:
- "50070:50070"
environment:
- CLUSTER_NAME=test
volumes:
- namenode:/hadoop/dfs/name
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
container_name: datanode
depends_on:
- namenode
environment:
- CORE_CONF_fs_defaultFS=hdfs://namenode:8020
volumes:
- datanode:/hadoop/dfs/data
volumes:
namenode:
datanode:
启动集群:
docker-compose up -d
创建Dockerfile:
FROM ubuntu:18.04
RUN apt-get update && apt-get install -y \
openjdk-8-jdk \
wget \
ssh \
rsync
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ENV HADOOP_VERSION=3.2.1
ENV HADOOP_URL=https://archive.apache.org/dist/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz
RUN wget $HADOOP_URL && \
tar -xzvf hadoop-$HADOOP_VERSION.tar.gz && \
mv hadoop-$HADOOP_VERSION /usr/local/hadoop && \
rm hadoop-$HADOOP_VERSION.tar.gz
ENV PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
# 配置SSH无密码登录
RUN ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa && \
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
chmod 0600 ~/.ssh/authorized_keys
COPY core-site.xml /usr/local/hadoop/etc/hadoop/
COPY hdfs-site.xml /usr/local/hadoop/etc/hadoop/
CMD ["/bin/bash"]
构建并运行:
docker build -t custom-hadoop .
docker run -it custom-hadoop
核心配置文件:
core-site.xml
:配置HDFS的默认文件系统URIhdfs-site.xml
:配置HDFS特定参数,如副本数、数据目录等网络配置:
数据持久化:
-v ./hdfs/namenode:/hadoop/dfs/name
权限问题:
hdfs dfs -chmod -R 777 /
Web UI无法访问:
DataNode无法连接NameNode:
core-site.xml
中的fs.defaultFS配置通过以上方法,您可以在Docker环境中快速搭建HDFS文件系统,为大数据开发和测试提供便利的环境。