Dockerize Apache HAWQ

Posted 大数据社区

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Dockerize Apache HAWQ相关的知识,希望对你有一定的参考价值。

Docker, a lightweight Linux Container engine, becomes more and more popular accompanied with the continuous development of big data and cloud computing. Essentially this is due to convenience involved by Docker in applications building, shipping and running. In this article, we will introduce docker and steps to dockerize Apache HAWQ.


Docker Introduction

The high level architecture of Docker is shown in Figure 1. Linux Container is a kind of lightweight operating system level virtualization method for running multiple isolated containers on a single host. It leverages namespace to do process isolation and cgroup to do resource isolation. And layered FS is used to be combined into a single docker image, which may contain user data and apps. This process just like put patch one by one on the base image to form a new one. With docker image, we can start docker container. It is worth while to note that docker container is not VM. The most important difference is that each VM needs a guest OS on host machine, which result in more cost of starting time and resource consumption than docker container. Now we already talked about docker image and docker container. Dockerfile is another basic element of docker. It contains instructions on command line to assemble a docker image, on purpose of automatically build and version control.


Figure 1: Docker high-level architecture


Docker Installation

We do docker installation on CentOS Linux release 7.0.1406 (Core), the kernel version is 3.10.0-123.el7.x86_64. First we do yum -y update to upgrade kernel and softwares if possible. Later we issue curl -sSL https://get.docker.com/ | sh to install docker engine. Ideally we could start docker via service docker start. However it reports error. After device-mapper-libs and device-mapper-event-libs are yum installed, docker is started up successfully. By default docker daemon process binds to Unix socket which belongs to root, so we have to sudo docker command. To avoid this, simply issue usermod -aG docker $USER. Docker is now ready to go.


Docker Image Build

In order to build Apache HAWQ docker image, we make use of Dockerfile. Keywords below are widly used in Dockerfile:

  • FROM: to specify base image from which you are building

  • RUN: to execute shell command

  • ENV: to set environment variable in docker container

  • ADD/COPY: usually to import files from host

  • EXPOSE: to open service port in docker container

  • CMD/ENTRYPOINT: to be executed when docker container started

Though both ENTRYPOINT and CMD allow you to specify the startup command for an image, there are subtle differences between them:

  1. CMD is overridden by the argument after the image name when starting the container, while ENTRYPOINT can only be overridden by the flag —entrypoint.

  2. Combining ENTRYPOINT and CMD, CMD strings will be appended to be the args of ENTRYPOINT.

  3. When using ENTRYPOINT and CMD, it's important to always use the exec form like ENTRYPOINT ["/bin/ping”,”localhost”], not the shell form ENTRYPOINT /bin/ping localhost.


Now we have some basic knowledge for Dockerfile, let's continue to build image for Apache HAWQ. We choose centos:7 in DockerHub as the base image. First we yum install softwares which is version compatible to Apache HAWQ, like jdk1.7, krb5, libxml2, libcurl, snappy, etc. For other libraries which are not version compatible or not found in yum repo, we install them from specific source, like json-c 0.9, flex 2.5.35, libhdfs3, libyarn, etc. Apache HAWQ development environment is settle down once all these dependencies are successfully installed. This is enough for devel mode. For production mode, we still need to add entrypoint part including Apache HAWQ building and running loggic. To build the image, we issue command docker build hawq:devel <path to Dockerfile>. One pre-built Apache HAWQ docker image has been pushed to DockerHub, you can refer to https://hub.docker.com/r/mayjojo/hawq-devel/.


Docker Image Run

Issue command docker images to check that we already built hawq:devel image in local, we still need hadoop image. We can find it in DockerHub by docker search hadoop, and then docker pull <image>. To start HAWQ container, we use command docker run -d --name=hawq hawq:devel tail -f /dev/null. Issue command docker ps to check that one container named hawq is running in daemon. To login to the env, issue command docker exec -it hawq /bin/bash. Now you can build your HAWQ code, run HAWQ and do everything what you want. If you happen to break the envrionment, just docker kill hawq and rerun a new one. To achieve data persistent or share data between containers, you can simply mount data volumn from host by docker run -v or create a data container docker create -v /data --name=data and run HAWQ/Hadoop container docker run --volumes-from data. The latter is more recommended.


Docker version is still in quick iteration today. More and more exciting features to apply in Apache HAWQ are waiting for us to explore...



更多精彩内容,请关注大数据社区公共帐号!

长按识别图片二维码



以上是关于Dockerize Apache HAWQ的主要内容,如果未能解决你的问题,请参考以下文章

Dockerize Spring Boot mongo

Dockerize adonis.js + mysql

dockerize 容器工具集基本使用

解耦和 Dockerize Django 和 Celery

SBT:如何 Dockerize 一个胖罐子?

text Dockerize Rails(Puma + Postgres + Nginx)