电商数仓superset

Posted 今夜月色很美

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了电商数仓superset相关的知识,希望对你有一定的参考价值。

1 Superset入门

1.1 Superset概述

Apache Superset是一个开源的、现代的、轻量级BI分析工具,能够对接多种数据源、拥有丰富的图表展示形式、支持自定义仪表盘,且拥有友好的用户界面,十分易用。

1.2 Superset应用场景

由于Superset能够对接常用的大数据分析工具,如Hive、Kylin、Druid等,且支持自定义仪表盘,故可作为数仓的可视化工具。

2 Superset安装及使用

Superset官网地址:http://superset.apache.org/

2.1 安装Python环境

Superset是由Python语言编写的Web应用,要求Python3.7的环境。

2.1.1 安装Miniconda

conda是一个开源的包、环境管理器,可以用于在同一个机器上安装不同Python版本的软件包及其依赖,并能够在不同的Python环境之间切换,Anaconda包括

Conda、Python以及一大堆安装好的工具包,比如:numpy、pandas等,Miniconda包括Conda、Python。

此处,我们不需要如此多的工具包,故选择MiniConda。

1)下载Miniconda(Python3版本)

下载地址:https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

2)安装Miniconda

(1)执行以下命令进行安装,并按照提示操作,直到安装完成。

bash Miniconda3-latest-Linux-x86_64.sh

(2)在安装过程中,出现以下提示时,可以指定安装路径

(3)出现以下字样,即为安装完成

3)加载环境变量配置文件,使之生效

source ~/.bashrc

4)取消激活base环境

Miniconda安装完成后,每次打开终端都会激活其默认的base环境,我们可通过以下命令,禁止激活默认base环境。

conda config --set auto_activate_base false

2.1.2 创建Python3.7环境

1)配置conda国内镜像

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
conda config --set show_channel_urls yes

2)创建Python3.7环境

conda create --name superset python=3.7(请使用python3.6,使用python3.7和3.8在安装superset环节都遇到了各种问题,换成python3.6解决)

说明:conda环境管理常用命令

创建环境:conda create -n env_name

查看所有环境:conda info --envs

删除一个环境:conda remove -n env_name --all

3)激活superset环境

conda activate superset

激活后效果如下图所示

说明:退出当前环境

conda deactivate

4)执行python命令查看python版本

2.2 Superset部署

2.2.1 安装依赖

安装Superset之前,需安装以下所需依赖

sudo yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel python-setuptools openssl-devel cyrus-sasl-devel openldap-devel

2.2.2 安装Superset

1)安装(更新)setuptools和pip

pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/

说明:pip是python的包管理工具,可以和centos中的yum类比

2)安装Supetset

pip install apache-superset -i https://pypi.douban.com/simple/

说明:-i的作用是指定镜像,这里选择国内镜像

注:如果遇到网络错误导致不能下载,可尝试更换镜像

pip install apache-superset --trusted-host https://repo.huaweicloud.com -i https://repo.huaweicloud.com/repository/pypi/simple

3)初始化Supetset数据库

superset db upgrade

初始化数据库报错:

Traceback (most recent call last):
  File "/opt/module/miniconda3/envs/superset/bin/superset", line 5, in <module>
    from superset.cli.main import superset
  File "/opt/module/miniconda3/envs/superset/lib/python3.8/site-packages/superset/__init__.py", line 18, in <module>
    from flask import current_app, Flask
  File "/opt/module/miniconda3/envs/superset/lib/python3.8/site-packages/flask/__init__.py", line 14, in <module>
    from jinja2 import escape
  File "/opt/module/miniconda3/envs/superset/lib/python3.8/site-packages/jinja2/__init__.py", line 12, in <module>
    from .environment import Environment
  File "/opt/module/miniconda3/envs/superset/lib/python3.8/site-packages/jinja2/environment.py", line 25, in <module>
    from .defaults import BLOCK_END_STRING
  File "/opt/module/miniconda3/envs/superset/lib/python3.8/site-packages/jinja2/defaults.py", line 3, in <module>
    from .filters import FILTERS as DEFAULT_FILTERS  # noqa: F401
  File "/opt/module/miniconda3/envs/superset/lib/python3.8/site-packages/jinja2/filters.py", line 13, in <module>
    from markupsafe import soft_unicode
ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/opt/module/miniconda3/envs/superset/lib/python3.8/site-packages/markupsafe/__init__.py)

解决方法:

python -m pip install markupsafe==2.0.1

重新初始化报错:

Usage: superset [OPTIONS] COMMAND [ARGS]...

Error: Could not locate a Flask application. You did not provide the "FLASK_APP" environment variable, and a "wsgi.py" or "app.py" module was not found in the current directory.

使用python3.7和python3.8都试过,遇到各种报错,搞不定了,换成python3.6,报错:

Traceback (most recent call last):
  File "/opt/module/miniconda3/envs/superset/bin/superset", line 5, in <module>
    from superset.cli import superset
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/__init__.py", line 21, in <module>
    from superset.app import create_app
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/app.py", line 24, in <module>
    from flask_appbuilder import expose, IndexView
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/flask_appbuilder/__init__.py", line 5, in <module>
    from .api import ModelRestApi  # noqa: F401
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/flask_appbuilder/api/__init__.py", line 21, in <module>
    from .convert import Model2SchemaConverter
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/flask_appbuilder/api/convert.py", line 4, in <module>
    from flask_appbuilder.models.sqla.interface import SQLAInterface
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/flask_appbuilder/models/sqla/interface.py", line 40, in <module>
    from sqlalchemy_utils.types.uuid import UUIDType
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/sqlalchemy_utils/__init__.py", line 1, in <module>
    from .aggregates import aggregated  # noqa
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/sqlalchemy_utils/aggregates.py", line 372, in <module>
    from .functions.orm import get_column_key
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/sqlalchemy_utils/functions/__init__.py", line 1, in <module>
    from .database import (  # noqa
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/sqlalchemy_utils/functions/database.py", line 11, in <module>
    from .orm import quote
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/sqlalchemy_utils/functions/orm.py", line 14, in <module>
    from sqlalchemy.orm.query import _ColumnEntity
ImportError: cannot import name '_ColumnEntity'

sqlalchemy版本问题,我们执行如下命令:

pip install sqlalchemy==1.3.24

再次初始化,报错:

/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/sqlalchemy_utils/types/encrypted/encrypted_type.py:18: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
  import cryptography
Traceback (most recent call last):
  File "/opt/module/miniconda3/envs/superset/bin/superset", line 5, in <module>
    from superset.cli import superset
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/__init__.py", line 21, in <module>
    from superset.app import create_app
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/app.py", line 45, in <module>
    from superset.security import SupersetSecurityManager
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/security/__init__.py", line 17, in <module>
    from superset.security.manager import SupersetSecurityManager  # noqa: F401
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/security/manager.py", line 44, in <module>
    from superset import sql_parse
  File "/opt/module/miniconda3/envs/superset/lib/python3.6/site-packages/superset/sql_parse.py", line 18, in <module>
    from dataclasses import dataclass
ModuleNotFoundError: No module named 'dataclasses'

安装dataclasses依赖

pip install dataclasses

再次初始化,终于成功了,哭了。

4)创建管理员用户

export FLASK_APP=superset
superset fab create-admin

说明:flask是一个python web框架,Superset使用的就是flask

5)Superset初始化

superset init

2.2.3 启动Supterset

1)安装gunicorn

pip install gunicorn -i https://pypi.douban.com/simple/

说明:gunicorn是一个Python Web Server,可以和java中的TomCat类比

2)启动Superset

(1)确保当前conda环境为superset

(2)启动

gunicorn --workers 5 --timeout 120 --bind h102:8787  "superset.app:create_app()" --daemon 

说明:

--workers:指定进程个数
--timeout:worker进程超时时间,超时会自动重启
--bind:绑定本机地址,即为Superset访问地址
--daemon:后台运行

(3)登录Superset

访问http://h102:8787,并使用3.2节中第4步创建的管理员账号进行登录。

3)停止superset

停掉gunicorn进程

ps -ef | awk '/superset/ && !/awk/print $2' | xargs kill -9

退出superset环境

conda deactivate

3.4 superset启停脚本

1)创建superset.sh文件

vim superset.sh

内容如下

#!/bin/bash

superset_status()
    result=`ps -ef | awk '/gunicorn/ && !/awk/print $2' | wc -l`
    if [[ $result -eq 0 ]]; then
        return 0
    else
        return 1
    fi

superset_start()
        source ~/.bashrc
        superset_status >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            conda activate superset ; gunicorn --workers 5 --timeout 120 --bind h102:8787 --daemon 'superset.app:create_app()'
        else
            echo "superset正在运行"
        fi



superset_stop()
    superset_status >/dev/null 2>&1
    if [[ $? -eq 0 ]]; then
        echo "superset未在运行"
    else
        ps -ef | awk '/gunicorn/ && !/awk/print $2' | xargs kill -9
    fi



case $1 in
    start )
        echo "启动Superset"
        superset_start
    ;;
    stop )
        echo "停止Superset"
        superset_stop
    ;;
    restart )
        echo "重启Superset"
        superset_stop
        superset_start
    ;;
    status )
        superset_status >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            echo "superset未在运行"
        else
            echo "superset正在运行"
        fi
esac

2)加执行权限

chmod +x superset.sh

3)测试

启动superset

superset.sh start

停止superset

superset.sh stop

3 Superset使用

3.1 对接mysql数据源

3.1.1 安装依赖

conda install mysqlclient

说明:对接不同的数据源,需安装不同的依赖,以下地址为官网说明

https://superset.apache.org/docs/databases/installing-database-drivers

3.1.2 重启Superset

superset.sh restart

3.1.3 数据源配置

1)Database配置

Step1:点击Data/Databases

Step2:点击+DATABASE

Step3:点击填写Database及SQL Alchemy URI

注:SQL Alchemy URI编写规范:mysql://用户名:密码@主机名:端口号/数据库名称

此处填写:

mysql://root:123456@h103:3310/gmall-report?charset=utf8

Step4:点击Test Connection,出现“Seems OK!”提示即表示连接成功

Step5:点击SAVE

2)Table配置
Step1:点击Data/Datasets

Step2:点击+Datasets

Step3:配置Table

3.2 制作仪表盘

3.2.1 创建空白仪表盘

1)点击Dashboards/+DASHBOARDS

3.2.2 创建图表

1)点击Charts/+CHART

2)选择何使的图表类型

3)创建图表

4)按照说明配置图表

5)点击“Run Query”

6)保存至仪表盘

7)可以在dashboard中查看编辑

4 Superset实战

4.1 制作桑基图

4.1.1 配置Table

4.1.2 配置Chart

4.2 制作地图

5 superset布局

先添加Tabs、行、列,然后将图表拖动到tabs中,图片大小可以按住ctrl键拖动

以上是关于电商数仓superset的主要内容,如果未能解决你的问题,请参考以下文章

数据可视化之-superset

电商数仓数仓理论

电商数仓数仓环境搭建

电商数仓2.0----4.7总结

电商数仓kylin

电商数仓zookeeper