text 数据分析师/体验课/市委书记养成记.ipynb

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了text 数据分析师/体验课/市委书记养成记.ipynb相关的知识,希望对你有一定的参考价值。

{
  "cells": [
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 导包"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:43.784481Z",
          "start_time": "2018-12-22T08:28:43.112139Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "import pandas as pd  # 导入数据分析工具包\nimport numpy as np  # 导入科学计算工具包\nimport matplotlib.pyplot as plt  # 导入图表绘制工具包\nimport seaborn as sns  # 导入可视化工具包",
      "execution_count": 1,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 读取数据"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:43.814002Z",
          "start_time": "2018-12-22T08:28:43.786085Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "df = pd.read_csv(\"地市级党委书记数据库(2000-10).csv\", encoding=\"gbk\")",
      "execution_count": 2,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 取前5行"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:43.848844Z",
          "start_time": "2018-12-22T08:28:43.815554Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.head()",
      "execution_count": 3,
      "outputs": [
        {
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>省级政区代码</th>\n      <th>省级政区名称</th>\n      <th>地市级政区代码</th>\n      <th>地市级政区名称</th>\n      <th>年份</th>\n      <th>党委书记姓名</th>\n      <th>出生年份</th>\n      <th>出生月份</th>\n      <th>籍贯省份代码</th>\n      <th>籍贯省份名称</th>\n      <th>...</th>\n      <th>民族</th>\n      <th>教育</th>\n      <th>是否是党校教育(是=1,否=0)</th>\n      <th>专业:人文</th>\n      <th>专业:社科</th>\n      <th>专业:理工</th>\n      <th>专业:农科</th>\n      <th>专业:医科</th>\n      <th>入党年份</th>\n      <th>工作年份</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130100</td>\n      <td>石家庄市</td>\n      <td>2000</td>\n      <td>陈来立</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>硕士</td>\n      <td>1.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130100</td>\n      <td>石家庄市</td>\n      <td>2001</td>\n      <td>吴振华</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130100</td>\n      <td>石家庄市</td>\n      <td>2002</td>\n      <td>吴振华</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130100</td>\n      <td>石家庄市</td>\n      <td>2003</td>\n      <td>吴振华</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130100</td>\n      <td>石家庄市</td>\n      <td>2004</td>\n      <td>吴振华</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n<p>5 rows × 23 columns</p>\n</div>",
            "text/plain": "   省级政区代码 省级政区名称  地市级政区代码 地市级政区名称    年份 党委书记姓名  出生年份  出生月份  籍贯省份代码 籍贯省份名称  \\\n0  130000    河北省   130100    石家庄市  2000    陈来立   NaN   NaN     NaN    NaN   \n1  130000    河北省   130100    石家庄市  2001    吴振华   NaN   NaN     NaN    NaN   \n2  130000    河北省   130100    石家庄市  2002    吴振华   NaN   NaN     NaN    NaN   \n3  130000    河北省   130100    石家庄市  2003    吴振华   NaN   NaN     NaN    NaN   \n4  130000    河北省   130100    石家庄市  2004    吴振华   NaN   NaN     NaN    NaN   \n\n   ...    民族  教育 是否是党校教育(是=1,否=0) 专业:人文 专业:社科  专业:理工  专业:农科  专业:医科  入党年份  工作年份  \n0  ...   NaN  硕士              1.0   NaN   NaN    NaN    NaN    NaN   NaN   NaN  \n1  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   NaN  \n2  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   NaN  \n3  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   NaN  \n4  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   NaN  \n\n[5 rows x 23 columns]"
          },
          "execution_count": 3,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 取指定行"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:43.880141Z",
          "start_time": "2018-12-22T08:28:43.850338Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "df[10:20]",
      "execution_count": 4,
      "outputs": [
        {
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>省级政区代码</th>\n      <th>省级政区名称</th>\n      <th>地市级政区代码</th>\n      <th>地市级政区名称</th>\n      <th>年份</th>\n      <th>党委书记姓名</th>\n      <th>出生年份</th>\n      <th>出生月份</th>\n      <th>籍贯省份代码</th>\n      <th>籍贯省份名称</th>\n      <th>...</th>\n      <th>民族</th>\n      <th>教育</th>\n      <th>是否是党校教育(是=1,否=0)</th>\n      <th>专业:人文</th>\n      <th>专业:社科</th>\n      <th>专业:理工</th>\n      <th>专业:农科</th>\n      <th>专业:医科</th>\n      <th>入党年份</th>\n      <th>工作年份</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>10</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130100</td>\n      <td>石家庄市</td>\n      <td>2010</td>\n      <td>孙瑞彬</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>硕士</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2000</td>\n      <td>白润璋</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>12</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2001</td>\n      <td>白润璋</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>13</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2002</td>\n      <td>白润璋</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>14</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2003</td>\n      <td>张和</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>15</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2004</td>\n      <td>张和</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>16</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2005</td>\n      <td>张和</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>17</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2006</td>\n      <td>张和</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>本科</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>18</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2007</td>\n      <td>赵勇</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>博士</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n    <tr>\n      <th>19</th>\n      <td>130000</td>\n      <td>河北省</td>\n      <td>130200</td>\n      <td>唐山市</td>\n      <td>2008</td>\n      <td>赵勇</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>NaN</td>\n      <td>...</td>\n      <td>NaN</td>\n      <td>博士</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>1.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>0.0</td>\n      <td>NaN</td>\n      <td>NaN</td>\n    </tr>\n  </tbody>\n</table>\n<p>10 rows × 23 columns</p>\n</div>",
            "text/plain": "    省级政区代码 省级政区名称  地市级政区代码 地市级政区名称    年份 党委书记姓名  出生年份  出生月份  籍贯省份代码 籍贯省份名称  \\\n10  130000    河北省   130100    石家庄市  2010    孙瑞彬   NaN   NaN     NaN    NaN   \n11  130000    河北省   130200     唐山市  2000    白润璋   NaN   NaN     NaN    NaN   \n12  130000    河北省   130200     唐山市  2001    白润璋   NaN   NaN     NaN    NaN   \n13  130000    河北省   130200     唐山市  2002    白润璋   NaN   NaN     NaN    NaN   \n14  130000    河北省   130200     唐山市  2003     张和   NaN   NaN     NaN    NaN   \n15  130000    河北省   130200     唐山市  2004     张和   NaN   NaN     NaN    NaN   \n16  130000    河北省   130200     唐山市  2005     张和   NaN   NaN     NaN    NaN   \n17  130000    河北省   130200     唐山市  2006     张和   NaN   NaN     NaN    NaN   \n18  130000    河北省   130200     唐山市  2007     赵勇   NaN   NaN     NaN    NaN   \n19  130000    河北省   130200     唐山市  2008     赵勇   NaN   NaN     NaN    NaN   \n\n    ...    民族  教育 是否是党校教育(是=1,否=0) 专业:人文 专业:社科  专业:理工  专业:农科  专业:医科  入党年份  \\\n10  ...   NaN  硕士              1.0   0.0   1.0    0.0    0.0    0.0   NaN   \n11  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   \n12  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   \n13  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   \n14  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   \n15  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   \n16  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   \n17  ...   NaN  本科              0.0   0.0   0.0    1.0    0.0    0.0   NaN   \n18  ...   NaN  博士              0.0   0.0   1.0    0.0    0.0    0.0   NaN   \n19  ...   NaN  博士              0.0   0.0   1.0    0.0    0.0    0.0   NaN   \n\n    工作年份  \n10   NaN  \n11   NaN  \n12   NaN  \n13   NaN  \n14   NaN  \n15   NaN  \n16   NaN  \n17   NaN  \n18   NaN  \n19   NaN  \n\n[10 rows x 23 columns]"
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 获取所有列名"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:43.893834Z",
          "start_time": "2018-12-22T08:28:43.882316Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.columns.tolist()",
      "execution_count": 5,
      "outputs": [
        {
          "data": {
            "text/plain": "['省级政区代码',\n '省级政区名称',\n '地市级政区代码',\n '地市级政区名称',\n '年份',\n '党委书记姓名',\n '出生年份',\n '出生月份',\n '籍贯省份代码',\n '籍贯省份名称',\n '籍贯地市代码',\n '籍贯地市名称',\n '性别',\n '民族',\n '教育',\n '是否是党校教育(是=1,否=0)',\n '专业:人文',\n '专业:社科',\n '专业:理工',\n '专业:农科',\n '专业:医科',\n '入党年份',\n '工作年份']"
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 取指定列前10行值"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:43.904829Z",
          "start_time": "2018-12-22T08:28:43.895282Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "df[\"党委书记姓名\"].head(10)",
      "execution_count": 6,
      "outputs": [
        {
          "data": {
            "text/plain": "0    陈来立\n1    吴振华\n2    吴振华\n3    吴振华\n4    吴振华\n5    吴振华\n6    吴振华\n7    吴显国\n8    吴显国\n9     车俊\nName: 党委书记姓名, dtype: object"
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "###  返回基本数据信息\n- `include=[np.number]`: 只统计数值类型,包括:计数,平均数,方差,最小值,四分位数(25%,50%,75%),最大值\n- `include=[np.object]`: 只统计字符串类型,包括:计数,唯一值数量,出现频率最高的内容,最高出现频率"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:43.965838Z",
          "start_time": "2018-12-22T08:28:43.906196Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.describe(include=[np.number])",
      "execution_count": 7,
      "outputs": [
        {
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>省级政区代码</th>\n      <th>地市级政区代码</th>\n      <th>年份</th>\n      <th>出生年份</th>\n      <th>出生月份</th>\n      <th>籍贯省份代码</th>\n      <th>籍贯地市代码</th>\n      <th>是否是党校教育(是=1,否=0)</th>\n      <th>专业:人文</th>\n      <th>专业:社科</th>\n      <th>专业:理工</th>\n      <th>专业:农科</th>\n      <th>专业:医科</th>\n      <th>入党年份</th>\n      <th>工作年份</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>count</th>\n      <td>3663.000000</td>\n      <td>3663.000000</td>\n      <td>3663.000000</td>\n      <td>2676.000000</td>\n      <td>2645.000000</td>\n      <td>2624.000000</td>\n      <td>2615.000000</td>\n      <td>2493.000000</td>\n      <td>2370.000000</td>\n      <td>2376.000000</td>\n      <td>2371.000000</td>\n      <td>2369.000000</td>\n      <td>2370.000000</td>\n      <td>2384.000000</td>\n      <td>2568.000000</td>\n    </tr>\n    <tr>\n      <th>mean</th>\n      <td>403393.393393</td>\n      <td>404456.756757</td>\n      <td>2005.000000</td>\n      <td>1953.622571</td>\n      <td>6.790548</td>\n      <td>364428.353659</td>\n      <td>365742.332696</td>\n      <td>0.430405</td>\n      <td>0.275527</td>\n      <td>0.627525</td>\n      <td>0.256854</td>\n      <td>0.067539</td>\n      <td>0.009705</td>\n      <td>1976.906879</td>\n      <td>1973.129673</td>\n    </tr>\n    <tr>\n      <th>std</th>\n      <td>148176.721620</td>\n      <td>148485.810327</td>\n      <td>3.162709</td>\n      <td>4.416316</td>\n      <td>3.614664</td>\n      <td>126267.485520</td>\n      <td>125961.993399</td>\n      <td>0.576136</td>\n      <td>0.446874</td>\n      <td>0.483566</td>\n      <td>0.436990</td>\n      <td>0.251006</td>\n      <td>0.098054</td>\n      <td>5.310080</td>\n      <td>4.856564</td>\n    </tr>\n    <tr>\n      <th>min</th>\n      <td>130000.000000</td>\n      <td>130100.000000</td>\n      <td>2000.000000</td>\n      <td>1941.000000</td>\n      <td>1.000000</td>\n      <td>110000.000000</td>\n      <td>120000.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>1961.000000</td>\n      <td>1958.000000</td>\n    </tr>\n    <tr>\n      <th>25%</th>\n      <td>330000.000000</td>\n      <td>330100.000000</td>\n      <td>2002.000000</td>\n      <td>1951.000000</td>\n      <td>3.000000</td>\n      <td>320000.000000</td>\n      <td>320700.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>1973.000000</td>\n      <td>1970.000000</td>\n    </tr>\n    <tr>\n      <th>50%</th>\n      <td>420000.000000</td>\n      <td>420200.000000</td>\n      <td>2005.000000</td>\n      <td>1954.000000</td>\n      <td>7.000000</td>\n      <td>370000.000000</td>\n      <td>370700.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>1.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>1976.000000</td>\n      <td>1972.500000</td>\n    </tr>\n    <tr>\n      <th>75%</th>\n      <td>510000.000000</td>\n      <td>513400.000000</td>\n      <td>2008.000000</td>\n      <td>1956.000000</td>\n      <td>10.000000</td>\n      <td>430000.000000</td>\n      <td>431300.000000</td>\n      <td>1.000000</td>\n      <td>1.000000</td>\n      <td>1.000000</td>\n      <td>1.000000</td>\n      <td>0.000000</td>\n      <td>0.000000</td>\n      <td>1981.000000</td>\n      <td>1976.000000</td>\n    </tr>\n    <tr>\n      <th>max</th>\n      <td>650000.000000</td>\n      <td>654300.000000</td>\n      <td>2010.000000</td>\n      <td>1966.000000</td>\n      <td>14.000000</td>\n      <td>640000.000000</td>\n      <td>640500.000000</td>\n      <td>9.000000</td>\n      <td>1.000000</td>\n      <td>1.000000</td>\n      <td>1.000000</td>\n      <td>1.000000</td>\n      <td>1.000000</td>\n      <td>1994.000000</td>\n      <td>1990.000000</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "              省级政区代码        地市级政区代码           年份         出生年份         出生月份  \\\ncount    3663.000000    3663.000000  3663.000000  2676.000000  2645.000000   \nmean   403393.393393  404456.756757  2005.000000  1953.622571     6.790548   \nstd    148176.721620  148485.810327     3.162709     4.416316     3.614664   \nmin    130000.000000  130100.000000  2000.000000  1941.000000     1.000000   \n25%    330000.000000  330100.000000  2002.000000  1951.000000     3.000000   \n50%    420000.000000  420200.000000  2005.000000  1954.000000     7.000000   \n75%    510000.000000  513400.000000  2008.000000  1956.000000    10.000000   \nmax    650000.000000  654300.000000  2010.000000  1966.000000    14.000000   \n\n              籍贯省份代码         籍贯地市代码  是否是党校教育(是=1,否=0)        专业:人文  \\\ncount    2624.000000    2615.000000       2493.000000  2370.000000   \nmean   364428.353659  365742.332696          0.430405     0.275527   \nstd    126267.485520  125961.993399          0.576136     0.446874   \nmin    110000.000000  120000.000000          0.000000     0.000000   \n25%    320000.000000  320700.000000          0.000000     0.000000   \n50%    370000.000000  370700.000000          0.000000     0.000000   \n75%    430000.000000  431300.000000          1.000000     1.000000   \nmax    640000.000000  640500.000000          9.000000     1.000000   \n\n             专业:社科        专业:理工        专业:农科        专业:医科         入党年份  \\\ncount  2376.000000  2371.000000  2369.000000  2370.000000  2384.000000   \nmean      0.627525     0.256854     0.067539     0.009705  1976.906879   \nstd       0.483566     0.436990     0.251006     0.098054     5.310080   \nmin       0.000000     0.000000     0.000000     0.000000  1961.000000   \n25%       0.000000     0.000000     0.000000     0.000000  1973.000000   \n50%       1.000000     0.000000     0.000000     0.000000  1976.000000   \n75%       1.000000     1.000000     0.000000     0.000000  1981.000000   \nmax       1.000000     1.000000     1.000000     1.000000  1994.000000   \n\n              工作年份  \ncount  2568.000000  \nmean   1973.129673  \nstd       4.856564  \nmin    1958.000000  \n25%    1970.000000  \n50%    1972.500000  \n75%    1976.000000  \nmax    1990.000000  "
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:44.011786Z",
          "start_time": "2018-12-22T08:28:43.967523Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "df.describe(include=[np.object])",
      "execution_count": 8,
      "outputs": [
        {
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>省级政区名称</th>\n      <th>地市级政区名称</th>\n      <th>党委书记姓名</th>\n      <th>籍贯省份名称</th>\n      <th>籍贯地市名称</th>\n      <th>性别</th>\n      <th>民族</th>\n      <th>教育</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>count</th>\n      <td>3663</td>\n      <td>3663</td>\n      <td>3021</td>\n      <td>2624</td>\n      <td>2615</td>\n      <td>2708</td>\n      <td>2517</td>\n      <td>2550</td>\n    </tr>\n    <tr>\n      <th>unique</th>\n      <td>27</td>\n      <td>333</td>\n      <td>901</td>\n      <td>29</td>\n      <td>240</td>\n      <td>2</td>\n      <td>2</td>\n      <td>7</td>\n    </tr>\n    <tr>\n      <th>top</th>\n      <td>四川省</td>\n      <td>张掖市</td>\n      <td>焉荣竹</td>\n      <td>山东省</td>\n      <td>威海市</td>\n      <td>男</td>\n      <td>汉族</td>\n      <td>硕士</td>\n    </tr>\n    <tr>\n      <th>freq</th>\n      <td>231</td>\n      <td>11</td>\n      <td>11</td>\n      <td>313</td>\n      <td>58</td>\n      <td>2633</td>\n      <td>2351</td>\n      <td>1381</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "       省级政区名称 地市级政区名称 党委书记姓名 籍贯省份名称 籍贯地市名称    性别    民族    教育\ncount    3663    3663   3021   2624   2615  2708  2517  2550\nunique     27     333    901     29    240     2     2     7\ntop       四川省     张掖市    焉荣竹    山东省    威海市     男    汉族    硕士\nfreq      231      11     11    313     58  2633  2351  1381"
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 按性别分析占比"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:44.028176Z",
          "start_time": "2018-12-22T08:28:44.013857Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "gender = df['性别']\ngd = gender[gender.notnull()]  # .notnull()表示取得所有非空内容\ncount = gd.count()\nrate_m = \"{:.2f}\".format(gd[gd == \"男\"].count() * 100 / count)\nrate_w = \"{:.2f}\".format(gd[gd == \"女\"].count() * 100 / count)\nprint(gd.head(), \"\\n\\n\", gd.unique(), \"\\n\\n\",\n      rate_m, rate_w)  # .unique()表示显示数据的唯一值内容",
      "execution_count": 9,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "121    男\n122    男\n123    男\n124    男\n125    男\nName: 性别, dtype: object \n\n ['男' '女'] \n\n 97.23 2.77\n"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 按省份、性别分析占比"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:44.055894Z",
          "start_time": "2018-12-22T08:28:44.029829Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "prov_gender = df[[\"省级政区名称\", \"性别\"]].dropna()\n# .crosstab(行,列)用于针对字符串数据的透视(类似excel的数据透视)\npg = pd.crosstab(prov_gender[\"省级政区名称\"], prov_gender[\"性别\"])\npg.head()",
      "execution_count": 10,
      "outputs": [
        {
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>性别</th>\n      <th>女</th>\n      <th>男</th>\n    </tr>\n    <tr>\n      <th>省级政区名称</th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>云南省</th>\n      <td>2</td>\n      <td>73</td>\n    </tr>\n    <tr>\n      <th>内蒙古自治区</th>\n      <td>0</td>\n      <td>86</td>\n    </tr>\n    <tr>\n      <th>吉林省</th>\n      <td>4</td>\n      <td>72</td>\n    </tr>\n    <tr>\n      <th>四川省</th>\n      <td>8</td>\n      <td>155</td>\n    </tr>\n    <tr>\n      <th>宁夏回族自治区</th>\n      <td>0</td>\n      <td>49</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "性别       女    男\n省级政区名称         \n云南省      2   73\n内蒙古自治区   0   86\n吉林省      4   72\n四川省      8  155\n宁夏回族自治区  0   49"
          },
          "execution_count": 10,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:44.076762Z",
          "start_time": "2018-12-22T08:28:44.057435Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "pg[\"女性占比\"] = pg[\"女\"] / (pg[\"男\"] + pg[\"女\"])\n# .sort_values()排序,ascending = False表示降序\npg = pg.sort_values(by=[\"女性占比\"], ascending=False)\npg.head()",
      "execution_count": 11,
      "outputs": [
        {
          "data": {
            "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th>性别</th>\n      <th>女</th>\n      <th>男</th>\n      <th>女性占比</th>\n    </tr>\n    <tr>\n      <th>省级政区名称</th>\n      <th></th>\n      <th></th>\n      <th></th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>辽宁省</th>\n      <td>13</td>\n      <td>121</td>\n      <td>0.097015</td>\n    </tr>\n    <tr>\n      <th>陕西省</th>\n      <td>9</td>\n      <td>93</td>\n      <td>0.088235</td>\n    </tr>\n    <tr>\n      <th>吉林省</th>\n      <td>4</td>\n      <td>72</td>\n      <td>0.052632</td>\n    </tr>\n    <tr>\n      <th>山西省</th>\n      <td>6</td>\n      <td>112</td>\n      <td>0.050847</td>\n    </tr>\n    <tr>\n      <th>四川省</th>\n      <td>8</td>\n      <td>155</td>\n      <td>0.049080</td>\n    </tr>\n  </tbody>\n</table>\n</div>",
            "text/plain": "性别       女    男      女性占比\n省级政区名称                   \n辽宁省     13  121  0.097015\n陕西省      9   93  0.088235\n吉林省      4   72  0.052632\n山西省      6  112  0.050847\n四川省      8  155  0.049080"
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 图表"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### 不同省份女性市委书记占比"
    },
    {
      "metadata": {
        "ExecuteTime": {
          "end_time": "2018-12-22T08:28:44.094962Z",
          "start_time": "2018-12-22T08:28:44.078329Z"
        },
        "trusted": true
      },
      "cell_type": "code",
      "source": "fig_1 = plt.figure(figsize=(8, 4))\nfig_1.show()",
      "execution_count": 12,
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": "/root/anaconda3/lib/python3.7/site-packages/matplotlib/figure.py:457: UserWarning: matplotlib is currently using a non-GUI backend, so cannot show the figure\n  \"matplotlib is currently using a non-GUI backend, \"\n"
        },
        {
          "data": {
            "text/plain": "<Figure size 576x288 with 0 Axes>"
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ]
    },
    {
      "metadata": {
        "trusted": true
      },
      "cell_type": "code",
      "source": "",
      "execution_count": null,
      "outputs": []
    }
  ],
  "metadata": {
    "gist": {
      "id": "773548b4bc39feb1d8100edf8e85e366",
      "data": {
        "description": "数据分析师/体验课/市委书记养成记.ipynb",
        "public": true
      }
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3",
      "language": "python"
    },
    "language_info": {
      "name": "python",
      "version": "3.7.0",
      "mimetype": "text/x-python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "pygments_lexer": "ipython3",
      "nbconvert_exporter": "python",
      "file_extension": ".py"
    },
    "toc": {
      "nav_menu": {},
      "number_sections": true,
      "sideBar": true,
      "skip_h1_title": false,
      "base_numbering": 1,
      "title_cell": "Table of Contents",
      "title_sidebar": "Contents",
      "toc_cell": false,
      "toc_position": {},
      "toc_section_display": true,
      "toc_window_display": true
    },
    "varInspector": {
      "window_display": true,
      "cols": {
        "lenName": 16,
        "lenType": 16,
        "lenVar": 40
      },
      "kernels_config": {
        "python": {
          "library": "var_list.py",
          "delete_cmd_prefix": "del ",
          "delete_cmd_postfix": "",
          "varRefreshCmd": "print(var_dic_list())"
        },
        "r": {
          "library": "var_list.r",
          "delete_cmd_prefix": "rm(",
          "delete_cmd_postfix": ") ",
          "varRefreshCmd": "cat(var_dic_list()) "
        }
      },
      "types_to_exclude": [
        "module",
        "function",
        "builtin_function_or_method",
        "instance",
        "_Feature"
      ],
      "position": {
        "height": "836px",
        "left": "1565px",
        "right": "20px",
        "top": "144px",
        "width": "537px"
      }
    },
    "_draft": {
      "nbviewer_url": "https://gist.github.com/773548b4bc39feb1d8100edf8e85e366"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}

以上是关于text 数据分析师/体验课/市委书记养成记.ipynb的主要内容,如果未能解决你的问题,请参考以下文章

厦门市委书记走访图扑等多家软件企业调研元宇宙产业发展情况

市委网信办市大数据管理中心召开落实全面从严治党工作中期推动会

年薪50万的大数据分析师养成记

年薪50万的大数据分析师养成记摘抄

年薪50万的大数据分析师养成记摘抄

文艺青年阿文的超级IP养成记|书乐寓言①