每日分享!一个不错的数据分析实战案例数据分析可视化MySQL+Python

Posted 报告,今天也有好好学习

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了每日分享!一个不错的数据分析实战案例数据分析可视化MySQL+Python相关的知识,希望对你有一定的参考价值。


今日份分享!

项目来源

https://www.kaggle.com/hugomathien/soccer

项目简介

利用SQL和pandas对11支球队在7个赛季中的25979场比赛数据,分析各球队在每个赛季的主客场得分情况以及联赛积分情况。

PS: 项目本身提供的是sqlite文件,为了更贴近现实中的工作场景,我将其中的表都输出成csv文件,再利用Navicat for mysql 导入到MySQL当中。(文末附资源链接)
本次项目同样在jupyter上运行。

导入模块

import pymysql
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei'] # 设置汉字字体,优先使用黑体
plt.rcParams['font.size'] = 12  # 设置字体大小
plt.rcParams['axes.unicode_minus'] = False   # 设置正常显示负号

数据库中的表格:

conn = pymysql.connect(
    host = 'localhost',
    user = 'root',
    password = '',
    db = 'data',
    port = 3306
)
df = pd.read_sql("show tables",conn)
df


国家名单:

countries = pd.read_sql("""SELECT *
                        FROM Country;""", conn)
countries

将country表和league表连接起来

leagues = pd.read_sql("""SELECT *
                        FROM League
                        JOIN Country ON Country.id = League.country_id;""", conn)
leagues

按队名排序中的前十名

teams = pd.read_sql("""SELECT *
                        FROM Team
                        ORDER BY team_long_name
                        LIMIT 10;""", conn)
teams

输出spain主客队比赛的信息

detailed_matches = pd.read_sql("""SELECT Match.id, 
                                        Country.name AS country_name, 
                                        League.name AS league_name, 
                                        season, 
                                        stage, 
                                        date,
                                        HT.team_long_name AS  home_team,
                                        AT.team_long_name AS away_team,
                                        home_team_goal, 
                                        away_team_goal                                        
                                FROM `match`
                                JOIN Country on Country.id = Match.country_id
                                JOIN League on League.id = Match.league_id
                                LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id
                                LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id
                                WHERE country.name = 'Spain'
                                ORDER by date
                                LIMIT 10;""", conn)
detailed_matches

统计各个国家的各个联赛的各个赛季中stage大于10的球队主客队平均得分,主客队平均分之和与差,以及总和

leages_by_season = pd.read_sql("""SELECT Country.name AS country_name, 
                                        League.name AS league_name, 
                                        season,
                                        count(distinct stage) AS number_of_stages,
                                        count(distinct HT.team_long_name) AS number_of_teams,
                                        avg(home_team_goal) AS avg_home_team_scors, 
                                        avg(away_team_goal) AS avg_away_team_goals, 
                                        avg(home_team_goal-away_team_goal) AS avg_goal_dif, 
                                        avg(home_team_goal+away_team_goal) AS avg_goals, 
                                        sum(home_team_goal+away_team_goal) AS total_goals                                       
                                FROM `match`
                                JOIN Country on Country.id = Match.country_id
                                JOIN League on League.id = Match.league_id
                                LEFT JOIN Team AS HT on HT.team_api_id = Match.home_team_api_id
                                LEFT JOIN Team AS AT on AT.team_api_id = Match.away_team_api_id
                                WHERE country.name in ('Spain', 'Germany', 'France', 'Italy', 'England')
                                GROUP BY Country.name, League.name, season
                                HAVING count(distinct stage) > 10
                                ORDER BY Country.name, League.name, season DESC
                                ;""", conn)
leages_by_season.head(10)

列出几支球队的各赛季平均得分趋势图

df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique())

df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goals'])
df.loc[:,'Spain']   = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goals'])
df.loc[:,'France']   = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goals'])
df.loc[:,'Italy']   = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goals'])
df.loc[:,'England']   = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goals'])

df.plot(figsize=(12,5),title='各赛季平均得分趋势图')

列出几支球队的各赛季平均主客队分差趋势图

df = pd.DataFrame(index=np.sort(leages_by_season['season'].unique()), columns=leages_by_season['country_name'].unique())

df.loc[:,'Germany'] = list(leages_by_season.loc[leages_by_season['country_name']=='Germany','avg_goal_dif'])
df.loc[:,'Spain']   = list(leages_by_season.loc[leages_by_season['country_name']=='Spain','avg_goal_dif'])
df.loc[:,'France']   = list(leages_by_season.loc[leages_by_season['country_name']=='France','avg_goal_dif'])
df.loc[:,'Italy']   = list(leages_by_season.loc[leages_by_season['country_name']=='Italy','avg_goal_dif'])
df.loc[:,'England']   = list(leages_by_season.loc[leages_by_season['country_name']=='England','avg_goal_dif'])

df.plot(figsize=(12,5),title='各赛季平均主客队分差趋势图')

求各赛季各球队的积分(赢球得3分,平局得1分,输球不得分)

team_season_score = pd.read_sql("""SELECT 
		m.season,
    t.team_long_name,
		t.team_short_name,	
        t.team_api_id,
    COUNT(*) AS 'matches_played', 
    SUM(
        CASE 
            WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal>m.away_team_goal) OR
                (t.team_api_id=m.away_team_api_id AND m.home_team_goal<m.away_team_goal) THEN 3
            WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal=m.away_team_goal) OR
                (t.team_api_id=m.away_team_api_id AND m.home_team_goal=m.away_team_goal) THEN 1
            WHEN (t.team_api_id=m.home_team_api_id AND m.home_team_goal<m.away_team_goal) OR
                (t.team_api_id=m.away_team_api_id AND m.home_team_goal>m.away_team_goal) THEN 0
        END
    ) AS 'points', 
    SUM(IF(t.team_api_id=m.home_team_api_id, m.home_team_goal, m.away_team_goal)) AS 'goal_for', 
    SUM(IF(t.team_api_id=m.home_team_api_id, m.away_team_goal, m.home_team_goal)) AS 'goal_against', 
    SUM(IF(t.team_api_id=m.home_team_api_id, m.home_team_goal, m.away_team_goal))-SUM(IF(t.team_api_id=m.home_team_api_id, m.away_team_goal, m.home_team_goal)) AS 'goal_diff'
FROM Team t JOIN `match` m ON t.team_api_id=m.home_team_api_id OR t.team_api_id=m.away_team_api_id
WHERE t.team_api_id in (10260, 8634, 8650, 9904, 8636)
GROUP BY team_api_id, season
ORDER BY season, points DESC, goal_diff DESC, team_long_name;""", conn)
team_season_score.head(10)

列出几支球队的各赛季积分趋势图

df = pd.DataFrame(index=np.sort(team_season_score['season'].unique()), columns=team_season_score['team_short_name'].unique())

df.loc[:,'MUN'] = list(team_season_score.loc[team_season_score['team_short_name']=='MUN','points'])
df.loc[:,'BAR'] = list(team_season_score.loc[team_season_score['team_short_name']=='BAR','points'])
df.loc[:,'LIV'] = list(team_season_score.loc[team_season_score['team_short_name']=='LIV','points'])
df.loc[:,'HAN'] = list(team_season_score.loc[team_season_score['team_short_name']=='HAN','points'])
df.loc[:,'INT'] = list(team_season_score.loc[team_season_score['team_short_name']=='INT','points'])

df.plot(figsize=(12,5),title='各赛季积分趋势图')

结束语

此次项目的数据提供了不少数据,就连球员的EA SPORT游戏中的数据都有,不过这个项目我主要也是拿来练手,并没有刻意去将全部的数据都拿来分析,大概感兴趣的话也可以自己尝试一下。
我把最初kaggle提供的sqlite文件,后面我自己输出的csv文件和sql文件以及本次完整代码都放到了网盘上,需要的朋友自行下载:
链接:https://pan.baidu.com/s/1CbrdX0q18W9v-PgNyZfRPg
提取码:1024


推荐关注的专栏

👨‍👩‍👦‍👦 机器学习:分享机器学习实战项目和常用模型讲解
👨‍👩‍👦‍👦 数据分析:分享数据分析实战项目和常用技能整理


CSDN@报告,今天也有好好学习

以上是关于每日分享!一个不错的数据分析实战案例数据分析可视化MySQL+Python的主要内容,如果未能解决你的问题,请参考以下文章

真香啊,5个小而美的 Pandas 实战案例

案例分享——健康管理APP(全套解决方案:蓝牙数采SQLite数据存储数据分析与服务器HTTP通信等可视化图形界面数据展示)

案例分享——健康管理APP(全套解决方案:蓝牙数采SQLite数据存储数据分析与服务器HTTP通信等可视化图形界面数据展示)

数据可视化之powerBI技巧基于SQL思维的PowerBI DAX实战

如何创造出优秀的数据可视化作品?这些“实战经验”请收好

Python库Pandas数据可视化实战案例