python应用篇之数据可视化——总结
Posted 一计之长
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python应用篇之数据可视化——总结相关的知识,希望对你有一定的参考价值。
前言
我们通过七篇文章给大家大致介绍了数据可视化的制作过程,当然这个项目也是来自Eric Matthes编著的《Python编程从入门到实践书中项目。不过,本人是通过一定的特色,通过我学习项目的方式来给大家介绍这个项目。从环境搭建到后面一步步的实现。当然,随着项目的不断深入,代码的量越来越大,为了方便大家的阅读,我们只是将实现功能对应的代码方法进行了书写。今天,我们给出大家该项目的完整代码,给大家一个完整的效果。不过还是强烈读者从这个项目的开始阅读。这样,相信会对你获益匪浅。如果只是简单的将本文中的所有代码粘贴一遍,没有任何的用,可能你连这个项目的整个框架都不清楚。这里需要说明的是:由于本项目是数据的可视化,与上一个项目——外星人入侵还是有一定的区别的。外星人入侵是一个模块实现项目的一小个部分,它是依附于项目的全部代码才能跑起来,比如说我们前面介绍的武装飞船,我们光有这一块代码是跑不起来。但是我们的数据可视化是不一样的,它每个模块之间是相互独立的,没有必然的联系,耦合性是极低的,我们主要是给大家介绍数据的获取以及API的具体使用,将我们已有的数据进行分析,教大家如何制作一些漂亮的图表。
项目概括
数据可视化指的是通过可视化表示来探索数据,它与数据挖掘紧密相关,其实准确地说,它是数据挖掘、人工智能地其中一个环节,而数据挖掘指的是使用代码来探索数据集的规律和关联。数据集可以是用一行代码就能表示的小型数字列表,也可以是比较直观地图片。具体效果如下:
漂亮地呈现数据关乎的并非仅仅是漂亮的图片。以引人注目的简洁方式呈现数据,让用户很清晰、直观地明白数据背后所呈现的含义,从而更好地把控其中的规律。本项目首先给大家介绍的就是解决数据的问题,因为数据可视化的大前提是我们首先得有数据才行。主要通过三篇文章给大家介绍生成数据,即在没有数据的情况下,我们应该生成一些数据供我们分析;通过两篇文章介绍下载数据;有了数据之后,最后就是通过两篇文章给大家介绍API的具体使用与分析。
不过本文只是给大家介绍的是一些小的方法,比如柱状图、折线图怎么画,读者要想学这方面的知识,网上教程一大堆,大家可以去学习一下子,比较简单,应用还挺多,性价比还挺好的。接下来给大家介绍本次项目中的所有代码。方便大家整体上去参考。
完整代码
1、dice_visual.py
的实现
import pygal
from die import Die
# Create two D6 dice.
die_1 = Die()
die_2 = Die()
# Make some rolls, and store results in a list.
results = []
for roll_num in range(1000):
result = die_1.roll() + die_2.roll()
results.append(result)
# Analyze the results.
frequencies = []
max_result = die_1.num_sides + die_2.num_sides
for value in range(2, max_result+1):
frequency = results.count(value)
frequencies.append(frequency)
# Visualize the results.
hist = pygal.Bar()
hist.force_uri_protocol = http
hist.title = "Results of rolling two D6 dice 1000 times."
hist.x_labels = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
hist.x_title = "Result"
hist.y_title = "Frequency of Result"
hist.add(D6 + D6, frequencies)
hist.render_to_file(dice_visual.svg)
2、die.py
的实现
from random import randint
class Die():
"""A class representing a single die."""
def __init__(self, num_sides=6):
"""Assume a six-sided die."""
self.num_sides = num_sides
def roll(self):
""""Return a random value between 1 and number of sides."""
return randint(1, self.num_sides)
3、die_visual.py
的实现
import pygal
from die import Die
# Create a D6.
die = Die()
# Make some rolls, and store results in a list.
results = []
for roll_num in range(1000):
result = die.roll()
results.append(result)
# Analyze the results.
frequencies = []
for value in range(1, die.num_sides+1):
frequency = results.count(value)
frequencies.append(frequency)
# Visualize the results.
hist = pygal.Bar()
hist.force_uri_protocol = http
hist.title = "Results of rolling one D6 1000 times."
hist.x_labels = [1, 2, 3, 4, 5, 6]
hist.x_title = "Result"
hist.y_title = "Frequency of Result"
hist.add(D6, frequencies)
hist.render_to_file(die_visual.svg)
4、different_dice.py
的实现
from die import Die
import pygal
# Create a D6 and a D10.
die_1 = Die()
die_2 = Die(10)
# Make some rolls, and store results in a list.
results = []
for roll_num in range(50000):
result = die_1.roll() + die_2.roll()
results.append(result)
# Analyze the results.
frequencies = []
max_result = die_1.num_sides + die_2.num_sides
for value in range(2, max_result+1):
frequency = results.count(value)
frequencies.append(frequency)
# Visualize the results.
hist = pygal.Bar()
hist.force_uri_protocol = http
hist.title = "Results of rolling a D6 and a D10 50,000 times."
hist.x_labels = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16]
hist.x_title = "Result"
hist.y_title = "Frequency of Result"
hist.add(D10 + D10, frequencies)
hist.render_to_file(dice_visual.svg)
5、mpl_squares.py
的实现
import matplotlib.pyplot as plt
input_values = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]
plt.plot(input_values, squares, linewidth=5)
# Set chart title and label axes.
plt.title("Square Numbers", fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)
# Set size of tick labels.
plt.tick_params(axis=both, labelsize=14)
plt.show()
6、random_walk.py
的实现
from random import choice
class RandomWalk():
"""A class to generate random walks."""
def __init__(self, num_points=5000):
"""Initialize attributes of a walk."""
self.num_points = num_points
# All walks start at (0, 0).
self.x_values = [0]
self.y_values = [0]
def fill_walk(self):
"""Calculate all the points in the walk."""
# Keep taking steps until the walk reaches the desired length.
while len(self.x_values) < self.num_points:
# Decide which direction to go, and how far to go in that direction.
x_direction = choice([1, -1])
x_distance = choice([0, 1, 2, 3, 4])
x_step = x_direction * x_distance
y_direction = choice([1, -1])
y_distance = choice([0, 1, 2, 3, 4])
y_step = y_direction * y_distance
# Reject moves that go nowhere.
if x_step == 0 and y_step == 0:
continue
# Calculate the next x and y values.
next_x = self.x_values[-1] + x_step
next_y = self.y_values[-1] + y_step
self.x_values.append(next_x)
self.y_values.append(next_y)
7、rw_visual.py
的实现
import matplotlib.pyplot as plt
from random_walk import RandomWalk
# Keep making new walks, as long as the program is active.
while True:
# Make a random walk, and plot the points.
rw = RandomWalk(50000)
rw.fill_walk()
# Set the size of the plotting window.
plt.figure(dpi=128, figsize=(10, 6))
point_numbers = list(range(rw.num_points))
plt.scatter(rw.x_values, rw.y_values, c=point_numbers, cmap=plt.cm.Blues,
edgecolor=none, s=1)
# Emphasize the first and last points.
plt.scatter(0, 0, c=green, edgecolors=none, s=100)
plt.scatter(rw.x_values[-1], rw.y_values[-1], c=red, edgecolors=none,
s=100)
# Remove the axes.
plt.axes().get_xaxis().set_visible(False)
plt.axes().get_yaxis().set_visible(False)
plt.show()
keep_running = input("Make another walk? (y/n): ")
if keep_running == n:
break
8、scatter_squares.py
的实现
import matplotlib.pyplot as plt
x_values = list(range(1, 1001))
y_values = [x**2 for x in x_values]
plt.scatter(x_values, y_values, c=(0, 0, 0.8), edgecolor=none, s=40)
# Set chart title, and label axes.
plt.title("Square Numbers", fontsize=24)
plt.xlabel("Value", fontsize=14)
plt.ylabel("Square of Value", fontsize=14)
# Set size of tick labels.
plt.tick_params(axis=both, which=major, labelsize=14)
# Set the range for each axis.
plt.axis([0, 1100, 0, 1100000])
plt.show()
9、highs_lows.py
的实现
import csv
from datetime import datetime
from matplotlib import pyplot as plt
# Get dates, high, and low temperatures from file.
filename = death_valley_2014.csv
with open(filename) as f:
reader = csv.reader(f)
header_row = next(reader)
dates, highs, lows = [], [], []
for row in reader:
try:
current_date = datetime.strptime(row[0], "%Y-%m-%d")
high = int(row[1])
low = int(row[3])
except ValueError:
print(current_date, missing data)
else:
dates.append(current_date)
highs.append(high)
lows.append(low)
# Plot data.
fig = plt.figure(dpi=128, figsize=(10, 6))
plt.plot(dates, highs, c=red, alpha=0.5)
plt.plot(dates, lows, c=blue, alpha=0.5)
plt.fill_between(dates, highs, lows, facecolor=blue, alpha=0.1)
# Format plot.
title = "Daily high and low temperatures - 2014\\nDeath Valley, CA"
plt.title(title, fontsize=20)
plt.xlabel(, fontsize=16)
fig.autofmt_xdate()
plt.ylabel("Temperature (F)", fontsize=16)
plt.tick_params(axis=both, which=major, labelsize=16)
plt.show()
10、btc_close_2017.py
的实现
from __future__ import (absolute_import, division, print_function,
unicode_literals)
try:
# Python 2.x 版本
from urllib2 import urlopen
except ImportError:
# Python 3.x 版本
from urllib.request import urlopen # 1
import json
import requests
import pygal
import math
from itertools import groupby
json_url = https://raw.githubusercontent.com/muxuezi/btc/master/btc_close_2017.json
response = urlopen(json_url) # 2
# 读取数据
req = response.read()
# 将数据写入文件
with open(btc_close_2017_urllib.json, wb) as f: # 3
f.write(req)
# 加载json格式
file_urllib = json.loads(req.decode(utf8)) # 4
print(file_urllib)
json_url = https://raw.githubusercontent.com/muxuezi/btc/master/btc_close_2017.json
req = requests.get(json_url) # 1
# 将数据写入文件
with open(btc_close_2017_request.json, w) as f:
f.write(req.text) # 2
file_requests = req.json() # 3
print(file_urllib == file_requests)
# 将数据加载到一个列表中
filename = btc_close_2017.json
with open(filename) as f:
btc_data = json.load(f) # 1
# 打印每一天的信息
for btc_dict in btc_data:
date = btc_dict[date]
month = int(btc_dict[month])
week = int(btc_dict[week])
weekday = btc_dict[weekday]
close = int(float(btc_dict[close])) # 1
print(" is month week , , the close price is RMB".format(
date, month, week, weekday, close))
# 创建5个列表,分别存储日期和收盘价
dates = []
months = []
weeks = []
weekdays = []
close = []
# 每一天的信息
for btc_dict in btc_data:
dates.append(btc_dict[date])
months.append(int(btc_dict[month]))
weeks.append(int(btc_dict[week]))
weekdays.append(btc_dict[weekday])
close.append(int(float(btc_dict[close])))
line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False) # ①
line_chart.title = 收盘价(¥)
line_chart.x_labels = dates
N = 20 # x轴坐标每隔20天显示一次
line_chart.x_labels_major = dates[::N] # ②
line_chart.add(收盘价, close)
line_chart.render_to_file(收盘价折线图(¥).svg)
line_chart = pygal.Line(x_label_rotation=20, show_minor_x_labels=False)
line_chart.title = 收盘价对数变换(¥)
line_chart.x_labels = dates
N = 20 # x轴坐标每隔20天显示一次
line_chart.x_labels_major = dates[::N]
close_log = [math.log10(_) for _ in close] # ①
line_chart.add(log收盘价, close_log)
line_chart.render_to_file(收盘价对数变换折线图(¥).svg)
line_chart
def draw_line(x_data, y_data, title, y_legend):
xy_map = []
for x, y in groupby(sorted(zip(x_data, y_data)), key=lambda _: _[0]): # 2
y_list = [v for _, v in y]
xy_map.append([x, sum(y_list) / len(y_list)]) # 3
x_unique, y_mean = [*zip(*xy_map)] # 4
line_chart = pygal.Line()
line_chart.title = title
line_chart.x_labels = x_unique
line_chart.add(y_legend, y_mean)
line_chart.render_to_file(title + .svg)
return line_chart
idx_month = dates.index(2017-12-01)
line_chart_month = draw_line(
months[:idx_month], close[:idx_month], 收盘价月日均值(¥), 月日均值)
line_chart_month
idx_week = dates.index(2017-12-11)
line_chart_week = draw_line(
weeks[1:idx_week], close[1:idx_week], 收盘价周日均值(¥), 周日均值)
line_chart_week
idx_week = dates.index(2017-12-11)
wd = [Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday, Sunday]
weekdays_int = [wd.index(w) + 1 for w in weekdays[1:idx_week]]
line_chart_weekday = draw_line(
weekdays_int, close[1:idx_week], 收盘价星期均值(¥), 星期均值)
line_chart_weekday.x_labels = [周一, 周二, 周三, 周四, 周五, 周六, 周日]
line_chart_weekday.render_to_file(收盘价星期均值(¥).svg)
line_chart_weekday
with open(收盘价Dashboard.html, w, encoding=utf8) as html_file:
html_file.write(
<html><head><title>收盘价Dashboard</title><meta charset="utf-8"></head><body>\\n)
for svg in [
收盘价折线图(¥).svg, 收盘价对数变换折线图(¥).svg, 收盘价月日均值(¥).svg,
收盘价周日均值(¥).svg, 收盘价星期均值(¥).svg
]:
html_file.write(
<object type="image/svg+xml" data="0" height=500></object>\\n.format(svg)) # 1
html_file.write(</body></html>)
11、bar_descriptions.py
的实现
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS
my_style = LS(#333366, base_style=LCS)
chart = pygal.Bar(style=my_style, x_label_rotation=45, show_legend=False)
chart.title = Python Projects
chart.x_labels = [httpie, django, flask]
chart.force_uri_protocol = http
plot_dicts = [
value: 16101, label: Description of httpie.,
value: 15028, label: Description of django.,
value: 14798, label: Description of flask.,
]
chart.add(, plot_dicts)
chart.render_to_file(bar_descriptions.svg)
12、python_repos.py
的实现
import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS
# Make an API call, and store the response.
url = https://api.github.com/search/repositories?q=language:python&sort=stars
r = requests.get(url)
print("Status code:", r.status_code)
# Store API response in a variable.
response_dict = r.json()
print("Total repositories:", response_dict[total_count])
# Explore information about the repositories.
repo_dicts = response_dict[items]
names, plot_dicts = [], []
for repo_dict in repo_dicts:
names.append(repo_dict[name])
plot_dict =
value: repo_dict[stargazers_count],
label: repo_dict[description],
xlink: repo_dict[html_url],
plot_dicts.append(plot_dict)
# Make visualization.
my_style = LS(#333366, base_style=LCS)
my_config = pygal.Config()
my_config.force_uri_protocol = http
my_config.x_label_rotation = 45
my_config.show_legend = False
my_config.title_font_size = 24
my_config.label_font_size = 14
my_config.major_label_font_size = 18
my_config.truncate_label = 15
my_config.show_y_guides = False
my_config.width = 1000
chart = pygal.Bar(my_config, style=my_style)
chart.title = Most-Starred Python Projects on GitHub
chart.x_labels = names
chart.add(, plot_dicts)
chart.render_to_file(python_repos.svg)
13、hn_submissions.py
的实现
import requests
from operator import itemgetter
# Make an API call, and store the response.
url = https://hacker-news.firebaseio.com/v0/topstories.json
r = requests.get(url)
print("Status code:", r.status_code)
# Process information about each submission.
submission_ids = r.json()
submission_dicts = []
for submission_id in submission_ids[:30]:
# Make a separate API call for each submission.
url = (https://hacker-news.firebaseio.com/v0/item/ +
str(submission_id) + .json)
submission_r = requests.get(url)
print(submission_r.status_code)
response_dict = submission_r.json()
submission_dict =
title: response_dict[title],
link: http://news.ycombinator.com/item?id= + str(submission_id),
comments: response_dict.get(descendants, 0)
submission_dicts.append(submission_dict)
submission_dicts = sorted(submission_dicts, key=itemgetter(comments),
reverse=True)
for submission_dict in submission_dicts:
print("\\nTitle:", submission_dict[title])
print("Discussion link:", submission_dict[link])
print("Comments:", submission_dict[comments])
这就是本文的完整代码,希望读者学完之后可以有一个清晰的认知,对自己的Python基础知识的应用有一个较为深刻的认知。
每个模块实现过程
1、生成数据
[1].生成数据(上)
[2].生成数据(中)
[3].生成数据(下)
2、下载数据
[1].下载数据(上)
[2].下载数据(下)
3、使用API
[1].使用API(上)
[2].使用API(下)
这就是我们本项目每个模块实现的详情,大家可以认真阅读,对大家在日后的数据分析中一定有所帮助。
总结
本文给大家总结了《数据可视化》项目,从需求分析,到代码结构,以及给出了本项目的完整代码。最后贵吗总结了前面每个功能实现的文章链接,方便大家阅读。Python是一门注重实际操作的语言,它是众多编程语言中最简单,也是最好入门的。当你把这门语言学会了,再去学习java、go以及C语言就比较简单了。当然,Python也是一门热门语言,对于人工智能的实现有着很大的帮助,因此,值得大家花时间去学习。生命不息,奋斗不止,我们每天努力,好好学习,不断提高自己的能力,相信自己一定会学有所获。加油!!!
以上是关于python应用篇之数据可视化——总结的主要内容,如果未能解决你的问题,请参考以下文章
Python中的json文件数据可视化之制作交易收盘价折线图