在Keras下对结构化数据分类
Posted 卓晴
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在Keras下对结构化数据分类相关的知识,希望对你有一定的参考价值。
§01 Keras分类
这个分类问题来自于 Structured data classification from scratch ,从头开始介绍了一个在Keras下对于303位病人预测是否有心脏疾病。
一、数据集合
1、下载地址
2、基本内容
-
数据集合:
-
格式
:CSV
数量
:303行
用途
:病人特征,预测是否有心脏疾病
【表1-1-2 数据内容介绍】
Column | Description | Feature Type |
---|---|---|
Age | Age in years | Numerical |
Sex | (1 = male; 0 = female) | Categorical |
CP | Chest pain type (0, 1, 2, 3, 4) | Categorical |
Trestbpd | Resting blood pressure (in mm Hg on admission) | Numerical |
Chol | Serum cholesterol in mg dl | Numerical |
FBS | fasting blood sugar in 120 mg dl (1 = true; 0 = false) | Categorical |
RestECG | Resting electrocardiogram results (0, 1, 2) | Categorical |
Thalach | Maximum heart rate achieved | Numerical |
Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical |
Oldpeak | ST depression induced by exercise relative to rest | Numerical |
Slope | Slope of the peak exercise ST segment | Numerical |
CA | Number of major vessels (0-3) colored by fluoroscopy | Both numerical & categorical |
Thal | 3 = normal; 6 = fixed defect; 7 = reversible defect | Categorical |
Target | Diagnosis of heart disease (1 = true; 0 = false) | Target |
3、获取数据
(1)获取代码
from headm import *
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
file_url = "http://storage.googleapis.com/download.tensorflow.org/data/heart.csv"
dataframe = pd.read_csv(file_url)
printf(dataframe)
printf(dataframe.shape)
printf(dataframe.head())
printf('\\a')
(2)运行结果
Ⅰ.数据尺寸
[303 rows x 14 columns]
(303, 14)
Ⅱ.前面几行数据
age sex cp trestbps chol ... oldpeak slope ca thal target
0 63 1 1 145 233 ... 2.3 3 0 fixed 0
1 67 1 4 160 286 ... 1.5 2 3 normal 1
2 67 1 4 120 229 ... 2.6 2 2 reversible 0
3 37 1 3 130 250 ... 3.5 3 0 normal 0
4 41 0 2 130 204 ... 1.4 1 0 normal 0
[5 rows x 14 columns]
4、划分训练集合与测试集合
val_dataframe = dataframe.sample(frac=0.2, random_state=1337)
train_dataframe = dataframe.drop(val_dataframe.index)
printf(len(train_dataframe), len(val_dataframe))
-
运行结果:
-
训练
:242
验证
:61
242
61
age sex cp trestbps chol fbs ... exang oldpeak slope ca thal target
0 63 1 1 145 233 1 ... 0 2.3 3 0 fixed 0
1 67 1 4 160 286 0 ... 1 1.5 2 3 normal 1
3 37 1 3 130 250 0 ... 0 3.5 3 0 normal 0
4 41 0 2 130 204 0 ... 0 1.4 1 0 normal 0
5 56 1 2 120 236 0 ... 0 0.8 1 0 normal 0
[5 rows x 14 columns]
age sex cp trestbps chol ... oldpeak slope ca thal target
96 41 1 3 112 250 ... 0.0 1 0 normal 0
142 67 1 4 100 299 ... 0.9 2 2 normal 1
80 51 1 3 94 227 ... 0.0 1 1 reversible 0
67 41 1 2 135 203 ... 0.0 2 0 fixed 0
188 41 0 2 126 306 ... 0.0 1 0 normal 0
[5 rows x 14 columns]
5、设定批次数量
train_ds = train_ds.batch(32)
val_ds = val_ds.batch(32)
二、特征处理
from tensorflow.keras.layers import IntegerLookup
from tensorflow.keras.layers import Normalization
from tensorflow.keras.layers import StringLookup
def encode_numerical_feature(feature, name, dataset):
# Create a Normalization layer for our feature
normalizer = Normalization()
# Prepare a Dataset that only yields our feature
feature_ds = dataset.map(lambda x, y: x[name])
feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))
# Learn the statistics of the data
normalizer.adapt(feature_ds)
# Normalize the input feature
encoded_feature = normalizer(feature)
return encoded_feature
def encode_categorical_feature(feature, name, dataset, is_string):
lookup_class = StringLookup if is_string else IntegerLookup
# Create a lookup layer which will turn strings into integer indices
lookup = lookup_class(output_mode="binary")
# Prepare a Dataset that only yields our feature
feature_ds = dataset.map(lambda x, y: x[name])
feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))
# Learn the set of possible string values and assign them a fixed integer index
lookup.adapt(feature_ds)
# Turn the string input into integer indices
encoded_feature = lookup(feature)
return encoded_feature
以上是关于在Keras下对结构化数据分类的主要内容,如果未能解决你的问题,请参考以下文章