在Keras下对结构化数据分类

Posted 卓晴

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在Keras下对结构化数据分类相关的知识,希望对你有一定的参考价值。

 

§01 Keras分类


这个分类问题来自于 Structured data classification from scratch ,从头开始介绍了一个在Keras下对于303位病人预测是否有心脏疾病。

一、数据集合

1、下载地址

2、基本内容

数据集合:
格式:CSV
数量:303行
用途:病人特征,预测是否有心脏疾病
【表1-1-2 数据内容介绍】
ColumnDescriptionFeature Type
AgeAge in yearsNumerical
Sex(1 = male; 0 = female)Categorical
CPChest pain type (0, 1, 2, 3, 4)Categorical
TrestbpdResting blood pressure (in mm Hg on admission)Numerical
CholSerum cholesterol in mg dlNumerical
FBSfasting blood sugar in 120 mg dl (1 = true; 0 = false)Categorical
RestECGResting electrocardiogram results (0, 1, 2)Categorical
ThalachMaximum heart rate achievedNumerical
ExangExercise induced angina (1 = yes; 0 = no)Categorical
OldpeakST depression induced by exercise relative to restNumerical
SlopeSlope of the peak exercise ST segmentNumerical
CANumber of major vessels (0-3) colored by fluoroscopyBoth numerical & categorical
Thal3 = normal; 6 = fixed defect; 7 = reversible defectCategorical
TargetDiagnosis of heart disease (1 = true; 0 = false)Target

3、获取数据

(1)获取代码

from headm import *

import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers

file_url = "http://storage.googleapis.com/download.tensorflow.org/data/heart.csv"
dataframe = pd.read_csv(file_url)

printf(dataframe)
printf(dataframe.shape)
printf(dataframe.head())


printf('\\a')

(2)运行结果

Ⅰ.数据尺寸
[303 rows x 14 columns]
(303, 14)
Ⅱ.前面几行数据
   age  sex  cp  trestbps  chol  ...  oldpeak  slope  ca        thal  target
0   63    1   1       145   233  ...      2.3      3   0       fixed       0
1   67    1   4       160   286  ...      1.5      2   3      normal       1
2   67    1   4       120   229  ...      2.6      2   2  reversible       0
3   37    1   3       130   250  ...      3.5      3   0      normal       0
4   41    0   2       130   204  ...      1.4      1   0      normal       0

[5 rows x 14 columns]

4、划分训练集合与测试集合

val_dataframe = dataframe.sample(frac=0.2, random_state=1337)
train_dataframe = dataframe.drop(val_dataframe.index)

printf(len(train_dataframe), len(val_dataframe))
运行结果:
训练:242
验证:61
242
61
   age  sex  cp  trestbps  chol  fbs  ...  exang  oldpeak  slope  ca    thal  target
0   63    1   1       145   233    1  ...      0      2.3      3   0   fixed       0
1   67    1   4       160   286    0  ...      1      1.5      2   3  normal       1
3   37    1   3       130   250    0  ...      0      3.5      3   0  normal       0
4   41    0   2       130   204    0  ...      0      1.4      1   0  normal       0
5   56    1   2       120   236    0  ...      0      0.8      1   0  normal       0

[5 rows x 14 columns]
     age  sex  cp  trestbps  chol  ...  oldpeak  slope  ca        thal  target
96    41    1   3       112   250  ...      0.0      1   0      normal       0
142   67    1   4       100   299  ...      0.9      2   2      normal       1
80    51    1   3        94   227  ...      0.0      1   1  reversible       0
67    41    1   2       135   203  ...      0.0      2   0       fixed       0
188   41    0   2       126   306  ...      0.0      1   0      normal       0

[5 rows x 14 columns]

5、设定批次数量

train_ds = train_ds.batch(32)
val_ds = val_ds.batch(32)

二、特征处理

from tensorflow.keras.layers import IntegerLookup
from tensorflow.keras.layers import Normalization
from tensorflow.keras.layers import StringLookup

def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = Normalization()

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the statistics of the data
    normalizer.adapt(feature_ds)

    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature

def encode_categorical_feature(feature, name, dataset, is_string):
    lookup_class = StringLookup if is_string else IntegerLookup
    # Create a lookup layer which will turn strings into integer indices
    lookup = lookup_class(output_mode="binary")

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the set of possible string values and assign them a fixed integer index
    lookup.adapt(feature_ds)

    # Turn the string input into integer indices
    encoded_feature = lookup(feature)
    return encoded_feature

以上是关于在Keras下对结构化数据分类的主要内容,如果未能解决你的问题,请参考以下文章

Keras深度学习实战——新闻文本分类

Keras CNN:图像的多标签分类

如何使用 keras RNN 在数据集中进行文本分类?

Keras 多类图像分类和预测

图像分类用最简短的代码复现SeNet,小白一定要收藏(keras,Tensorflow2.x)

Keras CIFAR-10 分类汇总篇