聊一聊粗糙集

Posted gedanke

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了聊一聊粗糙集相关的知识,希望对你有一定的参考价值。

本节我们将继续介绍粗糙集有关的概念。


上节我们介绍了知识粒度的度量,本节将介绍知识粒度的矩阵表示形式。

我们先简单介绍矩阵的相关概念。

矩阵

先看矩阵的和,差。

矩阵的和:
(A=(a_{ij})_{m imes n})(B=(b_{ij})_{m imes n})是两个(m imes n)的矩阵,则两个矩阵的和(C=(c_{ij})_{m imes n})
[ C = A+B quad Longrightarrow quad c_{ij}=a_{ij}+b_{ij} ]

[ =egin{bmatrix} a_{11} & a_{12} & cdots & a_{1n} a_{21} & a_{22} & cdots & a_{2n} vdots & vdots & ddots & vdots a_{m1} & a_{m2} & cdots & a_{mn} \ end{bmatrix} + egin{bmatrix} b_{11} & b_{12} & cdots & b_{1n} b_{21} & b_{22} & cdots & b_{2n} vdots & vdots & ddots & vdots b_{m1} & b_{m2} & cdots & b_{mn} end{bmatrix} ]

[ =egin{bmatrix} a_{11}+b_{11} & a_{12}+b_{12} & cdots & a_{1n}+b_{1n} a_{21}+b_{21} & a_{22}+b_{22} & cdots & a_{2n}+b_{2n} vdots & vdots & ddots & vdots a_{m1}+b_{m1} & a_{m2}+b_{m2} & cdots & a_{mn}+b_{mn} \end{bmatrix} ]

类似的,两个矩阵的差:
[ C = A-B quad Longrightarrow quad c_{ij}=a_{ij}-b_{ij} ]

[ = egin{bmatrix} a_{11}-b_{11} & a_{12}-b_{12} & cdots & a_{1n}-b_{1n} a_{21}-b_{21} & a_{22}-b_{22} & cdots & a_{2n}-b_{2n} vdots & vdots & ddots & vdots a_{m1}-b_{m1} & a_{m2}-b_{m2} & cdots & a_{mn}-b_{mn} end{bmatrix} ]
矩阵的转置:
[ A= egin{bmatrix} a_{11} & a_{12} & cdots & a_{1n} a_{21} & a_{22} & cdots & a_{2n} vdots & vdots & ddots & vdots a_{n1} & a_{n2} & cdots & a_{nn} \end{bmatrix} ]

则矩阵(A)的转置矩阵(A^T)为:
[ A^T= egin{bmatrix} a_{11} & a_{21} & cdots & a_{n1} a_{12} & a_{22} & cdots & a_{n2} vdots & vdots & ddots & vdots a_{1n} & a_{2n} & cdots & a_{nn} \end{bmatrix} ]
最后来看矩阵的乘积:
(A=(a_{ij})_{m imes n})(B=(b_{ij})_{n imes p})是两个矩阵
则两个矩阵的乘积(A imes B =C=(c_{ij})_{m imes p}) 为:
[ C = A imes B quad Longrightarrow quad (c_{ij})_{m imes p}=(sum_{k=1}^{n} a_{ik}cdot b_{kj})_{m imes p} ]

[ = egin{bmatrix} sum_{k=1}^{n} a_{1k}b_{k1} & sum_{k=1}^{n}a_{1k}b_{k2} & cdots & sum_{k=1}^{n} a_{1k}b_{kp} sum_{k=1}^{n} a_{2k}b_{k1} & sum_{k=1}^{n}a_{2k}b_{k2} & cdots & sum_{k=1}^{n} a_{2k}b_{kp} vdots & vdots & ddots & vdots sum_{k=1}^{n} a_{mk}b_{k1} & sum_{k=1}^{n}a_{mk}b_{k2} & cdots & sum_{k=1}^{n} a_{mk}b_{kp} end{bmatrix} ]

知识粒度的矩阵表现形式

我们依旧使用该表

(U) (a) (b) (c) (e) (f) (d)
1 0 1 1 1 0 1
2 1 1 0 1 0 1
3 1 0 0 0 1 0
4 1 1 0 1 0 1
5 1 0 0 0 1 0
6 0 1 1 1 1 0
7 0 1 1 1 1 0
8 1 0 0 1 0 1
9 1 0 0 1 0 0

等价关系矩阵的定义如下:
(S=(U,A=C igcup D,V,f))是一个决策信息系统,论域(U={u_{1},u_{2},...,u_{n} })(n)是论域内元素个数,(U/C={X_{1},X_{2},...,X_{m}})(R_{C})是论域(U)的等价关系。则等价关系矩阵(U_{U}^{R_{C}} = (m_{ij})_{n imes n})定义如下:
[ m_{ij} =egin{cases} 1 & (u_{i},u_{j}) in R_{C} & (u_{i},u_{j}) otin R_{C} end{cases} ]

其中,({1 leq i,j leq n})

基于矩阵的知识粒度如下:
(S=(U,A=C igcup D,V,f))是一个决策信息系统,(U_{U}^{R_{C}} = (m_{ij})_{n imes n})是等价关系矩阵,条件属性(C)基于矩阵的知识粒度定义如下:
[ GP_{U}(C)=frac{sumleft(M_{U}^{R_{C}} ight)}{|U|^{2}}=overline{M_{U}^{R_{C}}} ]
其中,(sumleft(M_{U}^{R_{C}} ight))是等价矩阵内(1)的个数总和,(overline{M_{U}^{R_{C}}})是矩阵内所有元素的均值。

依旧上表,我们可以计算(GP_{U}(C)):
[ GP_{U}(C)=overline{M_{U}^{R_{C}}}=frac{1}{81} imesoperatorname{sum}(left[egin{array}{ccccccccc} {1} & {0} & {0} & {0} & {0} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {1} & {1} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {1} & {1} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {0} & {0} & {1} & {1} {0} & {0} & {0} & {0} & {0} & {0} & {0} & {1} & {1} end{array} ight])=frac{17}{81} ]

这和我们在上节计算得到的结果是一致的。

类似的,相对知识粒度的定义如下:
(S=(U,A=C igcup D,V,f))是一个决策信息系统,(U_{U}^{R_{C}})(U_{U}^{R_{C igcup D}})是等价关系矩阵,则决策属性(D)关于条件属性(C)基于矩阵的相对知识粒度定义如下:
[ G P_{U}(Dmid C)=overline{U_{U}^{R_{C}}}-overline{U_{U}^{R_{C igcup D}}} ]

根据上表,我们可以计算(GP_{U}(D mid C)):
[ GP_{U}(D mid C)=overline{U_{U}^{R_{C}}}-overline{U_{U}^{R_{C igcup D}}} ]

[ =frac{1}{81} imesoperatorname{sum}(left[egin{array}{ccccccccc} {1} & {0} & {0} & {0} & {0} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {1} & {1} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {1} & {1} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {0} & {0} & {1} & {1} {0} & {0} & {0} & {0} & {0} & {0} & {0} & {1} & {1} end{array} ight] - left[egin{array}{ccccccccc} {1} & {0} & {0} & {0} & {0} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} & {0} {0} & {0} & {1} & {0} & {1} & {0} & {0} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {1} & {1} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {1} & {1} & {0} & {0} {0} & {0} & {0} & {0} & {0} & {0} & {0} & {1} & {0} {0} & {0} & {0} & {0} & {0} & {0} & {0} & {0} & {1} end{array} ight]) =frac{2}{81} ]
这与我们之前计算的结果是一致的。

类似的,基于矩阵的内外部属性重要度的定义如下:
内部属性重要度:
(S=(U,A=C igcup D,V,f))是一个决策信息系统,(Bsubseteq C),且(U_{U}^{R_{B}})(U_{U}^{R_{B-{a} }}),(U_{U}^{R_{B igcup D}}),(U_{U}^{R_{(B -{a}) igcup D}})都是等价关系矩阵,(forall a in B),则属性(a)关于条件属性(B)相对于决策属性集(D)的基于矩阵的相对知识粒度定义如下:
[ operatorname{Sig}_{U}^{inner }(a, B, D)=GP_{U}(D mid B-{a})-GP_{U}(D mid B) ]

[ ={ GP_{U}(B-{a})-GP_{U}((B-{a}) igcup D) }-{GP_{U}(B)-GP_{U}(B igcup D) } ]

[ =overline{M_{U}^{R_{B-{a }}}}-overline{M_{U}^{R_{(B -{a}) igcup D}}}-overline{M_{U}^{R_{B}}}+overline{M_{U}^{R_{B igcup D}}} ]

外部属性重要度:
(S=(U,A=C igcup D,V,f))是一个决策信息系统,(Bsubseteq C),且(U_{U}^{R_{B}})(U_{U}^{R_{B igcup D}}),(U_{U}^{R_{B igcup {a} }}),(U_{U}^{R_{(B igcup {a}) igcup D}})都是等价关系矩阵,(forall a in (C-B)),则属性(a)关于条件属性(B)相对于决策属性集(D)的基于矩阵的相对知识粒度定义如下:
[ operatorname{Sig}_{U}^{outer }(a, B, D)=GP_{U}(D mid B)-GP_{U}(D mid B igcup {a}) ]

[ ={ GP_{U}(B)-GP_{U}(Bigcup D)} - { GP_{U}(B igcup {a})-GP_{U}((Bigcup {a}) igcup D) } ]

[ =overline{M_{U}^{R_{B}}}-overline{M_{U}^{R_{B igcup D}}}-overline{M_{U}^{R_{B igcup {a } }}}+overline{M_{U}^{R_{(B igcup {a}) igcup D}}} ]

参考上节的案例,如果使用矩阵表示的话,结果是一样的,但是基于矩阵的方式在面对大规模数据集是可能不是好的选择。


本文参考了:

  • 景运革. 基于知识粒度的动态属性约简算法研究[D].西南交通大学,2017.

以上是关于聊一聊粗糙集的主要内容,如果未能解决你的问题,请参考以下文章

聊一聊粗糙集

聊一聊粗糙集

SQL开发实战技巧系列(二十二):数仓报表场景☞ 从分析函数效率一定快吗聊一聊结果集分页和隔行抽样实现方式

南京TSMC 16nm量产出货,聊一聊TSMC的崛起之路

聊一聊MR过程

聊一聊hadoop小文件合并成大文件