准备用于SOFM算法的数据集合

Posted 卓晴

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了准备用于SOFM算法的数据集合相关的知识,希望对你有一定的参考价值。

简 介: 为了了准备人工神经网络中对于SOFM数据集合,在网络上搜集了六种5×7的点阵字符图片。通过程序将其转换成 0 - 1 编码,用于课程的作业中。

关键词 ASCII字符点阵

 

§01 SOFM数据集合


  是为了2021年人工神经网络课程第二次作业(针对于竞争网络)中的作业题。在去年的作业体重使用了课件上的三种字符作为SOM的数据集合。今年计划修改成另外一组数据集合。

▲ 图1.1 2020年作业中所使用的数据集合

  Self-Organizing Maps and Applications

一、数据集合

1、原计划数据集

  使用网络上的 7×9点阵,选取其中 G、H、I、N、O、Q、U、Z,也就是ZHUOQING中对应的八个字符,两种不同的字体,再有这两种不同的字体增加 汉明距离 为2,生成另外两组字符进行聚类。

  选择GNINOQUZ作为训练样本,其中 H-N, O-Q较为难以区分。它们之间的汉明距离很接近。

  但是经过网络搜索,发现网络上5×7点阵的字符集合比较多。

2、5×7点阵字符集合

  下面搜集了6中ASCII点阵字符。

▲ 图1.1.1 5×7点阵字体

▲ 图1.1.2 5×7点阵字体

▲ 图1.1.3 5×7点阵字体

▲ 图1.1.4 5×7点阵字体

▲ 图1.1.5 5×7点阵字体

▲ 图1.1.6 5×7点阵字体

二、图片数据转换

  上述所获得的点阵模板都是图片,需要将它们转换成按照行扫描的 0-1字符串。每个字符包括长度为35个0-1字符串进行。

1、图片增强与反转

  首先将图片通过编辑器转换成前景是深色,背景是浅色的图片。如果原始图片相反,则通过图片颜色反向来获得。

▲ 图1.2.1 将图片转换成前景是深色,背景是前侧图片

2、定出字符边界

  在TEASOFT软件中,按照字符确定出每个字符点阵图片的边界。

▲ 图1.2.2 按照顺序确定出字符边界

3、转换程序

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# ASCIIDOT.PY                  -- by Dr. ZhuoQing 2021-10-31
#
# Note:
#============================================================
from headm import *
from PIL import Image
boxid = [2, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
         23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
picfile = tspgetdopfile(boxid[0])
picrange = tspgetrange(boxid[0])
#printf(picrange)
printf(picfile)
#------------------------------------------------------------
IMAGE_ROW = 7
IMAGE_COL = 5
PIXEL_THRESHOLD = 230
def image2Density(size, imagePixels):
    global boxid
    imageSize = size
    imageWidth = imageSize[0]
    imageHeight = imageSize[1]
    picwidth = picrange[2] - picrange[0]
    picheight = picrange[3] - picrange[1]
    widthRatio = imageWidth / picwidth
    heightRatio = imageHeight / picheight
    asciidim = []
    for box in boxid[1:]:
        boxrange = tspgetrange(box)
        boxpos = [boxrange[0] - picrange[0],
                  boxrange[1] - picrange[1],
                  boxrange[2] - picrange[0],
                  boxrange[3] - picrange[1]]
        boxheight = boxrange[3] - boxrange[1]
        boxwidth = boxrange[2] - boxrange[0]
        asciistr = ''
        for i in range(IMAGE_ROW):
            startRow = int((i * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
            endRow = int(((i+1) * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
            col = []
            for i in range(IMAGE_COL):
                startCol = int((i * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
                endCol = int(((i+1) * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
                pixelNum = (endRow - startRow - 1) * (endCol - startCol - 1)
                pixelSigma = 0
                for ii in range(startRow, endRow):
                    for jj in range(startCol, endCol):
                        pixelSigma += sum(imagePixels[ii, jj])
                pixelSigma = int(pixelSigma / (pixelNum))
                #printf(pixelSigma)
                if pixelSigma > PIXEL_THRESHOLD:
                    col.append('0')
                else: col.append('1')
            str01 = ''.join(col)
            printf(str01.replace('0','.').replace('1','#'))
            asciistr = asciistr + str01
        printf('  ')
        asciidim.append(asciistr)
    return asciidim
#------------------------------------------------------------
img = Image.open(picfile)
r,g,b = img.split()
img = Image.merge("RGB", (r,g,b)).getdata()
#plt.imshow(img)
#plt.show()
size = img.size
print(size)
img = array(img).sum(axis=1)/3
imgdata = img.reshape(size[1], size[0])
#imgaverage = imgdata.sum(axis=0)
#printf(shape(imgaverage))
#plt.plot(imgaverage)
#plt.xlabel("x")
#plt.ylabel("y")
#plt.grid(True)
#plt.tight_layout()
#plt.show()
printf(shape(imgdata))
result = image2Density(size, imgdata)
for s in result:
    printf(s)
#------------------------------------------------------------
#        END OF FILE : ASCIIDOT.PY
#============================================================

4、转换结果

(1)字体1

▲ $#Y 1:字体1

01110100011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000101111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00001000010000100001100011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110001100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000110001111101000010000
01110100011000110001101011001101111
11110100011000110001111101000110001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011101110001
10001100010101000100010101000110001
10001100011000101010001000010000100
11111000010001000100010001000011111

(2)字体2

▲ 图1.2.4 字体2

00111010011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
01111000010000100001000011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001100010111000001
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100010101000100010101000110001
10001100011000101111000011000101110
11111000010001000100010001000011111

(3)字体3

▲ 图1.2.5 字体3

01110010101101110001111111000110001
11110010010100101110010010100111110
01110100011000010000100001000101110
11110010100101101001010110101011110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010111100011000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000110001000010110101101001110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111111110101100011000110001
10001110011110110101101111001110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001001101
11110100011000111110100101001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000111011010100111000100
10001100011010110101111111101111011
10001110110101000100010101101110001
10001110110101001110001000010000100
11110001100010000100010000100011110

(4)字体4

▲ 图1.2.6 字体4

00111010011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000010000100001000010100100110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001100010111000001
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100010101000100010101000110001
10001100011000101111000011100101110
11111000010001000100010001000011111

(5)字体5

▲ 图1.2.7 字体5

00100010101000110001111111000110001
11110010010100101110010010100111110
01110100011000010000100001000101110
11110010010100101001010010100111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010011100011000101111
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000100001000010000101001001100
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001001101
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110101101011010101010
10001100010101000100010101000110001
10001100011000101010001000010000100
11111000010001000100010001000011111

(6)字体6

▲ 图1.2.8 字体6

01110100011000111111100011000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
01111000010000100001000011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110001100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001101111
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100011000101110100011000110001
10001100011000101110001000010000100
11111000010001000100010001000011111

5、字符集合分析

  在上面转换的六种字体中,实际上有的字符在所字体中编码都相似,比如C,V,K。也有的字母相差很大,比如A,B,Y等。

▲ 图2.1 六种字体

 

验总结 ※


  了准备人工神经网络中对于SOFM数据集合,在网络上搜集了六种5×7的点阵字符图片。通过程序将其转换成 0 - 1 编码,用于课程的作业中。


■ 相关文献链接:

● 相关图表链接:

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# ASCIIDOT.PY                  -- by Dr. ZhuoQing 2021-10-31
#
# Note:
#============================================================
from headm import *
from PIL import Image
#boxid = [3, 10, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
#boxid = [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91]
#boxid = [6, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104, 105, 106, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119]
#boxid = [120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 147, 148]
boxid = [149, 150, 151, 152, 153, 154, 155, 156, 157, 173, 172, 171, 170, 174, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 175]
picfile = tspgetdopfile(boxid[0])
picrange = tspgetrange(boxid[0])
#printf(picrange)
printf(picfile)
#------------------------------------------------------------
IMAGE_ROW = 7
IMAGE_COL = 5
PIXEL_THRESHOLD = 230
def image2Density(size, imagePixels):
    global boxid
    imageSize = size
    imageWidth = imageSize[0]
    imageHeight = imageSize[1]
    picwidth = picrange[2] - picrange[0]
    picheight = picrange[3] - picrange[1]
    widthRatio = imageWidth / picwidth
    heightRatio = imageHeight / picheight
    asciidim = []
    for box in boxid[1:]:
        boxrange = tspgetrange(box)
        boxpos = [boxrange[0] - picrange[0],
                  boxrange[1] - picrange[1],
                  boxrange[2] - picrange[0],
                  boxrange[3] - picrange[1]]
        boxheight = boxrange[3] - boxrange[1]
        boxwidth = boxrange[2] - boxrange[0]
        asciistr = ''
        for i in range(IMAGE_ROW):
            startRow = int((i * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
            endRow = int(((i+1) * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
            col = []
            for i in range(IMAGE_COL):
                startCol = int((i * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
                endCol = int(((i+1) * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
                pixelNum = (endRow - startRow - 1) * (endCol - startCol - 1)
                pixelSigma = 0
                for ii in range(startRow, endRow):
                    for jj in range(startCol, endCol):
                        pixelSigma += sum(imagePixels[ii, jj])
                pixelSigma = int(pixelSigma / (pixelNum))
                #printf(pixelSigma)
                if pixelSigma > PIXEL_THRESHOLD:
                    col.append('0')
                else: col.append('1')
            str01 = ''.join(col)
            printf(str01.replace('0','.').replace('1','#'))
            asciistr = asciistr + str01
        printf('  ')
        asciidim.append(asciistr)
    return asciidim
#------------------------------------------------------------
img = Image.open(picfile)
r,g,b = img.split()
img = Image.merge("RGB", (r,g,b)).getdata()
#plt.imshow(img)
#plt.show()
size = img.size
print(size)
img = array(img).sum(axis=1)/3
imgdata = img.reshape(size[1], size[0])
#imgaverage = imgdata.sum(axis=0)
#printf(shape(imgaverage))
#plt.plot(imgaverage)
#plt.xlabel("x")
#plt.ylabel("y")
#plt.grid(True)
#plt.tight_layout()
#plt.show()
printf(shape(imgdata))
result = image2Density(size, imgdata)
for s in result:
    printf(s)
#------------------------------------------------------------
#        END OF FILE : ASCIIDOT.PY
#============================================================

以上是关于准备用于SOFM算法的数据集合的主要内容,如果未能解决你的问题,请参考以下文章

为啥相同的SOFM程序每次运行的结果不一样

Sofm在比赛中出现挂机消极的行为,SN做出了啥相关惩戒?

2021年人工神经网络第四次作业要求:第七题

数据库系统原理 片段翻译

带有红宝石集合/可枚举的酷技巧和富有表现力的片段[关闭]

算法导论——用于不相交集合的数据结构