准备用于SOFM算法的数据集合
Posted 卓晴
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了准备用于SOFM算法的数据集合相关的知识,希望对你有一定的参考价值。
简 介: 为了了准备人工神经网络中对于SOFM数据集合,在网络上搜集了六种5×7的点阵字符图片。通过程序将其转换成 0 - 1 编码,用于课程的作业中。
关键词
: ASCII,字符点阵
§01 SOFM数据集合
这是为了2021年人工神经网络课程第二次作业(针对于竞争网络)中的作业题。在去年的作业体重使用了课件上的三种字符作为SOM的数据集合。今年计划修改成另外一组数据集合。
▲ 图1.1 2020年作业中所使用的数据集合
Self-Organizing Maps and Applications
一、数据集合
1、原计划数据集
使用网络上的 7×9点阵,选取其中 G、H、I、N、O、Q、U、Z,也就是ZHUOQING中对应的八个字符,两种不同的字体,再有这两种不同的字体增加 汉明距离 为2,生成另外两组字符进行聚类。
选择GNINOQUZ作为训练样本,其中 H-N, O-Q较为难以区分。它们之间的汉明距离很接近。
但是经过网络搜索,发现网络上5×7点阵的字符集合比较多。
2、5×7点阵字符集合
下面搜集了6中ASCII点阵字符。
▲ 图1.1.1 5×7点阵字体
▲ 图1.1.2 5×7点阵字体
▲ 图1.1.3 5×7点阵字体
▲ 图1.1.4 5×7点阵字体
▲ 图1.1.5 5×7点阵字体
▲ 图1.1.6 5×7点阵字体
二、图片数据转换
上述所获得的点阵模板都是图片,需要将它们转换成按照行扫描的 0-1字符串。每个字符包括长度为35个0-1字符串进行。
1、图片增强与反转
首先将图片通过编辑器转换成前景是深色,背景是浅色的图片。如果原始图片相反,则通过图片颜色反向来获得。
▲ 图1.2.1 将图片转换成前景是深色,背景是前侧图片
2、定出字符边界
在TEASOFT软件中,按照字符确定出每个字符点阵图片的边界。
▲ 图1.2.2 按照顺序确定出字符边界
3、转换程序
#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# ASCIIDOT.PY -- by Dr. ZhuoQing 2021-10-31
#
# Note:
#============================================================
from headm import *
from PIL import Image
boxid = [2, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]
picfile = tspgetdopfile(boxid[0])
picrange = tspgetrange(boxid[0])
#printf(picrange)
printf(picfile)
#------------------------------------------------------------
IMAGE_ROW = 7
IMAGE_COL = 5
PIXEL_THRESHOLD = 230
def image2Density(size, imagePixels):
global boxid
imageSize = size
imageWidth = imageSize[0]
imageHeight = imageSize[1]
picwidth = picrange[2] - picrange[0]
picheight = picrange[3] - picrange[1]
widthRatio = imageWidth / picwidth
heightRatio = imageHeight / picheight
asciidim = []
for box in boxid[1:]:
boxrange = tspgetrange(box)
boxpos = [boxrange[0] - picrange[0],
boxrange[1] - picrange[1],
boxrange[2] - picrange[0],
boxrange[3] - picrange[1]]
boxheight = boxrange[3] - boxrange[1]
boxwidth = boxrange[2] - boxrange[0]
asciistr = ''
for i in range(IMAGE_ROW):
startRow = int((i * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
endRow = int(((i+1) * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
col = []
for i in range(IMAGE_COL):
startCol = int((i * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
endCol = int(((i+1) * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
pixelNum = (endRow - startRow - 1) * (endCol - startCol - 1)
pixelSigma = 0
for ii in range(startRow, endRow):
for jj in range(startCol, endCol):
pixelSigma += sum(imagePixels[ii, jj])
pixelSigma = int(pixelSigma / (pixelNum))
#printf(pixelSigma)
if pixelSigma > PIXEL_THRESHOLD:
col.append('0')
else: col.append('1')
str01 = ''.join(col)
printf(str01.replace('0','.').replace('1','#'))
asciistr = asciistr + str01
printf(' ')
asciidim.append(asciistr)
return asciidim
#------------------------------------------------------------
img = Image.open(picfile)
r,g,b = img.split()
img = Image.merge("RGB", (r,g,b)).getdata()
#plt.imshow(img)
#plt.show()
size = img.size
print(size)
img = array(img).sum(axis=1)/3
imgdata = img.reshape(size[1], size[0])
#imgaverage = imgdata.sum(axis=0)
#printf(shape(imgaverage))
#plt.plot(imgaverage)
#plt.xlabel("x")
#plt.ylabel("y")
#plt.grid(True)
#plt.tight_layout()
#plt.show()
printf(shape(imgdata))
result = image2Density(size, imgdata)
for s in result:
printf(s)
#------------------------------------------------------------
# END OF FILE : ASCIIDOT.PY
#============================================================
4、转换结果
(1)字体1
▲ $#Y 1:字体1
01110100011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000101111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00001000010000100001100011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110001100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000110001111101000010000
01110100011000110001101011001101111
11110100011000110001111101000110001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011101110001
10001100010101000100010101000110001
10001100011000101010001000010000100
11111000010001000100010001000011111
(2)字体2
▲ 图1.2.4 字体2
00111010011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
01111000010000100001000011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001100010111000001
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100010101000100010101000110001
10001100011000101111000011000101110
11111000010001000100010001000011111
(3)字体3
▲ 图1.2.5 字体3
01110010101101110001111111000110001
11110010010100101110010010100111110
01110100011000010000100001000101110
11110010100101101001010110101011110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010111100011000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000110001000010110101101001110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111111110101100011000110001
10001110011110110101101111001110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001001101
11110100011000111110100101001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000111011010100111000100
10001100011010110101111111101111011
10001110110101000100010101101110001
10001110110101001110001000010000100
11110001100010000100010000100011110
(4)字体4
▲ 图1.2.6 字体4
00111010011000110001111111000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000010000100001000010100100110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001100010111000001
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100010101000100010101000110001
10001100011000101111000011100101110
11111000010001000100010001000011111
(5)字体5
▲ 图1.2.7 字体5
00100010101000110001111111000110001
11110010010100101110010010100111110
01110100011000010000100001000101110
11110010010100101001010010100111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010011100011000101111
10001100011000111111100011000110001
01110001000010000100001000010001110
00111000100001000010000101001001100
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110101100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001001101
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110101101011010101010
10001100010101000100010101000110001
10001100011000101010001000010000100
11111000010001000100010001000011111
(6)字体6
▲ 图1.2.8 字体6
01110100011000111111100011000110001
11110100011000111110100011000111110
01110100011000010000100001000101110
11110100011000110001100011000111110
11111100001000011110100001000011111
11111100001000011110100001000010000
01110100011000010000100111000101110
10001100011000111111100011000110001
01110001000010000100001000010001110
01111000010000100001000011000101110
10001100101010011000101001001010001
10000100001000010000100001000011111
10001110111010110001100011000110001
10001100011100110101100111000110001
01110100011000110001100011000101110
11110100011000111110100001000010000
01110100011000110001101011001101111
11110100011000111110101001001010001
01110100011000001110000011000101110
11111001000010000100001000010000100
10001100011000110001100011000101110
10001100011000110001100010101000100
10001100011000110001101011010101010
10001100011000101110100011000110001
10001100011000101110001000010000100
11111000010001000100010001000011111
5、字符集合分析
在上面转换的六种字体中,实际上有的字符在所字体中编码都相似,比如C,V,K。也有的字母相差很大,比如A,B,Y等。
▲ 图2.1 六种字体
※ 实验总结 ※
为了准备人工神经网络中对于SOFM数据集合,在网络上搜集了六种5×7的点阵字符图片。通过程序将其转换成 0 - 1 编码,用于课程的作业中。
■ 相关文献链接:
● 相关图表链接:
- 图1.1 2020年作业中所使用的数据集合
- 图1.1.1 5×7点阵字体
- 图1.1.2 5×7点阵字体
- 图1.1.3 5×7点阵字体
- 图1.1.4 5×7点阵字体
- 图1.1.5 5×7点阵字体
- 图1.1.6 5×7点阵字体
- 图1.2.1 将图片转换成前景是深色,背景是前侧图片
- 图1.2.2 按照顺序确定出字符边界
- $#Y 1:字体1
- 图1.2.4 字体2
- 图1.2.5 字体3
- 图1.2.6 字体4
- 图1.2.7 字体5
- 图1.2.8 字体6
- 图2.1 六种字体
#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# ASCIIDOT.PY -- by Dr. ZhuoQing 2021-10-31
#
# Note:
#============================================================
from headm import *
from PIL import Image
#boxid = [3, 10, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]
#boxid = [65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91]
#boxid = [6, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 103, 104, 105, 106, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119]
#boxid = [120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 147, 148]
boxid = [149, 150, 151, 152, 153, 154, 155, 156, 157, 173, 172, 171, 170, 174, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 175]
picfile = tspgetdopfile(boxid[0])
picrange = tspgetrange(boxid[0])
#printf(picrange)
printf(picfile)
#------------------------------------------------------------
IMAGE_ROW = 7
IMAGE_COL = 5
PIXEL_THRESHOLD = 230
def image2Density(size, imagePixels):
global boxid
imageSize = size
imageWidth = imageSize[0]
imageHeight = imageSize[1]
picwidth = picrange[2] - picrange[0]
picheight = picrange[3] - picrange[1]
widthRatio = imageWidth / picwidth
heightRatio = imageHeight / picheight
asciidim = []
for box in boxid[1:]:
boxrange = tspgetrange(box)
boxpos = [boxrange[0] - picrange[0],
boxrange[1] - picrange[1],
boxrange[2] - picrange[0],
boxrange[3] - picrange[1]]
boxheight = boxrange[3] - boxrange[1]
boxwidth = boxrange[2] - boxrange[0]
asciistr = ''
for i in range(IMAGE_ROW):
startRow = int((i * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
endRow = int(((i+1) * boxheight / IMAGE_ROW + boxpos[1]) * heightRatio)
col = []
for i in range(IMAGE_COL):
startCol = int((i * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
endCol = int(((i+1) * boxwidth / IMAGE_COL + boxpos[0]) * widthRatio)
pixelNum = (endRow - startRow - 1) * (endCol - startCol - 1)
pixelSigma = 0
for ii in range(startRow, endRow):
for jj in range(startCol, endCol):
pixelSigma += sum(imagePixels[ii, jj])
pixelSigma = int(pixelSigma / (pixelNum))
#printf(pixelSigma)
if pixelSigma > PIXEL_THRESHOLD:
col.append('0')
else: col.append('1')
str01 = ''.join(col)
printf(str01.replace('0','.').replace('1','#'))
asciistr = asciistr + str01
printf(' ')
asciidim.append(asciistr)
return asciidim
#------------------------------------------------------------
img = Image.open(picfile)
r,g,b = img.split()
img = Image.merge("RGB", (r,g,b)).getdata()
#plt.imshow(img)
#plt.show()
size = img.size
print(size)
img = array(img).sum(axis=1)/3
imgdata = img.reshape(size[1], size[0])
#imgaverage = imgdata.sum(axis=0)
#printf(shape(imgaverage))
#plt.plot(imgaverage)
#plt.xlabel("x")
#plt.ylabel("y")
#plt.grid(True)
#plt.tight_layout()
#plt.show()
printf(shape(imgdata))
result = image2Density(size, imgdata)
for s in result:
printf(s)
#------------------------------------------------------------
# END OF FILE : ASCIIDOT.PY
#============================================================
以上是关于准备用于SOFM算法的数据集合的主要内容,如果未能解决你的问题,请参考以下文章