如何自动识别双峰直方图?
Posted
技术标签:
【中文标题】如何自动识别双峰直方图?【英文标题】:How to identify Bimodal Histograms automatically? 【发布时间】:2015-08-26 10:37:26 【问题描述】:我有一个问题,我从一些数据库图像中收到了很多直方图。这些直方图表示为向量 (0...255),我必须识别并使用双峰直方图。
是否有一个公式可以自动识别哪些直方图是双峰的,哪些不是?由于它们是数字向量,因此我可以使用编程语言 (Java/C#) 来处理它。
文献上是否有软件识别双峰直方图的标准?
以下是我正在使用的 3 个直方图和格式输入示例。每个直方图都是一个具有 256 (0...255) 个位置的向量。
Histogram 1
8029, 41, 82, 177, 135, 255, 315, 591, 949, 456, 499, 688, 446, 733, 712, 1595, 2633, 3945, 6134, 9755, 9236, 11911, 11888, 9450, 13119, 8819, 5991, 4399, 6745, 2017, 3747, 1777, 2946, 1623, 2151, 454, 3015, 3176, 2211, 1080, 391, 580, 750, 473, 10424, 334, 559, 621, 340, 2794, 1094, 5274, 2822, 204, 389, 728, 268, 15, 1060, 58, 113, 2728, 52, 3166, 11, 103, 522, 107, 351, 97, 66, 565, 315, 444, 3305, 245, 647, 306, 147, 112, 103, 672, 69, 317, 61, 224, 71, 52, 479, 62, 106, 166, 215, 132, 137, 321, 998, 427, 846, 787, 542, 1054, 1429, 615, 697, 580, 642, 768, 1244, 462, 4107, 1701, 2394, 4954, 4869, 1841, 1807, 1032, 3075, 331, 488, 627, 1281, 233, 1010, 1178, 727, 830, 1619, 728, 1428, 1849, 4826, 351, 745, 320, 888, 335, 741, 1151, 734, 689, 2143, 1130, 2482, 3609, 4779, 5678, 4186, 2654, 1668, 1290, 702, 1093, 476, 438, 445, 271, 98, 368, 226, 90, 75, 26, 33, 62, 16, 824, 21, 37, 34, 24, 54, 42, 101, 112, 18, 24, 17, 15, 3, 50, 7, 6, 54, 3, 58, 9, 10, 66, 12, 11, 10, 6, 25, 11, 7, 172, 13, 18, 21, 9, 8, 9, 42, 16, 15, 6, 12, 17, 7, 591, 6, 7, 14, 24, 7, 7, 19, 87, 18, 8, 9, 9, 35, 55, 4, 17, 10, 18, 22, 46, 8, 852, 15, 14, 12, 11, 9, 3, 50, 163, 12, 4, 18, 129, 6, 35, 47, 14, 18, 150, 21, 46, 24, 0
Histogram 2
8082, 4857, 1494, 2530, 1604, 1636, 1651, 1681, 1630, 1667, 1636, 1649, 1934, 1775, 1701, 1691, 1478, 1649, 1449, 1449, 1503, 1475, 1497, 1398, 1509, 1747, 1301, 1539, 1575, 1496, 1754, 1432, 1759, 1786, 1679, 1816, 2435, 1174, 1780, 1344, 1749, 2026, 1779, 1742, 1722, 1835, 2306, 1662, 1965, 1885, 2212, 2139, 1930, 2306, 2707, 2289, 2307, 2082, 2360, 2216, 2480, 2243, 2222, 1824, 4555, 1918, 2116, 2275, 2615, 2240, 2703, 2481, 2626, 2708, 3008, 2696, 2561, 2906, 3625, 2419, 3137, 2793, 2747, 2861, 2774, 4124, 3155, 3243, 3523, 3432, 3277, 3456, 2984, 2902, 2819, 2778, 3158, 2997, 2591, 2717, 2553, 2464, 3657, 2296, 2352, 2046, 2124, 1965, 2014, 2096, 1664, 1373, 1607, 1322, 1272, 1113, 1156, 1055, 924, 881, 1019, 669, 929, 636, 590, 463, 524, 177, 1267, 378, 409, 413, 415, 435, 385, 379, 267, 413, 266, 282, 499, 194, 360, 199, 337, 92, 986, 183, 160, 230, 124, 213, 188, 334, 164, 159, 130, 143, 135, 331, 25, 118, 114, 98, 74, 301, 92, 119, 94, 72, 192, 38, 64, 100, 138, 30, 98, 65, 226, 23, 46, 78, 78, 61, 55, 234, 26, 36, 95, 31, 49, 214, 25, 34, 58, 37, 101, 20, 41, 34, 150, 16, 50, 25, 53, 18, 30, 67, 27, 36, 42, 23, 60, 12, 21, 36, 12, 45, 21, 58, 53, 18, 51, 16, 25, 9, 24, 15, 18, 30, 33, 20, 19, 12, 23, 16, 14, 21, 14, 10, 20, 13, 12, 9, 6, 9, 7, 10, 7, 2, 0, 0, 0, 0, 0, 2087
Histogram 3
50, 226, 857, 2018, 1810, 1795, 1840, 1929, 1942, 1693, 1699, 1547, 1564, 1556, 1451, 1439, 1448, 1357, 1428, 1419, 1383, 1705, 1670, 1777, 1826, 1865, 1897, 1924, 2003, 1973, 1813, 1801, 1827, 1696, 1717, 1654, 1678, 1705, 1621, 1523, 1494, 1559, 1434, 1370, 1358, 1385, 1348, 1380, 1368, 1367, 1389, 1445, 1514, 1471, 1465, 1461, 1475, 1484, 1390, 1403, 1324, 1339, 1426, 1432, 1487, 1460, 1469, 1460, 1546, 1504, 1425, 1373, 1391, 1391, 1382, 1311, 1368, 1354, 1325, 1323, 1263, 1325, 1363, 1357, 1325, 1322, 1429, 1419, 1412, 1371, 1266, 1179, 1166, 1076, 1100, 1083, 1103, 1053, 1116, 1080, 1071, 1025, 1088, 1060, 1011, 984, 958, 959, 954, 937, 982, 950, 1001, 963, 965, 875, 1010, 954, 990, 894, 959, 972, 963, 1101, 971, 1042, 1064, 1075, 1029, 1088, 1090, 1068, 1073, 1058, 1102, 1105, 1009, 1062, 1005, 1048, 973, 998, 1034, 1013, 961, 1006, 983, 948, 1031, 972, 952, 1013, 954, 964, 970, 881, 887, 967, 941, 928, 994, 1019, 1106, 1056, 1113, 1071, 1158, 1108, 1178, 1071, 1080, 1074, 1050, 1076, 1106, 1048, 973, 1042, 997, 1034, 934, 863, 935, 845, 839, 803, 764, 782, 787, 771, 766, 751, 745, 804, 789, 765, 681, 658, 690, 672, 650, 635, 695, 619, 572, 499, 535, 565, 564, 520, 516, 568, 530, 479, 507, 424, 446, 455, 380, 395, 371, 360, 391, 373, 351, 388, 426, 349, 417, 421, 400, 443, 470, 485, 456, 495, 452, 484, 457, 518, 519, 631, 652, 693, 762, 771, 807, 906, 991, 1138, 1433, 1545, 2467, 4907, 6743, 1921
【问题讨论】:
见this on CrossValidated。完全一样的想法。 平滑直方图(以消除小的局部最小值最大值)然后找到所有局部最小值最大值并测试您是否有 2 个不同的大峰值 ... @Spektre 感谢您的回答。看起来你的评论很简单。但是我看过一些关于它的文章,它看起来并不简单。你有一个应用程序的例子吗?再次感谢! 不,我不使用第三方应用程序进行 DIP。是的,实现看起来微不足道,但是当您开始添加诸如自适应平滑或微不足道峰值的自适应阈值等内容时,代码变得更加复杂。 (在通用直方图上发现局部峰值一点也不简单)肯定有很多关于这方面的研究论文,但每一篇都可能适合特定类型的图像或任务。在 DIP/CV 中很难(几乎不可能)为任何具有通用/任意输入的任务提供可靠的算法。所以更具体地说,您需要向我们展示您的输入数据集 @Spektre 我对问题进行了编辑,并放置了我的直方图模型样本。由于直方图是 occrency index,它们可以很容易地用向量表示。谢谢 【参考方案1】:我认为没有简单直接的解决方案。
如果您将直方图中的每个峰值视为一个集群,您可以尝试实现某种聚类算法,该算法能够自动检测直方图中的峰值数量。
你可以从这里开始看:
Determining the number of clusters in a data set
Kmeans without knowing the number of clusters?
AUTOMATIC DETERMINATION OF THE NUMBER OF CLUSTERS USING SPECTRAL ALGORITHMS
【讨论】:
【参考方案2】:平滑直方图
这会过滤掉小的局部最小值最大值和噪声。使用对称平滑以避免向一侧移动。我从左边平滑然后从右边平滑,这降低了很多。
查找/计算局部最大峰值
只计算足够大的峰(按某个阈值)。如果峰值计数不是2
,则它不是双峰直方图,除非您对双峰有不同的定义,例如:
这取决于直方图的用途
这是我为此而破解的一些 C++ 代码:
void histograms(Graphics::TBitmap *bmp,int xs,int ys,int **pyx)
// clear buffer
bmp->Canvas->Brush->Color=clBlack;
bmp->Canvas->FillRect(TRect(0,0,xs,ys));
bmp->Canvas->Font->Color=clAqua;
bmp->Canvas->TextOutA( 5,5,"Raw histogram");
bmp->Canvas->TextOutA(285,5,"Smoothed histogram");
int his1[256]= 8029, 41, 82, 177, 135, 255, 315, 591, 949, 456, 499, 688, 446, 733, 712, 1595, 2633, 3945, 6134, 9755, 9236, 11911, 11888, 9450, 13119, 8819, 5991, 4399, 6745, 2017, 3747, 1777, 2946, 1623, 2151, 454, 3015, 3176, 2211, 1080, 391, 580, 750, 473, 10424, 334, 559, 621, 340, 2794, 1094, 5274, 2822, 204, 389, 728, 268, 15, 1060, 58, 113, 2728, 52, 3166, 11, 103, 522, 107, 351, 97, 66, 565, 315, 444, 3305, 245, 647, 306, 147, 112, 103, 672, 69, 317, 61, 224, 71, 52, 479, 62, 106, 166, 215, 132, 137, 321, 998, 427, 846, 787, 542, 1054, 1429, 615, 697, 580, 642, 768, 1244, 462, 4107, 1701, 2394, 4954, 4869, 1841, 1807, 1032, 3075, 331, 488, 627, 1281, 233, 1010, 1178, 727, 830, 1619, 728, 1428, 1849, 4826, 351, 745, 320, 888, 335, 741, 1151, 734, 689, 2143, 1130, 2482, 3609, 4779, 5678, 4186, 2654, 1668, 1290, 702, 1093, 476, 438, 445, 271, 98, 368, 226, 90, 75, 26, 33, 62, 16, 824, 21, 37, 34, 24, 54, 42, 101, 112, 18, 24, 17, 15, 3, 50, 7, 6, 54, 3, 58, 9, 10, 66, 12, 11, 10, 6, 25, 11, 7, 172, 13, 18, 21, 9, 8, 9, 42, 16, 15, 6, 12, 17, 7, 591, 6, 7, 14, 24, 7, 7, 19, 87, 18, 8, 9, 9, 35, 55, 4, 17, 10, 18, 22, 46, 8, 852, 15, 14, 12, 11, 9, 3, 50, 163, 12, 4, 18, 129, 6, 35, 47, 14, 18, 150, 21, 46, 24, 0 ;
int his2[256]= 8082, 4857, 1494, 2530, 1604, 1636, 1651, 1681, 1630, 1667, 1636, 1649, 1934, 1775, 1701, 1691, 1478, 1649, 1449, 1449, 1503, 1475, 1497, 1398, 1509, 1747, 1301, 1539, 1575, 1496, 1754, 1432, 1759, 1786, 1679, 1816, 2435, 1174, 1780, 1344, 1749, 2026, 1779, 1742, 1722, 1835, 2306, 1662, 1965, 1885, 2212, 2139, 1930, 2306, 2707, 2289, 2307, 2082, 2360, 2216, 2480, 2243, 2222, 1824, 4555, 1918, 2116, 2275, 2615, 2240, 2703, 2481, 2626, 2708, 3008, 2696, 2561, 2906, 3625, 2419, 3137, 2793, 2747, 2861, 2774, 4124, 3155, 3243, 3523, 3432, 3277, 3456, 2984, 2902, 2819, 2778, 3158, 2997, 2591, 2717, 2553, 2464, 3657, 2296, 2352, 2046, 2124, 1965, 2014, 2096, 1664, 1373, 1607, 1322, 1272, 1113, 1156, 1055, 924, 881, 1019, 669, 929, 636, 590, 463, 524, 177, 1267, 378, 409, 413, 415, 435, 385, 379, 267, 413, 266, 282, 499, 194, 360, 199, 337, 92, 986, 183, 160, 230, 124, 213, 188, 334, 164, 159, 130, 143, 135, 331, 25, 118, 114, 98, 74, 301, 92, 119, 94, 72, 192, 38, 64, 100, 138, 30, 98, 65, 226, 23, 46, 78, 78, 61, 55, 234, 26, 36, 95, 31, 49, 214, 25, 34, 58, 37, 101, 20, 41, 34, 150, 16, 50, 25, 53, 18, 30, 67, 27, 36, 42, 23, 60, 12, 21, 36, 12, 45, 21, 58, 53, 18, 51, 16, 25, 9, 24, 15, 18, 30, 33, 20, 19, 12, 23, 16, 14, 21, 14, 10, 20, 13, 12, 9, 6, 9, 7, 10, 7, 2, 0, 0, 0, 0, 0, 2087 ;
int his3[256]= 50, 226, 857, 2018, 1810, 1795, 1840, 1929, 1942, 1693, 1699, 1547, 1564, 1556, 1451, 1439, 1448, 1357, 1428, 1419, 1383, 1705, 1670, 1777, 1826, 1865, 1897, 1924, 2003, 1973, 1813, 1801, 1827, 1696, 1717, 1654, 1678, 1705, 1621, 1523, 1494, 1559, 1434, 1370, 1358, 1385, 1348, 1380, 1368, 1367, 1389, 1445, 1514, 1471, 1465, 1461, 1475, 1484, 1390, 1403, 1324, 1339, 1426, 1432, 1487, 1460, 1469, 1460, 1546, 1504, 1425, 1373, 1391, 1391, 1382, 1311, 1368, 1354, 1325, 1323, 1263, 1325, 1363, 1357, 1325, 1322, 1429, 1419, 1412, 1371, 1266, 1179, 1166, 1076, 1100, 1083, 1103, 1053, 1116, 1080, 1071, 1025, 1088, 1060, 1011, 984, 958, 959, 954, 937, 982, 950, 1001, 963, 965, 875, 1010, 954, 990, 894, 959, 972, 963, 1101, 971, 1042, 1064, 1075, 1029, 1088, 1090, 1068, 1073, 1058, 1102, 1105, 1009, 1062, 1005, 1048, 973, 998, 1034, 1013, 961, 1006, 983, 948, 1031, 972, 952, 1013, 954, 964, 970, 881, 887, 967, 941, 928, 994, 1019, 1106, 1056, 1113, 1071, 1158, 1108, 1178, 1071, 1080, 1074, 1050, 1076, 1106, 1048, 973, 1042, 997, 1034, 934, 863, 935, 845, 839, 803, 764, 782, 787, 771, 766, 751, 745, 804, 789, 765, 681, 658, 690, 672, 650, 635, 695, 619, 572, 499, 535, 565, 564, 520, 516, 568, 530, 479, 507, 424, 446, 455, 380, 395, 371, 360, 391, 373, 351, 388, 426, 349, 417, 421, 400, 443, 470, 485, 456, 495, 452, 484, 457, 518, 519, 631, 652, 693, 762, 771, 807, 906, 991, 1138, 1433, 1545, 2467, 4907, 6743, 1921 ;
int *his,tmp[256],a,x0,y0,x,y,h,tr=12,sm=10,peak[256],peaks;
// loop through histograms
for (y0=20,h=0;;h++)
x0=5;if (h==0) his=his1;
else if (h==1) his=his2;
else if (h==2) his=his2;
else break;
// rescale his <0,?> to tmp <0-100>
for (y=his[0],x=0;x<256;x++) if (y<his[x]) y=his[x]; // y=max
for ( x=0;x<256;x++) tmp[x]=(his[x]*100)/y;
// draw tmp
for (x=0;x<256;x++) for (pyx[y0+100][x0+x]=0x00004040,y=0;y<tmp[x];y++) pyx[y0+100-y][x0+x]=(40+x)*0x00010101;
x0+=280;
// smooth tmp few times
for (y=0;y<sm;y++)
// from both directions to avoid shifting to one side
for (x=0;x<255;x++) tmp[x]=((90*tmp[x])+(10*tmp[x+1]))/100;
for (x=255;x>0;x--) tmp[x]=((90*tmp[x])+(10*tmp[x-1]))/100;
// find (count) peaks
for (peaks=0,a=0,y=0,x=0;x<255;x++)
if (tmp[x]<tmp[x+1]) if ((y< 0)&&(a-tmp[x]>tr)) a=tmp[x]; y=+1;
else if (tmp[x]>tmp[x+1]) if ((y>=0)&&(tmp[x]-a>tr)) a=tmp[x]; peak[peaks]=x; peaks++; y=-1;
// draw tmp
for (x=0;x<256;x++) for (pyx[y0+100][x0+x]=0x00004040,y=0;y<tmp[x];y++) pyx[y0+100-y][x0+x]=(40+x)*0x00010101;
// draw peaks
bmp->Canvas->Pen->Color=clAqua;
bmp->Canvas->Brush->Color=clAqua;
for (a=0;a<peaks;a++)
x=x0+peak[a];
y=y0+100-tmp[peak[a]];
bmp->Canvas->Ellipse(x-5,y-5,x+5,y+5);
// draw cross for not bimodal histograms
if (peaks!=2)
bmp->Canvas->Pen->Color=clRed;
bmp->Canvas->MoveTo(x0 ,y0 );
bmp->Canvas->LineTo(x0+256,y0+100);
bmp->Canvas->MoveTo(x0+256,y0 );
bmp->Canvas->LineTo(x0 ,y0+100);
y0+=128;
你可以忽略它只是渲染的 pyx[y][x]
和 bmp->
东西
pyx[y][x]
是位图bmp
的直接32位像素访问
和bmp->Canvas
是VCL封装的Windows GDI位图接口bmp
xs,ys
是位图分辨率
设置阈值tr
和平滑sm
以最好地满足您的需求
如果您有太多不同类型的直方图,那么您需要应用动态阈值或不同的方法来寻找峰值,这就是您的直方图的样子:
Histogram 1
是最上面的。如果不评论我,希望代码足够清晰......如果您重新调整为2
而不是100
的幂,那么您可以将乘法和除法更改为位移以加快速度。我选择100
是为了更清楚地选择阈值和平滑系数...
【讨论】:
哇,太棒了!我会检查一下并尝试在我的 Java 代码上实现,只是一个问题,你确定直方图 2 和 3 是双峰的吗?我的意思是,我明白了你的逻辑,但它们看起来不像我见过的另一个双峰直方图样本那样双峰。无论如何,这只是个人看法。 @MarcelJames 正如我所提到的,这取决于输入图像是什么以及使用的直方图是什么。例如,过度曝光的图像通常在峰之间没有任何间隙。如果您没有自然图像,则峰值通常只是单行(像图像一样的噪声,您需要保留噪声)。例如,如果你想对图像进行二值化,通常间隙比峰值更重要(这种情况下你可以有很多峰值,但如果直方图之间只有一个间隙被认为是双峰的)所以你总是需要记住你有什么和你想要什么... @MarcelJames 因此,当您知道要为您的任务考虑双峰直方图时,请对其进行调整以满足您的需求。该程序足够简单,我希望您可以自己进行更改 是的,我正在使用这些直方图对数字图像进行二值化。这些直方图代表数字图像中的 HSV(只是另一种空间颜色,如 RGB...)。将使用来自single object segmentation database 的图像,并使用彩色图像。分割将基于全局阈值。那么对于这个问题,你认为像 2 或 3 这样的直方图可以被认为是双峰的吗? @MarcelJames 用于二值化图像,您可以考虑所有 3 个双峰。第一个肯定是的,其他两个可能是起始峰值,可能是有用的信息或背景颜色,也可能只是不规则,有时结果很好,有时不取决于图像组成。请参阅2D HSV histogram in C++,这对您来说可能很有趣您也可以尝试另一种方法,例如fast color quantisation in C++,因为通过灰度化可能会丢失大量数据以上是关于如何自动识别双峰直方图?的主要内容,如果未能解决你的问题,请参考以下文章