C. Equal Frequencies
Posted onlyblues
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了C. Equal Frequencies相关的知识,希望对你有一定的参考价值。
C. Equal Frequencies
Let\'s call a string balanced if all characters that are present in it appear the same number of times. For example, "coder", "appall", and "ttttttt" are balanced, while "wowwow" and "codeforces" are not.
You are given a string $s$ of length $n$ consisting of lowercase English letters. Find a balanced string $t$ of the same length $n$ consisting of lowercase English letters that is different from the string $s$ in as few positions as possible. In other words, the number of indices $i$ such that $s_i \\ne t_i$ should be as small as possible.
Input
Each test contains multiple test cases. The first line contains the number of test cases $t$ ($1 \\le t \\le 10^4$). The description of the test cases follows.
Each test case consists of two lines. The first line contains a single integer $n$ ($1 \\le n \\le 10^5$) — the length of the string $s$.
The second line contains the string $s$ of length $n$ consisting of lowercase English letters.
It is guaranteed that the sum of $n$ over all test cases does not exceed $10^5$.
Output
For each test case, print the smallest number of positions where string $s$ and a balanced string $t$ can differ, followed by such a string $t$.
If there are multiple solutions, print any. It can be shown that at least one balanced string always exists.
Example
input
4 5 hello 10 codeforces 5 eevee 6 appall
output
1 helno 2 codefofced 1 eeeee 0 appall
Note
In the first test case, the given string "hello" is not balanced: letters \'h\', \'e\', and \'o\' appear in it once, while letter \'l\' appears twice. On the other hand, string "helno" is balanced: five distinct letters are present in it, and each of them appears exactly once. Strings "hello" and "helno" differ in just one position: the fourth character. Other solutions are possible too.
In the second test case, string "codefofced" is balanced since only letters \'c\', \'o\', \'d\', \'e\', and \'f\' are present in it, and each of them appears exactly twice.
In the third test case, string "eeeee" is balanced since only letter \'e\' is present in it.
In the fourth test case, the given string "appall" is already balanced.
解题思路
写这题分为两部,先找出最小改变次数,再对字符串进行修改。
可以发现,由于最后字符串中每种字符的出现次数都相同,因此还是很容易想到枚举最后有多少种字符来求最小改变次数。假设最后有$i$种字符,那么$i$要满足$i \\mid n$,同时每种字符都有$s = \\fracni$个。统计字符串中每个字符的出现次数,并按照个数从大到小排序。为了得到最小的改变次数,贪心的思路是保留前$i$种字符,然后把从第$i+1$个字符开始的剩余字符都删掉(如果有的话),当然如果不同种类的字符不足$i$个,那么意味需要额外添加其他种类的字符。这里的删除是指把该字符换成前$i$个原本存在的字符种类,添加是指把该字符换成原本不存在的字符种类。假设改变次数为$t$,枚举前$i$个字符如果该字符的出现的次数$c \\ne s$,那么$t$就要加上$|s-c|$;然后再枚举剩下字符,这些字符都是要删掉的,因此$t$要加上这些字符出现的次数。枚举完所有的$i$取最小的$t$就是答案,最后答案还有除以$2$,因为一次修改被算了两次(删除和添加)。
关于上面贪心的正确性说明,这里是选择保留出现次数多的字符而删除出现次数少的字符,上面求$t$的公式为$t = \\sum\\limits_k = 0^i - 1 |s - c_k| + \\sum\\limits_k = i^25 c_k$。假设有字符$x$和$y$,个数分别为$c_x$和$c_y$,如果保留$y$而删除$x$,变化后数值就变成$t - \\left( |s - c_x| + c_y \\right) + |s - c_y| + c_x$,如果贪心是正确的话,那么变化后的$t$应该是变大的。因此我们要证明出$|s - c_y| + c_x \\geq |s - c_x| + c_y$,才能保证贪心的正确性,然后我试着证明了一下,可以参考下面的附录部分。
最后是修改字符了,当时比赛时太急了没想出来,补题时发现确实还是蛮复杂的。
根据上面的求最小出现次数的过程我们就已经得到了最后要保留的字符种类个数,这里记为$\\textsum$,以及每种字符保留的个数$s$。然后我们要做的是根据前面统计出来的字符出现次数来替换字符串中的字符,当枚举到$\\textstr[i]$,如果字符$\\textstr[i]$的出现次数大于$s$,那么我们就要把$\\textstr[i]$换成出现次数小于$s$的字符种类;如果字符$\\textstr[i]$的出现次数小于$s$就不用管了,因为后面一定会存在大于$s$的字符。
实现的方法是先算出每个字符需要添加或删除的数量,具体做法是根据前面统计出来的每种字符出现次数,记为数组$\\textcnt$,把前$\\textsum$个字符都减去$s$,如果得到的结果大于$0$则说明该需要将该种类的字符替换成结果小于$0$的字符。第$\\textsum+1$个开始之后的字符就不用管了,如果存在则对应的$\\textcnt$必定为正数,说明需要进行替换且出现的次数就是要被替换的次数。在修改字符串的过程中开一个指针$j$指向第$\\textsum$个字符,每次枚举到下一个字符前都检查一下$j$所指向的字符在$\\textcnt$中是不是小于$0$,不是的话就往前移。当发现$\\textcnt[\\textstr[i]] > 0$,则将$\\textstr[i]$替换成$j$指向的字符,同时$\\textcnt$进行相应的修改。
AC代码如下:
#include <bits/stdc++.h> using namespace std; const int N = 1e5 + 10; char str[N]; int cnt[26], p[26]; void solve() int n; scanf("%d %s", &n, &str); memset(cnt, 0, sizeof(cnt)); for (int i = 0; i < n; i++) cnt[str[i] - \'a\']++; // 统计原字符串每个字符出现的次数 for (int i = 0; i < 26; i++) p[i] = i; sort(p, p + 26, [&](int i, int j) // 字符按照出现次数从大到小排序 return cnt[i] > cnt[j]; ); int ret = n, sum; // ret记录修改的次数,sum记录最后字符串中字符的种类个数 for (int i = 1; i <= 26; i++) // 枚举字符的种类,最多有26种 if (n % i == 0) int s = n / i, t = 0; for (int j = 0; j < i; j++) // 前i个字符要保留 t += abs(cnt[p[j]] - s); for (int j = i; j < 26; j++) // 剩余的字符都要删除 t += cnt[p[j]]; if (ret > t / 2) ret = t / 2, sum = i; // 答案记得除以2 for (int i = 0; i < sum; i++) cnt[p[i]] -= n / sum; // cnt[p[i]]>0,表示需要把字符p[i]需要替换成<0的字符 for (int i = 0, j = sum - 1; i < n; i++) while (j >= 0 && cnt[p[j]] >= 0) // j指向cnt[p[j]]<0的字符,表示要替换成p[j] j--; if (cnt[str[i] - \'a\'] > 0) // 修改str[i] cnt[str[i] - \'a\']--, cnt[p[j]]++; // 消除 str[i] = p[j] + \'a\'; // 替换成p[j] printf("%d\\n%s\\n", ret, str); int main() int t; scanf("%d", &t); while (t--) solve(); return 0;
附录
已知 $c_x \\geq c_y$ ,
求证 $|s - c_y| + c_x \\geq |s - c_x| + c_y$ 。
证明:
$$
\\beginalign*
|s - c_x| + c_y - \\left( |s - c_y| + c_x \\right) &= c_y - c_x + |s - c_x| - |s - c_y| \\\\
&\\leq c_y - c_x + |c_x - c_y| \\\\
&\\leq 0
\\endalign*
$$
即得
$$
|s - c_x| + c_y \\leq |s - c_y| + c_x
$$
得证。
主要是我忘了不等式
$$
|a|-|b| \\leq |a \\pm b| \\leq |a|+|b|
$$
一直记的是
$$
|a|-|b| \\leq |a + b| \\leq |a|+|b|
$$
不过如果我反过来证貌似得不到结果:
$$
\\beginalign*
|s - c_y| + c_x - \\left( |s - c_x| + c_y \\right) &= c_x - c_y + |s - c_y| - |s - c_x| \\\\
&\\leq c_x - c_y + |c_x - c_y| \\\\
&\\leq 2 \\cdot (c_x - c_y)
\\endalign*
$$
写到这里的时候我想到 $s$ 可以分类讨论:
- 当 $c_y \\leq s \\leq c_x$ :$$\\beginalign*|s - c_y| + c_x - \\left( |s - c_x| + c_y \\right) &= c_x - c_y + |s - c_y| - |s - c_x| \\\\&= c_x - c_y + (s - c_y) - (c_x - s) \\\\&= c_x - c_y + 2 \\cdot s - c_y - c_x \\\\&= 2 \\cdot (s - c_y) \\geq 0\\endalign*$$
- 当 $s < c_y$ :$$\\beginalign*|s - c_y| + c_x - \\left( |s - c_x| + c_y \\right) &= c_x - c_y + |s - c_y| - |s - c_x| \\\\&= c_x - c_y + (c_y - s) - (c_x - s) \\\\&= c_x - c_y + c_y - c_x \\\\&= 0\\endalign*$$
- 当 $s > c_x$ :$$\\beginalign*|s - c_y| + c_x - \\left( |s - c_x| + c_y \\right) &= c_x - c_y + |s - c_y| - |s - c_x| \\\\&= c_x - c_y + (s - c_y) - (s - c_x) \\\\&= c_x - c_y + c_x - c_y \\\\&= 2 \\cdot (c_x - c_y) \\geq 0\\endalign*$$
综上所述
$$
|s - c_y| + c_x - \\left( |s - c_x| + c_y \\right) \\geq 0
$$
同时得到不等式的上下界:
$$
0 \\leq |s - c_y| + c_x - \\left( |s - c_x| + c_y \\right) \\leq 2 \\cdot (c_x - c_y)
$$
得证。
参考资料
Codeforces Round #844 (Div. 1 + Div. 2, based on VK Cup 2022 - Elimination Round) A-D:https://www.cnblogs.com/BlankYang/p/17056807.html
以上是关于C. Equal Frequencies的主要内容,如果未能解决你的问题,请参考以下文章
C. Almost Equal ( Codeforces Round #580 (Div. 2) )