如何在批处理脚本的文件内容中找到非 ASCII?

Posted

技术标签:

【中文标题】如何在批处理脚本的文件内容中找到非 ASCII?【英文标题】:How can I find non ASCI in content in file in batch script? 【发布时间】:2022-01-10 08:22:42 【问题描述】:

在批处理脚本中,我想在 a.txt 中查找内容 在 a.txt 我有更多记录如何检查记录是否包含 nonaci 并写入 b.txt ? 我有中间字符串的代码,但也失败了

@echo off


setlocal enableDelayedExpansion
SETLOCAL 
set _char= "123456789~abcdef0"
SET /A _startchar=1
SET /A _length=1


for /L %%a in (32,1,125) do (

  cmd /c exit %%a
  
  
  echo !=exitcodeAscii!
  if "!=exitcodeAscii!" EQU "%_char%" echo -- %%a
  CALL SET _substring=!!_char:!_startchar!,2!!
  ECHO !_substring! --- !_startchar!
  SET /A _startchar=!_startchar! + 1
   
)

【问题讨论】:

ascii 退出代码永远不会等于变量_char。你想用那行代码来完成什么?以下代码行不正确:CALL SET _substring=!!_char:!_startchar!,2!!。这应该使用双百分号将变量正确扩展为值,并且您缺少波浪号。 CALL SET _substring=%%_char:~!_startchar!,2%% 嗨@Squashman,谢谢,但我在 (32,1,125) do ( cmd /c exit %%a echo !=exitcodeAscii! if "!= exitcodeAscii!" EQU "%_char%" echo -- %%a CALL SET _substring=%%_char:~!_startchar!,2%% ECHO !_substring!--- !_startchar!SET /A _startchar=!_startchar!+ 1 ) 但是当我回显它是空间时它失败了 CALL SET _substring=%%_char:~!_startchar!,2%% ECHO !_substring! --- !_startchar! 您的问题的代码更新应该放在您的问题中。请edit您的问题与您的新代码。无论如何,我并没有试图解决你的问题,我只是让人们注意我看到的一些代码错误。我给你的代码确实解决了你遇到的语法问题。一旦startchar 变量大于您尝试解析的字符串的长度,代码肯定会回显一个空格。在这种情况下,_char 变量只有 20 个字符。所以在那之后子字符串将显示一个空格。这是非常基本的逻辑,您可以自己弄清楚。 【参考方案1】:

以下定义了一个具有有效 Ascii 字符的变量(不包括 ",由替换处理)用于逐字符比较。

编辑:为提高性能和确保正确处理任何可能的 ASCII 输入而进行的更改。

@Echo off

 For /f "tokens=4 delims=: " %%G in ('CHCP')Do Set "Restore_Codepage=CHCP %%G > nul"
 Set "Return[Len]=" & Set "Return[String]=" & Set "input="

 Setlocal DISABLEDelayedExpansion
REM the label marker ":#" is used within this script to delimit help output.

:# 
:# ========================= ASCII string filter v3.1 by T3RRY ======================
 Rem - This script iterates over an input string character by character and tests
 Rem each character against a a whitelist of printable ASCII characters, with
 Rem succesful matches used to build a new string containing only printable
 Rem ASCII characters.
 Rem - Switch /R modifies this script to into a testing tool
 Rem   to check if a string contains any NonASCII or nonprintable ASCII characters.
 Rem   - Errorlevel 0 indicates the string contains only printable ASCII characters
 Rem   - A Positive errorlevel is returned containing the 1 indexed position of the
 Rem     first NonASCII or nonprintable ASCII character found.
 Rem - Execution time increases as string length increases. Each character in the
 Rem   string is tested against a whitelist containing 95 printable ASCII characters.
:# 
:# Usage: Filepath <"String"> [ /P | /R ] | [ -? | /? | -help ]
:#
:# Rem to use from another batch file:
:# For /f delims^= %%G in ('FilePath "string"')Do Echo(%%G
:# 
:# Accepts input String via doublequoted argument - reads %* and trims " \P" or " \R"
:# switches if present
:# - No escaping of characters in the argument is required
:# - If unbalanced doublequotes exist in the string all doublequotes will be Removed.
:# 
:# Use Switch /P to preserve original spaces
:#  - Default behaviour is to Remove all double spaces from the string.
:# 
:# Use Switch /R to reject input containing NonASCII characters
:#  - If non ASCII character encountered, returns a positive errorlevel
:#   ( the 1 indexed position of first non ASCII character encountered )
:# 
 Rem Version changes 09/Dec/2021 :
 Rem - Changed input method to handle cases where qouted args contain
 Rem   standard delims within quotes IE: "string "substring=text""
 Rem Version changes 08/Dec/2021 :
 Rem - Added Help Switches -? /? and -help
 Rem - Added switch: /R 
 Rem   - Reject strings containing non ASCII characters. Default: Strip NonASCCi
 Rem     characters from the string.
 Rem     Note: this switch does not define Return[Len] or Return[String]
 Rem Version changes 07/Dec/2021 :
 Rem - Rewritten for much faster performance - NOTE:
 Rem   - Added Switch: /P
 Rem    - Preserve all whitespace. Default: multiple spaces truncated to single.
 Rem - Renamed variable for returning String : Return[String]
 Rem - Added variable Return[Len] to return 0 indexed string length.
 Rem - Corrected handling of completely non ASCII strings to return empty / 0 Len
 Rem ** Utilize alternate data stream to store variable containing printable ASCII
 Rem    characters so the variable only needs to be generated on first execution.
 Rem     ** Requires this batch file to be run from an NTFS drive.
:# =================================================================================

 Set "ASCII= !"
 2> nul (
  more < "%~f0:ASCII.dat" > nul || (
   Setlocal EnableDelayedExpansion
   For /l %%i in (34 1 126) Do (
    Cmd /c Exit %%i
    Set "ASCII=!ASCII!!=ExitCodeAscii!"
   )
   >"%~f0:ASCII.dat" (Echo(Set ^^"ASCII=!ASCII!")
   ENDLOCAL
 ))

 Set "ASCII="
 For /f "delims=" %%G in ('More ^< "%~f0:ASCII.dat"')Do %%G
 If not Defined ASCII (
  2> nul (
   Powershell.exe -c "Remove-item -path '%~nx0' -Stream '*'"
  )
  1>&2 Echo(An error has occured. Ensure "%~nx0" is located on an NTFS drive.
  Pause
  ENDLOCAL
  Exit /b 1
 )

 Rem Maximum stringlength to support. Modify here to propagate to RemoveChar loop and Return[Len]
 REM maximum 1015 chars due to input reading method.
 Set "SupportLength=1015"
 Set "input="

::====================================================================================================
rem :: input capture method by Dave Benham : https://www.dostips.com/forum/viewtopic.php?t=4288#p23980
setlocal enableDelayedExpansion
>"%temp%\getArg.txt" <"%temp%\getArg.txt" (
  setlocal disableExtensions
  set prompt=#
  echo on
  for %%a in (%%a) do rem . %*.
  echo off
  endlocal
  set /p "args="
  set /p "args="
  set "input=!args:~7,-2!"
  set "count=!args:~7,-2!"
)

del "%temp%\getArg.txt"

::====================================================================================================

Rem the below line can be used to Remove the aleternate data stream this file creates.
 Rem Powershell -c "Remove-item -path '%~nx0' -Stream '*'"

 CHCP 65001 > nul
 If not defined input (
  Echo(Demo:
Rem escaped for definition in DelayedExpansion environment
  Set "input=this is a demo) * ^! & ☺ ^= ¶ | ^! <. ~ ^^ & %% ▒ ╔ § ♣ This"
  Set input
 )

REM handle help switches

 Set input | %SystemRoot%\System32\Findstr.exe /Xli "input=\/? input=-? input=-help" > nul && (
  Setlocal EnableDelayedExpansion
  For /f "tokens=2* delims=#" %%G in ('%SystemRoot%\System32\Findstr.exe /blic:":# " "%~f0"')Do (
   Set "Usage=%%G"
   Echo(!Usage:Filepath=%~f0!
  )
  ENDLOCAL & ENDLOCAL
  Exit /b 0
 )

 Set Div="is=#", "1/(is<<9)"

 Set "DQ=1"
 Set ^"count=!count:"=DQ!"

 2> nul Set "null=%count:DQ=" & Set /A DQ+=1& set "null=%"

 Set /A !Div:#=%DQ% %% 2! 2> nul || Set ^"input=!input:"=!"

REM handle nonhelp switches

 Set "ASCIISwitch[R]="
 Set "ASCIISwitch[P]="
 If defined input (
  Set input | %SystemRoot%\System32\findstr.exe /Elic:" /P" > nul && (
   Set "input=!input:~0,-3!"
   Set "ASCIISwitch[P]=true"
  )
  Set input | %SystemRoot%\System32\findstr.exe /Elic:" /R" > nul && (
   Set "input=!input:~0,-3!"
   Set "ASCIISwitch[R]=true"
 ))

Rem Remove outer doublequotes from input argument if not already removed due to unbalanced quoting.

 If .^%input:~0,1%^%input:~-1%. == ."". Set "input=!input:~1,-1!"

Rem RemoveChar loop - iterate over input character by character; Compare against each character in whitelist
Rem Appends ASCII Whitelist characters to New string unless /R switch used, in which case NonASCII characters
Rem  trigger an exit of the script with a positive errorlevel indicating the string is not ASCII.
Rem  the return value is the 1 indexed position of the first non ascii character encountered.

 Set "end=" & Set "New="
 For /l %%i in (0 1 %SupportLength%)Do If not "!input:~%%i,1!"=="" (
  Set "Char=!input:~%%i,1!"
  Set "ISAscii="
  For /l %%c in (0 1 94)Do If not "!ASCII:~%%c,1!" == "" (
   Set "C_Char=!ASCII:~%%c,1!"
   if "!Char!"=="!C_Char!" (
    Set "New=!New!!Char!"
    Set "ISAscii=true"
  ))
  If Defined ASCIISwitch[R] (
   If Not Defined ISAscii (
    Endlocal & Endlocal &  %Restore_Codepage%
    For /f "delims=" %%G in ('Set /A %%i+1')Do Exit /b %%G     
 )))

 Set "Input=!New!"
 If not Defined ASCIISwitch[P] (
  For /l %%i in (0 1 9)Do if defined Input Set "Input=!Input:  = !"
 )

 If defined input (
  Echo(!input!
  For /l %%i in (0 1 %SupportLength%)Do If not defined Return[Len] If "!input:~%%i,1!"=="" Set "Return[Len]=%%i"
 ) Else (
  Set "Return[Len]=0"
  Set "Return[String]="
 )

 ENDLOCAL & ENDLOCAL & Set "Return[Len]=%Return[Len]%" & Set "Return[string]=%input%" )
 %Restore_Codepage%

Exit /B 0

【讨论】:

嗨@T3RR0R 我尝试并成功,但如果返回 nonasci 将写入日志文件,我也有新问题。我的问题是我不能在函数中重用参数(nonasci 的位置)因为我需要读取文件夹中的所有文件 txt,并且在每个文件中我需要读取所有记录(10 条记录)如果有记录包含 nonasci 将写入日志文件并用空格替换 nonsci如果能 。你能帮我吗谢谢。 基于您的代码Call:IsASCII Demo Call:IsASCII Ascii 我添加了我的代码,但它不起作用` if %%i not equal 0 echo %Demo% >> log.txt` 你能帮我吗非常感谢

以上是关于如何在批处理脚本的文件内容中找到非 ASCII?的主要内容,如果未能解决你的问题,请参考以下文章

Bash脚本 - 查找具有非零字节内容的目录

将文件版本与批处理脚本进行比较

如何在 Linux 中打开包含非 Ascii 字符串的 wchar_t* 文件?

js脚本怎么转码?

如何使用非 ascii 字符处理 HttpWebRequest 重定向

Outlook导入联系人是否不支持非ASCII字符?