请问一个关于URL中汉字编码解码的问题

Posted 2023-05-12

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了请问一个关于URL中汉字编码解码的问题相关的知识，希望对你有一定的参考价值。

我在前台JSP页面写了下面的代码
<%String a="自动化";
String result=URLEncoder.encode(input,"UTF-8");//对“自动化”进行UTF-8编码
String hrefs=request.getContextPath()+"/searchOfficeNum.do?"textfield="+result+"&pageType=1";%>
<a href=<%=hrefs%>>测试</a>
这样向后台传递参数的一个链接。后台的JAVA代码是这样的（挑主要的）：
request.setCharacterEncoding("UTF-8");//这里不论用GBK还是用UTF-8结果都一样
String teststr=request.getParameter("textfield").trim();
但teststr的结果始终是？？？？，请问我后台应该用什么方法获得前台传过来的UTF-8编码呢？

URLDecoder.decode(url, "utf-8");
url是你的地址，
解码完之后，接收一下。应该可以了。
编码之后，记得解码。
import java.net.URLDecoder;
一个解码，
import java.net.URLEncoder;
一个编码。
如果遇到无法解码，在页面上将%替换成！
URLEncoder.encode(input,"UTF-8").replaceAll("%","!")，
然后在后台，再转换回来，String的replaceAll("!","%")。参考技术A new String(teststr.getBate(,"ISO-8859-1"),"UTF-8")；
所噶。参考技术B 页面编码是简体中文的，GB2312，我改成UTF-8整个页面就是乱码了~
两种编码要一致的，我调试网页时就是这样参考技术C 那楼主看看你选择的浏览器编码是UTF-8还是其他的。
查看-字符编码追问

页面编码是简体中文的，GB2312，我改成UTF-8整个页面就是乱码了~

追答

两种编码要一致的，我调试网页时就是这样

URL地址编码和解码

1. rfc1738

2.1. The main parts of URLs

   A full BNF description of the URL syntax is given in Section 5.

   In general, URLs are written as follows:

       <scheme>:<scheme-specific-part>

   A URL contains the name of the scheme being used (<scheme>) followed
   by a colon and then a string (the <scheme-specific-part>) whose
   interpretation depends on the scheme.

   Scheme names consist of a sequence of characters. The lower case
   letters "a"--"z", digits, and the characters plus ("+"), period
   ("."), and hyphen ("-") are allowed. For resiliency, programs
   interpreting URLs should treat upper case letters as equivalent to
   lower case in scheme names (e.g., allow "HTTP" as well as "http").

注意字母不区分大小写

2. python2

2.1

 1 >>> import urllib
 2 >>> url = ‘http://web page.com‘
 3 >>> url_en = urllib.quote(url)    #空格编码为“%20”
 4 >>> url_plus = urllib.quote_plus(url)    #空格编码为“+”
 5 >>> url_en_twice = urllib.quote(url_en)
 6 >>> url
 7 ‘http://web page.com‘
 8 >>> url_en
 9 ‘http%3A//web%20page.com‘
10 >>> url_plus
11 ‘http%3A%2F%2Fweb+page.com‘
12 >>> url_en_twice
13 ‘http%253A//web%2520page.com‘    #出现%25说明是二次编码
14 #相应解码
15 >>> urllib.unquote(url_en)
16 ‘http://web page.com‘
17 >>> urllib.unquote_plus(url_plus)
18 ‘http://web page.com‘

2.2 URL含有中文

1 >>> import urllib
2 >>> url_zh = u‘http://movie.douban.com/tag/美国‘
3 >>> url_zh_en = urllib.quote(url_zh.encode(‘utf-8‘))    #参数为string
4 >>> url_zh_en
5 ‘http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD‘
6 >>> print urllib.unquote(url_zh_en).decode(‘utf-8‘)
7 http://movie.douban.com/tag/美国

3. python3

3.1

 1 >>> import urllib
 2 >>> url = ‘http://web page.com‘
 3 >>> url_en = urllib.parse.quote(url)    #注意是urllib.parse.quote
 4 >>> url_plus = urllib.parse.quote_plus(url)
 5 >>> url_en
 6 ‘http%3A//web%20page.com‘
 7 >>> url_plus
 8 ‘http%3A%2F%2Fweb+page.com‘
 9 >>> urllib.parse.unquote(url_en)
10 ‘http://web page.com‘
11 >>> urllib.parse.unquote_plus(url_plus)
12 ‘http://web page.com‘

3.2 URl含中文

1 >>> import urllib
2 >>> url_zh = ‘http://movie.douban.com/tag/美国‘
3 >>> url_zh_en = urllib.parse.quote(url_zh)
4 >>> url_zh_en
5 ‘http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD‘
6 >>> urllib.parse.unquote(url_zh_en)
7 ‘http://movie.douban.com/tag/美国‘

4. 其他

 1 >>> help(urllib.urlencode)
 2 Help on function urlencode in module urllib:
 3 
 4 urlencode(query, doseq=0)
 5     Encode a sequence of two-element tuples or dictionary into a URL query string.
 6 
 7     If any values in the query arg are sequences and doseq is true, each
 8     sequence element is converted to a separate parameter.
 9 
10     If the query arg is a sequence of two-element tuples, the order of the
11     parameters in the output will match the order of parameters in the
12     input.
13 
14 >>>

以上是关于请问一个关于URL中汉字编码解码的问题的主要内容，如果未能解决你的问题，请参考以下文章

请问一个关于URL中汉字编码解码的问题

URL地址编码和解码

0. 参考

【整理】关于http(GET或POST)请求中的url地址的编码(encode)和解码(decode)

python3中的urlopen对于中文url是如何处理的？

中文URL的编码问题