使用文本摘要 API 解析 JSON 响应,响应中的编码错误

Posted

技术标签:

【中文标题】使用文本摘要 API 解析 JSON 响应,响应中的编码错误【英文标题】:Parsing JSON response using text summarization API, encoding error in response 【发布时间】:2020-11-27 14:47:11 【问题描述】:

我在https://www.meaningcloud.com/products/automatic-summarization使用服务 用于文本摘要。我正在使用 .NET Core 5

例如,我想缩短这条新闻:https://e.vnexpress.net/news/business/economy/vn-index-rises-for-third-straight-session-4141865.html

string input = "..." // long content of news post.
var client = new RestClient("https://api.meaningcloud.com/summarization-1.0");
client.Timeout = -1;
var request = new RestRequest(Method.POST);
request.AddParameter("key", "25870359b682ec3c93f9becd850eb459");  // fake token because this content is public.        
request.AddParameter("sentences", 4);
request.AddParameter("txt", JsonEncodedText.Encode(content));

IRestResponse response = client.Execute(request);
System.Threading.Thread.Sleep(3000);
var res = JObject.Parse(response.Content);
// Need convert \r\n , \r\n\r\n to space.
string short_content = res["summary"].ToString();
// SysUtil.StringEncodingConvert(short_content, "ISO-8859-1", "UTF-8");            
string result = raw_string.Replace(" [...] ", " ");

输入

The benchmark VN-Index saw steady growth throughout the day, gradually gaining a total of 10.23 points by the end of the session. The Ho Chi Minh Stock Exchange (HoSE), on which the index is based, saw 300 stocks gain and 78 lose. Total trading volume improved 48 percent over the previous session, reaching VND6.2 trillion ($269 million). The VN30-Index, a basket of HoSE’s 30 largest capped stocks, rose 1.63 percent, with 27 gaining and 2 losing. Its top gainers were SAB of Vietnam’s largest brewer Sabeco, up 4.8 percent, followed by VJC of budget airline Vietjet, up 2.8 percent, and MWG of electronics retailer Mobile World, up 2.2 percent. Of Vietnam’s biggest state-owned lenders by assets, BID of BIDV climbed 0.85 percent, VCB of Vietcombank 0.8 percent, and CTG of VietinBank 0.6 percent. HDB of HDBank and TCB of Techcombank led gains of private banks at 0.85 percent and 0.6 percent respectively. Other gainers included PNJ of Phu Nhuan Jewelry with 1.4 percent, HPG of steel producer Hoa Phat, 1.1 percent, and MSN of conglomerate Masan, 1 percent. The only two VN30 tickers that ended in the red were VIC of conglomerate Vingroup, down 1 percent, and PLX of fuel distributor Petrolimex, down 0.05 percent. The HNX-Index for stocks on the Hanoi Stock Exchange, home to mid and small caps, rose 1.35 percent, and the UPCoM-Index for stocks on the Unlisted Public Companies Market added 0.3 percent. Foreign investors turned net buyers to the tune of VND15.7 billion ($681,600), with buying pressure focused mainly on HPG and VHM of real estate giant Vinhomes.

文本摘要后的输出(4句)

The benchmark VN-Index saw steady growth throughout the day, gradually gaining a total of 10.23 points by the end of the session. The VN30-Index, a basket of HoSE\u2019s 30 largest capped stocks, rose 63 percent, with 27 gaining and 2 losing. Of Vietnam\u2019s biggest state-owned lenders by assets, BID of BIDV climbed 0.85 percent, VCB of Vietcombank 0.8 percent, and CTG of VietinBank 0.6 percent. The HNX-Index for stocks on the Hanoi Stock Exchange, home to mid and small caps, rose 1.35 percent, and the UPCoM-Index for stocks on the Unlisted Public Companies Market added 0.3 percent.

我也尝试使用 util

using System;

namespace myproj.Controllers


    public class SysUtil
    
        public static String StringEncodingConvert(String strText, String strSrcEncoding, String strDestEncoding)
        
            System.Text.Encoding srcEnc = System.Text.Encoding.GetEncoding(strSrcEncoding);
            System.Text.Encoding destEnc = System.Text.Encoding.GetEncoding(strDestEncoding);
            byte[] bData = srcEnc.GetBytes(strText);
            byte[] bResult = System.Text.Encoding.Convert(srcEnc, destEnc, bData);
            return destEnc.GetString(bResult);
        
    


但没有成功。

即使我替换了,仍然没有成功

tring result2 = result.Replace("\u2019s", "'s");

我发现了一些问题

\u2019s --> 我需要's,如何归档?

【问题讨论】:

可能问题与应用程序默认编码格式(或操作系统默认编码格式)有关,因为,我尝试测试以下代码string oriString = "a basket of HoSE\u2019s 30 largest capped stocks"; var result = oriString.Replace("\u2019", "'");它会直接显示“的”,而不是“\u2019”。试着在你身边检查一下。此外,请检查您的代码string short_content = res["summary"].ToString(); string result = raw_string.Replace(" [...] ", " ");,确保您使用了正确的变量名称(short_content 和 raw_string)。 【参考方案1】:

\u2019 是用于智能报价的 unicode 字符。只需替换它:

result2 = result.Replace('\u2019', '\'')

【讨论】:

以上是关于使用文本摘要 API 解析 JSON 响应,响应中的编码错误的主要内容,如果未能解决你的问题,请参考以下文章

在C#中解析Json rest api响应[重复]

如何在颤动中解析此响应,并从 API 响应中的两个数组的文本字段中获取响应值? [关闭]

如何在 Swift 中解析来自 Alamofire API 的 JSON 响应?

显示rest api json响应,如使用angular js的文本框自动填充

无法从 api 响应解析 json

我应该如何使用 Alamofire 和 SwiftyJSON 解析来自 API 的 JSON 响应?