使用Jsoup解决网页中图片链接问题

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用Jsoup解决网页中图片链接问题相关的知识,希望对你有一定的参考价值。

在做Facebook和WhatsApp分享的时候,分享出去的谷歌短链,Facebook获取不到大图,和竞品展示的不一样,WhatsApp分享出去的短链没有图片和描述。

 

WhatsApp:

  分析竞品UCNews,发现他们分享出去的WhatsApp链接指向的网页,在meat标签中添加了相关的属性,然后在自己的站中加入相关标签,问题解决。

<meta property="og:image" content="http://img.masala-sg.goldenmob.com/img/f6cb4bd725a7ab15dac6579f769a4c5f/i_0_mla_1480059457-608.jpg">
<meta property="twitter:image" content="http://img.masala-sg.goldenmob.com/img/f6cb4bd725a7ab15dac6579f769a4c5f/i_0_mla_1480059457-608.jpg">

 

Facebook:

  通过多次测试,发现Facebook是根据分享的链接中网页中的图片的大小,从而进行不同形式的展现的。我原来使用的是480的中图,这里需要将网页中所有图片链接改为608的大图。因为使用的是阿里的OSS,图片在爬虫入库的时候都已经经过处理,不同的图片大小只是链接后缀不同,所以这里就涉及到了修改整个网页中图片的地址。

  整个Conten是一个大的JSON,我在这边获取到之后,需要对tcontent中的内容进行解析,并替换。

技术分享
{"image":[{"url":"http://img.masala-sg.goldenmob.com/img/7bf6da255a03b8b93ffa985957da34a1/i_0_95db3c9b23b5de19685ca86fd1707c5b-208","w":480,"h":216,"s":23092},{"url":"http://img.masala-sg.goldenmob.com/img/7bf6da255a03b8b93ffa985957da34a1/i_1_honeyjar-208.jpg","w":362,"h":331,"s":76833},{"url":"http://img.masala-sg.goldenmob.com/img/7bf6da255a03b8b93ffa985957da34a1/i_2_718fdc7d637fe88f31c77eb90d8e22a0-208","w":480,"h":274,"s":24234},{"url":"http://img.masala-sg.goldenmob.com/img/7bf6da255a03b8b93ffa985957da34a1/i_3_muchgarlicpowderequalsoneclovegarlic_5c49a0162-208","w":700,"h":394,"s":31888}],"tcontent":"<p> <img src=\\"http://img.masala-sg.goldenmob.com/img/7bf6da255a03b8b93ffa985957da34a1/i_0_95db3c9b23b5de19685ca86fd1707c5b-480\\"></p><p></p> <p></p>\\n<p>????? ?? ??? ?? ???? ?? ?????? ??? ??, ???? ???? ???? ????? ?? ??? ???? ?? ??? ???? ???? ???</p>\\n<p>??? ?? ?? ????? ????? ???? ??? ?? ???? ?? ??? ?? ???? ?? ???? ??? ??? ???? ???? ??, ???? ??? ???? ?? ?? ???? ??????? ??????? ????? ?? ??? ??? ??? ??????? ??????? ????? ?? ???? ?? ?? ????? ?? ?? ??? ?? ????????? ??? ???? ???? ?? ????? ?? ????? ??? ?? ????? ?? ??? ?? ?? ??? ???? ?? ???? ?? ?? ??????????? ?? ??? ???? ???? ?? ?? ?????? ?? ???? ??? ???</p>\\n<p><img src=\\"http://img.masala-sg.goldenmob.com/img/7bf6da255a03b8b93ffa985957da34a1/i_1_honeyjar-480.jpg\\"></p>\\n<p>??? ????? ?? ???? ???? 2-3 ???? ????? ?? ??? ?? ?????? ?? ??? ?? ??? ?????? ?? ??? ????? ????? ?????? ??? ???????? ??? ??? ??? ?? ???? ??? ?? ???? ??????, ????? ????? ??? ???? ??? ??? ???? ??? ??? ???? ???? ??? 7 ????? ?? ????? ?? ??? ?????? ????? ????? ?????? ?? ????? ??? ?? ?? ?????? ???? ???????? ?? ???????????? ?? ?? ???? ?? ??? ???? ??? ??? ?? ??? ???? ?? ??? ?? ?? ???? ???</p>\\n<p> <img src=\\"http://img.masala-sg.goldenmob.com/img/7bf6da255a03b8b93ffa985957da34a1/i_2_718fdc7d637fe88f31c77eb90d8e22a0-480\\"></p><p></p> <p></p>\\n<p>????? ?? ??? ?? ??? ?? ?? ??? ?? ?????? ?? ???? ?? ?? ??? ?? ??????? ??????? ?? ????? ?? ???? ??? ??????? ??????? ????? ???? ?? ???? ???? ?? ??? ?? ??? ???? ?? ?? ??? ??? ?????? ???? ????? ?? ?????? ?? ???? ?? ???? ?? ???? ???? ??????? ??? ??? ??? ???? ???? ??, ????? ??? ?? ?????? ??? ?????? ?? ???? ?? ????? ???? ??? ???? ???? ?? ??????? ???? ???</p>\\n<p>?? ?????? ?? ???? ?? ??? ?? ??????? ??? ???? ?? ???????? ????? ????-??????????? ??? ???? ?? ??? ?? ???? ?? ???? ?? ?? ???? ??? ??? ???? ?? ??????? ?? ??? ?? ??, ??? ???? ?????? ??????? ???? ???? ???? ????? ???????? ?? ????? ?? ??? ?? ??????? ?? ??????? ???? ???? ?? ?????-????? ?? ??? ????? ?? ????? ?? ???? ?? ?? ???? ???</p>\\n<p><img src=\\"http://img.masala-sg.goldenmob.com/img/7bf6da255a03b8b93ffa985957da34a1/i_3_muchgarlicpowderequalsoneclovegarlic_5c49a0162-480\\"></p>\\n<p>?? ?????? ???? ?? ????? ?????? ?? ?? ????????? ?? ??? ???? ??? ???? ?????????, ???? ?? ?? ????? ?? ???? ???? ???, ????? ??????????????? ????? ?? ??? ?? ?????? ??????????? ?? ??? ?? ?? ???? ?? ????? ??? ?? ?? ????????? ???????? ?????? ??, ???? ???? ?? ???? ?? ????? ?? ????? ?????? ???? ?????? ???</p>"}
Content

  初始方案:

    直接使用String的replaceAll方法,将480替换成608,并且对没有后缀的图片链接添加.webp。在测试的时候发现没有任何问题,但是项目上线之后,OSS那边经常会出现如下错误。分析后发现,原来存在这样的情况<img src="xxx-480">这种图片没有后缀,在替换的时候只能替换480为608,但是无法添加上后缀,这样在网页进行访问的时候,浏览器请求图片地址会自动把后边的html代码带上去请求OSS。这种情况一般出现在印地语和马拉蒂语泰米尔语中,在content中还存在部分unicode编码,尝试过转码,正则,但是都不是很完美。

技术分享
/img/7bf6da255a03b8b93ffa985957da34a1/i_2_718fdc7d637fe88f31c77eb90d8e22a0-608.%3E%3C/p%3E%3Cp%3E%3C/p%3E%20%3Cp%3E%3C/p%3E%3Cp%3E%E0%A4%B2%E0%A4%B9%E0%A4%B8%E0%A5%81%E0%A4%A8%20%E0%A4%94%E0%A4%B0%20%E0%A4%B6%E0%A4%B9%E0%A4%A6%20%E0%A4%95%E0%A5%87%20%E0%A4%AE%E0%A5%87%E0%A4%B2%20%E0%A4%B8%E0%A5%87%20%E0%A4%87%E0%A4%B8%20%E0%A4%98%E0%A5%8B%E0%A4%B2%20%E0%A4%95%E0%A5%80%20%E0%A4%B6%E0%A4%95%E0%A5%8D%E2%80%8D%E0%A4%A4%E0%A4%BF%20%E0%A4%AC%E0%A4%A2%20%E0%A4%9C%E0%A4%BE%E0%A4%A4%E0%A5%80%20%E0%A4%B9%E0%A5%88%20%E0%A4%94%E0%A4%B0%20%E0%A4%AB%E0%A4%BF%E0%A4%B0%20%E0%A4%AF%E0%A4%B9%20%E0%A4%87%E0%A4%AE%E0%A5%8D%E2%80%8D%E0%A4%AF%E0%A5%82%E0%A4%A8%20%E0%A4%B8%E0%A4%BF%E0%A4%B8%E0%A5%8D%E2%80%8D%E0%A4%9F%E0%A4%AE%20%E0%A4%95%E0%A5%8B%20%E0%A4%AE%E0%A4%9C%E0%A4%AC%E0%A5%82%E0%A4%A4%20%E0%A4%95%E0%A4%B0%20%E0%A4%A6%E0%A5%87%E0%A4%A4%E0%A4%BE%20%E0%A4%B9%E0%A5%88%E0%A5%A4%20%E0%A4%87%E0%A4%AE%E0%A5%8D%E2%80%8D%E0%A4%AF%E0%A5%82%E0%A4%A8%20%E0%A4%B8%E0%A4%BF%E0%A4%B8%E0%A5%8D%E2%80%8D%E0%A4%9F%E0%A4%AE%20%E0%A4%AE%E0%A4%9C%E0%A4%AC%E0%A5%82%E0%A4%A4%20%E0%A4%B9%E0%A5%8B%E0%A4%A8%E0%A5%87%20%E0%A4%B8%E0%A5%87%20%E0%A4%B6%E0%A4%B0%E0%A5%80%E0%A4%B0%20%E0%A4%AE%E0%A5%8C%E0%A4%B8%E0%A4%AE%20%E0%A4%95%E0%A5%80%20%E0%A4%AE%E0%A4%BE%E0%A4%B0%20%E0%A4%B8%E0%A5%87%20%E0%A4%AC%E0%A4%9A%E0%A4%BE%20%E0%A4%B0%E0%A4%B9%E0%A4%A4%E0%A4%BE%20%E0%A4%B9%E0%A5%88%20%E0%A4%94%E0%A4%B0%20%E0%A4%89%E0%A4%B8%E0%A5%87%20%E0%A4%95%E0%A5%8B%E0%A4%88%20%E0%A4%AC%E0%A5%80%E0%A4%AE%E0%A4%BE%E0%A4%B0%E0%A5%80%20%E0%A4%A8%E0%A4%B9%E0%A5%80%E0%A4%82%20%E0%A4%B9%E0%A5%8B%E0%A4%A4%E0%A5%80%E0%A5%A4%20%E0%A4%87%E0%A4%B8%20%E0%A4%AE%E0%A4%BF%E0%A4%B6%E0%A5%8D%E0%A4%B0%E0%A4%A3%20%E0%A4%95%E0%A5%8B%20%E0%A4%96%E0%A4%BE%E0%A4%A8%E0%A5%87%20%E0%A4%B8%E0%A5%87%20%E0%A4%B9%E0%A5%83%E0%A4%A6%E0%A4%AF%20%E0%A4%A4%E0%A4%95%20%E0%A4%9C%E0%A4%BE%E0%A4%A8%E0%A5%87%20%E0%A4%B5%E0%A4%BE%E0%A4%B2%E0%A5%80%20%E0%A4%A7%E0%A4%AE%E0%A4%A8%E0%A4%BF%E0%A4%AF%E0%A5%8B%E0%A4%82%20%E0%A4%AE%E0%A5%87%E0%A4%82%20%E0%A4%9C%E0%A4%AE%E0%A4%BE%20%E0%A4%B5%E0%A4%B8%E0%A4%BE%20%E0%A4%A8%E0%A4%BF%E0%A4%95%E0%A4%B2%20%E0%A4%9C%E0%A4%BE%E0%A4%A4%E0%A4%BE%20%E0%A4%B9%E0%A5%88,%20%E0%A4%9C%E0%A4%BF%E0%A4%B8%E0%A4%B8%E0%A5%87%20%E0%A4%96%E0%A5%82%E0%A4%A8%20%E0%A4%95%E0%A4%BE%20%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A4%B5%E0%A4%BE%E0%A4%B9%20%E0%A4%A0%E0%A5%80%E0%A4%95%20%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A4%95%E0%A4%BE%E0%A4%B0%20%E0%A4%B8%E0%A5%87%20%E0%A4%B9%E0%A5%83%E0%A4%A6%E0%A4%AF%20%E0%A4%A4%E0%A4%95%20%E0%A4%AA%E0%A4%B9%E0%A5%81%E0%A4%82%E0%A4%9A%20%E0%A4%AA%E0%A4%BE%E0%A4%A4%E0%A4%BE%20%E0%A4%B9%E0%A5%88%E0%A5%A4%20%E0%A4%87%E0%A4%B8%E0%A4%B8%E0%A5%87%20%E0%A4%B9%E0%A5%83%E0%A4%A6%E0%A4%AF%20%E0%A4%95%E0%A5%80%20%E0%A4%B8%E0%A5%81%E0%A4%B0%E0%A4%95%E0%A5%8D%E0%A4%B7%E0%A4%BE%20%E0%A4%B9%E0%A5%8B%E0%A4%A4%E0%A5%80%20%E0%A4%B9%E0%A5%88%E0%A5%A4%3C/p%3E%3Cp%3E%E0%A4%87%E0%A4%B8%20%E0%A4%AE%E0%A4%BF%E0%A4%B6%E0%A5%8D%E0%A4%B0%E0%A4%A3%20%E0%A4%95%E0%A5%8B%20%E0%A4%B2%E0%A5%87%E0%A4%A8%E0%A5%87%20%E0%A4%B8%E0%A5%87%20%E0%A4%97%E0%A4%B2%E0%A5%87%20%E0%A4%95%E0%A4%BE%20%E0%A4%B8%E0%A4%82%E0%A4%95%E0%A5%8D%E0%A4%B0%E0%A4%AE%E0%A4%A3%20%E0%A4%A6%E0%A5%82%E0%A4%B0%20%E0%A4%B9%E0%A5%8B%E0%A4%A4%E0%A4%BE%20%E0%A4%B9%E0%A5%88%20%E0%A4%95%E0%A5%8D%E2%80%8D%E0%A4%AF%E0%A5%8B%E0%A4%82%E0%A4%95%E0%A4%BF%20%E0%A4%87%E0%A4%B8%E0%A4%AE%E0%A5%87%E0%A4%82%20%E0%A4%8F%E0%A4%82%E0%A4%9F%E0%A5%80-%E0%A4%87%E0%A4%82%E0%A4%AB%E0%A5%8D%E0%A4%B2%E0%A5%87%E0%A4%AE%E0%A5%87%E0%A4%9F%E0%A4%B0%E0%A5%80%20%E0%A4%97%E0%A5%81%E0%A4%A3%20%E0%A4%B9%E0%A5%88%E0%A4%82%E0%A5%A4%20%E0%A4%AF%E0%A4%B9%20%E0%A4%97%E0%A4%B2%E0%A5%87%20%E0%A4%95%E0%A5%80%20%E0%A4%96%E0%A4%B0%E0%A4%BE%E0%A4%B6%20%E0%A4%94%E0%A4%B0%20%E0%A4%B8%E0%A5%82%E0%A4%9C%E0%A4%A8%20%E0%A4%95%E0%A5%8B%20%E0%A4%95%E0%A4%AE%20%E0%A4%95%E0%A4%B0%E0%A4%A4%E0%A4%BE%20%E0%A4%B9%E0%A5%88%E0%A5%A4%20%E0%A4%85%E0%A4%97%E0%A4%B0%20%E0%A4%95%E0%A4%BF%E0%A4%B8%E0%A5%80%20%E0%A4%95%E0%A5%8B%20%E0%A4%A1%E0%A4%BE%E0%A4%AF%E0%A4%B0%E0%A4%BF%E0%A4%AF%E0%A4%BE%20%E0%A4%B9%E0%A5%8B%20%E0%A4%B0%E0%A4%B9%E0%A4%BE%20%E0%A4%B9%E0%A5%8B%20%E0%A4%A4%E0%A5%8B,%20%E0%A4%89%E0%A4%B8%E0%A5%87%20%E0%A4%87%E0%A4%B8%E0%A4%95%E0%A4%BE%20%E0%A4%AE%E0%A4%BF%E0%A4%B6%E0%A5%8D%E0%A4%B0%E0%A4%A3%20%E0%A4%96%E0%A4%BF%E0%A4%B2%E0%A4%BE%E0%A4%8F%E0%A4%82%E0%A5%A4%20%E0%A4%87%E0%A4%B8%E0%A4%B8%E0%A5%87%20%E0%A4%89%E0%A4%B8%E0%A4%95%E0%A4%BE%20%E0%A4%AA%E0%A4%BE%E0%A4%9A%E0%A4%A8%20%E0%A4%A4%E0%A4%82%E0%A4%A4%E0%A5%8D%E0%A4%B0%20%E0%A4%A6%E0%A5%81%E0%A4%B0%E0%A5%81%E0%A4%B8%E0%A5%8D%E2%80%8D%E0%A4%A4%20%E0%A4%B9%E0%A5%8B%20%E0%A4%9C%E0%A4%BE%E0%A4%8F%E0%A4%97%E0%A4%BE%20%E0%A4%94%E0%A4%B0%20%E0%A4%AA%E0%A5%87%E0%A4%9F%20%E0%A4%95%E0%A5%87%20%E0%A4%B8%E0%A4%82%E0%A4%95%E0%A5%8D%E0%A4%B0%E0%A4%AE%E0%A4%A3%20%E0%A4%AE%E0%A4%B0%20%E0%A4%9C%E0%A4%BE%E0%A4%8F%E0%A4%82%E0%A4%97%E0%A5%87%E0%A5%A4%20%E0%A4%87%E0%A4%B8%E0%A4%95%E0%A5%8B%20%E0%A4%96%E0%A4%BE%E0%A4%A8%E0%A5%87%20%E0%A4%B8%E0%A5%87%20%E0%A4%B8%E0%A4%B0%E0%A5%8D%E0%A4%A6%E0%A5%80-%E0%A4%9C%E0%A5%81%E0%A4%96%E0%A4%BE%E0%A4%AE%20%E0%A4%95%E0%A5%87%20%E0%A4%B8%E0%A4%BE%E0%A4%A5%20%E0%A4%B8%E0%A4%BE%E0%A4%87%E0%A4%A8%E0%A4%B8%20%E0%A4%95%E0%A5%80%20%E0%A4%A4%E0%A4%95%E0%A4%B2%E0%A5%80%E0%A4%AB%20%E0%A4%AD%E0%A5%80%20%E0%A4%95%E0%A4%BE%E0%A4%AB%E0%A5%80%20%E0%A4%95%E0%A4%AE%20%E0%A4%B9%E0%A5%8B%20%E0%A4%9C%E0%A4%BE%E0%A4%A4%E0%A5%80%20%E0%A4%B9%E0%A5%88%E0%A5%A4%3C/p%3E%3Cp%3E%3Cimg%20src=
/img/7bf6da255a03b8b93ffa985957da34a1/i_0_95db3c9b23b5de19685ca86fd1707c5b-608.%3E%3C/p%3E%3Cp%3E%3C/p%3E%20%3Cp%3E%3C/p%3E%3Cp%3E%E0%A4%B2%E0%A4%B9%E0%A4%B8%E0%A5%81%E0%A4%A8%20%E0%A4%94%E0%A4%B0%20%E0%A4%B6%E0%A4%B9%E0%A4%A6%20%E0%A4%8F%E0%A4%95%20%E0%A4%AC%E0%A4%B9%E0%A5%81%E0%A4%A4%20%E0%A4%B9%E0%A5%80%20%E0%A4%AA%E0%A5%81%E0%A4%B0%E0%A4%BE%E0%A4%A8%E0%A5%80%20%E0%A4%A6%E0%A4%B5%E0%A4%BE%20%E0%A4%B9%E0%A5%88,%20%E0%A4%9C%E0%A4%BF%E0%A4%B8%E0%A5%87%20%E0%A4%AC%E0%A4%A1%E0%A5%87%E0%A4%BC%20%E0%A4%AC%E0%A4%A1%E0%A5%87%E0%A4%BC%20%E0%A4%B0%E0%A5%8B%E0%A4%97%E0%A5%8B%E0%A4%82%20%E0%A4%95%E0%A5%8B%20%E0%A4%A6%E0%A5%82%E0%A4%B0%20%E0%A4%95%E0%A4%B0%E0%A4%A8%E0%A5%87%20%E0%A4%95%E0%A5%87%20%E0%A4%B2%E0%A4%BF%E0%A4%8F%20%E0%A4%96%E0%A4%BE%E0%A4%AF%E0%A4%BE%20%E0%A4%9C%E0%A4%BE%E0%A4%A4%E0%A4%BE%20%E0%A4%A5%E0%A4%BE%E0%A5%A4%3C/p%3E%3Cp%3E%E0%A4%85%E0%A4%97%E0%A4%B0%20%E0%A4%86%E0%A4%AA%20%E0%A4%B9%E0%A4%B0%20%E0%A4%B5%E0%A4%95%E0%A5%8D%E2%80%8D%E0%A4%A4%20%E0%A4%AC%E0%A5%80%E0%A4%AE%E0%A4%BE%E0%A4%B0%20%E0%A4%B0%E0%A4%B9%E0%A4%A4%E0%A5%87%20%E0%A4%B9%E0%A5%88%E0%A4%82%20%E0%A4%94%E0%A4%B0%20%E0%A4%A5%E0%A4%95%E0%A4%BE%E0%A4%A8%20%E0%A4%95%E0%A5%80%20%E0%A4%B5%E0%A4%9C%E0%A4%B9%20%E0%A4%B8%E0%A5%87%20%E0%A4%86%E0%A4%AA%E0%A4%95%E0%A4%BE%20%E0%A4%AE%E0%A4%A8%20%E0%A4%95%E0%A4%BF%E0%A4%B8%E0%A5%80%20%E0%A4%95%E0%A4%BE%E0%A4%AE%20%E0%A4%AE%E0%A5%87%E0%A4%82%20%E0%A4%A8%E0%A4%B9%E0%A5%80%E0%A4%82%20%E0%A4%B2%E0%A4%97%E0%A4%A4%E0%A4%BE%20%E0%A4%A4%E0%A5%8B,%20%E0%A4%87%E0%A4%B8%E0%A4%95%E0%A4%BE%20%E0%A4%B8%E0%A4%BE%E0%A4%AB%20%E0%A4%AE%E0%A4%A4%E0%A4%B2%E0%A4%AC%20%E0%A4%B9%E0%A5%88%20%E0%A4%95%E0%A4%BF%20%E0%A4%86%E0%A4%AA%E0%A4%95%E0%A4%BE%20%E0%A4%87%E0%A4%AE%E0%A5%8D%E2%80%8D%E0%A4%AF%E0%A5%82%E0%A4%A8%20%E0%A4%B8%E0%A4%BF%E0%A4%B8%E0%A5%8D%E2%80%8D%E0%A4%9F%E0%A4%AE%20%E0%A4%95%E0%A4%AE%E0%A4%9C%E0%A5%8B%E0%A4%B0%20%E0%A4%B9%E0%A5%8B%20%E0%A4%97%E0%A4%AF%E0%A4%BE%20%E0%A4%B9%E0%A5%88%E0%A5%A4%20%E0%A4%85%E0%A4%97%E0%A4%B0%20%E0%A4%87%E0%A4%AE%E0%A5%8D%E2%80%8D%E0%A4%AF%E0%A5%82%E0%A4%A8%20%E0%A4%B8%E0%A4%BF%E0%A4%B8%E0%A5%8D%E2%80%8D%E0%A4%9F%E0%A4%AE%20%E0%A4%95%E0%A4%AE%E0%A4%9C%E0%A5%8B%E0%A4%B0%20%E0%A4%B9%E0%A5%8B%20%E0%A4%9C%E0%A4%BE%E0%A4%A4%E0%A4%BE%20%E0%A4%B9%E0%A5%88%20%E0%A4%A4%E0%A5%8B%20%E0%A4%87%E0%A4%82%E0%A4%B8%E0%A4%BE%E0%A4%A8%20%E0%A4%95%E0%A5%8B%20%E0%A4%B8%E0%A5%8C%20%E0%A4%A4%E0%A4%B0%E0%A4%B9%20%E0%A4%95%E0%A5%80%20%E0%A4%AC%E0%A5%80%E0%A4%AE%E0%A4%BE%E0%A4%B0%E0%A4%BF%E0%A4%AF%E0%A4%BE%E0%A4%82%20%E0%A4%98%E0%A5%87%E0%A4%B0%20%E0%A4%B2%E0%A5%87%E0%A4%A4%E0%A5%80%20%E0%A4%B9%E0%A5%88%E0%A4%82%E0%A5%A4%20%E0%A4%AA%E0%A4%B0%20%E0%A4%95%E0%A5%8D%E2%80%8D%E0%A4%AF%E0%A4%BE%20%E0%A4%86%E0%A4%AA%20%E0%A4%9C%E0%A4%BE%E0%A4%A8%E0%A4%A4%E0%A5%87%20%E0%A4%B9%E0%A5%88%E0%A4%82%20%E0%A4%95%E0%A4%BF%20%E0%A4%B2%E0%A4%B9%E0%A4%B8%E0%A5%81%E0%A4%A8%20%E0%A4%94%E0%A4%B0%20%E0%A4%B6%E0%A4%B9%E0%A4%A6%20%E0%A4%95%E0%A5%8B%20%E0%A4%8F%E0%A4%95%20%E0%A4%B8%E0%A4%BE%E0%A4%A5%20%E0%A4%AE%E0%A4%BF%E0%A4%B2%E0%A4%BE%20%E0%A4%95%E0%A4%B0%20%E0%A4%96%E0%A4%BE%E0%A4%A8%E0%A5%87%20%E0%A4%B8%E0%A5%87%20%E0%A4%AF%E0%A5%87%20%E0%A4%8F%E0%A4%82%E0%A4%9F%E0%A5%80%E0%A4%AC%E0%A4%BE%E0%A4%AF%E0%A5%8B%E0%A4%9F%E0%A4%BF%E0%A4%95%20%E0%A4%95%E0%A4%BE%20%E0%A4%95%E0%A4%BE%E0%A4%AE%20%E0%A4%95%E0%A4%B0%E0%A4%A4%E0%A5%87%20%E0%A4%B9%E0%A5%88%E0%A4%82%E0%A5%A4%20%E0%A4%AF%E0%A4%B9%20%E0%A4%8F%E0%A4%95%20%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A4%95%E0%A4%BE%E0%A4%B0%20%E0%A4%95%E0%A4%BE%20%E0%A4%B8%E0%A5%82%E0%A4%AA%E0%A4%B0%20%E0%A4%AB%E0%A5%82%E0%A4%A1%20%E0%A4%B9%E0%A5%88%E0%A5%A4%3C/p%3E%3Cp%3E%3Cimg%20src=
OSS Error

  改进后(使用jsoup):

    改进后,使用了jsoup去解析整个HTML代码,然后获取到图片地址,在对其进行替换和添加后缀,完美解决问题。

技术分享
public static String replaceImgOfContent(String content){
        Document doc = Jsoup.parseBodyFragment(content);
        Elements img = doc.getElementsByTag("img");
        for (Element link : img) {
            link.removeAttr("data-src");
            link.removeAttr("data-lazy-src");
            link.removeAttr("alt");
            link.removeAttr("title");
            link.removeAttr("src-set");
            link.removeAttr("id");
            link.removeAttr("class");
            String linkHref = link.attr("src");
            if(!linkHref.endsWith(".gif")){
                if (linkHref.indexOf("masala-sg") > 0) {
                    int size = linkHref.lastIndexOf(".");
                    if (linkHref.length() - size < 10) {
                        String im = linkHref.substring(size - 3, linkHref.lastIndexOf("."));
                        if (im.equals("480")) {
                            String idu = linkHref.substring(linkHref.lastIndexOf("480"), linkHref.length());
                            linkHref = linkHref.replace(idu, "608.webp");
                        }
                    } else {
                        if (linkHref.endsWith("480")) {
                            linkHref = linkHref.replaceAll("480","608");
                            linkHref = linkHref+".webp";
                        }
                    }
                    link.attr("src", linkHref);
                }
            }
        }
        content = doc.body().html().toString();
        return content;
    }
View Code

 

jsoup Cookbook(中文版) : http://www.open-open.com/jsoup/

maven dependency:
<dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.10.2</version> </dependency>

 

   





以上是关于使用Jsoup解决网页中图片链接问题的主要内容,如果未能解决你的问题,请参考以下文章

使用Jsoup解析html网页

利用jsoup抓取网页的图文信息,只需要网页上的文章和图片,怎么样同时抓取这两个信息?

jsoup 提取 html 中的所有链接图片和媒体

java jsoup怎样爬取特定网页内的数据

无法使用jsoup获取RSS提要链接

如何通过Java代码实现对网页数据进行指定抓取