爬虫案例之网易有道翻译JS代码复杂版
Posted Dream-Z
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了爬虫案例之网易有道翻译JS代码复杂版相关的知识,希望对你有一定的参考价值。
网易有道翻译逆向案例
- 本次案例逆向的是网易有道云翻译
- 用到的知识包括
- requests 模块及方法
- md5加密
- js代码环境的补全
【一】分析网站
(1)网站页面如图
(2)抓包
(3)分析抓到的包
- 逐个查看每个包的标头和载荷
- 在
webtranslate
- 这个包的请求头中发现其为post请求
- 这个包的载荷中发现了其携带有很多参数
(4)分析载荷
i: run
from: auto
to:
dictResult: true
keyid: webfanyi
sign: a9856197613117e6524edc4b5076bd55
client: fanyideskweb
product: webfanyi
appVersion: 1.0.0
vendor: web
pointParam: client,mysticTime,product
mysticTime: 1683545856511
keyfrom: fanyi.web
-
在载荷中发现了很多参数
-
对网站二次请求抓到的该包携带的参数进行对比
i: rain
from: auto
to:
domain: 0
dictResult: true
keyid: webfanyi
sign: 60d4d3ab8995ecacee1767824036d8a2
client: fanyideskweb
product: webfanyi
appVersion: 1.0.0
vendor: web
pointParam: client,mysticTime,product
mysticTime: 1683546118762
keyfrom: fanyi.web
- 通过对比可以发现,其中有两个参数发生了变化
- sign : 猜测这是一个加密后的数据
- mysticTime : 这是一个时间戳生成的
【二】分析sign并逆向补充
(1)分析sign 的由来
- 全局搜索sign值:
-
通过搜索后的结果猜测其可能存在过的文件
-
其最可能是js文件通过代码生成,所以排除css、html等文件
-
-
进入第一个文件进行尝试分析
-
可以看到当前有一个sign函数,有传参进去,t为时间戳,e为参数。
-
将此处打断点,重新请求,看其请求是否会被卡主
-
-
可以很明显的发现其被卡主
(2)参数补充 - t ,逆向生成sign值
-
将鼠标放到参数t上查看
-
发现其生成就在上一行代码
const t = (new Date).getTime();
-
继续向上查看代码
-
发现有两行代码很可疑,将其拷贝并补充代码
function g(e) return r.a.createHash("md5").update(e).digest() function v(e) return r.a.createHash("md5").update(e.toString()).digest("hex") function h(e, t) return v(`client=$d&mysticTime=$e&product=$u&key=$t`)
- 将第一个传e为参的函数调用删除,因为第二个传e为参数的函数也进行同样类似的更新操作
function v(e) return r.a.createHash("md5").update(e.toString()).digest("hex") function h(e, t) return v(`client=$d&mysticTime=$e&product=$u&key=$t`)
- 查看第二和 h 函数,补全参数 。
-
已经有的参数 e 、 t 。需要补全 d 、 u
-
向上翻看代码
const d = "fanyideskweb" , u = "webfanyi" , m = "client,mysticTime,product" , p = "1.0.0" , b = "web" , f = "fanyi.web";
-
其中含有固定参数 d 、 u
-
补全后的代码
// t 参数的声明 const t = (new Date).getTime(); function v(e) return r.a.createHash("md5").update(e.toString()).digest("hex") var d = "fanyideskweb" var u = "webfanyi" function h(e, t) return v(`client=$d&mysticTime=$e&product=$u&key=$t`) h() //ReferenceError: r is not defined
-
这里运行后会报错,错误显示 r 没有被定义
-
这里采用的办法是 利用 crypto-js 补全环境
-
-
先声明 调用 该模块
var Cry = require(\'crypto-js\')
-
再修改该部分代码
//修改前 function v(e) return r.a.createHash("md5").update(e.toString()).digest("hex") //前面代码的大概意思是运用MD5加密算法将 e 加密混淆后转换为字符串 //修改后 function v(e) return Cry.MD5(e).toString() //修改后用 crypto-js(在js代码中的crypto算法) 这个 模块将 e 加密混淆后转换为字符串
-
补全后的代码
var Cry = require(\'crypto-js\') // t 参数的声明 const t = (new Date).getTime(); // function v(e) // return r.a.createHash("md5").update(e.toString()).digest("hex") // function v(e) return Cry.MD5(e).toString() var d = "fanyideskweb" var u = "webfanyi" function h(e, t) return v(`client=$d&mysticTime=$e&product=$u&key=$t`) h() // 无报错
(3)参数补充 - e ,逆向生成sign值
-
观察发现 e 为固定值
e = "fsdsogkndfokasodnaso"
-
补全代码
var Cry = require(\'crypto-js\') // t 参数的声明 const t = (new Date).getTime(); // function v(e) // return r.a.createHash("md5").update(e.toString()).digest("hex") // function v(e) return Cry.MD5(e).toString() var d = "fanyideskweb" var u = "webfanyi" function h(e, t) return v(`client=$d&mysticTime=$e&product=$u&key=$t`) function sign() var e = "fsdsogkndfokasodnaso" return h(e,t) console.log(sign()) //调用sigh函数,查看其生成代码 //b73eba683a8cafb121cbf292d122e628
(4)如何检验代码是否改写成功?
-
将参数 t 写死 ,观察生成值是否相同
var Cry = require(\'crypto-js\') // t 参数的声明 // const t = (new Date).getTime(); var t = \'1683546118762\' // function v(e) // return r.a.createHash("md5").update(e.toString()).digest("hex") // function v(e) return Cry.MD5(e).toString() var d = "fanyideskweb" var u = "webfanyi" function h(e, t) return v(`client=$d&mysticTime=$e&product=$u&key=$t`) function sign() var e = "fsdsogkndfokasodnaso" return h(e,t) console.log(sign()) // 二者 相同
【三】Python 部分代码
(1)定义函数 ---- 获取sign值部分
-
模块的导入
# requests 请求模块 import requests # 随机UA模块 from fake_useragent import UserAgent # 这里是执行js代码的相关模块 import subprocess from functools import partial # 必须先定义这个变量 再引入 execis模块 否则会报错 subprocess.Popen = partial(subprocess.Popen, encoding=\'utf-8\') import execjs
-
定义
get_sign()
部分代码def get_sign(): # 创建node对象 node = execjs.get() # 读取到 js 代码,并以 utf-8编码方式打开 with open(\'01.js\', encoding=\'utf-8\') as f: # JS 源文件编译 ctx = node.compile(f.read()) # 调用函数 sign = ctx.eval(\'run()\') # sign[0]:列表中的sign值, sign[1]:列表中的t值 return sign[0], sign[1]
-
这里还需要重写
01.js
文件,定义run()
函数var Cry = require(\'crypto-js\') // t 参数的声明 // const t = (new Date).getTime(); var t = \'1683546118762\' // function v(e) // return r.a.createHash("md5").update(e.toString()).digest("hex") // function v(e) return Cry.MD5(e).toString() var d = "fanyideskweb" var u = "webfanyi" function h(e, t) return v(`client=$d&mysticTime=$e&product=$u&key=$t`) function sign() var e = "fsdsogkndfokasodnaso" // 返回 sign 值 return h(e,t) function run() // 调用 sign() 函数,返回 sign 值和 t 值 return [sign(),t]
(2)伪装请求头
-
在
webtranslate
文件中我们可以发现请求标头(如果不知道那个不需要就全都写上)Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate, br Accept-Language: zh-CN,zh;q=0.9,en;q=0.8 Cache-Control: no-cache Connection: keep-alive Content-Length: 247 Content-Type: application/x-www-form-urlencoded Cookie: OUTFOX_SEARCH_USER_ID=-2102182500@10.110.96.154; OUTFOX_SEARCH_USER_ID_NCOO=1723343714.3489342 Host: dict.youdao.com Origin: https://fanyi.youdao.com Pragma: no-cache Referer: https://fanyi.youdao.com/ sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110" sec-ch-ua-mobile: ?0 sec-ch-ua-platform: "Windows" Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-site User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36
-
这里
headers
中的"User-Agent"
采用了fake随机获取 -
这里的cookie进行了单独提取
def spider(eng): # 请求头中的部分参数 headers = \'Accept\': \'application/json, text/plain, */*\', \'Accept-Encoding\': \'gzip, deflate, br\', \'Accept-Language\': \'zh-CN,zh;q=0.9,en;q=0.8\', \'Cache-Control\': \'no-cache\', \'Connection\': \'keep-alive\', \'Content-Length\': \'239\', \'Content-Type\': \'application/x-www-form-urlencoded\', \'Host\': \'dict.youdao.com\', \'Origin\': \'https://fanyi.youdao.com\', \'Pragma\': \'no-cache\', \'Referer\': \'https://fanyi.youdao.com/\', \'sec-ch-ua\': "\\"Chromium\\";v=\\"110\\", \\"Not A(Brand\\";v=\\"24\\", \\"Google Chrome\\";v=\\"110\\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "Windows", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "same-site", "User-Agent": UserAgent().random, # 请求头中的cookie cookies = \'OUTFOX_SEARCH_USER_ID\': \'-2102182500@10.110.96.154\', \'OUTFOX_SEARCH_USER_ID_NCOO\': \'1723343714.3489342\', # 需要请求的请求页 url page_url = \'https://dict.youdao.com/webtranslate\' # 因为是携带参数的请求,所以是post请求 # post请求需要携带请求参数(分析已经得知 sign 和 t 为变量,已经做了逆向获取) # 从 get_sign() 函数中获取到返回值 sign,mysticTime sign, mysticTime = get_sign() # 将 sign,mysticTime 添加到请求携带参数中 data = "i": str(eng), "from": "auto", "to": "", "dictResult": "true", "keyid": "webfanyi", "sign": sign, "client": "fanyideskweb", "product": "webfanyi", "appVersion": "1.0.0", "vendor": "web", "pointParam": "client,mysticTime,product", "mysticTime": mysticTime, "keyfrom": "fanyi.web", # 对请求页发起请求 response = requests.post(url=page_url, headers=headers, data=data, cookies=cookies) # 返回请求响应内容 return response.text
-
(3)定义主函数入口
if __name__ == \'__main__\':
res = spider(\'rain\')
print(res)
【四】逆向解密密文数据
(1)查找 sign
值的去向
-
从刚才的断点依次执行代码,找到加密后的密文数据,即
webtranslate
的响应内容- 一般特征是 json.parse()
-
在这行代码我们发现了这个方法
-
并且也发现了其上面获取到的数据
o
o = "Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6x7-sEgf3E8xxPy8fNgUR0PtLyLVXDnp0W_hhc-8PxqHNoVtmgMHMW8tBEi7Xh66zR8dDM3Ga-WTkqp4bVDIJHqVq7sbwGnCYrLIE-UQQNHZPv0XjhUoKPsJcfHXQzoNiGzoWtcYwV_z2Pwc9VS_vUfwlpfboIjBuCpnQ7QaUUL3cvi2pHSzznsXHIwr3B9RQCkiGZ5hW1_etmLOma7LtBnWLzWm9vj9qHHoAXTW6ZSXKvSaKWpI4owfVpffytuYADNG7n7H0Vg5P93mfAMNOMRKN4dNm0HkeWIKHFPLNLgH0SbZWk3kVnuJHw0Xsa32GLb3kcmjui6srexVN1PI3KpyCIL29fSPyLM9p5IHw8pZ5rSOAe6Z-pHwrcnpdnLhf8d-xRcxy2k6Aq_OLh4pw-DpoS53nPPBW6xIlFXH4itoLTlDjXq3Q64mujS7UTjuy8XME5aegFhrkDL86D18K0TeOJQtlZYtQNJMvqRcyVakDBEjph-_8r6jBhfLVr2aSM5BEs_cS_rh4tLXp48vte1P2YkuDF2_GQlh7fkJpSuSko9cObi8xvxULk1heIKOxpkRlwK187vD-E1VNMrulR5YmZXtQcS9E1g40f8qLHByyUULfY41SCepWQgvrwI3n4KAd6Pui5INE3iK-n9_unJ4L9H0HUugLoEmJA9F5ylal-pvhafyHlunzfv3os2lzccP-e04GrL_MZdqnGa_c019lEFqZKist2PxIIdkM3QpGedOXsx-guqfAjXaycFpQJn7DH32VDTVEbRGdcNaMZvh0lS4-ExynCiPLvYSX8Xvxpl4lffiknNZ7gYf56S8uKMAsGP3gS040ZV7mLo90wtlXe3cZekKNEpTR5OzqPeL6_CUTvjaE75o7TWE6BjmtGvA10fcuTZuZep73PgnvkU-HP447QGc_SqxuD98ZoITYkf11HsX9WeEgIZYkN-CCAQ1DcNiyzcJNIB3rdrYj9_5hFAwrKIPU1kYPnBQVsqb3p2YVXARZK7LqNBQfDHIZ5k3_boqWP2ZbRfr_Sdy5Yw=="
-
同时看到了我们想要的一串数据
""code":0,"dictResult":"ec":"exam_type":["初中","高中","CET4","CET6","考研"],"word":"usphone":"reɪn","ukphone":"reɪn","ukspeech":"rain&type=1","trs":["pos":"n.","tran":"雨,雨水;(热带地区的)雨季(the rains);(降雨般的)一阵,(大量的)降落物","pos":"v.","tran":"下雨;(使)大量降落,雨点般落下;(使)如雨般地猛击","tran":"【名】 (Rain)(法)兰,(英)雷恩,(罗、捷)赖恩(人名)"],"wfs":["wf":"name":"复数","value":"rains","wf":"name":"第三人称单数","value":"rains","wf":"name":"现在分词","value":"raining","wf":"name":"过去式","value":"rained","wf":"name":"过去分词","value":"rained"],"return-phrase":"rain","usspeech":"rain&type=2","translateResult":[["tgt":"雨","src":"rain","tgtPronounce":"yŭ"]],"type":"en2zh-CHS""
(2)逆向补全代码
-
首先 将这部分代码 扣出来
const n = an["a"].decodeData(o, sn["a"].state.text.decodeKey, sn["a"].state.text.decodeIv)
-
由此我们可以看到,我们需要两个 固定值
decodeKey
(秘钥)和decodeIv
(偏移量) -
将鼠标悬停在其上面就可以看到两个相应的值
decodeKey
(解密秘钥)
"ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl"
decodeIv
(偏移量)
"ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4"
-
我们还需要知道解密的算法
-
将鼠标悬停在
an["a"].decodeData
上查看其解密js代码 -
发现一个箭头函数
S = (t, o, n) => if (!t) return null; const a = e.alloc(16, g(o)) , c = e.alloc(16, g(n)) , i = r.a.createDecipheriv("aes-128-cbc", a, c); let s = i.update(t, "base64", "utf-8"); return s += i.final("utf-8"), s
-
格式化
function decode(t, o, n) if (!t) return null; const a = e.alloc(16, g(o)) , c = e.alloc(16, g(n)) , i = r.a.createDecipheriv("aes-128-cbc", a, c); let s = i.update(t, "base64", "utf-8"); return s += i.final("utf-8"), s
-
由此,分析
- 我们需要知道
e.alloc
- 还需要知道
r.a.
- 我们需要知道
-
(3)解决办法
-
e.alloc
-
由Buffer替换即可
const a = Buffer.alloc(16, g(o)) , c = Buffer.alloc(16, g(n))
-
r.a.
-
生成加密算法对象进行加密
// 导入crypto模块 const crypto = require(\'crypto\') // 创建算法对象 function g(e) return crypto.createHash("md5").update(e).digest() function decode(t, o, n) if (!t) return null; const a = Buffer.alloc(16, g(o)) , c = Buffer.alloc(16, g(n)) , i = crypto.createDecipheriv("aes-128-cbc", a, c); let s = i.update(t, "base64", "utf-8"); return s += i.final("utf-8"), s
(4)拼接数据
// 导入crypto模块
const crypto = require(\'crypto\')
// 创建算法对象
function g(e)
return crypto.createHash("md5").update(e).digest()
function decode(t, o, n)
if (!t)
return null;
const a = Buffer.alloc(16, g(o))
, c = Buffer.alloc(16, g(n))
, i = crypto.createDecipheriv("aes-128-cbc", a, c);
let s = i.update(t, "base64", "utf-8");
return s += i.final("utf-8"),
s
// 声明 变量 解密秘钥和偏移量
var k = "ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl"
var iv = "ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4"
// 定义 加密后的密文数据
var n = o = "Z21kD9ZK1ke6ugku2ccWu-MeDWh3z252xRTQv-wZ6jddVo3tJLe7gIXz4PyxGl73nSfLAADyElSjjvrYdCvEP4pfohVVEX1DxoI0yhm36ytQNvu-WLU94qULZQ72aml6x7-sEgf3E8xxPy8fNgUR0PtLyLVXDnp0W_hhc-8PxqHNoVtmgMHMW8tBEi7Xh66zR8dDM3Ga-WTkqp4bVDIJHqVq7sbwGnCYrLIE-UQQNHZPv0XjhUoKPsJcfHXQzoNiGzoWtcYwV_z2Pwc9VS_vUfwlpfboIjBuCpnQ7QaUUL3cvi2pHSzznsXHIwr3B9RQCkiGZ5hW1_etmLOma7LtBnWLzWm9vj9qHHoAXTW6ZSXKvSaKWpI4owfVpffytuYADNG7n7H0Vg5P93mfAMNOMRKN4dNm0HkeWIKHFPLNLgH0SbZWk3kVnuJHw0Xsa32GLb3kcmjui6srexVN1PI3KpyCIL29fSPyLM9p5IHw8pZ5rSOAe6Z-pHwrcnpdnLhf8d-xRcxy2k6Aq_OLh4pw-DpoS53nPPBW6xIlFXH4itoLTlDjXq3Q64mujS7UTjuy8XME5aegFhrkDL86D18K0TeOJQtlZYtQNJMvqRcyVakDBEjph-_8r6jBhfLVr2aSM5BEs_cS_rh4tLXp48vte1P2YkuDF2_GQlh7fkJpSuSko9cObi8xvxULk1heIKOxpkRlwK187vD-E1VNMrulR5YmZXtQcS9E1g40f8qLHByyUULfY41SCepWQgvrwI3n4KAd6Pui5INE3iK-n9_unJ4L9H0HUugLoEmJA9F5ylal-pvhafyHlunzfv3os2lzccP-e04GrL_MZdqnGa_c019lEFqZKist2PxIIdkM3QpGedOXsx-guqfAjXaycFpQJn7DH32VDTVEbRGdcNaMZvh0lS4-ExynCiPLvYSX8Xvxpl4lffiknNZ7gYf56S8uKMAsGP3gS040ZV7mLo90wtlXe3cZekKNEpTR5OzqPeL6_CUTvjaE75o7TWE6BjmtGvA10fcuTZuZep73PgnvkU-HP447QGc_SqxuD98ZoITYkf11HsX9WeEgIZYkN-CCAQ1DcNiyzcJNIB3rdrYj9_5hFAwrKIPU1kYPnBQVsqb3p2YVXARZK7LqNBQfDHIZ5k3_boqWP2ZbRfr_Sdy5Yw=="
// 利用解密函数进行解密
console.log(decode(n, k, iv))
/*"code":0,"dictResult":"ec":"exam_type":["初中","高中","CET4","CET6","考研"],"word":"usphone":"reɪn","ukphone":"reɪn","ukspeech":"rain&type=1","trs":["pos":"n.","tran":"雨,雨水
;(热带地区的)雨季(the rains);(降雨般的)一阵,(大量的)降落物","pos":"v.","tran":"下雨;(使)大量降落,雨点般落下;(使)如雨般地猛击","tran":"【名】 (Rain)(法)兰,
(英)雷恩,(罗、捷)赖恩(人名)"],"wfs":["wf":"name":"复数","value":"rains","wf":"name":"第三人称单数","value":"rains","wf":"name":"现在分词","value":"raining","wf"
:"name":"过去式","value":"rained","wf":"name":"过去分词","value":"rained"],"return-phrase":"rain","usspeech":"rain&type=2","translateResult":[["tgt":"雨","src":"rain","tg
tPronounce":"yŭ"]],"type":"en2zh-CHS"
*/
- 通过打印结果 可以看到我们的解密函数正常运行
【五】Python部分代码
(1)定义 tran_data()
函数
- 其参数为,加密后的密文数据
def tran_data(data):
# 创建 node 对象
node = execjs.get()
# 读取到 js 代码,并以 utf-8编码方式打开
with open(\'02.js\', encoding=\'utf-8\') as f:
# JS 源文件编译
ctx = node.compile(f.read())
# 调用函数 , 并向其传参 ,获得响应数据
t = ctx.eval(f\'run("data")\')
return t
(2)在 js 文件中定义主函数 run()
// 导入crypto模块
const crypto = require(\'crypto\')
// 创建算法对象
function g(e)
return crypto.createHash("md5").update(e).digest()
function decode(t, o, n)
if (!t)
return null;
const a = Buffer.alloc(16, g(o))
, c = Buffer.alloc(16, g(n))
, i = crypto.createDecipheriv("aes-128-cbc", a, c);
let s = i.update(t, "base64", "utf-8");
return s += i.final("utf-8"),
s
// 声明 变量 解密秘钥和偏移量
var k = "ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl"
var iv = "ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4"
// 参数为加密后的密文数据
function run(encode_data)
return (decode(encode_data, k, iv))
【六】代码整合
(1)Python代码部分
# requests 请求模块
import requests
# 随机UA模块
from fake_useragent import UserAgent
# json模块
import json
# 这里是执行js代码的相关模块
import subprocess
from functools import partial
# 必须先定义这个变量 再引入 execis模块 否则会报错
subprocess.Popen = partial(subprocess.Popen, encoding=\'utf-8\')
import execjs
def get_sign():
node = execjs.get()
with open(\'sign.js\', encoding=\'utf-8\') as f:
ctx = node.compile(f.read())
sign = ctx.eval(\'run()\')
return sign[0], sign[1]
def tran_data(data):
# 创建 node 对象
node = execjs.get()
# 读取到 js 代码,并以 utf-8编码方式打开
with open(\'decode.js\', encoding=\'utf-8\') as f:
# JS 源文件编译
ctx = node.compile(f.read())
# 调用函数 , 并向其传参 ,获得响应数据
t = ctx.eval(f\'run("data")\')
return t
def spider(eng):
headers =
\'Accept\': \'application/json, text/plain, */*\',
\'Accept-Encoding\': \'gzip, deflate, br\',
\'Accept-Language\': \'zh-CN,zh;q=0.9,en;q=0.8\',
\'Cache-Control\': \'no-cache\',
\'Connection\': \'keep-alive\',
\'Content-Length\': \'239\',
\'Content-Type\': \'application/x-www-form-urlencoded\',
\'Host\': \'dict.youdao.com\',
\'Origin\': \'https://fanyi.youdao.com\',
\'Pragma\': \'no-cache\',
\'Referer\': \'https://fanyi.youdao.com/\',
\'sec-ch-ua\': "\\"Chromium\\";v=\\"110\\", \\"Not A(Brand\\";v=\\"24\\", \\"Google Chrome\\";v=\\"110\\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "Windows",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
"User-Agent": UserAgent().random,
cookies =
\'OUTFOX_SEARCH_USER_ID\': \'-2102182500@10.110.96.154\',
\'OUTFOX_SEARCH_USER_ID_NCOO\': \'1723343714.3489342\',
page_url = \'https://dict.youdao.com/webtranslate\'
sign, my_time = get_sign()
data =
"i": str(eng),
"from": "auto",
"to": "",
"dictResult": "true",
"keyid": "webfanyi",
"sign": sign,
"client": "fanyideskweb",
"product": "webfanyi",
"appVersion": "1.0.0",
"vendor": "web",
"pointParam": "client,mysticTime,product",
"mysticTime": my_time,
"keyfrom": "fanyi.web",
response = requests.post(url=page_url, headers=headers, data=data, cookies=cookies)
return response.text
# print(response.text)
# print(f\'response:::response\')
if __name__ == \'__main__\':
while True:
eng = input(f"请输入英文单词::")
encode_text = spider(eng)
# print(encode_text)
res = json.loads(tran_data(encode_text))["dictResult"]["ec"]["word"]["trs"]
for i in res:
print(i)
(2)sign代码部分
var Cry = require(\'crypto-js\')
// t 参数的声明
// const t = (new Date).getTime();
var t = \'1683546118762\'
// function v(e)
// return r.a.createHash("md5").update(e.toString()).digest("hex")
//
function v(e)
return Cry.MD5(e).toString()
var d = "fanyideskweb"
var u = "webfanyi"
function h(e, t)
return v(`client=$d&mysticTime=$e&product=$u&key=$t`)
function sign()
var e = "fsdsogkndfokasodnaso"
return h(e,t)
function run()
return [sign(),t]
(3)decode代码部分
// 导入crypto模块
const crypto = require(\'crypto\')
// 创建算法对象
function g(e)
return crypto.createHash("md5").update(e).digest()
function decode(t, o, n)
if (!t)
return null;
const a = Buffer.alloc(16, g(o))
, c = Buffer.alloc(16, g(n))
, i = crypto.createDecipheriv("aes-128-cbc", a, c);
let s = i.update(t, "base64", "utf-8");
return s += i.final("utf-8"),
s
// 声明 变量 解密秘钥和偏移量
var k = "ydsecret://query/key/B*RGygVywfNBwpmBaZg*WT7SIOUP2T0C9WHMZN39j^DAdaZhAnxvGcCY6VYFwnHl"
var iv = "ydsecret://query/iv/C@lZe2YzHtZ2CYgaXKSVfsb7Y4QWHjITPPZ0nQp87fBeJ!Iv6v^6fvi2WN@bYpJ4"
// 参数为加密后的密文数据
function run(encode_data)
return (decode(encode_data, k, iv))
反反爬实战网易有道翻译(免费即时的多语种在线翻译)
前言
Python开发中,总是遇到不会的单词,有道翻译用着还不错,慢慢滴我便对 Ta 动了歪心思 (* ̄︶ ̄)
1. 分析页面
翻译链接:网易有道翻译
-
首先按下F12进入开发者模式,进入Network,进行数据抓包。
-
复制一句话,粘贴入查询框,会自动进行翻译,观察有哪些数据包传送过来。
-
打开服务器最先发送过来的数据包,得到翻译结果。
- 于是我复制以下post请求的相关内容,再次模拟请求时,返回error。
- 可能请求参数里面,有加密或者部分参数每次请求都会变化。
- 连续两次更换请求,对比参数变化
经过对比发现,参数salt、sign、lts
果真在变化!
- 全局search该参数,可以发现有JS代码调用:
有点猫腻了!
大概阅读JS代码,这里面有我们想要的请求参数啊!
分析这句很重要哇!
r = v.generateSaltSign(n);
往上找到 r
的产生语句:
var n = e("./jquery-1.7");
e("./utils");
e("./md5");
var r = function(e) {
var t = n.md5(navigator.appVersion)
, r = "" + (new Date).getTime()
, i = r + parseInt(10 * Math.random(), 10);
return {
ts: r,
bv: t,
salt: i,
sign: n.md5("fanyideskweb" + e + i + "Y2FYu%TNSbMCxc3t2u^XT")
}
};
有点像是我们的参数啊!
这就是生成参数salt、sign、lts
的代码。
2. Py模拟JS生成 Form Data数据
有些参数是固定的,所以直接复制就可!
def generate_formdata(self):
"""
ts: r = "" + (new Date).getTime(),
salt: ts + parseInt(10 * Math.random(), 10);,
sign: n.md5("fanyideskweb" + e + i + "Y2FYu%TNSbMCxc3t2u^XT")
"""
ts = str(int(time.time()) * 1000)
salt = ts + str(random.randint(0, 9))
tempstr = "fanyideskweb" + self.word + salt + "Y2FYu%TNSbMCxc3t2u^XT"
md5 = hashlib.md5()
md5.update(tempstr.encode())
sign = md5.hexdigest()
self.formdata = {
'i': self.word,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': salt,
'sign': sign,
'lts': ts,
'bv': '7596b16d0589d68d2b53a8de445f5852',
'doctype': 'json',
'version': '2.1',
'keyfrom':' fanyi.web',
'action': 'FY_BY_REALTlME'
}
3. 请求&解析数据
def get_data(self):
response = requests.post(self.url, data=self.formdata, headers=self.headers)
return response.content
def parse_data(self, origin_data): # b'{"translateResult":[[{"tgt":"Life is short, carpe diem","src":"\\xe4\\xba\\xba\\xe7\\x94\\x9f\\xe8\\x8b\\xa6\\xe7\\x9f\\xad\\xef\\xbc\\x8c\\xe5\\x8f\\x8a\\xe6\\x97\\xb6\\xe8\\xa1\\x8c\\xe4\\xb9\\x90"}]],"errorCode":0,"type":"zh-CHS2en"}'
# TypeError: string indices must be integers,因此加载为json对象
# print(type(origin_data)) # <class 'str'>
data = json.loads(origin_data)
# print(type(data)) # <class 'dict'>
return '"{}"的有道翻译结果是:{}'.format(data['translateResult'][0][0]['src'], data['translateResult'][0][0]['tgt'])
4. 完整代码
import requests
import hashlib
import time
import random
import json
class Youdao(object):
def __init__(self, word):
self.url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4542.2 Safari/537.36',
'Cookie': 'OUTFOX_SEARCH_USER_ID=-1872735395@10.108.160.105; OUTFOX_SEARCH_USER_ID_NCOO=390856101.01363164; JSESSIONID=aaaWggQObiC9694v6tDTx; ___rl__test__cookies=1629358130539',
'Referer': 'https://fanyi.youdao.com/',
}
self.formdata = None
self.word = word
def generate_formdata(self):
"""
ts: r = "" + (new Date).getTime(),
salt: ts + parseInt(10 * Math.random(), 10);,
sign: n.md5("fanyideskweb" + e + i + "Y2FYu%TNSbMCxc3t2u^XT")
"""
ts = str(int(time.time()) * 1000)
salt = ts + str(random.randint(0, 9))
tempstr = "fanyideskweb" + self.word + salt + "Y2FYu%TNSbMCxc3t2u^XT"
md5 = hashlib.md5()
md5.update(tempstr.encode())
sign = md5.hexdigest()
self.formdata = {
'i': self.word,
'from': 'AUTO',
'to': 'AUTO',
'smartresult': 'dict',
'client': 'fanyideskweb',
'salt': salt,
'sign': sign,
'lts': ts,
'bv': '7596b16d0589d68d2b53a8de445f5852',
'doctype': 'json',
'version': '2.1',
'keyfrom':' fanyi.web',
'action': 'FY_BY_REALTlME'
}
def get_data(self):
response = requests.post(self.url, data=self.formdata, headers=self.headers)
return response.content
def parse_data(self, origin_data): # b'{"translateResult":[[{"tgt":"Life is short, carpe diem","src":"\\xe4\\xba\\xba\\xe7\\x94\\x9f\\xe8\\x8b\\xa6\\xe7\\x9f\\xad\\xef\\xbc\\x8c\\xe5\\x8f\\x8a\\xe6\\x97\\xb6\\xe8\\xa1\\x8c\\xe4\\xb9\\x90"}]],"errorCode":0,"type":"zh-CHS2en"}'
# TypeError: string indices must be integers,因此加载为json对象
# print(type(origin_data)) # <class 'str'>
data = json.loads(origin_data)
# print(type(data)) # <class 'dict'>
return '"{}"的有道翻译结果是:{}'.format(data['translateResult'][0][0]['src'], data['translateResult'][0][0]['tgt'])
def run(self):
# url
# headers
# formdata
self.generate_formdata()
# 发送请求,获取相呼应
origin_data = self.get_data().decode() # 将二进制byte解码为utf-8
# 解析数据
data = self.parse_data(origin_data)
print(data)
if __name__ == '__main__':
# 获取translationText
word = input('请输入您要翻译的内容(自动识别语言):')
# 创建有道翻译对象
youdao = Youdao(word)
# 执行翻译代码
youdao.run()
注意:需要反反爬虫,User-Agent加上Cookie和Referer,即可!。
可以加上:while True,这样遇到不会的词语就直接查了,各种语言都可AUTO翻译。
一直开发,一直查词,一直爽!
加油!
感谢!
努力!
以上是关于爬虫案例之网易有道翻译JS代码复杂版的主要内容,如果未能解决你的问题,请参考以下文章