Proj. CHW Paper Reading: Characterizing Cryptocurrency Exchange Scams
Posted 雪溯
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Proj. CHW Paper Reading: Characterizing Cryptocurrency Exchange Scams相关的知识,希望对你有一定的参考价值。
1. intro
Blockchain community防范scam attack措施
- 包含malicious domains的开源数据库,例如CryptoScamDB和EtherScanDB
- 多半是使用crowd-sourcing based approach搜集,例如受害者报告
本文探究
- the extent the scams exist in the ecosystem
- who are the attackers
- what are the impacts
本文布局
Section4:
自动化方法搜集scams
Section5:
搜集了1595scam domains,其中60%以上之前未公开发布
300多个fake exchange apps
cluster the domains and apps
Section6:
分析94 scam domain families, 30 fake app families
分析distribution channels
分析实际影响
Ch3
3.1 Target Cryptocurrency Exchange
为了获得a list of Cryptocurrency Exchanges,考虑到exchanges的排名浮动问题,本文通过google搜索获取多个排名然后进行合并,最后得到70个exchanges。
如果某个exchange有多个domains或者apps,也都进行搜集
3.2 Research Questions
- Are scam attacks prevalent in cryptocurrency exchanges?
- what is the presence and trend of scam domains
- what is the presence and trend of scam apps
- Who are the attackers behind them?
- What is the impact of the scams
Ch4 Measurement of the scams
首先收集已有的scams,然后自动化进一步识别
4.1 Detecting the Scam Domains
- Collecting Scam Domains from Existing Corpus: 从etherscamdb.info和cryptoscamdb.org这样的知名网站中使用爬虫收集已知的诈骗域名,并过滤交易所相关的。
- Generating the Squatting Domains: 使用dnstwist识别使用typosquatting技术分发的有ip的域名。
- Labeling the Domains: 进一步搜集信息以确定是否有恶意
-
- 搜集信息:WHOIS信息,DNS信息, autonomous system numbers, VirusTotal anti-virus engine scan results
-
- 排除仅有空白页面的域名
-
- 分析the landing URL,source code和screenshots,并于已有的无害网站如prking services进行对比
-
- 利用OCR技术分析网站与对应官方网站之间的相似度,识别钓鱼网站
-
- 对于被VirusTotal标记的域名,进一步人工分析分为网络钓鱼或者交易诈骗
-
- 收集图像内容,使用Google Cloud Natural Language API和Vision API以确定是否有成人或者赌博内容等
-
- 对于最后剩下来的域名做人工分析
- 最终分类:
- C0 False Positive
- C1 Registered:虽然有ip但是却只有空白页面或者作者无法访问
- C2 Parked:一般含有宣传或者销售域名的内容
- C3 Phishing: 与官方相似的钓鱼网站
- C4 Trading Scan: 诱使用户相信有利可图(比如帮助、高利率投资),主动交出虚拟货币
- C5 Referral Fraud: 推荐欺诈账户
- C6 Adult and Gambling
4.2 Detecing the Fake Apps
- 识别Fake Apps
- 从交易所的官方网站提取证书签名
- 从应用市场搜索所有可能的假冒应用,使用Koodous因为其中的应用包比会定期清理恶意和虚假应用的Google Play更全面
- 将开发者签名进行比较,如果比匹配,则视为假冒应用程序
- Overall results
- Classification of Fake apps
- 被VirusTotal标记过的使用Euphony来法分析malware type和malware family
- 没有被标记过的,安装后人工分析或者逆向后静态分析
恶意软件类型和恶意软件家族分布。如图 10 所⽰,对于 170 个标记的虚假应⽤程序,⼤约 50% 的应⽤程序被VirusTotal 标记为灰⾊软件。其中超过40% 被标记为特洛伊⽊⻢,⼤约 4% 被标记为⼴告软件。这⼀结果表明,这些虚假应⽤程序可能会给⽤带来巨⼤的安全威胁。具体来说,我们使⽤Euphony 为每个恶意软件家族标签⽣成⼀个恶意软件家族标签,图 11 显⽰了恶意软件家族分布。不出所料,family fakeapp排名第⼀,有23个app被打上标签。其余应⽤的标签差异很⼤,
图 12 显⽰了我们识别出的虚假应⽤程序的⽰例。
图 12(1) 显⽰了⼀个针对 Poloniex 的⽹络钓⻥应⽤程序10。它制 作了⼀个虚假的登录屏幕并诱骗⽤⼾输⼊他们的Poloniex 账⼾。之后,它会继续显⽰伪造的 2FA验证屏幕,并要求获得完整的电⼦邮件访问权限,以进⼀步窃取⽤⼾的电⼦邮件帐⼾。⼀旦成 功,攻击者将获得对⽤⼾ Poloniex 账⼾的完全访问权限,并悄悄 窃取他们的资⾦。
图 12(2) 是 Coinbase ⼴告软件 11⽰例。它是从官⽅ Coinbase 应⽤程序重新打包的,并嵌⼊了激进的⼴告库。 它会要求用户安装推荐的应用,但是,⼤多数推荐的应⽤程序都被认为是恶意的。此外,该应⽤程序在后台 运⾏期间会推送移动⼴告,这可能会导致⽆意中点击⼴告。
图 12(3) 是⼀ 个针对 Binance 的推荐应⽤程序12 。它只是实现了⼀个 webview 并连 接到推荐链接 https://www.binance.com/?ref=20270961。攻击者将 从通过此链接注册的⽤⼾中获益。收益通常是佣⾦的⼀部分,具体取决于 不同交易所的推荐规则13 。
图 12(4) 是 Bithumb⽊⻢app14的代码⽚段。 正如反编译代码中所强调的那样,它会秘密收集⽤⼾的短信、合同和通话记 录,然后将它们上传到攻击者的私⼈服务器http://bithumbinback.pro/。 此外,它在后台监控设备的来电和消息。
Ch5 Understanding the attackers
5.1 The relation of the scam docmains
我们得到了 94 个集群,总共有 699 个域(43.8%)请注意,有 896 个独⽴域。
这⼀结果表明:(1)⼀些攻击者有创建⼤量诈骗域的倾向。例如,我们数据集中最⼤的家族创建了 254 个诈骗域
(2) 攻击者倾向于使⽤相同的⽅法来创建骗局域,即骗局类别对⼤多数集群保持不变。
5.2 虚假应用的关系
- 方法
- 基于开发者签名的聚类
- 基于代码相似性的聚类: SimiDroid
- 人工聚类
- 从每个集群中抽取应用程序手动检查
结论:
- 同一证书签名的虚假应用程序通常有较高的代码相似性
- 不少攻击者喜欢使用可视化编程平台
5.3
没有明确的证据表明诈骗域与假冒应⽤程序存在关联
Reading Comprehension必读paper汇总
文章目录
文章转自thunlp/RCPapers,另外在此基础上小博会实时更新~
Must-read papers on Machine Reading Comprehension.
Contributed by Yankai Lin, Deming Ye and Haozhe Ji.
Model Architecture
- Memory networks. Jason Weston, Sumit Chopra, and Antoine Bordes. arXiv preprint arXiv:1410.3916 (2014). paper ★ \\bigstar ★
- Teaching Machines to Read and Comprehend. Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. NIPS 2015. paper
- Text Understanding with the Attention Sum Reader Network. Rudolf Kadlec, Martin Schmid, Ondrej Bajgar, and Jan Kleindienst. ACL 2016. paper
- A Thorough Examination of the Cnn/Daily Mail Reading Comprehension Task. Danqi Chen, Jason Bolton, and Christopher D. Manning. ACL 2016. paper
- Long Short-Term Memory-Networks for Machine Reading. Jianpeng Cheng, Li Dong, and Mirella Lapata. EMNLP 2016. paper
- Key-value Memory Networks for Directly Reading Documents. Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. EMNLP 2016. paper
- Modeling Human Reading with Neural Attention. Michael Hahn and Frank Keller. EMNLP 2016. paper
- Learning Recurrent Span Representations for Extractive Question Answering Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, and Jonathan Berant. arXiv preprint arXiv:1611.01436 (2016). paper
- Multi-Perspective Context Matching for Machine Comprehension. Zhiguo Wang, Haitao Mi, Wael Hamza, and Radu Florian. arXiv preprint arXiv:1612.04211. paper
- Natural Language Comprehension with the Epireader. Adam Trischler, Zheng Ye, Xingdi Yuan, and Kaheer Suleman. EMNLP 2016. paper
- Iterative Alternating Neural Attention for Machine Reading. Alessandro Sordoni, Philip Bachman, Adam Trischler, and Yoshua Bengio. arXiv preprint arXiv:1606.02245 (2016). paper
- Bidirectional Attention Flow for Machine Comprehension. Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. ICLR 2017. paper Reading Over
- Machine Comprehension Using Match-lstm and Answer Pointer. Shuohang Wang and Jing Jiang. arXiv preprint arXiv:1608.07905 (2016). paper Reading Over
- Gated Self-matching Networks for Reading Comprehension and Question Answering. Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, and Ming Zhou. ACL 2017. paper
- Attention-over-attention Neural Networks for Reading Comprehension. Yiming Cui, Zhipeng Chen, Si Wei, Shijin Wang, Ting Liu, and Guoping Hu. ACL 2017. paper ★ \\bigstar ★
- Gated-attention Readers for Text Comprehension. Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, and Ruslan Salakhutdinov. ACL 2017. paper
- A Constituent-Centric Neural Architecture for Reading Comprehension. Pengtao Xie and Eric Xing. ACL 2017. paper
- Structural Embedding of Syntactic Trees for Machine Comprehension. Rui Liu, Junjie Hu, Wei Wei, Zi Yang, and Eric Nyberg. EMNLP 2017. paper
- Accurate Supervised and Semi-Supervised Machine Reading for Long Documents. Izzeddin Gur, Daniel Hewlett, Alexandre Lacoste, and Llion Jones. EMNLP 2017. paper
- MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension. Boyuan Pan, Hao Li, Zhou Zhao, Bin Cao, Deng Cai, and Xiaofei He. arXiv preprint arXiv:1707.09098 (2017). paper
- Dynamic Coattention Networks For Question Answering. Caiming Xiong, Victor Zhong, and Richard Socher. ICLR 2017 paper ★ \\bigstar ★
- R-NET: Machine Reading Comprehension with Self-matching Networks. Natural Language Computing Group, Microsoft Research Asia. paper Reading Over
- Reasonet: Learning to Stop Reading in Machine Comprehension. Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. KDD 2017. paper
- FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension. Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, and Weizhu Chen. ICLR 2018. paper
- Making Neural QA as Simple as Possible but not Simpler. Dirk Weissenborn, Georg Wiese, and Laura Seiffe. CoNLL 2017. paper
- Efficient and Robust Question Answering from Minimal Context over Documents. Sewon Min, Victor Zhong, Richard Socher, and Caiming Xiong. ACL 2018. paper
- Simple and Effective Multi-Paragraph Reading Comprehension. Christopher Clark and Matt Gardner. ACL 2018. paper
- Neural Speed Reading via Skim-RNN. Minjoon Seo, Sewon Min, Ali Farhadi, and Hannaneh Hajishirzi. ICLR2018. paper
- Hierarchical Attention Flow forMultiple-Choice Reading Comprehension. Haichao Zhu, Furu Wei, Bing Qin, and Ting Liu. AAAI 2018. paper
- Towards Reading Comprehension for Long Documents. Yuanxing Zhang, Yangbin Zhang, Kaigui Bian, and Xiaoming Li. IJCAI 2018. paper
- Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension. Zhen Wang, Jiachen Liu, Xinyan Xiao, Yajuan Lyu, and Tian Wu. ACL 2018. paper
- Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification. Yizhong Wang, Kai Liu, Jing Liu, Wei He, Yajuan Lyu, Hua Wu, Sujian Li, and Haifeng Wang. ACL 2018. paper
- Reinforced Mnemonic Reader for Machine Reading Comprehension. Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, and Ming Zhou. IJCAI 2018. paper ★ \\bigstar ★
- Stochastic Answer Networks for Machine Reading Comprehension. Xiaodong Liu, Yelong Shen, Kevin Duh, and Jianfeng Gao. ACL 2018. paper ★ \\bigstar ★
- Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering. Wei Wang, Ming Yan, and Chen Wu. ACL 2018. paper
- A Multi-Stage Memory Augmented Neural Networkfor Machine Reading Comprehension. Seunghak Yu, Sathish Indurthi, Seohyun Back, and Haejun Lee. ACL 2018 workshop. paper
- S-NET: From Answer Extraction to Answer Generation for Machine Reading Comprehension. Chuanqi Tan, Furu Wei, Nan Yang, Bowen Du, Weifeng Lv, and Ming Zhou. AAAI2018. paper Reading Over
- Ask the Right Questions: Active Question Reformulation with Reinforcement Learning. Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, and Wei Wang. ICLR2018. paper
- QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension. Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, and Quoc V. Le. ICLR2018. paper Reading Over
- Read + Verify: Machine Reading Comprehension with Unanswerable Questions. Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, and Ming Zhou. AAAI2019. paper ★ \\bigstar ★
- Attention-Guided Answer Distillation for Machine Reading Comprehension. Minghao Hu, Yuxing Peng, Furu Wei, Zhen Huang, Dongsheng Li, Nan Yang, Ming Zhou paper 新增 ★ \\bigstar ★
- Zero-shot relation extraction via reading comprehension.
- A nil-aware answer extraction framework for question answering.
- Simple and effective multiparagraph reading comprehension.
- The Natural Language Decathlon: Multitask Leaning as Question Answering
Utilizing Extenal Knolwedge
- Leveraging Knowledge Bases in LSTMs for Improving Machine Reading. Bishan Yang and Tom Mitchell. ACL 2017. paper
- Learned in Translation: Contextualized Word Vectors. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. arXiv preprint arXiv:1708.00107 (2017). paper
- Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge. Todor Mihaylov and Anette Frank. ACL 2018. paper
- A Comparative Study of Word Embeddings for Reading Comprehension. Bhuwan Dhingra, Hanxiao Liu, Ruslan Salakhutdinov, and William W. Cohen. arXiv preprint arXiv:1703.00993 (2017). paper
- Deep contextualized word representations. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. NAACL 2018. paper
- Improving Language Understanding by Generative Pre-Training. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. OpenAI. paper
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. arXiv preprint arXiv:1810.04805 (2018). paper
Exploration
- Adversarial Examples for Evaluating Reading Comprehension Systems. Robin Jia, and Percy Liang. EMNLP 2017. paper
- Did the Model Understand the Question? Pramod Kaushik Mudrakarta, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere. ACL 2018. paper
Open Domain Question Answering
- Reading Wikipedia to Answer Open-Domain Questions. Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. ACL 2017. paper
- R^3: Reinforced Reader-Ranker for Open-Domain Question Answering. Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, and Jing Jiang. AAAI 2018. paper
- Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering. Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. ICLR 2018. paper
- Denoising Distantly Supervised Open-Domain Question Answering. Yankai Lin, Haozhe Ji, Zhiyuan Liu, and Maosong Sun. ACL 2018. paper
Datasets
- (SQuAD 1.0) SQuAD: 100,000+ Questions for Machine Comprehension of Text. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. EMNLP 2016. paper
- (SQuAD 2.0) Know What You Don’t Know: Unanswerable Questions for SQuAD.
Pranav Rajpurkar, Robin Jia, and Percy Liang. ACL 2018. paper - (MS MARCO) MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. arXiv preprint arXiv:1611.09268 (2016). paper
- (Quasar) Quasar: Datasets for Question Answering by Search and Reading. Bhuwan Dhingra, Kathryn Mazaitis, and William W. Cohen. arXiv preprint arXiv:1707.03904 (2017). paper
- (TriviaQA) TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer. arXiv preprint arXiv:1705.03551 (2017). paper
- (SearchQA) SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine.
Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, and Kyunghyun Cho. arXiv preprint arXiv:1704.05179 (2017). paper - (QuAC) QuAC : Question Answering in Context. Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zettlemoyer. arXiv preprint arXiv:1808.07036 (2018). paper
- (CoQA) CoQA: A Conversational Question Answering Challenge. Siva Reddy, Danqi Chen, and Christopher D. Manning. arXiv preprint arXiv:1808.07042 (2018). paper
- (MCTest) MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text. Matthew Richardson, Christopher J.C. Burges, and Erin Renshaw. EMNLP 2013. paper.
- (CNN/Daily Mail) Teaching Machines to Read and Comprehend. Hermann, Karl Moritz, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. NIPS 2015. paper
- (CBT) The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations. Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston. arXiv preprint arXiv:1511.02301 (2015). paper
- (bAbi) Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, and Tomas Mikolov. arXiv preprint arXiv:1502.05698 (2015). paper
- (LAMBADA) The LAMBADA Dataset:Word Prediction Requiring a Broad Discourse Context. Denis Paperno, Germ ́an Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fern ́andez. ACL 2016. paper
- (SCT) LSDSem 2017 Shared Task: The Story Cloze Test. Nasrin Mostafazadeh, Michael Roth, Annie Louis,Nathanael Chambers, and James F. Allen. ACL 2017 workshop. paper
- (Who did What) Who did What: A Large-Scale Person-Centered Cloze Dataset Takeshi Onishi, Hai Wang, Mohit Bansal, Kevin Gimpel, and David McAllester. EMNLP 2016. paper
- (NewsQA) NewsQA: A Machine Comprehension Dataset. Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Suleman. arXiv preprint arXiv:1611.09830 (2016). paper
- (RACE) RACE: Large-scale ReAding Comprehension Dataset From Examinations. Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard Hovy. EMNLP 2017. paper
- (ARC) Think you have Solved Question Answering?Try ARC, the AI2 Reasoning Challenge. Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot,Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. arXiv preprint arXiv:1803.05457 (2018). paper
- (MCScript) MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge. Simon Ostermann, Ashutosh Modi, Michael Roth, Stefan Thater, and Manfred Pinkal. arXiv preprint arXiv:1803.05223. paper
- (NarrativeQA) The NarrativeQA Reading Comprehension Challenge.
Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, and Edward Grefenstette. TACL 2018. paper - (DuoRC) DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension. Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, and Karthik Sankaranarayanan. ACL 2018. paper
- (CLOTH) Large-scale Cloze Test Dataset Created by Teachers. Qizhe Xie, Guokun Lai, Zihang Dai, and Eduard Hovy. EMNLP 2018. paper
- (DuReader) DuReader: a Chinese Machine Reading Comprehension Dataset from
Real-world Applications. Wei He, Kai Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu, and Haifeng Wang. ACL 2018 Workshop. paper Reading Over - (CliCR) CliCR: a Dataset of Clinical Case Reports for Machine Reading Comprehension. Simon Suster and Walter Daelemans. NAACL 2018. paper
other必读
- Semi-supervised Sequence Learning Andrew M. Dai and Quoc V. Le. @google NIPS 2015.半监督预训练
- Sequence to Sequence Learning with Neural Networks Ilya Sutskever. @google. NIPS 2014. 经常提的seq2seq
- [A Neural Probabilistic Language Model] 神经语言模型
- [Neural Reading Comprehension and Beyond] Stanford陈丹琦阅读理解综述
- MRQA 2018: Machine Reading for Question Answering
- https://mrqa2018.github.io/slides/phil.pdf
以上是关于Proj. CHW Paper Reading: Characterizing Cryptocurrency Exchange Scams的主要内容,如果未能解决你的问题,请参考以下文章
Testing & Paper reading——Sketchvisor