阅读笔记Blockchain management and ML adaptation for IoT environment in 5G and beyond ...

Posted 文三路张同学

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了阅读笔记Blockchain management and ML adaptation for IoT environment in 5G and beyond ...相关的知识,希望对你有一定的参考价值。

【阅读笔记】 Blockchain management and machine learning adaptation for IoT environment in 5G and beyond networks: A systematic review

本文是一篇CCF C类文章,作者来自印度旁遮普邦帕蒂拉塔帕工程技术学院计算机科学与工程系

负一、问答

  1. 5G 和 B5G有什么区别?
    答:5G主要解决了我们熟悉的高清视频、传输速率等问题;而B5G(Beyond-5G)将解决一些应用场景与技术的完善过程,比如,在远程医疗、智慧交通、工业4.0方面的行业运用。

〇、本文的背景

大数据分析技术 + IoT应用需要安全和隐私保护 = 造就了机器学习与区块链技术的结合(the integration of machine learning and blockchain)

Keeping in view of the constraints and challenges with respect to big data analytics along with security and privacy preservation for 5G and B5G applications, the integration of machine learning and blockchain, two of the most promising technologies of the modern era is inevitable.

IoT设备介绍

  • IoT设备是什么呢?
    Over the last decade, Internet of Things (IoT) has revolutionized the whole world leading to various technological trends starting from Industry 1.0 to Industry 5.0, AR/VR/MR, smart factories, tactile Inter- net, smart transportation, smart plants, etc. It is an interconnection of various devices monitored and controlled using the Internet in order to provide ubiquitous computing services to the end-users.

  • IoT设备中存在的问题
    Because of the constraints such as — heterogeneity of devices, resource constraints, power storage, security, and data management constant revolutions are foreseen in IoT over the years. Among these, the security and privacy are most crucial keeping in view of the data access restric- tions at various levels in different applications [1].

  • 大量的IoT设备的产生
    Moreover, with an increase in the number of IoT devices, the data generated by these devices is increasing exponentially in recent years. As per the report [2], the number of IoT devices connected to the Internet at the end of Nov. 2019 was 26.6 billion and is expected to reach 75 billion by the year 2025.

  • IoT设备中存在隐私问题

    Moreover, all IoT applications are having sen- sitive information, for which security and privacy preservation are of utmost important. Also, devices are reluctant to transfer their data for training purposes in an open environment such as the Internet because of privacy concerns [3].

  • IoT系统中为什么要使用机器学习呢?

    1. Also, IoT system needs to be autonomous(自动运行) so that it can learn from the gathered data and make context-based decisions [4]. In such an environment, machine learning (ML) can be an effective tool in understanding the patterns, analyzing, processing, and making intelligent decisions.

    2. The ever-growing market for IoT demands the usage of ML-based models for accuracy and precision in the decision-making process. Implementing ML in IoT applications can significantly improve data analytics and real-time decision-making. Applications of ML in various IoT use-cases (e.g., smart transportation, smart grid, etc.) include network optimization, resource allocation, congestion avoidance [6].

机器学习市场的发展变化

Fig. 1© shows the global ML market share from the year 2017 to 2024 [5]. Technology advancements in ML and deep learning (DL) have changed the way a computer can process information automatically.

ai在IoT领域的应用(阅读这一部分的时候,可以看出来,本文的作者文献引用的情况太少了,一些需要引用其他案例的内容并没有引用)

  • For example, autonomous controllers based upon Artificial Intelligence (AI) can be used to optimize energy us- age [7].

  • predictive models for energy consumption including Markov’s decision process and NN’s can be incorporated with IoT- enabled devices [8].

  • 补充说明一下,为啥我觉得本文作者写的文章引用量太少;下面这么多篇幅的内容,却没有引用相关文献。

    support vector machine (SVM) provides effective data classification for blockchain peers and other transactional entities. Moreover, the supervised ML algorithms such as — random forest, gradient boost, etc. are used to reduce anonymity in the blockchain network. Recently, NN’s are also exploited to predict the price of cryptocurrency. With various computing models, ML can ease data verification, validation process and helps in identification of anomalies and malicious attacks in the blockchain network. Resource management, classification of transactional entities, and managing offloading tasks are some other applications of ML for blockchain.

引出区块链技术

With the centralized authority, threats of privacy preservation, false authentication, data tampering prevails. Also, the reliability of data is very important for ML algorithms in order to obtain accurate results. Even a small security loophole in the ML algorithm can generate high false rate for certain events. Moreover, the computations ofML models are dependent on the trusted third party (TTP) (e.g., a cloud service provider) for many security applications which may raise serious privacy concerns. Hence, there is a demand for decentralized framework based ML.

区块链公司发展、区块链与IoT结合的市场变化

Fig. 1(b) represents the percentage of startups in different industries focusing on blockchain in the year 2021 [9]. As per a report in [10], IoT blockchain 50 based spending is expected to reach $573M by 2023 as compared to $174M in the year 2018 (Refer Fig. 1(a)).

区块链可以用在IoT中的案例

Also, blockchain technology can provide many benefits to 5G IoT networks including secure authen- tication, secure communication, secure network coding, and resource configuration framework [11,12].

区块链对于机器学习的作用?

Moreover, blockchain can improve the performance of ML algorithms as it provides digitally signed data from reliable, trusted, and secure sources. The distributed computing powers can be utilized for developing a better and secure prediction model.

the adoption of ML in blockchain helps to analyze the existing issues in blockchain technology, enabling to enhance the security and privacy of the whole network.

上述图片是本文的总结性贡献,我觉得,1)如果我来绘制这张图片,我会在这张图片的基础上再添加上引用文献;2)ml 和 blockchain应该不属于上下层级的关系吧,应该分开去绘制。

以后我绘制这样的图片的时候,也可以去多找一些这样的基站的信息、图片。我觉得蛮高大上的。

区块链对于5G、B5G的作用?

With blockchain, 5G and B5G services can be more scalable as they support efficient solutions for spectrum sharing and resource management [14].

一、本文有哪些贡献

本文对IoT环境下的区块链和联邦学习结合进行了综合的分类

Then, we presented a comprehensive taxonomy for integration of blockchain and machine learning in an IoT environment.

本文探索了联邦学习、强化学习、深度学习算法在区块链上的应用

We also explored federated learning, reinforcement learning, deep learning algorithms usage in blockchain based applications.

最后,对这些技术在5G and B5G下的应用

Finally, we provide recommendations for future use cases of these emerging technologies in 5G and B5G technologies.

二、如何写一篇综述?(本文是怎么写的)

  1. 写作的方法
  1. 如何整理每一篇文章
  1. 本文的组织结构

文章1.1部分 展示的是调研方法;2部分讨论了有关ml和bc的其他调研;3部分讨论了ml和bc;4部分讨论了ml+bc,并将其分类为ML for blockchain and blockchain for ML;5部分给出了挑战;6部分给出了结论。

三、其他的相关综述文章

大部分的ml和bc是不相关的,

Existing literature work reveals that blockchain and ML are surveyed mostly in isolation or with their applications in several vertical domains.

其他相关的综述

  1. ML

    Specifically, the survey of ML models for big data analysis can be found in [16–18].

  2. BC

    Meanwhile, multiple notable works such as [19–21] provide the concepts, advantages, challenges, and future research directions of blockchain technology.

  3. BC + IoT

    The more recent survey articles in the context of blockchain applications for IoT have been presented in [22–25]

  4. ML + IoT

    whereas authors of [26–28] discuss the applications of ML models in various fields of IoT

  5. BC + ML + IoT

    • Several studies were put forward addressing the integration of Artificial Intelligence (AI) and blockchain.

      • For example, the authors of [29] presented a review article on the integration of AI and blockchain by discussing applications of blockchain for AI as well as AI for blockchain.
      • Likewise, Salah et al. [30] present the review on the literature and sum- marize the existing blockchain applications and protocols facilitating AI domain. Along with this, open research challenges of implementing blockchain for AI are also discussed by the authors.
    • However, only a few research efforts have been made on the integration of ML and blockchain, in order to provide decision-making service in an intelligent way while assuring security and privacy.

      • For example, Vyas et al. [31] discussed the role of blockchain in improving the accuracy of ML results for healthcare applications. However, authors presented a short survey article and in-depth knowledge cannot be gained with this article.
      • In the same way, Acheampong [32] presented an overview of the basic concepts of blockchain and ML by discussing the impact of blockchain in ML community.
      • More recently, authors in [33] conducted an inten- sive survey that focuses on a specific application of ML for blockchain, i.e., anomaly detection. Also, this article reviews the application of blockchain for privacy preservation in learning process.
    • 将ML应用到BC中

      • In contrast, authors of [15] presented a review to discuss the applications of ML in blockchain technology. Specifically, authors have reviewed ML for blockchain applications such as — transaction entity classification, Bitcoin price prediction, computing power allocations, cryptocurrency price prediction, and portfolio management
      • In another work, Nguyen et al. [34] presented a small section that discusses the efficiency of ML in improving blockchain cloud of things (BCOT) framework.
      • Very recently, Rane et al. [35] presented in-depth survey on available ML algorithms for predicting Bitcoin prices and concluded that existing schemes only achieve accuracy of 60%–70%.
      • Recently, Liu et al. [36] present a survey article that discusses overview, benefits, applications, open issues, and challenges while combining blockchain and ML. (这篇文章,按照本文作者的描述,应该数据ML和BC结合的文章,但是为啥在本段中进行展示呢,就离谱)

作者将上述找到的其他文献进行了下述表格的总结,这样的总结我觉得蛮好的。

四、先行知识基础

4.1 Blockchain

区块链的分类

  • private
  • public
  • consortium

区块链中智能合约的作用及发展

The applications of smart contract are not only limited to cryptocurrency but can be extended to many applications including voting systems, inventory management, automation of payments, automation of claims and blind auctions, etc.

  • Solidity: Solidity [42] is the most popular high-level programming language used for implementing smart contracts on the Ethereum platform. This language is influenced by C++, python, and javascript.
  • Serpent [43] is inspired from the Python language which focuses on delivering high productivity and automating tasks
  • After Solidity, Vyper [44] is the next most popular lan- guage for Ethereum virtual machine (EVM) having syntax in- spired from Python.
  • LLL (Lisp like language) is the first low-level language devel- oped after the assembler for EVM and it is a tiny wrapper over coding around the assembler itself. LLL provides direct access to memory in an execution environment and can be easily opti- mized for speed.

为什么IoT一定要使用去中心化?

Moreover, with an over-increasing deployment of IoT objects, security is of prime concern. Cloud computing has been widely used to support IoT for management, processing, and storage.

  • However, its centralized nature raises security questions. Centralized servers manag- ing sensitive IoT data can be shared with anybody without the user’s consent, thus leading to privacy breaches [45].

  • Also, the intermediaries decrease the efficiency of interactions among system components. Also, with an increase in the number of IoT devices, current centralized devices providing security services including authentication and autho- rization will turn into a bottleneck.

  • Moreover, the security vulnerability because of centralization is an easy target for Distributed denial-of- service (DDoS) attacks.

  • Additionally, to ensure data integrity presence of publically verifiable audits without involving a TTP is desirable. In this context, blockchain can mitigate security and privacy risks with its capabilities such as — transparency, immutability, anonymity, decentralization, and operational resilience [4].

如何解决IoT场景下的计算资源、存储问题?

  • However, to support resource-constrained nature of IoT devices blockchain provides the concept of Simplified payment verification, in which nodes need not to store complete blockchain data rather only block headers. In this context, Le and Mutka [46] proposed a lightweight method to validate blockchain data using bloom filter (probabilistic data structure).
  • Similarly, authors in [47] presented a proposal that integrates blockchain with constrained IoT devices. The evaluation of the proposal is carried out in terms of memory, processing time, and power consumption.

区块链在IoT场景下应用,需要解决的问题!!(本文作者的总结)

However, high computation, storage costs, high energy demands, communication hurdles, mobility of devices, and latency are some of the challenges faced while integrating blockchain with IoT. In an IoT network, devices generate gigabytes of data in real-time. Due to lack of storage blockchain might appear unsuitable for IoT networks. The limited resource IoT devices are also unsuitable for highly com- putational PoW consensus algorithm. Hence, the scalability issue of integrating blockchain and IoT needs an immediate effective solution. Also, different characteristics of IoT network such as — heterogeneity, wireless communication and mobility complicates the security chal- lenge. Moreover, the transparency supported by IoT can affect the privacy of data. Last but not the least, lack of regulations and standards can influence the future of blockchain and IoT.

4.2 Machine Learning

Machine Learning 介绍,可以瞅一瞅,反正大致就是那一套。

ML is a branch of AI that makes programming machines to perform particular tasks by learning. With time, ML models have been able to exceed humans in various problems. Particularly, previous experience is used to execute assigned tasks. ML algorithms have proved their sig- nificance in various areas such as — transportation, image processing, marketing, etc. ML includes various models to solve different types of problems. The most commonly used ML models involve SVM, Artificial Neural Networks (ANN), decision trees, etc. to name a few. Building a new ML model involves two steps, i.e., training and testing in order to perform tasks of prediction, classification, clustering, etc. on new dataset. Indeed, data is an important source in ML. The data is required in preprocessing and training any ML model. First, the ML model is trained with a training dataset. With the increase in size of training data, the efficiency of ML classifier also increases [48]. Next, after the training phase, the accuracy of the prediction is evaluated with a new dataset. In case of acceptable accuracy, the ML model is deployed otherwise it is trained again. In recent, a popular subcategory of ML named deep learning (DL) has emerged to imitate the human thinking process. The fundamentals of DL have been originated from cognitive theories that are used to create NN structure. Popular applications of DL include object detection, face recognization, and traffic flow prediction to name a few [49]. Supervised learning, unsupervised learning, and reinforcement learning (RL) are three categorizations of learning styles in ML al- gorithms. In supervised learning, the machine is trained with well labeled data, i.e., the data is already mapped with the correct an- swer. Next, the machine is fed with a completely new set of data to generate correct results from analyzing the labeled data from training phase. Furthermore, supervised learning is divided into two categories that include classification and regression. SVM, decision trees, nearest neighbor, etc. are popular algorithms under this category. In contrast, unsupervised learning is training the machine with input data that is not labelled or classified. Specifically, the aim is to group unsorted data as per similarity and difference such as — pattern detection and descriptive modeling. Clustering and association are two categories of unsupervised learning [50]. K-means clustering and Principle Compo- nent Analysis (PCA) are popular algorithms under this category. In RL, an agent is employed to interact with the environment in order to find best outcome by continuously learning from the environment. RL uses trial-and-error method to train itself when exposed to a certain environment. Markov’s decision process is a popular example of RL. Notably, there are vulnerabilities in ML models system with respect to privacy and security.

本文作者认为ML中存在的安全攻击

Security attack in ML mainly includes evasion and poisoning attacks. Evasion attacks disrupt the entire classification process using adversarial examples whereas the poisoning attack destroys the data while training phase, which can decrease model accuracy [51]. On other hand, the privacy attack on ML model comes from service providers and third-party entities. Clearly, the development of ML mod- els empowers to launch new AI services including facial recognition and words suggestion. Nevertheless, the dataset provided to support these applications often includes sensitive and private information

五、BC + ML + IoT

本节按照下面的思维导图的结构来编写

5.1 Blockchain for machine learning

Blockchain for ML can solve the problem of data acquisition

  • With blockchain, the ML algorithm can be fed with highly reliable data and thus accurate and trusted results can be achieved. Also, training ML models with real-data will enhance the accuracy and efficiency of ML algorithms. The built-in consensus mechanism and fundamentals of blockchain ensure secure and tamper-proof sharing of IoT data.
  • Moreover, the existing client-master type ML models rely on trusted central servers and consider only privacy issues in linear sharing and ignore privacy in non-linear learning models. In the client-master model, an enormous amount of data generated by IoT devices is collected and stored at one central location whereas, in the distributed multi-party model, data is generated by various parties and stored in a distributed manner. However, the decentralized model incurs high communication costs and raises security and privacy issues. The transparency feature supported by blockchain, assures ML users confidentiality and privacy of data. As discussed, more amount of data available for training improves the overall throughput and produces a more effective and reliable system. Clearly, blockchain in ML can result in much safer data and better ML models.
  1. 为什么作者认为 「only privacy issues in linear sharing and ignore privacy in non-linear learning models」呢?这个问题我没法自己解答。 线性模型与非线性模型的区别?https://zhuanlan.zhihu.com/p/37866896
  1. The transparency feature supported by blockchain, assures ML users confidentiality and privacy of data.

Emmm, for me, 我知道区块链的透明指的是在区块链上的操作是公开透明的,这就避免了数据被篡改;但是,我不明白,为啥能确保ML用户机密性和数据隐私。

Blockchain for machine learning的相关应用

5.1.1 去信任(trustless)的机器学习合约

使用区块链的智能合约来构建机器学习的激励机制,即充分利用区块链的去信任化

The proposal introduced by [56] implements the concept of trustless ML contract and it is defined in 3 phases. In the first phase, a dataset, an evaluation function, amount of reward, and request for best ML model is submitted by the reward giver/buyer. In the second phase, the provided dataset is downloaded by ML model providers/practitioners and each provider works independently in order to train the ML model. After training, the providers submit their model. In the last phase, the winner is selected. Moreover, such a proposal can be utilized for raising funds transpar- ently for IoT applications such as — medical research. In addition, it can achieve automated self-improvement for AI agents. Unfortunately, this proposal [56] does not require identity and reputation validation for creating a new transaction and hence raises security concerns. Also, this proposal works only for Ethereum blockchain. Fig. 8 represents an illustration of trustless ML contracts.

5.1.2 ML计算中的分布式信任

本节主要强调的是,使用区块链可以解决传统分布式机器学习中的中心化问题,强调的是使用区块链的去中心化

Another matter to be considered in the context of ML is that these algorithms lack trustability and automation.

  • Notably, it is difficult to trust results from trained ML models having open source code and open data in an IoT environment.

  • In fact, multi agent socio-technical systems (which work collaboratively on some tasks, share models and data for local computations) due to the involve- ment of independent agents face trust issues in computations from other agents. In

中心化的系统存在数据篡改威胁

As ML algorithm relies on data that is mutable, so it is difficult to trust the results from these algorithms. The system administrator can manipulate the data source that in return changes the result.

目前的ML模型大都是人工的,缺少自动化。那怎么建立一个信任的、透明的协作计算平台呢?用密码学技术!

Also, existing ML models are mostly controlled by human beings so it is difficult to automate the ML algorithms. Hence, there is a need for developing an environment having trust and transparency in computations for collaborative op- erations. To solve this problem, zero-knowledge proof, Elliptic-curve cryptography (ECC), etc. are some cryptographic techniques that are effective in the verification and validation of computations [73,74].

  • In this context, Raman et al. [57] proposed a model for verification and validation of computations in a permissioned blockchain network for multi-agent socio-technical system. Authors have demonstrated the usage of blockchain in developing trust for recording and validating audit at each step of computations.

    • However, due to lack of scalability large scale computations for a multi agent network prove expensive.

      For this, the authors have used a lossy compression technique that reduces the communication and storage cost of the blockchain network.(这篇文章就是模型压缩的相关文章,回头可以去看一下)

  • Similarly, authors of [62] established a link between ML and blockchain technology in order to solve trustability and automation issues of ML by using association rule mining.

5.1.3 用与Ml models上的可验证的开放仓库(Verifiable open repository of ML models)

用ML来作为区块链挖矿的过程(即挖矿节点上的“可验证”)

比如,使用训练的过程来替代区块链的共识算法。但是,我怎么感觉这里不属于 区块链为ML做的事情呢,emmm;有点像ML为区块链做的事情,emmm;本文是不是指的是使用区块链来构造这么一条MLmodel链呢? 这一段是不是强调使用区块链来构建一个ML框架的事情呢?

这一章,同时介绍了,使用区块链(智能合约)来为ML做一些工作的时候需要了一些困难,以及相关的解决文献。

Pow共识算法的缺陷?

Among all research work on consensus algorithms, Proof-of-Work(PoW) is the widely accepted technical consensus algorithm use to settle among all participating nodes. However, the PoW consensus algorithm proves costly and environmental unfriendly due to the high computations involved in it. After PoW many other consensus algorithms such as — Proof-of-Stake (PoS), Proof-of-Activity (PoA) were introduced in order to reduce computations while mining blocks.

  • In this context, the authors of [58] introduced a cryptocurrency named ‘‘WekaCoin’’ that is based on Proof-of-Learning (PoL) consensus algo- rithm. PoL is inspired by open-source ML competitions (e.g. Kaggle and CodaLab). Among all network nodes, some nodes called trainers upload ML models on blockchain network for tasks that were submitted by other nodes called suppliers. (The model initiator may upload their model on a Interplanetary file system (IPFS) system and in return receives checksum hash.) The uploaded models are then tested for data that was not considered by trainers while training. The validator nodes which are selected randomly are then supposed to rank these models and add the information to the block. The trainer nodes having the best model are rewarded with WekaCoins by supplier nodes. This way blockchain can be used for generating verifiable ML models. The flowchart for the understanding of PoL algorithm is presented in Fig. 9. The main advantage of this protocol is that the computations involved in the validation process solve useful tasks as well as creates a validated open repository for ML models and datasets. However, the authors have not discussed the prevention of collusion among suppliers, trainers, and validators.
  • In contrast to the permissionless blockchain, authors of [69] developed privacy preserving distributed ML model based on permissioned blockchain network. This is, however, a first attempt to propose a distributed ML model for a permissioned blockchain network. Decentralized ML allows machines to perform intelligent decision-making on data securely stored on the blockchain network without involving any TTP. The decentralized ML technique allows algorithms or ML models to run directly on connected mobile devices. This distributed technology is smart contract based marketplace that connects developers, clients, and data owners by facilitating all stakeholders in a way to create a middle-man free ML infrastructure. The authors demonstrated that the impact of proposed error based aggregation rule supports high resilience and mitigates collusion attack.
  • However, latency and bandwidth are the major drawback of distributed ML [75]. To improve network condition, 5G technology can be adopted as it enables high availability. In this direction, to ensure byzantine resilience for distributive learning in five networks, authors in [70] have proposed a blockchain based secure computing framework. By using a sharding based blockchain, authors have prevented arbitrary attacks on learning convergence.

智能合约存在的问题?

  • 智能合约不能执行太重的任务

    However, authors of [76] pointed out that ML programs cannot be stored with blockchain because of the certain limitations of smart contracts. The authors pointed out that smart contracts cannot process high computational tasks.

这一段内容的思想表明,计算所带来的损失消耗会影响挖矿的程度(这与我的综述文章的思想是一致的,我觉得可以引用一下)

With the blockchain mining process, when output corresponding to any input is expected to be recorded via smart contracts, honest miners then execute the program to verify the correctness of results. In case of a computationally high process, adversarial nodes can skip and carry forward to verify the new block. This way adversarial nodes can get a chance of adding new blocks as honest participants are busy with the execution of smart contracts.

  • 另外,智能合约不能执行随机数。

    Moreover, the smart contract cannot carry randomized computations as with randomization honest nodes can have inconsistent output. Besides, as ML computations are costly and randomized, so ML tasks are difficult to execute with blockchain. To address this challenge, the authors of [76] have used a game theory approach that empowers randomized computations on the top of blockchain. Here, a simple incentive mechanism is designed in order to execute the program with crowdsourcing in a blockchain environment.

5.1.4 隐私保护(Privacy preservation)

使用区块链来解决ML中遇到的隐私问题,这里强调的是使用区块链的“密码学技术、不可篡改”等内容吧?不能确定

比如: 为了保护上传时的隐私、使用区块链来保护联邦学习的安全性(但是这一条,我觉得是ML为区块链做的工作吧,隐私保护,emmm。区块链也能够保护联邦学习,但是这里体现的是隐私保护吗?)

为什么ML遇到了隐私保护的问题?

Another matter to be considered in the context of ML is the privacy preservation of data. For example, ML healthcare predictive modeling has proved beneficial in national healthcare research and biomedical discoveries. However, data disclosure of patients to these third-party cloud services leads to privacy attacks. The available distributed privacy preserving predictive models are dependent on the central server to execute the modeling process [77].

下面的这句话,应该不足以支撑本段观点吧,emmm

Institutional policies, single point of failure, mutable disseminate data, and trust issues are some associated risks with the existing client–server architecture. Moreover, any participating node cannot leave or join the network for a short period of time in order to avoid any recovery issues.

作者的结论为:The state-of-art research has adopted blockchain technology in order to deal with the above-mentioned risks. The characteristics of blockchain technology make it suitable to deal with centralized privacy preservation models.

但是我个人觉得,上述内容并不能证明区块链能保护数据的隐私吧,emmm

ML中存在单点故障、等问题?

Institutional policies, single point of failure, mutable disseminate data, and trust issues are some associated risks with the existing client–server architecture. Moreover, any participating node cannot leave or join the network for a short period of time in order to avoid any recovery issues.

答:Blockchain avoids a single point of failure, Byzantine General, and Sybil attack problem and preserves privacy while predicting the modeling process.

本文给出的区块链能保护ML隐私的案例

  • In this context, Kou et al. [59] have presented Modelchain, a private blockchain that enabled privacy preserving pre- dictive modeling for the healthcare industry. Instead of relying on only PoW protocol, the authors have designed a new algorithm on the top of PoW named proof of information to increase the efficiency and accuracy of ML model. Unfortunately, the proof of information algorithm proves inefficient to deal with the scalability of the network. The result section demonstrated that Modelchain provides a secure and privacy preserving interoperability framework. Unfortunately, privacy preservation is provided but the authors of [59] did not consider the basic requirements for differential privacy as differential privacy based ML has to consider the fact that how many times a ML model can be trained without any privacy breach.

    没有太看懂本文作者解释的上述文章的问题!!

  • Subsequently, Chen et al. [65] proposed another decentralized ML system called ‘‘Learningchain’’ that takes both linear and non-linear learning models in account without relying on the central server. Here, differential privacy based methods are also designed to preserve the privacy of data. Differential privacy or cryptographic solutions have proved to be efficient for preserving user’s data privacy [78–80]. This model is implemented on the Ethereum platform and a stochastic gradient descent algorithm is used to design a predictive model over blockchain.

    The proposal works in 3 phases. In the first phase, a P2P network is initialized. In the second phase, data holders calculate their local gradients as per predefined common loss function and predictive model using differential privacy methods. Next, computed gradients are broadcasted in the network using differential privacy scheme for learning models. After reaching a consensus, local gradients are aggregated by the authority holder using Learningchain. Three different datasets were used for training and testing purposes, i.e., synthetic dataset, Wisconsin breast cancer dataset, and Modified Na- tional Institute of Standards and Technology database (MNIST) dataset. It is concluded in results that there is a trade-off between privacy and accuracy as lowering the privacy budget increases test errors.

为了保护上传数据时的隐私问题。

With the growing trend of DL models, many DL models are designed to be run on client devices such as — IoT devices or smart devices. Although this technique demands enough memory and disk space to run the models in real-time. Also, because of privacy concerns, it is not recommended to upload client data on a centralized machine for processing and executing ML algorithms.

Along the same line of thought, to preserve privacy while uploading client ML data, authors of [61] proposed another work. Singla et al. [61] proposed a blockchain-based system that stores client device profiles in a shared household to predict user activity. Here, the main aim is to enable automatic customization of each client using blockchain decentralized security and privacy. The personalization feature of each device is computed using rule mining. However, this proposal is based on the assumption that client preferences are not changing.

解决协作数据分享问题

Similarly, to solve the challenge in collaborative data sharing among multiple parties in IoT applications, Lu et al. [66] proposed a privacy preserving data sharing model using differential privacy methods.

引出联邦学习

However, rather than sharing raw data directly, the federated learning algorithm is utilized into permissioned blockchain network through which only data model is shared over decentralized multiple parties. In a centralized ML model, participants upload their data on central cloud server. The server performs all computational tasks for training on the data as shown in Fig. 10(a).

吹一波联邦学习

This model involves high risks of privacy attacks. Also, communication overhead is created between participants and the cloud server. In contrast, federated learning enables ML models to be computed on distributed mobile devices. This technique helps ML models to be trained on the devices where data is produced. This way the privacy of data is ensured as data of a particular device does not leave its data production place. This technique is disrupting the centralized way of data training.

联邦学习的过程

In federated learning, each device has its local training dataset that is never seen by the server and each device generates an update to the existing global model located at the server. Next, the server combines these models by aggregating them and the whole process is repeated until global model training is completed. The primary benefit of federated learning is the decoupling of the training phase from the requirement of direct access to raw training data. The process of federated learning based model is represented in Fig. 10(b).

为什么联邦学习要和区块链结合在一起

Therefore, it minimizes training and privacy risk. However, the usage of a single central server is vulnerable to a single point of failure. Moreover, there is no reward service for distributed devices. Notably, the devices with more data samples should be given reward as it con- tributes more to global training. With blockchain, verified local updates and exchanges can be enabled along with providing corresponding rewards proportional to the size of training sample size. The illustration of blockchain based federated learning has been represented in Fig. 11.

联邦学习和区块链结合时遇到【假装拥有数据】攻击及解决案例

Unfortunately, the federated learning technique fails to provide security in case of the presence of Byzantine nodes. If an attacker, pretends to be a real data holder and breaks down the security of system, such an attacker is called Byzantine attacker.

  • In another work, Zhu et al. [67] also presented a blockchain based privacy preserving method for securing updates and achieving consensus in federated learning.Here, blockchain technology is adopted to deal with Byzantine devices in the network. In particular, only updates are added in blockchain transaction records. Along with broadcasting digital signatures of a node, other information such as — hyper-parameters, difference in weights, and public ID’s are also broadcasted. The other participants of the network validate the broadcasted transactions as per their local datasets. If majority of the participants approve that the performance score of the updated model is greater than the existing models then updates are added to the model.
  • Similarly, Doku et al. [63] also integrated blockchain technology and federated learning to improve the quality of data. Here, the hash of mobile device data is stored on blockchain whereas data still remains on the user’s device, and only the locally analyzed results will be shared with ML practitioners via a secure network. In addition to this, incentives will be provided to data owners.

使用区块链来加强联邦学习的安全性

  • Additionally, in order to enhance the security of federated learning, the authors of [71] proposed a framework based on blockchain in order to verify and exchange local learning models. This scheme aims to activate on-device ML involving any centralized server. A reward mechanism is also proposed for user and miner node participation. Additionally, authors have evaluated end-to-end average learning completion latency.
  • In a closely related work, authors of [72] proposed federated learning with multi-access edge computing and blockchain technology. Here, edge devices are employed to provide resources to mobile devices and also to act as blockchain nodes. Here, a separate channel is dedicated for learning of every global model in the blockchain network.
    • Unfortunately, in this proposal user devices are dependent on the integrity of corresponding edge nodes for sending transactions to blockchain networks. Additionally, no reward mechanism for user and miner nodes is designed by authors.

保护ML过程的数据安全

  • Also, authors of [81] leveraged suite of ML to support data exchange on the blockchain via smart contract for a distributed data vending architecture. Particularly, data embedding and distance metric learning approaches of ML research are used to enable retrieval of smart con- tracts without affecting the integrity of private data. Here, the signature of data entry is generated using data embedding procedure with privacy preservation, and further signatures are taken to measure similarity among data entries.

  • In an alternative work, the authors of [64] also proposed blockchain based model named ‘‘secureSVM’’ for privacy preserved sharing of data while training ML algorithms. Here, IoT data generator encrypts data on the local device by their private key, and this encrypted data is stored on blockchain. The experimental result proves that incorporating blockchain with SVM classifier improves the accuracy of the system model.

    上面这两个案例没有看懂!!!

5.1.5 ML数据上的加密安全(Cryptographic security on ML data)

使用区块链来保护ML使用的数据的安全访问。

但是,我怎么感觉这一章节的内容在上一个章节中已经提过了呢,emmm

  • 使用基于区块链的访问控制管理器实时安全地访问存储在不同地方的数据

    是不是可以理解为 ML使用数据的安全访问控制???

    Classification of IoT data with black-box concept, questions the type of data being collected. Hence, the system needs to attain con- fidentiality, integrity, anonymity, and secure access on data. Authors of [60] have used blockchain in retraining stacked denoising autoen- coder (SDA) algorithm for arrhythmia classification. Retraining is used to solve non-stationary nature of ECG data because it enables deep net- work in learning any new distribution at specific time intervals whereas SDA has the feature of taking different relevant features from data samples. Here, patient data stored on external storage that is collected by retraining SDA algorithm are securely accessed using blockchain based access control manager in real-time. A scenario of blockchain based secure access control on ML data has been represented in Fig. 12.

  • 这个是研究区块链在CNN网络结构中的作用吗?看不太懂

    More recently, Goel et al. [68] experimentally investigated the role of blockchain in providing authenticity to each block of Convolutional Neural Network (CNN) model. In CNN, each convolution layer is referred to as a block and the authors pinpointed the accountability of each block for correct output. To this end, blocks of CNN are kept in random order and neighbor blocks have the information regarding the next legitimate block. Indeed, hiding the architecture of the network from attacker, mitigates the threat of white box adversarial attack. Also, this scheme enhances transparency between blocks and the entire network. Unfortunately, the complexity of the system is quite high.

  • 使用区块链来为ML提供匿名性

    Another potential application of blockchain for ML is in providing anonymity. As discussed earlier, if the data is stored anonymously, it is hard to link the true identity of the person. Authors of [82] pointed that the facility of pseudo anonymity provided by blockchain can encourage the use of ML on anonymous dataset. Researchers can now use massive datasets for their research in order to improve the prediction results of healthcare system. However, along with anonymity, encrypting data could enhance security of the system. To address this challenge, homo- morphic encryption was introduced that has the ability to execute ML operations on encrypted data [83].

本章内容的总结

Summary and insights Section 4.1 focuses on various applications of blockchain technology targeting ML areas for IoT environment. Incorporating blockchain technology in ML provides reliable sharing of data for different tasks of ML including prediction, forecasting, voice, and speech recognization to name a few. However, we have made several observations after reviewing and tabulating the literature. Clearly, with trustless ML contracts, trustless rewards can be provided to the best ML model.(去信任的奖励) However, there are some risks in the proposal that need to be deal with. For example, the organizer may deny to reveal the testing dataset which may stop submitters for their work as no evaluation function would be available then. Moreover, based on selection criteria, the reward money can be claimed by the first submitter fulfilling evaluation criteria. Hence, the reward mechanism can be evenly distributed in order to incentivize more participation. Also, it has been observed that nodes are still reluctant to host data on IPFS blockchain data storage. So, future work should consider these problems before designing revised trustless ML contracts.

Additionally, it has been observed that most of the proposals have leveraged public blockchain which makes data generation speed slow. Hence, a fast data stream situation in blockchain is another important topic of research. Moreover, it has been observed that researchers have not considered the confidentiality of ML data in the blockchain network.

我有个问题,如果没有保护区块链网络中的ML数据保密性,那上面介绍的数据隐私保护方法是什么呢?搞不懂,emm,另外,感觉本文作者总结的并不是很好呀,emmm

5.2 ML for blockchain

For blockchain, ML can solve issues of uncertain and complex features.

  • 作者认为:在区块链环境中,IoT传感器中产生数据。这些数据可以使用ML算法进行分析。

    In a blockchain environment, the data gathered from IoT sensors can be analyzed and monitored at multiple points by ML models for efficient decision-making [98].

  • 有文献提出 「blockchain thinking」The main aim is to utilize the frame- work of blockchain for initiating thinking machines.

    Following an emerging trend rendered by the adoption of ML in blockchain, Swan [99] introduced a new term called ‘‘blockchain thinking’’ that enables accommodating thinking on blockchain network. The main aim is to utilize the frame- work of blockchain for initiating thinking machines. In such a type of framework, input involves sensor data. Further, the input data is processed at a specific location to generate output that includes storing information to memory or taking a specific action. This process involves ‘‘personal thinking chains’’ that signify backup of full human mind files.

    • 为了实现区块链思考,可以结合IPFS技术(但是,为什么放在第一段呢?)

      To implement the blockchain thinking process IPFS could be relevant as it eases P2P file serving system [100]. Notably, the research work of ML is entirely data-driven. This data can be shared via a central resource or a distributed file system. Using central repository will be inefficient with the increase in the number of users. On the other hand, IPFS is a distributed file system to store data files in a decentralized manner. Also, each file in an IPFS is assigned a unique fingerprint called cryptographic hash. IPFS will disseminate data files with a list of trusted nodes and the data will be available to other users using content identifiers.

将ML用于区块链的相关研究比较如下6、7、8、9表

深度强化学习和区块链的相关比较,在资源管理和计算卸载领域,基于交易

相关的ML模型比较(价格预测)

基于区块链和联邦学习的价格预测的比较(基于交易)

5.2.1 Resource management and computational offloading

本章的主要背景:在IoT系统中遇到了一些资源浪费等问题,为了解决这个挑战,一些灵活的资源管理框架将blockchain和ML结合在一起。

Resource management is the process of scheduling and allocating resources in order to maximize efficiency of the IoT system. Energy consumption, transparency, operational expenditure, request scheduling, latency, content caching, and security are some of the issues involved in the realization of resource management process [101]. To address this challenge, few secure and flexible resource management framework has been developed in literature by integrating blockchain and ML.

引出深度强化学习

A blockchain based platform possesses the capability to store all records of transactions related to resource management in a distributed and transparent data structures. However, to increase the efficiency of the network, ML models can be experimented with blockchain. In particular, deep reinforcement learning (DRL) has been extensively used with blockchain to achieve resource management tasks. DRL technique has the capability to handle dynamic and large dimensional features of IoT. The main concept behind DRL is that similar to a biological agent, an artificial agent may learn from interaction with its surroundings to take further decisions. By interacting with the environment, the agent gathers experience to optimize objectives served in the form of cumu- lative rewards.

  • For example, authors in [86] have used DRL method for maximizing transactional throughput of the blockchain network. In particular, DRL selects block producers, block size, and block interval to adjust the dynamic features of the Internet of Vehicles (IoV) scenario

  • Also, in order to achieve resource management for tasks such as — content caching, computation offloading, spectrum sharing, etc., the authors in [85] have utilized DRL. Specifically, this scheme has utilized DRL for the Device-to-Device (D2D) caching scheme that matches the caching supply and demand pairs to maximizes the network utilities of consortium blockchain enabled framework. Notably, DRL based caching scheme optimizes bandwidth between caching requester and provider. It has been demonstrated in the results that cumulative average system utility has been improved. However, this proposal has not discussed the mining procedure.

  • Meanwhile, when embedded with smart contracts, ML helps to minimize the energy expense in cloud data centers (DC’s) as discussed by the authors of [84]. Here, the smart contract facility of blockchain migrates the requests and virtual machines to the cloud DC’s with minimum load, and RL method based request migration is used for energy cost minimization as this method does not require any prior knowledge. Fig. 14 represents the blockchain and ML empowered resource management scenario for smart grid networks. Here, all com- putation intensive tasks including caching, billing, demand-response management, etc. are implemented at edge layer of the network due to resource constraints. Notably, learning capable ML agents employed on edge devices are responsible for implementing effective caching, computational offloading, scheduling, and real-time decisions on the edge devices. Moreover, mobile base stations used to transfer data to edge devices also have ML models running on them for scheduling computational or storage requests.

ML对区块链的另一个应用是在移动区块链网络中的卸载(指的是:移动设备的计算能力有限,)

Another perspective application of ML for blockchain is in offloading approaches for the mobile blockchain networks. With the introduction of mobile technology, the blockchain network can now be easily used with mobile devices so that more flexible blockchain applications for IoT can be developed. However, with mobile systems, resource-constrained IoT devices face difficulty while mining blocks. In this context, mobile edge computing facilitates high computational tasks for mobile devices. However, there is a challenge of effectively allocating available edge computing resources to miners. Mobile de- vices can offload their high computational tasks to the assigned mobile edge/cloud server. With a motive to enhance the performance of the system, literature contains multiple offloading approaches.

  • For example, convex optimization model, and game theory approaches has been used by authors of [119–123] that minimizes task execution latency. Nevertheless, these methods fail for highly complex online models and also they demands prior knowledge about the system. To solve this issue, RL can be used where a learning agent is employed to derive an optimal solution for computational offloading via trail-and-error method. Moreover, this solution does not require prior system statistics knowledge.

  • However, for high dimensional computational offloading challenges, RL solution also gets fail due to high dimensions of state and action space as pointed by work in [124,125]. To deal with high dimensional data, the use of DRL is beneficial and some literature work has demonstrated the scalability and offloading efficiency of DRL in blockchain based edge computing applications. DRL can achieve an optimal offloading strategy based on past experiences of offloading. Both of the proposals in [87,88] were designed to preserve users’ privacy and to achieve security as an optimization problem. By using DRL method, performance metrics including computational latency, energy consumed, and privacy level were analyzed proving feasibility of the proposed scheme with reduced offloading latency and minimum energy consumption

  • 上述的样例只是避免了在挖矿过程中的计算卸载

    The above-discussed offloading approaches are designed only for mining tasks whereas data processing tasks are ignored. In contrast, the work in [89] has discussed computational offloading for both mining and data processing tasks combining DRL and genetic algorithms. Additionally, Markov decision process has been used to handle the dynamic environment.

  • However, to implement DRL method for offloading decisions, the major challenge is to achieve convergence and accuracy of deep NN. Also, there is a need to develop effective resource allocation on mobile blockchain. To address this challenge, authors of [102] designed a multilayer NN supported auction mechanism for re- source

    以上是关于阅读笔记Blockchain management and ML adaptation for IoT environment in 5G and beyond ...的主要内容,如果未能解决你的问题,请参考以下文章

    资源Blockchain 区块链中文资源阅读列表

    课后作业-阅读任务-阅读笔记-2

    The Science of the Blockchain学习笔记

    虾说区块链-83-blockchain笔记二

    虾说区块链-84-blockchain笔记三

    Blockchain-based Edge Computing for Deep Neural Network Applications学习笔记