在 Django 后端使用 Clamav 设置文件上传流扫描

Posted 2023-02-16

技术标签:

【中文标题】在 Django 后端使用 Clamav 设置文件上传流扫描【英文标题】：Setting up a file upload stream scan using Clamav in a Django back-end 【发布时间】：2018-05-24 00:14:40 【问题描述】：

正在开发 React/Django 应用程序。我有用户通过 React 前端上传的文件，这些文件最终在 Django/DRF 后端。我们在服务器上不断运行防病毒 (AV)，但我们希望在将其写入磁盘之前添加流扫描。

如何设置它有点让我头疼。以下是我正在查看的一些来源。

How do you virus scan a file being uploaded to your java webapp as it streams?

虽然公认的最佳答案描述了它“...非常容易”设置，但我正在苦苦挣扎。

我显然需要 cat testfile | clamscan - 每个帖子和相应的文档：

How do you virus scan a file being uploaded to your java webapp as it streams?

如果我的后端如下所示：

class SaveDocumentAPIView(APIView):
    permission_classes = [IsAuthenticated]

    def post(self, request, *args, **kwargs):

        # this is for handling the files we do want
        # it writes the files to disk and writes them to the database
        for f in request.FILES.getlist('file'):
            max_id = Uploads.objects.all().aggregate(Max('id'))
            if max_id['id__max'] == None:
                max_id = 1
            else:    
                max_id = max_id['id__max'] + 1
            data = 
                'user_id': request.user.id,
                'sur_id': kwargs.get('sur_id'),
                'co': User.objects.get(id=request.user.id).co,
                'date_uploaded': datetime.datetime.now(),
                'size': f.size
            
            filename = str(data['co']) + '_' + \
                    str(data['sur_id']) + '_' + \
                    str(max_id) + '_' + \
                    f.name
            data['doc_path'] = filename
            self.save_file(f, filename)
            serializer = SaveDocumentSerializer(data=data)
            if serializer.is_valid(raise_exception=True):
                serializer.save()
        return Response(status=HTTP_200_OK)

    # Handling the document
    def save_file(self, file, filename):
        with open('fileupload/' + filename, 'wb+') as destination:
            for chunk in file.chunks():
                destination.write(chunk)

我想我需要在save_file 方法中添加一些内容，例如：

for chunk in file.chunks():
    # run bash comman from python
    cat chunk | clamscan -
    if passes_clamscan:
        destination.write(chunk)
        return HttpResponse('It passed')
    else:
        return HttpResponse('Virus detected')

所以我的问题是：

1) 如何从 Python 运行 Bash？

2) 如何从扫描中接收结果响应，以便将其发送回用户，并且可以通过后端的响应完成其他事情？（例如创建逻辑以向用户和管理员发送一封电子邮件，说明他们的文件有病毒）。

我一直在玩这个，但运气不佳。

Running Bash commands in Python

此外，Github 存储库声称将 Clamav 与 Django 结合得很好，但它们要么多年未更新，要么现有文档非常糟糕。请参阅以下内容：

https://github.com/vstoykov/django-clamd

https://github.com/musashiXXX/django-clamav-upload

https://github.com/QueraTeam/django-clamav

【问题讨论】：

文件的一部分不太可能被检测为病毒。扫描仪可能需要整个文件。 【参考方案1】：

好的，可以使用 clamd。我将SaveDocumentAPIView 修改为以下内容。这会在将文件写入磁盘之前对其进行扫描，并在它们被感染时防止它们被写入。仍然允许未受感染的文件通过，因此用户不必重新上传它们。

class SaveDocumentAPIView(APIView):
    permission_classes = [IsAuthenticated]

    def post(self, request, *args, **kwargs):

        # create array for files if infected
        infected_files = []

        # setup unix socket to scan stream
        cd = clamd.ClamdUnixSocket()

        # this is for handling the files we do want
        # it writes the files to disk and writes them to the database
        for f in request.FILES.getlist('file'):
            # scan stream
            scan_results = cd.instream(f)

            if (scan_results['stream'][0] == 'OK'):    
                # start to create the file name
                max_id = Uploads.objects.all().aggregate(Max('id'))
                if max_id['id__max'] == None:
                    max_id = 1
                else:    
                    max_id = max_id['id__max'] + 1
                data = 
                    'user_id': request.user.id,
                    'sur_id': kwargs.get('sur_id'),
                    'co': User.objects.get(id=request.user.id).co,
                    'date_uploaded': datetime.datetime.now(),
                    'size': f.size
                
                filename = str(data['co']) + '_' + \
                        str(data['sur_id']) + '_' + \
                        str(max_id) + '_' + \
                        f.name
                data['doc_path'] = filename
                self.save_file(f, filename)
                serializer = SaveDocumentSerializer(data=data)
                if serializer.is_valid(raise_exception=True):
                    serializer.save()

            elif (scan_results['stream'][0] == 'FOUND'):
                send_mail(
                    'Virus Found in Submitted File',
                    'The user %s %s with email %s has submitted the following file ' \
                    'flagged as containing a virus: \n\n %s' % \
                    (
                        user_obj.first_name, 
                        user_obj.last_name, 
                        user_obj.email, 
                        f.name
                    ),
                    'The Company <no-reply@company.com>',
                    ['admin@company.com']
                )
                infected_files.append(f.name)

        return Response('filename': infected_files, status=HTTP_200_OK)

    # Handling the document
    def save_file(self, file, filename):
        with open('fileupload/' + filename, 'wb+') as destination:
            for chunk in file.chunks():
                destination.write(chunk)

【讨论】：

到目前为止，此实现对您的效果如何？你会推荐这种扫描图像/PDF的方法吗？效果很好，没有任何问题！我们只有几个被感染的试图上传的文件。发生时向我们发送电子邮件，以便我们与他们联系。很高兴听到这个消息。我现在只使用 PDF，所以我使用的是 PDFiD（但正在考虑添加 ClamAV）。为什么你认为没有更高评价/活跃的 python 包来集成 ClamCV？这似乎是许多 webapps 应该有的东西——我很惊讶在过去的 3 年里我不得不努力挖掘这个帖子来讨论它。

以上是关于在 Django 后端使用 Clamav 设置文件上传流扫描的主要内容，如果未能解决你的问题，请参考以下文章