如何实时跟踪视频中两个对象相交的时间？

Posted 2023-03-16

技术标签:

【中文标题】如何实时跟踪视频中两个对象相交的时间？【英文标题】：How to keep a real-time track of how long two object intersect in a video? 【发布时间】：2021-05-22 02:41:11 【问题描述】：

我正在制作一个刷牙应用程序，我想在其中跟踪一个人刷牙的时间。为此，我构建了一个牙刷跟踪器，一个口腔跟踪器，还开发了一个算法来确定牙刷是否在嘴里（我假设这意味着这个人正在刷牙）。

现在，我只需要添加一个计时器，它可以实时计算人们刷牙的时间，并将其显示在视频源上。如果用户暂停刷牙，定时器也应该暂停；如果用户恢复刷牙，定时器也应该恢复。我该怎么做？

我检测刷牙时间的逻辑如下：

提取牙刷边界框的顶点提取嘴巴的最左、最上、最右和最下点，并提取它们的坐标找出这两个矩形是否重叠（我正在使用一个名为 shapely 的库来做这件事 - 如果您有一个不需要使用库的更原始的解决方案，那就太好了）如果它们重叠，则启动计时器；如果不是，请暂停计时器。这是我不知道该怎么做以及需要帮助的步骤。

这大概是我的代码的样子：

from shapely.geometry import Polygon
import cv2, face_recognition, time

brush_time = 0
video_capture = cv2.VideoCapture(0)

while True:
    ret, frame = video_capture.read()

    '''
    Code for extracting the toothbrush's coordinates (using Yolo),
    and the mouth's coordinates (using the face_recognition library)
    ...
    ...
    ...
    '''

    p1 = Polygon([(mouth_left_x, mouth_left_y), (mouth_top_x, mouth_top_y), (mouth_right_x, mouth_right_y), (mouth_bottom_x, mouth_bottom_y)])
    p2 = Polygon([(brush_top_left_x, brush_top_left_y), (brush_top_right_x, brush_top_right_y), (brush_bottom_left_x, brush_bottom_left_y), (brush_bottom_right_x, brush_bottom_right_y)])

    if p1.intersects(p2):
        
        ##############################################
        # This is where I want the timer logic to be #
        # start_time = time.time()                   #
        # end_time = ????                            #
        ##############################################

        brush_time = end_time - start_time
        text = 'Brushed teeth for  seconds'.format(str(brush_time))

【问题讨论】：

问题是定时器还是检测？好的，你检测到刷子但你怎么确定刷子在里面？你只会数时间吗？刷子也只能留在里面。复杂的任务，但值得支持是的，确实很复杂。这就是为什么，为了简单起见，把一些东西作为初稿拿出来，我做了一些假设。每当两个物体（牙刷和嘴巴）的边界框相交时，我就假设这个人正在刷牙。有了这个假设，我现在正在尝试构建一个计时器来跟踪该人刷牙的持续时间。所以你想在交叉路口开始时启动计时器并在完成时停止？这似乎很容易我希望计时器持续检测刷牙活动；它应该在用户暂停刷牙几秒钟时暂停，并在用户恢复刷牙时恢复。只有当满足以下两个条件之一时，计时器才会停止并输出总刷牙时间：（1）达到给定的最大刷牙时间（例如，3 分钟），或（2）用户没有连续刷了一些阈值持续时间（例如，15 秒）。这就是我的想法。 【参考方案1】：

答案还将包含一些简洁的代码建议，您在最终实现它时可能会忽略这些建议。但我想向您展示我的首选方式。因此，您将需要 2 个辅助类。第一类的职责是给我们一个布尔值来判断这个人是否在刷牙。尝试仅在此类中封装重叠的矩形逻辑等。为简单起见，我将对随机刷入和刷出持续时间进行硬编码。

class BrushDetector:
    def is_brushing(self, curr_frame_idx: int, frame) -> bool:
        """
        You provide the implementation here to detect whether the individual is holding brush inside his/her mouth.
        For sake of simplicity, I will simply hard-code the time when this shall return true in the attached video.
        """
        if 50 < curr_frame_idx < 100:
            return True
        if 150 < curr_frame_idx < 200:
            return True
        if 250 < curr_frame_idx < 300:
            return True
        return False

现在我们建立另一个类来记录刷入和刷出实例。这个类的职责是为我们提供总刷机时间。该类的逻辑可以构建为：

import datetime

class BrushRecorder:
    def __init__(self):
        self._start_time_ms = None
        self._total_time_ms = datetime.timedelta(0, 0, 0, 0, 0, 0, 0)

    def register_brush_in(self):
        if self._start_time_ms is None:
            self._start_time_ms = datetime.datetime.now()

    def register_brush_out(self):
        if self._start_time_ms is None:
            return

        delta = datetime.datetime.now() - self._start_time_ms
        self._total_time_ms += delta
        self._start_time_ms = None

    def get_brushing_time(self):
        if self._start_time_ms is None:
            return self._total_time_ms.microseconds / 1000
        else:
            return (self._total_time_ms + datetime.datetime.now() - self._start_time_ms).microseconds / 1000

现在我们已经构建了两个核心类。让我们进入图像读取部分，尝试将所有部分整合在一起。

if __name__ == "__main__":
    video_stream = cv2.VideoCapture("path/to/stream/brush.mp4")
    recorder = BrushRecorder()
    detector = BrushDetector()
    frame_idx = 0
    while video_stream.isOpened():
        ret, frame = video_stream.read()
        if not ret:
            break

        if detector.is_brushing(frame_idx, frame):
            recorder.register_brush_in()
        else:
            recorder.register_brush_out()
        brushing_time = recorder.get_brushing_time()
        print(frame_idx, brushing_time)
        frame_idx += 1
    video_stream.release()

在上面的 sn-p 中我刚刚将刷机时间记录到控制台，但您也可以在每一帧上使用cv2.putText 用于 GUI 目的或根据需要使用此 brushing_time 值。

【讨论】：

我明白了，有趣的答案。我有三个疑问。首先，是否有必要在class BrushDetector 中收集有关curr_frame_idx 的信息，特别是因为当前帧ID 没有在其他任何地方使用？我对is_brushing 的逻辑只是是否p1.intersects(p2)。其次，我认为划分get_brushing_time() 的因子应该是 1000000，而不是 1000，对吗？第三，我复制了你的class BrushRecorder并尝试计算brushing_time，就像你在main中所做的那样，但它显示了一些刷牙时间的随机数，而不是1s、2s、3s……顺序。 Answer1：不。我只是用它来生成随机的刷牙时刻。答案 2：是的。很抱歉在那里滑倒。答案 3：您确定您的 BrushDetector 工作正常吗？如果它生成一些随机时间戳，我可以重新审视我的逻辑。关于第 3 点，是的，我刚刚将您的 class BrushRecorder 和 main 代码复制到了我的代码中。我还用class BrushDetector 尝试了你的整个代码，用于随机刷机时刻，也显示了随机持续时间。

以上是关于如何实时跟踪视频中两个对象相交的时间？的主要内容，如果未能解决你的问题，请参考以下文章