Linux中断管理 workqueue工作队列

Posted 2020-11-09 Arnold Lu@南京

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Linux中断管理 workqueue工作队列相关的知识，希望对你有一定的参考价值。

《Linux中断管理》

《Linux中断管理 (1)Linux中断管理机制》

《Linux中断管理 (2)软中断和tasklet》

《Linux中断管理 (3)workqueue工作队列》

关键词：

工作队列的原理是把work(需要推迟执行的函数)交由一个内核线程来执行，它总是在进程上下文中执行。

工作队列的优点是利用进程上下文来执行中断下半部操作，因此工作队列允许重新调度和睡眠，是异步执行的进程上下文，它还能解决软中断和tasklet执行时间过长导致系统实时性下降等问题。

当驱动程序或者内核子系统在进程上下文中有异步执行的工作任务时，可以使用work item来描述工作任务，包括该工作任务的执行回调函数，把work item添加到一个队列中，然后一个内核线程回去执行这个工作任务的回调函数。

这里work item被称为工作，队列被称为workqueue，即工作队列，内核线程被称为worker。

CMWQ(Concurrency Managed Workqueues)

执行work item任务的线程被称为worker或者工作线程。工作线程会串行化地执行挂入到队列中所有的work item。如果队列中没有work，那么该工作线程就会变成idle态。

为了管理众多工作线程，CMWQ提出了工作线程池(worker-pool)概念，worker-pool有两种：

一是bound型，可以理解为Per-CPU类型，每个CPU都有worker-pool；

另一种是unbound型，即不和具体CPU绑定。

这两种worker-pool都会定义两个线程池，一个给普通优先级的work使用，另一个给高优先级的work使用。

1. 初始化工作队列

1.1 工作、工作队列、工作线程池、工作线程数据结构

workqueue机制最小的调度单元是work_struct，即工作任务。

struct work_struct {
    atomic_long_t data;---------------低比特位部分是work的标志位，剩余比特位通常用于存放上一次运行的worker_pool ID或pool_workqueue的指针。存放的内容有WORK_STRUCT_PWQ标志位来决定
    struct list_head entry;-----------用于把work挂到其他队列上。
    work_func_t func;-----------------工作任务的处理函数
#ifdef CONFIG_LOCKDEP
    struct lockdep_map lockdep_map;
#endif
}

工作队列由struct workqueue_struct数据结构描述：

struct workqueue_struct {
    struct list_head    pwqs;        /* WR: all pwqs of this wq */--------------------该workqueue所在的所有pool_workqueue链表
    struct list_head    list;        /* PL: list of all workqueues */-----------------系统所有workqueue_struct的全局链表

    struct mutex        mutex;        /* protects this wq */
    int            work_color;    /* WQ: current work color */
    int            flush_color;    /* WQ: current flush color */
    atomic_t        nr_pwqs_to_flush; /* flush in progress */
    struct wq_flusher    *first_flusher;    /* WQ: first flusher */
    struct list_head    flusher_queue;    /* WQ: flush waiters */
    struct list_head    flusher_overflow; /* WQ: flush overflow list */

    struct list_head    maydays;    /* MD: pwqs requesting rescue */-------------------所有rescue状态下的pool_workqueue数据结构链表
    struct worker        *rescuer;    /* I: rescue worker */---------------------------rescue内核线程，内存紧张时创建新的工作线程可能会失败，如果创建workqueue是设置了WQ_MEM_RECLAIM，那么rescuer线程会接管这种情况。

    int            nr_drainers;    /* WQ: drain in progress */
    int            saved_max_active; /* WQ: saved pwq max_active */

    struct workqueue_attrs    *unbound_attrs;    /* WQ: only for unbound wqs */---------UNBOUND类型属性
    struct pool_workqueue    *dfl_pwq;    /* WQ: only for unbound wqs */----------------unbound类型的pool_workqueue

#ifdef CONFIG_SYSFS
    struct wq_device    *wq_dev;    /* I: for sysfs interface */
#endif
#ifdef CONFIG_LOCKDEP
    struct lockdep_map    lockdep_map;
#endif
    char            name[WQ_NAME_LEN]; /* I: workqueue name */--------------------------该workqueue的名字

    /* hot fields used during command issue, aligned to cacheline */
    unsigned int        flags ____cacheline_aligned; /* WQ: WQ_* flags */---------------经常被不同CUP访问，因此要和cache line对齐。
    struct pool_workqueue __percpu *cpu_pwqs; /* I: per-cpu pwqs */---------------------指向per-cpu类型的pool_workqueue
    struct pool_workqueue __rcu *numa_pwq_tbl[]; /* FR: unbound pwqs indexed by node */
}

运行work_struct的内核线程被称为worker，即工作线程。

/*
 * The poor guys doing the actual heavy lifting.  All on-duty workers are
 * either serving the manager role, on idle list or on busy hash.  For
 * details on the locking annotation (L, I, X...), refer to workqueue.c.
 *
 * Only to be used in workqueue and async.
 */
struct worker {
    /* on idle list while idle, on busy hash table while busy */
    union {
        struct list_head    entry;    /* L: while idle */
        struct hlist_node    hentry;    /* L: while busy */
    };

    struct work_struct    *current_work;    /* L: work being processed */----当前正在处理的work
    work_func_t        current_func;    /* L: current_work\'s fn */-----------当前正在执行的work回调函数
    struct pool_workqueue    *current_pwq; /* L: current_work\'s pwq */-------当前work所属的pool_workqueue
    bool            desc_valid;    /* ->desc is valid */
    struct list_head    scheduled;    /* L: scheduled works */---------------所有被调度并正准备执行的work_struct都挂入该链表中

    /* 64 bytes boundary on 64bit, 32 on 32bit */

    struct task_struct    *task;        /* I: worker task */-----------------该工作线程的task_struct数据结构
    struct worker_pool    *pool;        /* I: the associated pool */---------该工作线程所属的worker_pool
                        /* L: for rescuers */
    struct list_head    node;        /* A: anchored at pool->workers */------可以把该worker挂入到worker_pool->workers链表中
                        /* A: runs through worker->node */

    unsigned long        last_active;    /* L: last active timestamp */
    unsigned int        flags;        /* X: flags */
    int            id;        /* I: worker id */

    /*
     * Opaque string set with work_set_desc().  Printed out with task
     * dump for debugging - WARN, BUG, panic or sysrq.
     */
    char            desc[WORKER_DESC_LEN];

    /* used only by rescuers to point to the target workqueue */
    struct workqueue_struct    *rescue_wq;    /* I: the workqueue to rescue */
}

CMWQ提出了工作线程池的概念，struct worker_pool数据结构用于描述工作线程池。

worker_pool是per-cpu变量，每个CPU都有worker_pool，而且有两个worker_pool。

一个用于普通优先级工作线程，另一个用于高优先级工作线程。

struct worker_pool {
    spinlock_t        lock;        /* the pool lock */-----------------------用于保护worker_pool的自旋锁
    int            cpu;        /* I: the associated cpu */-------------------对于unbound类型为-1；对于bound类型workqueue表示绑定的CPU ID。
    int            node;        /* I: the associated node ID */
    int            id;        /* I: pool ID */-------------------------------该worker_pool的ID号
    unsigned int        flags;        /* X: flags */

    struct list_head    worklist;    /* L: list of pending works */----------挂入pending状态的work_struct
    int            nr_workers;    /* L: total number of workers */-----------工作线程的数量

    /* nr_idle includes the ones off idle_list for rebinding */
    int            nr_idle;    /* L: currently idle ones */------------------处于idle状态的工作线程的数量

    struct list_head    idle_list;    /* X: list of idle workers */----------处于idle状态的工作线程链表
    struct timer_list    idle_timer;    /* L: worker idle timeout */
    struct timer_list    mayday_timer;    /* L: SOS timer for workers */

    /* a workers is either on busy_hash or idle_list, or the manager */
    DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
                        /* L: hash of busy workers */

    /* see manage_workers() for details on the two manager mutexes */
    struct mutex        manager_arb;    /* manager arbitration */
    struct mutex        attach_mutex;    /* attach/detach exclusion */
    struct list_head    workers;    /* A: attached workers */---------------该worker_pool管理的工作线程链表
    struct completion    *detach_completion; /* all workers detached */

    struct ida        worker_ida;    /* worker IDs for task name */

    struct workqueue_attrs    *attrs;        /* I: worker attributes */-----工作线程属性
    struct hlist_node    hash_node;    /* PL: unbound_pool_hash node */
    int            refcnt;        /* PL: refcnt for unbound pools */

    /*
     * The current concurrency level.  As it\'s likely to be accessed
     * from other CPUs during try_to_wake_up(), put it in a separate
     * cacheline.
     */
    atomic_t        nr_running ____cacheline_aligned_in_smp;----------------用于管理worker的创建和销毁的统计计数，表示运行中的worker数量。该变量可能被多CPU同时访问，因此独占一个缓存行，避免多核读写造成“颠簸”现象。

    /*
     * Destruction of pool is sched-RCU protected to allow dereferences
     * from get_work_pool().
     */
    struct rcu_head        rcu;---------------------------------------------RCU锁
}

struct pool_workqueue用于链接workqueue和worker_pool。

struct pool_workqueue {
    struct worker_pool    *pool;        /* I: the associated pool */-----------指向worker_pool结构
    struct workqueue_struct *wq;        /* I: the owning workqueue */----------指向workqueue_struct结构
    int            work_color;    /* L: current color */
    int            flush_color;    /* L: flushing color */
    int            refcnt;        /* L: reference count */
    int            nr_in_flight[WORK_NR_COLORS];
                        /* L: nr of in_flight works */
    int            nr_active;    /* L: nr of active works */------------------活跃的work_strcut数量
    int            max_active;    /* L: max active works */-------------------最大活跃work_struct数量
    struct list_head    delayed_works;    /* L: delayed works */--------------延迟执行work_struct链表
    struct list_head    pwqs_node;    /* WR: node on wq->pwqs */
    struct list_head    mayday_node;    /* MD: node on wq->maydays */

    /*
     * Release of unbound pwq is punted to system_wq.  See put_pwq()
     * and pwq_unbound_release_workfn() for details.  pool_workqueue
     * itself is also sched-RCU protected so that the first pwq can be
     * determined without grabbing wq->mutex.
     */
    struct work_struct    unbound_release_work;
    struct rcu_head        rcu;------------------------------------------------RCU锁
}

上面几个数据结构的关系图？

1.2 初始化工作队列

首先看一下对创建工作队列有重要影响的flags。

/*
 * Workqueue flags and constants.  For details, please refer to
 * Documentation/workqueue.txt.
 */
enum {
    WQ_UNBOUND        = 1 << 1, /* not bound to any cpu */-----------------绑定到某一个CPU执行
    WQ_FREEZABLE        = 1 << 2, /* freeze during suspend */--------------在suspend进行进程冻结的时候，需要让工作线程完成当前所有的work才完成进程冻结，并且这个过程不会再新开始一个work的执行，知道进程被解冻。
    WQ_MEM_RECLAIM        = 1 << 3, /* may be used for memory reclaim */---在内存紧张导致创建新进程失败，系统通过rescuer内核线程去接管这种情况。
    WQ_HIGHPRI        = 1 << 4, /* high priority */------------------------属于高于高优先级的worker_pool
    WQ_CPU_INTENSIVE    = 1 << 5, /* cpu intensive workqueue */------------属于特别消耗CPU资源的一类work，这个work执行会得到调度器的监管，排在这类work后的non-CPU-intensive类型work可能会推迟执行
    WQ_SYSFS        = 1 << 6, /* visible in sysfs, see wq_sysfs_register() */

    WQ_POWER_EFFICIENT    = 1 << 7,-----------------根据wq_power_efficient来决定此类型的工作队列是bound还是unbound类型，bound型可能导致处于idle的CPU被唤醒，而unbound型则不会必然唤醒idle的CPU。

    __WQ_DRAINING        = 1 << 16, /* internal: workqueue is draining */
    __WQ_ORDERED        = 1 << 17, /* internal: workqueue is ordered */----表示同一时间只能执行一个work_item。
    __WQ_ORDERED_EXPLICIT    = 1 << 19, /* internal: alloc_ordered_workqueue() */

    WQ_MAX_ACTIVE        = 512,      /* I like 512, better ideas? */
    WQ_MAX_UNBOUND_PER_CPU    = 4,      /* 4 * #cpus for unbound wq */
    WQ_DFL_ACTIVE        = WQ_MAX_ACTIVE / 2,
};

内核启动的时候，调用init_workqueues()创建工作线程，同时创建了一些常用的工作队列。

init_workqueues()由early_initcall(init_workqueues)在early阶段调用。

1.2.1 谁？都创建了哪些工作线程？

对于4核SMP系统来说，必然创建的工作线程有：每个CPU的kworker/x:0、kworker/x:0H、以及unbound类型的kworker/u8:0。

init_workqueues()创建CPU0以及unbound工作线程

kworker/0:0和kworker/0:0H以及kworker/u8:0都是由init_workqueues创建的，调用轨迹如下。

kworker/0:0、kworker/0:0H：kernel_init()->kernel_init_freeable()->do_one_initcall()->init_workqueues()->create_worker()

kworker/u8:0：kernel_init()->kernel_init_freeable()->do_one_inicall->init_workqueues()->__alloc_workqueue_key()->apply_workqueue_attrs()->alloc_unbound_pwq()->create_worker()

对于unbound工作线程的创建是因为init_workqueues()中创建了一系列的workqueue，调用alloc_workqueue()->__allow_workqueue_key()->alloc_and_link_pwqs()->apply_workqueue_attrs()->alloc_unbound_pwq()导致的。

这里的init_workqueues()为什么不将CPU1~3的工作线程一起创建了？

虽然此处init_workqueues()是在do_one_initcall中执行，但是此处的do_one_initcall较特殊。

static noinline void __init kernel_init_freeable(void)
{
...
    smp_prepare_cpus(setup_max_cpus);

    do_pre_smp_initcalls();-------------------------------------此处调用的initcall是在__initcall_start~__initcall0_start之间的函数，也即early_initcall()。所以init_workqueues()在smp_init之前被调用。
    lockup_detector_init();

    smp_init();
    sched_init_smp();-------------------------------------------将剩余CPU1~3进行up操作。

    do_basic_setup();-------------------------------------------执行__initcall_0start之后的initcall函数
...
}

在初始化pool的时候，是按照possible的CPU来进行初始化的。而在创建工作线程的时候是按照online的CPU来创建的。

在init_workqueues()的时刻，CPU1~3还没有online。所以会先创建kworker/0:0、kworker/0:0H、kworker/u8:0三个工作线程。

unbound工作线程的pool->id为8也就不难理解了，因为前面4和分配个0~7。

workqueue_cpu_up_callback()创建了其他CPU工作线程

kernel_init()->kernel_init_freeable()->smp_init()->cpu_up()->_cpu_up()->__raw_notifier_call_chain()->workqueue_cpu_up_callback()->create_worker()

在init_workqueues()开头就注册了CPU_PRI_WORKQUEUE_UP处理函数，所以在smp_init()->cpu_up()将CPU启动之后就会为每个CPU创建两个工作线程

1.2.2 init_workqueues()初始化worker_pool、worker、workqueue

static int __init init_workqueues(void)
{
    int std_nice[NR_STD_WORKER_POOLS] = { 0, HIGHPRI_NICE_LEVEL };---------------这里HIGHPRI_NICE_LEVEL为-20，对应的prio为100，是普通进程里面的最高优先级。
    int i, cpu;

    WARN_ON(__alignof__(struct pool_workqueue) < __alignof__(long long));

    pwq_cache = KMEM_CACHE(pool_workqueue, SLAB_PANIC);

    cpu_notifier(workqueue_cpu_up_callback, CPU_PRI_WORKQUEUE_UP);--------------跟随CPU_UP/CPU_DOWN动态创建工作线程的接口。
    hotcpu_notifier(workqueue_cpu_down_callback, CPU_PRI_WORKQUEUE_DOWN);

    wq_numa_init();

    /* initialize CPU pools */
    for_each_possible_cpu(cpu) {------------------------------------------------遍历每个possible状态的CPU
        struct worker_pool *pool;

        i = 0;
        for_each_cpu_worker_pool(pool, cpu) {-----------------------------------每个CPU两个worker_poo，分别对应per-cpu变量nice值为0的cpu_worker_pool[0]和nice值为-20的cpu_worker_pool[1]。
            BUG_ON(init_worker_pool(pool));-------------------------------------初始化worker_pool
            pool->cpu = cpu;
            cpumask_copy(pool->attrs->cpumask, cpumask_of(cpu));
            pool->attrs->nice = std_nice[i++];----------------------------------设置nice值
            pool->node = cpu_to_node(cpu);

            /* alloc pool ID */
            mutex_lock(&wq_pool_mutex);
            BUG_ON(worker_pool_assign_id(pool));
            mutex_unlock(&wq_pool_mutex);
        }
    }

    /* create the initial worker */
    for_each_online_cpu(cpu) {--------------------------------------------------遍历所有online状态CPU，对于SMP多核CPU，支队boot cpu创建了工作线程。其他CPU工作线程稍后再cpu_up中创建。
        struct worker_pool *pool;

        for_each_cpu_worker_pool(pool, cpu) {-----------------------------------使用create_worker对每个worker_pool创建两个内核线程对应cpu_worker_pool[0]和cpu_worker_pool[1]
            pool->flags &= ~POOL_DISASSOCIATED;
            BUG_ON(!create_worker(pool));
        }
    }

    /* create default unbound and ordered wq attrs */
    for (i = 0; i < NR_STD_WORKER_POOLS; i++) {
        struct workqueue_attrs *attrs;

        BUG_ON(!(attrs = alloc_workqueue_attrs(GFP_KERNEL)));
        attrs->nice = std_nice[i];
        unbound_std_wq_attrs[i] = attrs;---------------------------------------设置Unbound类型workqueue的属性

        /*
         * An ordered wq should have only one pwq as ordering is
         * guaranteed by max_active which is enforced by pwqs.
         * Turn off NUMA so that dfl_pwq is used for all nodes.
         */
        BUG_ON(!(attrs = alloc_workqueue_attrs(GFP_KERNEL)));
        attrs->nice = std_nice[i];
        attrs->no_numa = true;
        ordered_wq_attrs[i] = attrs;-------------------------------------------设置ordered类型workqueue的属性，ordered类型workqueue同一时刻只能有一个work item在运行。
    }

    system_wq = alloc_workqueue("events", 0, 0);-------------------------------普通优先级bound类型工作队列system_wq
    system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);------高优先级bound类型工作队列system_highpri_wq
    system_long_wq = alloc_workqueue("events_long", 0, 0);---------------------
    system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,----------普通优先级unbound类型工作队列system_unbound_wq
                        WQ_UNBOUND_MAX_ACTIVE);
    system_freezable_wq = alloc_workqueue("events_freezable",------------------freezable类型工作队列system_freezable_wq
                          WQ_FREEZABLE, 0);
    system_power_efficient_wq = alloc_workqueue("events_power_efficient",------省电类型的工作队列system_power_efficient_wq
                          WQ_POWER_EFFICIENT, 0);
    system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
                          WQ_FREEZABLE | WQ_POWER_EFFICIENT,-------------------freezable并且省电类型的工作队列system_freezable_power_efficient_wq
                          0);
    BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
           !system_unbound_wq || !system_freezable_wq ||
           !system_power_efficient_wq ||
           !system_freezable_power_efficient_wq);
    return 0;
}

static int workqueue_cpu_up_callback(struct notifier_block *nfb,
                           unsigned long action,
                           void *hcpu)
{
    int cpu = (unsigned long)hcpu;
    struct worker_pool *pool;
    struct workqueue_struct *wq;
    int pi;

    switch (action & ~CPU_TASKS_FROZEN) {
    case CPU_UP_PREPARE:
        for_each_cpu_worker_pool(pool, cpu) {
            if (pool->nr_workers)
                continue;
            if (!create_worker(pool))
                return NOTIFY_BAD;
        }
        break;

    case CPU_DOWN_FAILED:
    case CPU_ONLINE:
        mutex_lock(&wq_pool_mutex);

        for_each_pool(pool, pi) {
            mutex_lock(&pool->attach_mutex);

            if (pool->cpu == cpu)
                rebind_workers(pool);
            else if (pool->cpu < 0)
                restore_unbound_workers_cpumask(pool, cpu);

            mutex_unlock(&pool->attach_mutex);
        }

        /* update NUMA affinity of unbound workqueues */
        list_for_each_entry(wq, &workqueues, list)
            wq_update_unbound_numa(wq, cpu, true);

        mutex_unlock(&wq_pool_mutex);
        break;
    }
    return NOTIFY_OK;
}

static int workqueue_cpu_down_callback(struct notifier_block *nfb,
                         unsigned long action,
                         void *hcpu)
{
    int cpu = (unsigned long)hcpu;
    struct work_struct unbind_work;
    struct workqueue_struct *wq;

    switch (action & ~CPU_TASKS_FROZEN) {
    case CPU_DOWN_PREPARE:
        /* unbinding per-cpu workers should happen on the local CPU */
        INIT_WORK_ONSTACK(&unbind_work, wq_unbind_fn);
        queue_work_on(cpu, system_highpri_wq, &unbind_work);

        /* update NUMA affinity of unbound workqueues */
        mutex_lock(&wq_pool_mutex);
        list_for_each_entry(wq, &workqueues, list)
            wq_update_unbound_numa(wq, cpu, false);
        mutex_unlock(&wq_pool_mutex);

        /* wait for per-cpu unbinding to finish */
        flush_work(&unbind_work);
        destroy_work_on_stack(&unbind_work);
        break;
    }
    return NOTIFY_OK;
}

init_worker_pool()初始化一个worker_pool。

static int init_worker_pool(struct worker_pool *pool)
{
    spin_lock_init(&pool->lock);
    pool->id = -1;
    pool->cpu = -1;----------------------------------------------初始值-1表示当前worker_pool是unbound型的
    pool->node = NUMA_NO_NODE;
    pool->flags |= POOL_DISASSOCIATED;
    INIT_LIST_HEAD(&pool->worklist);
    INIT_LIST_HEAD(&pool->idle_list);
    hash_init(pool->busy_hash);

    init_timer_deferrable(&pool->idle_timer);
    pool->idle_timer.function = idle_worker_timeout;-------------销毁多余worker，每IDLE_WORKER_TIMEOUT(300秒)执行一次。
    pool->idle_timer.data = (unsigned long)pool;

    setup_timer(&pool->mayday_timer, pool_mayday_timeout,
            (unsigned long)pool);--------------------------------设置mayday_timer，周期为MAYDAY_INTERVAL，一HZ的1/10，即100ms。判断workpoll执行异常，则让rescuer worker介入。

    mutex_init(&pool->manager_arb);
    mutex_init(&pool->attach_mutex);
    INIT_LIST_HEAD(&pool->workers);

    ida_init(&pool->worker_ida);
    INIT_HLIST_NODE(&pool->hash_node);
    pool->refcnt = 1;

    /* shouldn\'t fail above this point */
    pool->attrs = alloc_workqueue_attrs(GFP_KERNEL);
    if (!pool->attrs)
        return -ENOMEM;
    return 0;
}

何时创建新kworker？何时销毁kworker？

1.2.2.1 销毁kworker

一个worker被创建后首先进入worker_enter_idle()，里面启动了pool->idle_timer，定时IDLE_WORKER_TIMEOUT即300HZ。

如果一个worker进入idle超过300HZ，即会执行idle_worker_timeout()。

static void idle_worker_timeout(unsigned long __pool)
{
    struct worker_pool *pool = (void *)__pool;

    spin_lock_irq(&pool->lock);

    while (too_many_workers(pool)) {------------------判断当前workpoll中的worker数量是否过程，如果过程则选中一个worker销毁。直到workerpool中没有worker过剩。
        struct worker *worker;
        unsigned long expires;

        /* idle_list is kept in LIFO order, check the last one */
        worker = list_entry(pool->idle_list.prev, struct worker, entry);
        expires = worker->last_active + IDLE_WORKER_TIMEOUT;

        if (time_before(jiffies, expires)) {
            mod_timer(&pool->idle_timer, expires);
            break;
        }

        destroy_worker(worker);-----------------------销毁选中的worker。
    }

    spin_unlock_irq(&pool->lock);
}

/* Do we have too many workers and should some go away? */
static bool too_many_workers(struct worker_pool *pool)
{
    bool managing = mutex_is_locked(&pool->manager_arb);
    int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
    int nr_busy = pool->nr_workers - nr_idle;

    return nr_idle > 2 && (nr_idle - 2) * MAX_IDLE_WORKERS_RATIO >= nr_busy;-----首先满足idle worker数量大于2；并且除去两个idle worker线程外的idle worker不能超过busy worker的1/3。所以每个workerpool最少两个worker线程。如果workerpool中有4个worker(3idle+1busy)，则3>2并且(3-2)*4>1，即会选择一个idle销毁。
}

static void destroy_worker(struct worker *worker)
{
    struct worker_pool *pool = worker->pool;

    lockdep_assert_held(&pool->lock);

    /* sanity check frenzy */
    if (WARN_ON(worker->current_work) ||
        WARN_ON(!list_empty(&worker->scheduled)) ||
        WARN_ON(!(worker->flags & WORKER_IDLE)))
        return;

    pool->nr_workers--;---------------------------------------------------------更新对应workerpool的nr_workers和nr_idle数量，并将worker从wokerpoll的worker列表中摘除。
    pool->nr_idle--;

    list_del_init(&worker->entry);
    worker->flags |= WORKER_DIE;------------------------------------------------在worker_thread()中，判断当前worker->flags，如果为WORKER_DIE则销毁线程。
    wake_up_process(worker->task);
}

1.2.2.2 rescue woker

系统每100ms启动检查当前workerpool中是否存在allocation deadlock异常，启动rescuer worker进行处理。

static void pool_mayday_timeout(unsigned long __pool)
{
    struct worker_pool *pool = (void *)__pool;
    struct work_struct *work;

    spin_lock_irq(&pool->lock);
    spin_lock(&wq_mayday_lock);        /* for wq->maydays */

    if (need_to_create_worker(pool)) {
        /*
         * We\'ve been trying to create a new worker but
         * haven\'t been successful.  We might be hitting an
         * allocation deadlock.  Send distress signals to
         * rescuers.
         */
        list_for_each_entry(work, &pool->worklist, entry)
            send_mayday(work);
    }

    spin_unlock(&wq_mayday_lock);
    spin_unlock_irq(&pool->lock);

    mod_timer(&pool->mayday_timer, jiffies + MAYDAY_INTERVAL);
}

static Linux驱动实践：中断处理中的工作队列 workqueue 是什么鬼？
 Linux驱动实践：中断处理中的工作队列 workqueue 是什么鬼？
 Linux驱动实践：中断处理中的工作队列 workqueue 是什么鬼？
 Linux驱动实践：中断处理中的工作队列 workqueue 是什么鬼？
 如何使用Linux工作队列workqueue
 原创Linux中断子系统-Workqueue