linux2.4内核调度

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了linux2.4内核调度相关的知识,希望对你有一定的参考价值。

进程调度需要兼顾3种进程:交互进程,批处理进程,实时进程,在设计一个进程调度机制时需要考虑具体问题
(1)调度时机?
答:进程在用户空间可以pause()或者让内核设置进程为睡眠状态,以此调度,调度还可以强制性的发生在从系统调用返回前夕,以此每次从中断或异常处理返回到用户空间前夕(用户空间表示,只有cpu在用户空间运行时,发生异常或者中断才会调度),如果发生在内核的异常或者中断不会引起调度
 
缺陷:在实时进程中,内核中发生了中断,而且这个中断处理时间很长,并且内核中断不会调度.那就可能将调度过分延迟,从而使得用户感觉到明显的延迟.,另外从内核返回到用户空间并非一定会调度,而取决于pcb中的need_resched是否设置为1(谁来设置呢,当前进程自动让粗,在内核唤醒一个进程,以及时间中断处理程序发现当前进程运行时间太久时)
(2)调度的政策,依靠什么标准调度下一进程
答:用户抢占,时机从内核态返回到用户态,内核不可抢占(2.6内核版本改进).
内核为每个进程计算一个权值,选最高运行,进程运行时,资格随时间低调,当所有进程的资格变为0时,就从新计算一次(2.6内核改进,每个,而非全部)
为了适应不同策略,分为:sched_fifo(实时进程)跟sched_rr(运行时间长的进程)还有other
(3)调度的方式:可抢占式,还是不可抢占式
/*
 *  ‘schedule()‘ is the scheduler function. It‘s a very simple and nice
 * scheduler: it‘s not perfect, but certainly works for most things.
 *
 * The goto is "interesting".
 *
 *   NOTE!!  Task 0 is the ‘idle‘ task, which gets called when no other
 * tasks can run. It can not be killed, and it cannot sleep. The ‘state‘
 * information in task[0] is never used.
 */
asmlinkage void schedule(void)
{
    struct schedule_data * sched_data;
    struct task_struct *prev, *next, *p;
    struct list_head *tmp;
    int this_cpu, c;
    if (!current->active_mm) BUG();//调度时,线程的active_mm不可以为0,借用之前的空间
need_resched_back:
    prev = current;//赋值获得当前pcb
    this_cpu = prev->processor;
    if (in_interrupt())//是否处于中断处理状态,一个bug,将调用bug()
        goto scheduling_in_interrupt;
    release_kernel_lock(prev, this_cpu);//对单核cpu是空语句
    /*检查内核软中断服务请求是否在等待 Do "administrative" work here while we don‘t hold any locks */
    if (softirq_active(this_cpu) & softirq_mask(this_cpu))
        goto handle_softirq;//转到下面,进行请求服务
handle_softirq_back:
    /*sched_data用于保存一下一次调度时,所需要的信息
     * ‘sched_data‘ is protected by the fact that we can run
     * only one process per CPU.
     */
    sched_data = & aligned_data[this_cpu].schedule_data;
    spin_lock_irq(&runqueue_lock);//加锁此队列
    /* move an exhausted RR process to be last.. */
    if (prev->policy == SCHED_RR)//如果当前进程的调度策略为sched_rr也就是轮换调度,那就特殊处理
        goto move_rr_last;//判断时间配额是否用完,用完移到run队列队尾,同时恢复最初时间配额,然后跳到这里
move_rr_back://对sched_rr特殊处理
    switch (prev->state) {
        case TASK_INTERRUPTIBLE:
            if (signal_pending(prev)) {//检测当前进程是否有信号要进行处理
                prev->state = TASK_RUNNING;
                break;
            }
        default:
            del_from_runqueue(prev);//从可运行队列中删除
        case TASK_RUNNING:
    }
    prev->need_resched = 0;//设置为不需要调度,因为所需求的调度已经在运行了
    /*
     * this is the scheduler proper:
     */
repeat_schedule://接下来挑选一进程来运行了
    /*
     * Default process to select..
     */
    next = idle_task(this_cpu);//指向最佳候选进程
    c = -1000;//设置c的权值为最低值,后面遍历有用
    if (prev->state == TASK_RUNNING)//如果当前进程还是处于可运行状态
        goto still_running;//如果当前进程还想继续运行,那就从当前进程计算权值开始,相同权值具有优先级
still_running_back:
    list_for_each(tmp, &runqueue_head) {
        p = list_entry(tmp, struct task_struct, run_list);
        if (can_schedule(p, this_cpu)) {//遍历运行队列中的所有进程
            int weight = goodness(p, this_cpu, prev->active_mm);//通过goodness计算机它当前所具有的权值
            if (weight > c)
                c = weight, next = p;
        }
    }
    /* Do we need to re-calculate counters? */
    if (!c)//如果已选择的进程(权值最高)为0,那就要从新计算机各个进程的时间配额,说明系统已经没有就绪的实时进程了
        goto recalculate;
    /*
     * from this point on nothing can prevent us from
     * switching to the next task, save this fact in
     * sched_data.
     */
    sched_data->curr = next;
#ifdef CONFIG_SMP
     next->has_cpu = 1;
    next->processor = this_cpu;
#endif
    spin_unlock_irq(&runqueue_lock);
    if (prev == next)//如果挑选出来的进程是当前进程,那就直接返回
        goto same_process;
#ifdef CONFIG_SMP
     /*
      * maintain the per-process ‘last schedule‘ value.
      * (this has to be recalculated even if we reschedule to
      * the same process) Currently this is only used on SMP,
     * and it‘s approximate, so we do not have to maintain
     * it while holding the runqueue spinlock.
      */
     sched_data->last_schedule = get_cycles();
    /*
     * We drop the scheduler lock early (it‘s a global spinlock),
     * thus we have to lock the previous process from getting
     * rescheduled during switch_to().
     */
#endif /* CONFIG_SMP */
    kstat.context_swtch++;
    /*
     * there are 3 processes which are affected by a context switch:
     *
     * prev == .... ==> (last => next)
     *
     * It‘s the ‘much more previous‘ ‘prev‘ that is on next‘s stack,
     * but prev is set to (the just run) ‘last‘ process by switch_to().
     * This might sound slightly confusing but makes tons of sense.
     */
    prepare_to_switch();//准备调度
    {
        struct mm_struct *mm = next->mm;//下一进程的mm
        struct mm_struct *oldmm = prev->active_mm;//当前进程的mm
        if (!mm) {//下一要调度的是线程
            if (next->active_mm) BUG();//如果线程连空间都木有,那就bug
            next->active_mm = oldmm;//沿用前一进程的空间
            atomic_inc(&oldmm->mm_count);//引用计数++
            enter_lazy_tlb(oldmm, next, this_cpu);
        } else {//下一要调度的是进程
            if (next->active_mm != mm) BUG();
            switch_mm(oldmm, mm, next, this_cpu);//切换空间
        }
        if (!prev->mm) {//前一进程为线程
            prev->active_mm = NULL;//设置为NULL
            mmdrop(oldmm);//释放,这里线程只是把引用计数--
        }
    }
    /*
     * This just switches the register state and the
     * stack.
     */
    switch_to(prev, next, prev);//开始调度------------------
    __schedule_tail(prev);//对于新创建的进程,调用后,直接转到ret_from_sys_call返回到用户空间
same_process:
    reacquire_kernel_lock(current);//空语句
    if (current->need_resched)//前面已经清空为0,现在变成了非0,那就中断发生了有变化
        goto need_resched_back;//再次调度
    return;
recalculate:
    {
        struct task_struct *p;
        spin_unlock_irq(&runqueue_lock);
        read_lock(&tasklist_lock);
        for_each_task(p)//将当前进程的时间配额除以2?nice换来的ticks数量
            p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
        read_unlock(&tasklist_lock);
        spin_lock_irq(&runqueue_lock);
    }
    goto repeat_schedule;
still_running:
    c = goodness(prev, this_cpu, prev->active_mm);
    next = prev;
    goto still_running_back;
handle_softirq:
    do_softirq();
    goto handle_softirq_back;
move_rr_last:
    if (!prev->counter) {//一旦counter为0,表示运行时间配额为0,将从可执行进程队列当前位置移到队列尾部
        prev->counter = NICE_TO_TICKS(prev->nice);//恢复最初的时间配额.将根据进程的优先级别换成可运行的时间配额.
        move_last_runqueue(prev);
    }
    goto move_rr_back;
scheduling_in_interrupt://一个bug,在中断处理程序中调度了
    printk("Scheduling in interrupt\n");
    BUG();
    return;
}

goodness函数解析

goodness对于非实时进程来说权重等于时间配额+1(如果是线程,+1)+(20-nice)

nice对于实时进程的权重计算没什么用,不过对sched_rr的时间配额有用

实时进程权重计算:weight = 1000 + p->rt_priority,rt_priority对实时进程的权重还是很重要的
 1 static inline int goodness(struct task_struct * p, int this_cpu, struct mm_struct *this_mm)
 2 {
 3     int weight;
 4     /*
 5      * select the current process after every other
 6      * runnable process, but before the idle thread.
 7      * Also, dont trigger a counter recalculation.
 8      */
 9     weight = -1;
10     if (p->policy & SCHED_YIELD)//如果当前进程设置了此标志位,表示礼让,权值设置为-1.直接return
11         goto out;
12     /*
13      * Non-RT process - normal case first.
14      */
15     if (p->policy == SCHED_OTHER) {//对于没有实时要求的进程来说
16         /*
17          * Give the process a first-approximation goodness value
18          * according to the number of clock-ticks it has left.
19          *
20          * Don‘t do any other calculations if the time slice is
21          * over..
22          */
23         weight = p->counter;//weight等于时间配额
24         if (!weight)//用完了,权值为0,直接返回
25             goto out;
26             
27 #ifdef CONFIG_SMP
28         /* Give a largish advantage to the same processor...   */
29         /* (this is equivalent to penalizing other processors) */
30         if (p->processor == this_cpu)
31             weight += PROC_CHANGE_PENALTY;
32 #endif
33         /* .. and a slight advantage to the current MM */
34         if (p->mm == this_mm || !p->mm)//如果是内核线程,或者用户空间与当前进程相同,唔需要切换用户空间,获得奖励+1s
35             weight += 1;
36         weight += 20 - p->nice;//nice也小,优先级越高,范围-20到19.
37         goto out;
38     }
39     /*
40      * Realtime process, select the first one on the
41      * runqueue (taking priorities within processes
42      * into account).//实时进程的nice与优先级无关,但对于sched_rr进程的时间配额大小有关,实时进程就绪时,非实时进程没机会运行
43      *///对于实时进程来说,则有一种正向优先级,那就是实时优先级rt_priority,由于时间要求,对进程赋予很高的全职
44     weight = 1000 + p->rt_priority;//rt_priotty对实时进程哟很重要的作用
45 out:
46     return weight;
47 }

 

总schedule流程:
准备:
处理中断处理状态直接跳到bug()出错
当前进程是SCHED_RR(需要长时间的进程),判断时间片是否用完了,用完了移到run队尾,同时恢复时间片配额
判断当前进程是否是可中断睡眠,是而且有信号要处理,那就设置当前进程为可运行状态;,如果是除了运行状态的其他状态
那就把当前进程从可运行状态队列删除.
挑选:
如果当前进程处于run.计算权重从当前进程计算,这样使得当前进程在同权重的进程中有优先级
遍历所有运行队列中的所有进程,通过goodness(goodness对于非实时进程来说权重=时间配额+1(如果是线程,+1)+(20-nice)
nice对于实时进程的权重计算没什么用,不过对sched_rr的时间片配额有用,实时进程权重计算:weight = 1000 + p->rt_priority)
计算所有run状态的权重,选取最高的运行,不过如果最高的是0,那就表示运行队列中没有实时进程,
需要重新计算可运行状态队列中的所有进程的时间片.而且这种情况持续一段时间了,否则sched_other没机会消耗到0
,计算完后.选最高权重进程调度
调度:
切换空间,切换进程或线程
 



以上是关于linux2.4内核调度的主要内容,如果未能解决你的问题,请参考以下文章

CFS Scheduler(CFS调度器)

2.14.5.选择合适版本的内核

Linux2.4 任务响应模型

LVS初始使用步骤

LVS安装使用详解

LVS的应用一