Introduction To The Linux Kernel

  • Uploaded by: bharath_mv7-1
  • 0
  • 0
  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Introduction To The Linux Kernel as PDF for free.

More details

  • Words: 2,005
  • Pages: 51
Introduction to the Linux Kernel 國立中正大學 資訊工程研究所 羅習五 老師

Kernel mode and kernel space • The kernel includes a protected memory space and full access to the hardware. • This system state and memory space is collect vely referred to as kernel-space • A user process executes in – User-space – kernel-space

kernel-space • The kernel typically resides in an elevated syst em state compared to normal user applicaton s. – a protected memory space – full access to the hardware

• This system state and memory space is collec tvely referred to as kernel-space

user-space

They see a subset of the machine’s available res ources and can perform certain system functons

Summary: Executng contexts The modes a CPU executes in • In kernel-space, in process context, executng on behalf of a specific process • In kernel-space, in interrupt context, not assoc iated with a process, handling an interrupt • In user-space, executng user code in a process

The modes a CPU executes in

Kernel space

ISR CTX

Top halve

User space

Process context

(including bottom half)

A preemptve kernel

• A preemptve OS (non preemptve kernel) ini tate a context switch when the OS switch to user mode. • A preemptve kernel can also initate a conte xt switch in kernel mode to satsfy the sched uling policy.

Different Natures • No libc support – There are multple reasons for this, including some chicken-and-the-egg situatons, but the primary re ason is speed and size (upcall?). – Many of the usual libc functons have been impl emented inside the kernel. • For example

Inline Functons An inline functon is declared when the keyword s statc and inline are used as part of the funct on definiton. For example: static inline void wolf(unsigned long tail_size)

– Because they are marked statc , an exported func ton is not created.

Inline Assembly

unsigned int low, high; asm volatile("rdtsc" : "=a" (low), "=d" (high)) ;

/* low and high now contain the lower and uppe r 32-bits of the 64-bit tsc */

Branch Annotaton /* we predict 'success' is nearly a lways nonzero ... */ if (likely(success)) { /* ... */ }

No (Easy) Use of Floatng Point • When a user-space process uses floatng-point instructons, the kernel manages the transiton from integer to floatng point mode. • Unlike user-space, the kernel does not have th e luxury of seamless support for floatng point because it cannot easily trap itself.

Small, Fixed-Size Stack

• The user-space stack can dynamically grow • The kernel stack is small and fixed in size – Historically, the kernel stack is two pages. – Recently, the size of kernel stack is one page only!! !

Process control block (task_struct, thread_info)

• The task_struct structure is allocated via t he slab allocator. • To support “4k task_struct“, thread_inf o was created that again lives at the bottom of the stack.

thread_info

task_struct

The process descriptor (task_struct) conta ins all the informaton about a specific proces s.

task_struct • State and executon informaton – such as pending signals, binary format, pid, pointers to parents and other related processes, priorites, and CPU tme

• Informaton on allocated virtual memory. • Process credentals such as user and group ID, capabilites. • Files used: filesystem informaton on all files handled by t he process must be saved. • Thread informaton, which records the CPU-specific • Informaton on interprocess communicaton (IPC) • Signal handlers used by the process.

task_struct and thread_info

Identfying the current process

8KB

• current is calculated by masking out the 13 (de pends on the stack size) least significant bits of the stack pointer to obtain the thread_info structure.

Process State

The process becomes runnable if it receives a signal.

the task does not respond to signals in this state

do_fork long do_fork(unsigned long clone_flags, unsigned long stack_start, struct pt_regs *regs, unsigned long stack_size, int __user *parent_tidptr, int __user *child_tidptr)

• clone_flags: A flag set to specify duplication properties. • stack_start: The start address of the user mode stack to be used. • regs: holding all registers in the order in which they are saved on the parent’s kernel stack when a system call is executed • parent_tdptr and child_tdptr: two pointers to addresses in use rspace that hold the TIDs of the parent and child processes.

Code flow diagram for do_fork

copy_process

/* Perform scheduler related setup. Assign this task to a CPU. */

flags (CLONE_XXX & copy_XXX)

flags (CLONE_XXX & copy_XXX)

Kernel Threads • kernel threads do not have an address space (in fact, their mm pointer is NULL). • Kernel threads are, however, schedulable and p reemptable as normal processes. – Example: pdflush task and the ksoftirqd task.

• A kernel thread can be created only by another kernel thread. – int kernel_thread(int (*fn)(void *), void * arg, unsig ned long flags)

the kernel threads • run the command ps –ef. You should see some thing like this… • Kernel process names are surrounded by squa re brackets ([]). UID root root root root root root root root

PID 1 2 3 5 6 7 9 10

PPID 0 0 2 2 2 2 2 2

C 0 0 0 0 0 0 0 0

STIME Jan04 Jan04 Jan04 Jan04 Jan04 Jan04 Jan04 Jan04

TTY ? ? ? ? ? ? ? ?

TIME 00:00:00 00:00:00 00:00:02 00:00:00 00:00:00 00:00:00 00:00:00 00:00:02

CMD /sbin/init [kthreadd] [ksoftirqd/0] [kworker/u:0] [migration/0] [migration/1] [ksoftirqd/1] [kworker/0:1]

do_execve int do_execve(char * filename, char __user *__user *argv, char __user *__user *envp, struct pt_regs * regs)

• filename: the executable file • argv & envp: The argument vector and environment • regs: holding all registers in the order in which they are s aved on the kernel stack when the system call is executed

Code flow diagram for do_execve

Process terminaton 5. calls __exit_files(), __exit_fs(), exit_namespac e(), and exit_sighand() to decrement the usag e count of objects 6. sets the task's exit code, stored in the exit_co de member of the task_struct 7. calls exit_notfy() to send signals to the task's parent 8. calls schedule()

Removal of the Process Descriptor After the parent has obtained informaton on its terminated child (by invoking wait4), or signified to the kernel that it does not care, the child's tas k_struct is deallocated.

UNIX Scheduling Policy • Scheduling policy determines what runs when – fast process response tme (low latency) – maximal system utlizaton (high throughput)

• Processes classificaton: – I/O-bound processes: spends much of its tme submitting and waitng on I/O requests – Processor-bound processes: spend much of their tme ex ecutng code

• Unix variants tends to favor I/O-bound processes, th us providing good process response tme 32

Linux scheduler – Process Priority • Linux’s priority-based scheduling – Rank processes based on their worth and need for processor tme. – processes with a higher priority also receive a longer tmeslice. – Both the user and the system may set a process's priority to influe nce the scheduling behavior of the system.

• Dynamic priority-based scheduling – Begins with an inital base priority – Then enables the scheduler to increase or decrease the priority dy namically to fulfill scheduling objectves. – E.g., a process that is spending more tme waitng on I/O will recei ve an elevated dynamic priority. 33

Timeslice • The tmeslice is the numeric value that represen ts how long a task can run untl it is pre-empted. – too short => large overhead of switching process – too long => poor interactve response

• Linux’s CFS scheduler does not directly assign t meslices to processes. – CFS assigns processes a proportion of the processor. – the amount of processor tme that a process receive s is a functon of the load of the system 34

2.4 scheduler - SMP busy

run queue

busy

35

2.4 scheduler • Non-preemptble kernel – Set p->need_resched if schedule() should be invok ed at the ‘next opportunity‘ (kernel => user mode) .

• Round-robin – task_struct->counter: number of clock tcks left to run in this scheduling slice, decremented by a tme r. 36

2.4 scheduler 1. Check if schedule() was invoked from interrupt handler (due to a bug) and panic if so. 2. Use spin_lock_irq() to lock ‘runqueue_lock’ 3. Check if a task is ‘runnable’ – in TASK_RUNNING state – in TASK_INTERRUPTIBLE state and a signal is pending

4. Examine the ‘goodness’ of each process 5. Context switch 37

2.4 scheduler – ‘goodness’ • ‘goodness’: identfying the best candidate amo ng all processes in the runqueue list. – ‘goodness’ = 0: the entty has exhausted its quantu m. – 0 < ‘goodness’ < 1000: the entty is a conventonal process/thread that has not exhausted its quantu m; a higher value denotes a higher level of goodne ss. 38

2.4 scheduler – ‘goodness’

(to improve multthreading performance) if (p->mm == prev->mm) return p->counter + p->priority + 1; else return p->counter + p->priority;

• A small bonus is given to the task p if it share s the address space with the previous task. 39

2.4 scheduler - SMP

Examine the processor field of the processes an d gives a consistent bonus (that is PROC_CHANG E_PENALTY, usually 15) to the process that was l ast executed on the ‘this_cpu’ CPU.

40

Scheduling policy tme_quantumnew = bonusI/O + tmestatc tme_quantumnew = tme_quantumold/2 + tme_q uantum_table[statc_priority]

& dynamic_priority ≈ tme_quantumnew 41

2.6 scheduler run queue task migraton (put + pull)

run queue

42

2.6 scheduler – User Preempton

• User preempton can occur – When returning to user-space from a system call – When returning to user-space from an interrupt h andler

43

2.6 scheduler – Kernel Preempton • The Linux kernel is a fully preemptve kernel. – It is possible to preempt a task at any point, so long as the kernel is in a sta te in which it is safe to reschedule. – “safe to reschedule”: kernel does not hold a lock

• The Linux design: – adding of a preempton counter, preempt_count, to each process's thread _info – This count increments once for each lock that is acquired and decrements once for each lock that is released

• Kernel preempton can also occur explicitly, when a task in the kern el blocks or explicitly calls schedule(). – no additional logic is required to ensure that the kernel is in a state that is safe to preempt! 44



runqueue



Most tasks have dynamic priorities that are based on their “nice” value (static priority) plus or minus 5 Interactivity of a task ≈ 1/sleep_time dynPrio = staticPrio + bonus bonus = -5 ~ +5 bonus ≈ 1/sleep_time

tsk1

tsk3 tsk2

tsk3 I/O bound

active

expired

45



runqueue

When all tasks have exhausted their time slices, the two priority arrays are exchanged!

tsk1

tsk3 tsk2

active

expired

46

2.6 scheduler – CFS • Classical schedulers compute tme slices for ea ch process in the system and allow them to ru n untl their tme slice/quantum is used up. – After that, all process need to be recalculated.

• CFS considers only the wait tme of a process – The task with the most need for CPU tme is sched uled.

47

2.6 scheduler – CFS

48

vruntme • The vruntime variable stores the virtual runtm e of a process, which is the actual runtme norm alized by the number of runnable processes. • The virtual runtme’s units are nanoseconds and therefore vruntme is decoupled from the tmer tck. • The virtual runtme is used to help us approxim ate the “ideal multtasking processor” that CFS is modeling. 49

Adding Processes to the Tree This would occur when a process becomes runn able (wakes up) or is first created via fork(). se->vruntime += cfs_rq->min_vruntime; update_curr(cfs_rq); account_entity_enqueue(cfs_rq, se); update_stats_enqueue(cfs_rq, se); __enqueue_entty(cfs_rq, se); 請參考課本 50

homework (繳交報告) • 比較 2.4 、 O(1) 、 CFS 的差異,至少必須 包含: – 如何分配 tme slice – 如何決定“下一個 task” – 如何藉由一個比較好的 scheduling algorithm 提 高整個系統的 throughput ( hint : i/o-bound & cpu-bound )

• 你或許可以添加下列內容 – 找出一個例子,完整的說明這三種 scheduling al gorithm 的優缺點

51

Related Documents