Kernel Modules n Applications •A module runs in kernel space. •Application runs in user space. •Application performs a single task Module registers itself to serve future requests •Module : linked only to kernel, function calls are exported by kernel only, no lib to link to. /usr/src/linux2.4.x/include/linux /usr/src/linux2.4.x/include/asm •Application libraries : /usr/include (e.g. printf()) Kernel fault is fatal, Application faults(segmentation
Namespace pollution Everything should be static to avoid namespace pollution Use prefix Lowercase Unique to globals
Usage Count System keeps count for every module to determine safe removal of module MOD_INC_USE_COUNT
MOD_DEC_USE_COUNT
MOD_IN_USE
Security Issues Security has two faces:- Deliberate and Incidental. Deliberate:The damage a user can cause through the misuse of existing programs. Incidental:Incidentally exploiting bugs. Any security check in the system is enforced by kernel code. Driver writers should avoid encoding security policy in their code. Security is a policy issue that is often best handled at higher levels within the kernel, under the control of the system administrator.
Security Issues Device access with affects: 1.Device operations that affect global resources (such as setting an interrupt line) 2.Operations that could affect other users (such as setting a default block size on a tape driver. 2.By buffer overrun errors.
Precautions 1.Any input received from user processes should be treated with great suspicion. 2. Be careful with uninitialized memory. 3.Specific operations (e.g., reloading the firmware on an adapter board, formatting a disk) that could affect the system, those operations should probably be restricted to privileged users. 4.Be careful, also, when receiving software from third parties, especially when the kernel is concerned.A maliciously modified kernel could allow anyone to load a module, thus opening an unexpected back door via create_module.
Precautions 5. Linux kernel can be compiled to have no module support whatsoever, thus closing any related security holes. All needed drives must be built directly into the kernel itself.
The Kernel Symbol Table
Insmod which loads module in kernel resolves undefined symbols against the table of public kernel symbols. The table contains the addresses of:1) Global kernel items 2) Functions 3)Variables
Symbol table can be read from /proc/ksyms or by ksyms command. When a module is loaded, any symbol exported by the module becomes part of the kernel symbol table.
The Kernel Symbol Table
The Kernel Symbol Table We can stack new module on top other modules. Module stacking is useful in complex projects. For example, the video-for-linux set of drivers exports symbols used by lower-level device drivers for specific hardware. When using stack modules make use modprobe utility to loads any other modules that are required by your module.. Module to export no symbols: Define a macro :EXPORT_NO_SYMBOLS Module to export subset of symbols:EXPORT_SYMTAB
Define a macro before including module.h.
Kernel Symbols EXPORT_SYMBOL before <module.h> •EXPORT_NO_SYMBOL •EXPORT_SYMBOL •EXPORT_SYMBOL_NOVERS (with no versioning information)
I/O Ports & I/O Memory Driver programmers may need to allocate I/O ports, I/O memory, and interrupt lines explicitly. System memory is anonymous and may be allocated from anywhere, I/O memory, ports, and interrupts have very specific roles. A driver needs to be able to allocate the exact ports it needs, not just some ports.
I/O Ports & I/O Memory The job of a typical driver is, writing and reading I/O ports and I/O memory. ●
Access to I/O ports and I/O memory (I/O REGIONS) happens both at initialization time and during normal operations. ●
Device driver should be guaranteed erxclusive access to its I/O regions to prevent interference. ●
Developers of linux has implemented request/free mechanism for I/O REGIONS.(which is a software abstraction) ●
Information about registered resources is in /proc/ioports and /proc/iomem. ●
Access to I/O regions The programming interface used to access the I/O registry is made up of three functions: •
1.int check_region(unsigned long start, unsigned long len); 2.struct resource *request_region(unsigned long start, unsigned long len, char *name); 3.void release_region(unsigned long start, unsigned long len);
I/O Ports 1.check_region ( ):- May be called to see if a range of ports is available for allocation. It returns a negative error code ( -EBUSY or -EINVAL) if the answer is no. 2.request_region ( ):-Will actually allocate the port range. Returning a non-NULL pointer value if the allocation succeeds. 3. release region ( ):-To release ports
The three functions are actually macros, and they are declared in .
I/O Ports
Sequence for registering ports:#include #include
static int skull_detect(unsigned int port, unsigned int range) { int err; if ((err = check_region(port,range)) < 0) return err; /* busy */ if (skull_probe_hw(port,range) != 0) return -ENODEV; /* not found */ request_region(port,range,"skull"); return 0;}
/* "Can't fail" */
I/O Ports Any I/O ports allocated by the driver must eventually be released,skull does it from within cleanup_module: static void skull_release(unsigned int port, unsigned int range) { release_region(port,range); }
I/O Memory
I/O memory information is available in the /proc/iomem file Access to a certain I/O memory region, the driver should use the following calls: 1.int check_mem_region(unsigned long start, unsigned long len); 2. int request_mem_region(unsigned long start, unsigned long len, char *name); 3. int release_mem_region(unsigned long start, unsigned long len); if (check_mem_region(mem_addr, mem_size)) { printk("drivername:memory already in use\n");return EBUSY;} request_mem_region(mem_addr, mem_size, "drivername");
I/O Memory
I/O memory information is available in the /proc/iomem file Access to a certain I/O memory region, the driver should use the following calls: 1.int check_mem_region(unsigned long start, unsigned long len); 2. int request_mem_region(unsigned long start, unsigned long len, char *name); 3. int release_mem_region(unsigned long start, unsigned long len); if (check_mem_region(mem_addr, mem_size)) { printk("drivername:memory already in use\n");return EBUSY;} request_mem_region(mem_addr, mem_size, "drivername");
Device _struct structure
When the character is registered with the kernel,its file_opeartion structure and name is added to global chrdevs, array of device_struct structures where the major number indexes it.This is called the character device switch table struct device_struct{ const char *name; struct file_operations *fops; } So by looking up chrdevs[YOUR_MAJOR]->fops, the kernel knows how to talk to the device and what entry points it supports.
ls -l /dev
Major & Minor Nos
crw-rw-rw- 1 root root
1, 3 Feb 23 1999
null
crw--------- 1 root root
10, 1 Feb 23 1999
psaux
crw----------1 rubini tty
4, 1 Aug 16 22:22 tty1
crw-rw-rw- 1 root dialout 4, 64 Jun 30 11:19 ttyS0 crw-rw-rw- 1 root dialout 4, 65 Aug 16 00:00 ttyS1 crw-------
1 root sys
7, 1
crw-------
1 root sys
7, 129 Feb 23 1999 hdcl
crw-rw-rw- 1 root root
1, 5
Feb 23 1999
vcs1
Feb 23 1999 zero
Major & Minor Nos The major no. indicates a specific device. The major number identifies the driver associated with the device. e.g /dev/null and /dev/zero are both managed by driver 1, whereas virtual consoles and serial terminals are managed by driver 4 The kernel uses the major number at open time to dispatch execution to the appropriate driver. The minor number is used only by the driver specified by the major number or minor is an instance within the device. Major 7 is the offical number for the secondary IDE controller & the IDE subsystem identifies partitions on the master & the slave device according to minor no.
Major & Minor Nos Syntax for mknod mknod name
type major minor
# mknod /dev/lp0 c
6
0
File Operations struct file_operations{ loff_t(*llseek)(struct file *,loff_t,int); ssize_t(*read)(struct file * ,char *,size_t,loff_t *); ssize_t(*write)(struct file * , const char *,size_t,loff_t *); unsigned int (*poll)(struct file *,struct poll_table_struct *); int(*ioctl)(struct inode *,struct file *, unsigned int,unsigned long); int(*open)(struct inode *,struct file *); Int((*release)(struct inode *,struct file *); }
File Operations loff_t(*llseek)(struct file * file,loff_t offset,int mode) The llseek method is used to change the current read / write position in a file, and the new position is returned as a positive return value. loff_t is a long offset ssize_t(*read)(struct file *file,char *buf,size_t count,loff_t * offset):Used to retrieve data from the device. A null pointer in this position causes the read system call to fail with -EINVAL. On success returns the number of bytes successfully read
File Operations ssize_t(*write)(struct file *,const char * buf ,size_t count ,loff_t *offset); Send data to the device. If missing -EINVAL is returned. Else represents the number of bytes successfully written.
unsigned int(*poll)(struct file *,struct poll_table_struct *); Poll and select both used to inquire if a device is readable or writeable or in some special state.Either system call can block until a device becomes readable or writable.
File Operations int(* ioctl)(struct inode *inode,struct file *file,unsigned int cmd, unsigned long arg); Offers a way to issue device -specific commands(like for Matting a track of a floppy disk,which is neither reading nor witing. On error returns -ENOTTY.
File Operations int(*open)(struct inode * inode ,struct file *file) The first operation performed on the device file, the driver is not required to declare a corresponding method,if this entry is NULL, opening device is always succeeds, but your driver isn't notified.
File Operations int (*release) (struct inode * inode , struct file *file); This operation is invoked when the file structure is being released. Like open, release can be missing.
File Operations
●
struct file { mode_t
f_mode;
loff_t
f_pos;
unsigned int f_flags; }
Debugging Techniques ●
Debugging by Printing
●
Debugging by Querying
●
Debugging by Watching
Debugging Techniques Debugging by Printing: printk is associated with different loglevels, or priorities, with the messages. We can indicate the loglevel with a macro. printk(KERN_DEBUG "Here I am: %s:%i\n", __FILE__, __LINE_&_); printk(KERN_CRIT "I'm trashed; giving up on %p\n", ptr);
Debugging Techniques There are eight possible loglevel strings, defined in the header : KERN_EMERG:-Used for emergency messages, usually those that precede a crash. KERN_ALERT:- A situation requiring immediate action. KERN_CRIT:-Critical conditions, often related to serious hardware or software failures. KERN_ERR:-Used to report error conditions; device drivers will often use KERN_ERR to report hardware difficulties. KERN_WARNING:-Warnings about problematic situations that do not, in themselves, create serious problems with the system.
Debugging Techniques
KERN_NOTICE:-Situations that are normal, but still worthy of note. A number of security-related conditions are reported at this level. KERN_INFO:-Informational messages. Many drivers print information about the hardware they find at startup time at this level. KERN_DEBUG:-Used for debugging messages. Each string in the macro expansion represents an integer in angle brackets ranging from 0 to 7. Printk with no priority defaults to DEFAULT_MESSAGE_LOGLEVEL specified in kernel/printk.c as an integer.
Debugging Techniques Based on the loglevel, the kernel may print the ,message to the current console, to a serial line or parallel printer. •
If priority is less than the integer variable console_loglevel the message is displayed. •
If both klogd and syslogd are running the kernel messages are appended to /var/log/messages or otherwise treated depending on syslogd configuration. •
klogd is a system daemon which intercepts and logs Linux kernel mes sages. •
sysklogd provides two system utilities which provide support for system logging and kernel message trapping. •
Debugging Techniques If klogd is not running , the message won't reach user space unless you read /proc/kmsg •
See /proc/sys/kernel/printk
•
First integer is current console loglevel and default level for messages. •
•
We can also write program to set console_loglevel.
Debugging by Printk How Messages Get Logged:-The printk function writes messages into a circular buffer that is LOG_BUF_LEN (defined in kernel/printk.c) bytes long. It then wakes any process that is waiting for messages, that is, any process that is sleeping in the syslog system call or that is reading /proc/kmsg. If the circular buffer fills up, printk wraps around and starts adding new data to the beginning of the buffer, overwriting the oldest data.
Debugging by Printk Turning the Messages On and Off:Each print statement can be enabled or disabled by removing or adding a single letter to the macro's name. All the messages can be disabled at once by changing the value of the CFLAGS variable before compiling. The same print statement can be used in kernel code and userlevel code, so that the driver and test programs can be managed in the same way with regard to extra messages.
Debugging by Printk Implementation of these features can be defined in scull.h #undef PDEBUG
/* undef it, just in case */
#ifdef SCULL_DEBUG # ifdef __KERNEL__ /* This one if debugging is on, and kernel space */ # define PDEBUG(fmt, args...) printk( KERN_DEBUG "scull: " fmt, ## args) # else
Debugging by Printk /* This one for user space */ #define PDEBUG(fmt, args...) fprintf(stderr, fmt, ## args) # endif #else # define PDEBUG(fmt, args...) /* not debugging: nothing */ #endif #undef PDEBUGG #define PDEBUGG(fmt, args...) /* nothing: it's a placeholder */
Debugging by Printk To simplify the process further, add the following lines to your makefile: # Comment/uncomment the following line to disable/enable debugging DEBUG = y # Add your debugging flag (or not) to CFLAGS ifeq ($(DEBUG),y) DEBFLAGS = -O -g -DSCULL_DEBUG # "-O" is needed to expand inlines else DEBFLAGS = -O2 Endif CFLAGS += $(DEBFLAGS)
Debugging by Querying Because of some disadvantages of debugging by printk :Like system crashing. And slowing down the system. We have Debugging by Querying:-By this we can derive relevant information from the system when we need the information. Two main techniques are available to driver developers for querying the system:1.Creating a file in the /proc filesystem 2. Using the ioctl driver method.
Debugging by Querying Using the /proc Filesystem:The /proc filesystem is a special , software -created filesystem that is used by the kernel to export information to the world. Each file under /proc is tied to a kernel function that generates the file's “contents” on the fly when the file is read. e.g 1.when we use /proc/modules 2. ps ,top,uptime get their information from /proc /proc filesystem is dynamic. All modules that work with /proc should include to define the proper functions.
Debugging by Querying To create a read-only /proc file your driver must implement a function to produce the data when the file is read Functions to read /proc file . int (*read_proc)(char *page, char **start, off_t offset, int count, int *eof, void *data); Page pointer: points where in the buffer you will write data. Start : where is interesting data. Eof : to indicate there is no more data. Data : is driver specific data pointer. int (*get_info)(char *page, char **start, off_t offset, int count);
Debugging by Querying When some process reads the file(using read system call), the request will reach your module by means of some connection. We need to make an entry to /proc hierarchy With kernel 2.2 and 2.4 sysdep.h is used to simply call create_proc_read_entry.
Debugging by Querying Call used by scull to make its /proc function available as /proc/scullmem create_proc_read_entry(“scullmem”, 0 /* default mode */ NULL /*parent dir */ scull_read_procmem, NULL /*client data */);
Debugging by Querying The ioctl Method: We can implement a few ioctl commands tailored for debugging. These commands can copy relevant data structure from the driver to user space where you can examine them. Just we need another program to issue the ioctl and display the results.
Debugging by Watching Sometimes minor problems can be traced just by watching behaviour of the application in user space. Ways to watch user -space program 1)Run debugger on it to step through its functions 2) add print statements. 3)run program under strace. strace command is very powerful tool that shows all the system calls issued by a user-space program and all the arguments to the calls Return value is symbolic form
Debugging by Watching If system call fails then error ENOMEM and corresponding string is dispalyed. Strace receives information from the kernel itself. A program can be traced regardless of whether or not it was compiled with debugging support. We can attach tracing to running process . Trace information is used to support bug reports. Command strace ls /dev > /dev/scull0