Узбекистан, Бухара, Бухарский институт высоких технологий, 2013 |
Custom kernels
Analyzing kernel crash dumps
When the kernel panics, and you have dumping enabled, you'll usually see something like this on the console:
\ Fatal trap 9: general protection fault while in kernel mode instruction pointer = 0x8:0xc01c434b stack pointer = 0x10:0xc99f8d0c frame pointer = 0x10:0xc99f8d28 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2638 (find) interrupt mask = net tty bio cam trap number = 9 panic: general protection fault syncing disks... 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 giving up on 6 buffers Uptime: 17h53m13s dumping to dev #ad/1, offset 786560 dump ata0: resetting devices .. done
You don't need to write this information down: it is saved in the dump.
When you reboot, the system startup scripts find that you have a dump in the designated dump device (see above) and copy it and the current kernel to /var/crash, assuming the directory exists and there's enough space for the dump. You'll see something like this in the directory:
# cd /var/crash # ls -l -rw-r--r-- 1 root wheel 3 Dec 29 10:09 bounds -rw-r--r-- 1 root wheel 4333000 Dec 29 10:10 kernel.22 -rw-r--r-- 1 root wheel 5 Sep 17 1999 minfree -rw------- 1 root wheel 268369920 Dec 29 10:09 vmcore.22
The important files here are kernel.22, which contains a copy of the kernel running when the crash occurred, and vmcore.22, which contains the contents of memory. The number 22 indicates that the sequence number of the dump. It's possible to have multiple dumps in /var/crash. Note that you can waste a lot of space like that.
The file bounds contains the number of the next dump (23 in this case), and minfree specifies the minimum amount of free space (in kilobytes) to leave on the file system after you've copied the dump. If this can't be guaranteed, savecore doesn't save the dump.
savecore copies the kernel from which you booted. As we've seen, it typically isn't a debug kernel. In the example above, we installed /usr/src/sys/i386/conf/FREEBIE/kernel, but the debug version was /usr/src/sys/i386/conf/FREEBIE/kernel.debug. This is the one you need. The easiest way to access it is to use a symbolic link:
# ln -s /usr/src/sys/i386/conf/FREEBIE/kernel.debug . # ls -lL -rw-r--r-- 1 root wheel 3 Dec 29 10:09 bounds -rwxr-xr-x 1 grog lemis 16796546 Dec 18 14:21 kernel.debug -rw-r--r-- 1 root wheel 4333000 Dec 29 10:10 kernel.22 -rw-r--r-- 1 root wheel 5 Sep 17 1999 minfree -rw------- 1 root wheel 268369920 Dec 29 10:09 vmcore.22
As you can see, it's much larger.
Next, run gdb against the kernel and the dump:
# gdb -k kernel.debug vmcore.22
The first thing you see is a political message from the Free Software Foundation, followed by a repeat of the crash messages, a listing of the current instruction (always the same) and a prompt:
#0 dumpsys () at ../../kern/kern_shutdown.c:473 473 if (dumping++) { (kgdb)
Due to the way C, gd and FreeBSD work, the real information you're looking for is further down the stack. The first thing you need to do is to find out exactly where it happens. Do that with the backtrace command:
(kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:473 #1 0xc01c88bf in boot (howto=256) at ../../kern/kern_shutdown.c:313 #2 0xc01c8ca5 in panic (fmt=0xc03a8cac "%s") at ../../kern/kern_shutdown.c:581 #3 0xc033ab03 in trap_fatal (frame=0xc99f8ccc, eva=0) at ../../i386/i386/trap.c:956 #4 0xc033a4ba in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16, tf_edi = -1069794208, tf_esi = -1069630360, tf_ebp = -912290520, tf_isp = -912290568, tf_ebx = -1069794208, tf_edx = 10, tf_ecx = 10, tf_eax = -1, tf_trapno = 9, tf_err = 0, tf_eip = -1071889589, tf_cs = 8, tf_eflags = 66182, tf_esp = 1024, tf_ss = 6864992}) at ../../i386/i386/trap.c:618 #5 0xc01c434b in malloc (size=1024, type=0xc03c3c60, flags=0) at ../../kern/kern_malloc.c:233 #6 0xc01f015c in allocbuf (bp=0xc3a6f7cc, size=1024) at ../../kern/vfs_bio.c:2380 #7 0xc01effa6 in getblk (vp=0xc9642f00, blkno=0, size=1024, slpflag=0, slptimeo=0) at ../../kern/vfs_bio.c:2271 #8 0xc01eded2 in bread (vp=0xc9642f00, blkno=0, size=1024, cred=0x0, bpp=0xc99f8e3c) at ../../kern/vfs_bio.c:504 #9 0xc02d0634 in ffs_read (ap=0xc99f8ea0) at ../../ufs/ufs/ufs_readwrite.c:273 #10 0xc02d734e in ufs_readdir (ap=0xc99f8ef0) at vnode_if.h:334 #11 0xc02d7cd1 in ufs_vnoperate (ap=0xc99f8ef0) at ../../ufs/ufs/ufs_vnops.c:2382 #12 0xc01fbc3b in getdirentries (p=0xc9a53ac0, uap=0xc99f8f80) at vnode_if.h:769 #13 0xc033adb5 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134567680, tf_esi = 134554336, tf_ebp = -1077937404, tf_isp = -912289836, tf_ebx = 672064612, tf_edx = 134554336, tf_ecx = 672137600, tf_eax = 196, tf_trapno = 7, tf_err = 2, tf_eip = 671767876, tf_cs = 31, tf_eflags = 582, tf_esp = -1077937448, tf_ss = 47}) at ../../i386/i386/trap.c:1155 #14 0xc032b825 in Xint0x80_syscall () #15 0x280a1eee in ?? () #16 0x280a173a in ?? () #17 0x804969e in ?? () #18 0x804b550 in ?? () #19 0x804935d in ?? () (kgdb)
The rest of this chapter is only of interest to programmers with a good understanding of C. If you're not a programmer, this is about as far as you can go. Save this information and supply it to whomever you ask for help. It's usually not enough to solve the problem, but it's a good start, and your helper will be able to tell you what to do next.
Climbing through the stack
The backtrace outputs information about stack frames, which are built when a function is called. They're numbered starting from the most recent frame, #0, which is seldom the one that interests us. In general, we've had a panic, the most important frame is the function that calls panic:
#3 0xc033ab03 in trap_fatal (frame=0xc99f8ccc, eva=0) at ../../i386/i386/trap.c:956
The information here is:
- #3 is the frame number. This is a number allocated by gdb. You can use it to reference the frame in a number of commands.
- 0xc033ab03 is the return address from the call to the next function up the stack (panic in this case).
- trap_fatal is the name of the function.
- (frame=0xc99f8ccc, eva=0) are the parameter values supplied to trap_fatal.
- ../../i386/i386/trap.c:956 gives the name of the source file and the line number in the file. The path names are relative to the kernel build directory, so they usually start with ../../.
In this example, the panic comes from a user process. Starting at the bottom, depending on the processor platform, you may see the user process stack. You can recognize them on an Intel platform by the addresses below the kernel base address 0xc0000000.On other platforms, the address might be different. In general, you won't get any symbolic information for these frames, since the kernel symbol table doesn't include user symbols.
Climbing up the stack, you'll find the system call stack frame, in this example at frames 14 and 13. This is where the process involved the kernel. The stack frame above (frame 12) generally shows the name of the system call, in this case getdirentries.To perform its function, getdirentries indirectly calls ffs_read, the function that reads from a UFS file. ffs_read calls bread, which reads into the buffer cache. To do so, it allocates a buffer with getblk and allocbuf, which calls malloc to allocate memory for buffer cache. The next thing we see is a stack frame for trap: something has gone wrong inside malloc. trap determines that the trap in unrecoverable and calls trap_fatal, which in turn calls panic. The stack frames above show how the system prepares to dump and writes to disk. They're no longer of interest.