Debugging the Linux kernel with JTAG

Alexander Sirotkin

8/29/2010 1:31 PM EDT

As with many Linux-related topics, the issue of using debuggers to troubleshoot the Linux kernel is not only technical--it's political. Linux is being mostly developed on the x86 platform, which does not have JTAG debugging capabilities, and software-only kernel debuggers are complex and unreliable. Because of this and other reasons, Linus Torvalds objected for a long time to inclusion of the KGDB (Linux kernel debugger) patch in the main Linux tree until Ingo Molnar managed to get a rather slimmed down KGDB variant into the 2.6.26 version. Putting politics aside, I believe that most developers would agree with me, that at least in the embedded world, a kernel debugger is a must-have tool for many tasks, BSP (board support package) development being probably the most obvious example.

Fortunately, compared with the world of x86, embedded platforms introduce not only additional challenges but also a good tool to help us tackle the problem--a JTAG debugger. It's easy to use, reliable, provides some nice features that are not available in software-only debuggers and is free from any controversy that is always associated with invasive Linux kernel patches, such as the original KGDB.

I assume that the topic of JTAG debugging is not new to you--as an experienced embedded systems developer, you have probably used something like Wind River's On-Chip Debugging in the past. I will go very briefly over general JTAG debugging capabilities and show you the peculiarities of Linux kernel debugging using JTAG.

JTAG is your friend
JTAG (Joint Test Action Group) was initially developed as a way to test circuit boards after manufacture, however today it's more commonly used for debugging embedded systems. A JTAG adaptor, sometimes referred to as in-circuit emulator (ICE), is used to access on-chip debug modules inside the target CPU.

It allows, among other things, to halt the CPU, inspect its registers and memory, single step through the code, and define breakpoints. In addition to the hard, you will also need a software debugger that supports it. Some JTAG adapter vendors provide software tools while others rely on open source packages.

Even though most JTAG debuggers nowadays support Linux, if you're shopping for a low-cost device, I suggest you ask whether it supports Linux, which boils down to memory management units (MMU), Linux binary formats, and loadable modules support. If it supports remote GDB (GNU debugger) protocol as well, you can be sure that it's going to work.

Cast of characters
I will use my setup of a FemtoLinux project throughout this article to demonstrate how to debug the Linux kernel on ARM, so it's beneficial to describe the setup first. The aim of this project is to reduce the Linux system's call latency and overhead in order to make it possible to port VxWorks 5.5 monolithic applications to Linux without redesign. As you would imagine, this kind of low-level Linux kernel hacking would be nearly impossible without JTAG.

(繼續閱讀...)

horace papa 發表在痞客邦留言(0) 人氣()

個人分類：openjtag

▲top

Mar 30 Wed 2011 17:39
DebuggingTheLinuxKernelUsingGdb

eLinux.org - Embedded Linux Wiki

DebuggingTheLinuxKernelUsingGdb

From eLinux.org

Jump to: navigation, search

[hide]

1 Debugging the linux kernel using gdb

Debugging the linux kernel using gdb

The majority of day to day kernel debugging is done by adding print statements to code by using the famous printk function. This technique is well described in Kernel Debugging Tips. Using printk is a relatively simple, effective and cheap way to find problems. There are many other Linux grown techniques that take the debugging and profiling approach to a higher level. On this page we will discuss using the GNU debugger (GDB) to do kernel debugging. The GDB page describes some basic gdb command and also gives good links to documentation. Overall starting using gdb to do kernel debugging is relatively easy.

Most of the examples here will work in two (open source) situations. When using JTAG and when using QEMU system emulation. As the second option does not require any hardware you could go on and try it right away!

The open source JTAG debugging world is not that big. One project stands out in terms of debugging capabilities is OpenOCD and this is the tool used in this documentation. OpenOCD is pretty usable on the targets we tested ARM11 and ARM9.

Requirements

GDB:

You need to get yourself a GDB that is capable of understanding you target architecture. Often this come with you cross-compiler but if you have do compile it yourself you need to understand the difference between --target and --host configure options. GDB will be running on host(read x86) and will be able to understand target (read armv6). with that you might also want to have the gdbserver that can serve as stub for you user land debugging.

OpenOCD:

TODO...

A JTAG Dongle:

TODO...

The basics

To start debugging are kernel you will need to configure the kernel to have debug symbols. Once this is done you can do your normal kernel development. When needed you can "hook-up" your debugger. Start debugging a running kernel.

- start openocd

vmlinuz v.s zImage

When you want to debug the kernel you need a little understanding of how the kernel is composed. Most important is the difference between your vmlinux and the zImage. What you need to understand at this point is that the zImage is a container. This container gets loaded by a boot loader and that execution is handed over to the zImage. This zImage unpacks the kernel to the same memory location and starts executing the kernel. (explain that vmlinux does not have to be the real kernel as it is possible to debug a "stripped" kernel using a non stripped vmlinux). overall if we look at a compiled kernel we will see that vmlinux is located at the root of the kernel tree whiles the zImage is located under arch/arm/boot

vmlinux
arch/arm/boot
`-- zImage

vmlinux is what we will be using during debugging of the Linux kernel.

(繼續閱讀...)

horace papa 發表在痞客邦留言(0) 人氣()

個人分類：openjtag

▲top

Mar 23 Wed 2011 15:51
TCP connections

TCP connections

In this section and the upcoming ones, we will take a closer look at the states and how they are handled for each of the three basic protocols TCP, UDP and ICMP. Also, we will take a closer look at how connections are handled per default, if they can not be classified as either of these three protocols. We have chosen to start out with the TCP protocol since it is a stateful protocol in itself, and has a lot of interesting details with regard to the state machine in iptables.

A TCP connection is always initiated with the 3-way handshake, which establishes and negotiates the actual connection over which data will be sent. The whole session is begun with a SYN packet, then a SYN/ACK packet and finally an ACK packet to acknowledge the whole session establishment. At this point the connection is established and able to start sending data. The big problem is, how does connection tracking hook up into this? Quite simply really.

As far as the user is concerned, connection tracking works basically the same for all connection types. Have a look at the picture below to see exactly what state the stream enters during the different stages of the connection. As you can see, the connection tracking code does not really follow the flow of the TCP connection, from the users viewpoint. Once it has seen one packet(the SYN), it considers the connection as NEW. Once it sees the return packet(SYN/ACK), it considers the connection as ESTABLISHED. If you think about this a second, you will understand why. With this particular implementation, you can allow NEW and ESTABLISHED packets to leave your local network, only allow ESTABLISHED connections back, and that will work perfectly. Conversely, if the connection tracking machine were to consider the whole connection establishment as NEW, we would never really be able to stop outside connections to our local network, since we would have to allow NEW packets back in again. To make things more complicated, there is a number of other internal states that are used for TCP connections inside the kernel, but which are not available for us in User-land. Roughly, they follow the state standards specified within RFC 793 - Transmission Control Protocol at page 21-23. We will consider these in more detail further along in this section.

As you can see, it is really quite simple, seen from the user's point of view. However, looking at the whole construction from the kernel's point of view, it's a little more difficult. Let's look at an example. Consider exactly how the connection states change in the /proc/net/ip_conntrack table. The first state is reported upon receipt of the first SYN packet in a connection.

tcp      6 117 SYN_SENT src=192.168.1.5 dst=192.168.1.35 sport=1031 \
     dport=23 [UNREPLIED] src=192.168.1.35 dst=192.168.1.5 sport=23 \
     dport=1031 use=1

As you can see from the above entry, we have a precise state in which a SYN packet has been sent, (the SYN_SENT flag is set), and to which as yet no reply has been sent (witness the [UNREPLIED] flag). The next internal state will be reached when we see another packet in the other direction.

tcp      6 57 SYN_RECV src=192.168.1.5 dst=192.168.1.35 sport=1031 \
     dport=23 src=192.168.1.35 dst=192.168.1.5 sport=23 dport=1031 \
     use=1

Now we have received a corresponding SYN/ACK in return. As soon as this packet has been received, the state changes once again, this time to SYN_RECV. SYN_RECV tells us that the original SYN was delivered correctly and that the SYN/ACK return packet also got through the firewall properly. Moreover, this connection tracking entry has now seen traffic in both directions and is hence considered as having been replied to. This is not explicit, but rather assumed, as was the [UNREPLIED] flag above. The final step will be reached once we have seen the final ACK in the 3-way handshake.

tcp      6 431999 ESTABLISHED src=192.168.1.5 dst=192.168.1.35 \
     sport=1031 dport=23 src=192.168.1.35 dst=192.168.1.5 \
     sport=23 dport=1031 use=1

In the last example, we have gotten the final ACK in the 3-way handshake and the connection has entered the ESTABLISHED state, as far as the internal mechanisms of iptables are aware. After a few more packets, the connection will also become [ASSURED], as shown in the introduction section of this chapter.

When a TCP connection is closed down, it is done in the following way and takes the following states.

As you can see, the connection is never really closed until the last ACK is sent. Do note that this picture only describes how it is closed down under normal circumstances. A connection may also, for example, be closed by sending a RST(reset), if the connection were to be refused. In this case, the connection would be closed down after a predetermined time.

(繼續閱讀...)

horace papa 發表在痞客邦留言(0) 人氣()

個人分類：networking

▲top

Mar 23 Wed 2011 15:44
the conntrack entries

The conntrack entries

Let's take a brief look at a conntrack entry and how to read them in /proc/net/ip_conntrack. This gives a list of all the current entries in your conntrack database. If you have the ip_conntrack module loaded, a cat of /proc/net/ip_conntrack might look like:

tcp      6 117 SYN_SENT src=192.168.1.6 dst=192.168.1.9 sport=32775 \
     dport=22 [UNREPLIED] src=192.168.1.9 dst=192.168.1.6 sport=22 \
     dport=32775 use=2

This example contains all the information that the conntrack module maintains to know which state a specific connection is in. First of all, we have a protocol, which in this case is tcp. Next, the same value in normal decimal coding. After this, we see how long this conntrack entry has to live. This value is set to 117 seconds right now and is decremented regularly until we see more traffic. This value is then reset to the default value for the specific state that it is in at that relevant point of time. Next comes the actual state that this entry is in at the present point of time. In the above mentioned case we are looking at a packet that is in the SYN_SENT state. The internal value of a connection is slightly different from the ones used externally with iptables. The value SYN_SENT tells us that we are looking at a connection that has only seen a TCP SYN packet in one direction. Next, we see the source IP address, destination IP address, source port and destination port. At this point we see a specific keyword that tells us that we have seen no return traffic for this connection. Lastly, we see what we expect of return packets. The information details the source IP address and destination IP address (which are both inverted, since the packet is to be directed back to us). The same thing goes for the source port and destination port of the connection. These are the values that should be of any interest to us.

The connection tracking entries may take on a series of different values, all specified in the conntrack headers available in linux/include/netfilter-ipv4/ip_conntrack*.h files. These values are dependent on which sub-protocol of IP we use. TCP, UDP or ICMP protocols take specific default values as specified in linux/include/netfilter-ipv4/ip_conntrack.h. We will look closer at this when we look at each of the protocols; however, we will not use them extensively through this chapter, since they are not used outside of the conntrack internals. Also, depending on how this state changes, the default value of the time until the connection is destroyed will also change.

Recently there was a new patch made available in iptables patch-o-matic, called tcp-window-tracking. This patch adds, among other things, all of the above timeouts to special sysctl variables, which means that they can be changed on the fly, while the system is still running. Hence, this makes it unnecessary to recompile the kernel every time you want to change the timeouts.

These can be altered via using specific system calls available in the /proc/sys/net/ipv4/netfilter directory. You should in particular look at the /proc/sys/net/ipv4/netfilter/ip_ct_* variables.

When a connection has seen traffic in both directions, the conntrack entry will erase the [UNREPLIED] flag, and then reset it. The entry tells us that the connection has not seen any traffic in both directions, will be replaced by the [ASSURED] flag, to be found close to the end of the entry. The [ASSURED] flag tells us that this connection is assured and that it will not be erased if we reach the maximum possible tracked connections. Thus, connections marked as [ASSURED] will not be erased, contrary to the non assured connections (those not marked as [ASSURED]). How many connections that the connection tracking table can hold depends upon a variable that can be set through the ip-sysctl functions in recent kernels. The default value held by this entry varies heavily depending on how much memory you have. On 128 MB of RAM you will get 8192 possible entries, and at 256 MB of RAM, you will get 16376 entries. You can read and set your settings through the /proc/sys/net/ipv4/ip_conntrack_max setting

horace papa 發表在痞客邦留言(0) 人氣()

個人分類：Linux kernel

▲top

Mar 23 Wed 2011 15:20
manuplate the iptabes and example

iptables 封包過瀘規則

需求

Kernel 2.4.x

參考文件

pinfo iptables
netfilter
Linux 伺服器安全防護 (O'REILLY, ISBN: 986-7794-18-4)
Linux iptables Pocket reference (O'REILLY, ISBN: 986-7794-39-7)
Linux Firewalls (New Riders, ISBN: 0735710996)

Linux 在做封包過瀘時，是由 kernel 的 netfilter 在實際做封包的過瀘，並非由 iptables 來做，而 iptables 最主要的功能，是用來設定 netfilter 的規則，iptables 可以用來設計防火牆或封包傳送的規則，也可以顯示目前核心 (kernel) 的 netfilter 過瀘狀態。

netfilter 攔截網路封包，分別有五個地方，這五個地方分別為 PREROUTING、POSTROUTING、INPUT、OUTPUT及 FORWRD。這五個 chains 分別為網路封包旅程時會依其狀態而經過。

以下為這五個 chains 的攔截點圖：

上圖可以很清楚看到這五個 chains 的位置，而封包在經過網路介面時，會判別該封包是會往那裡去，然而也是我們在做 iptables 設置時要搞清楚的地方。

PREROUTING
封包進入網路卡介面的時候

POSTROUTING
封包即將離開網路介面的時候

FORWARD
封包在轉送的時候，如（從 A 到 B 網段）

INPUT
到達本機的封包

OUTPUT
離開本機的封包

iptables 三種過瀘規則

filter
這是預設的規則，如果都不指定類別 (table)，那麼就會使用 filter 來當做預設的規則，filter 用來過瀘封包的來源 (埠)、目的 (埠) 和其它的類別，filter 可以使用處理 INPUT, OUTPUT, FORWARD 等 chains.

(繼續閱讀...)

horace papa 發表在痞客邦留言(0) 人氣()

▲top

Mar 22 Tue 2011 22:40
轉貼netfilter hook範例

2009/1/7

LINUX内核中Netfilter Hook的使用

LINUX内核中Netfilter Hook的使用作者：JuKevin

Hook是Linux Netfilter中重要技术，使用hook可以轻松开发内核下的多种网络处理程序。下面简单介绍一下hook及其使用。

1. hook相关数据结构

struct nf_hook_ops

{

struct list_head list;

/* User fills in from here down. */

nf_hookfn *hook;

struct module *owner;

int pf;

int hooknum;

/* Hooks are ordered in ascending priority. */

(繼續閱讀...)

horace papa 發表在痞客邦留言(0) 人氣()

個人分類：Linux kernel

▲top

Mar 22 Tue 2011 13:19
linux kernel list sample written by Adrian Huang

/*
* =============================================================================
*
*       Filename: list_head_ex.h
*
*    Description: Write a doubly linked list by using list_head structure
*
*        Version: 1.0
*        Created: Fri Oct 19 14:17:58 GMT 2007
*       Revision: none
*       Compiler: gcc
*
*         Author: Adrian Huang
*       Web Site: http://adrianhuang.blogspot.com/
*
* =============================================================================
*/

#ifndef LIST_HEAD_EX_INC

(繼續閱讀...)

horace papa 發表在痞客邦留言(0) 人氣()

個人分類：Linux kernel

▲top

Nov 05 Fri 2010 00:20
no symbol version for xxx

解决方法：
1、重新编译内核，关闭CONFIG_MODVERSIONS选项
2、重新编译内核开启MODULE_FORCE_LOAD选项，强制加载
3、拷贝Module.symversion到内核源码目录，然后在内核源码目录执行make prepare，然后再在编译Module的时候加上KERN_DIR=/usr/src/linux
4、修改Module代码，通过/proc/kallsyms来获得地址，并赋给函数指针来使用

参考资料：
http://www.4front-tech.com/forum/viewtopic.php?p=10907&sid=43dc39cc92cbe35d27a4f89ec1208eb0
http://groups.google.com/group/linux.debian.bugs.dist/browse_thread/thread/1c724e019da71903
http://ubuntuforums.org/showthread.php?p=6119045

horace papa 發表在痞客邦留言(0) 人氣()

個人分類：Linux kernel

▲top

Oct 20 Wed 2010 15:20
the email to describe how to analyse linux oops

From: Denis Vlasenko [email blocked]
To:  linux-kernel
Subject: [RFC] HOWTO find oops location
Date:   Sat, 14 Aug 2004 11:53:06 +0300

Hi folks,

Is this draft HOWTO useful? Comments?

--- cut here --- --- cut here --- --- cut here --- --- cut here --- 

Okay, so you've got an oops and want to find out what happened?

In this HOWTO, I presume you did not delete and did not
tamper with your kernel build tree. Also, I recommend you
to enable these options in the .config:

CONFIG_DEBUG_SLAB=y
CONFIG_FRAME_POINTER=y

First one makes use-after-free bug hunt easy, second gives
you much more reliable stacktraces.

Ok, let's take a look at example OOPS. ^^^^ marks are mine.

Unable to handle kernel NULL pointer dereference at virtual address 00000e14
 printing eip:
c0162887
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: eeprom snd_seq_oss snd_seq_midi_event..........
CPU:    0
EIP:    0060:[<c0162887>]    Not tainted
EFLAGS: 00010206   (2.6.7-nf2)
EIP is at prune_dcache+0x147/0x1c0
          ^^^^^^^^^^^^^^^^^^^^^^^^
eax: 00000e00   ebx: d1bde050   ecx: f1b3c050   edx: f1b3ac50
esi: f1b3ac40   edi: c1973000   ebp: 00000036   esp: c1973ef8
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 65, threadinfo=c1973000 task=c1986050)
Stack: d7721178 c1973ef8 0000007a 00000000 c1973000 f7ffea48 c0162d1f 0000007a
       c0139a2b 0000007a 000000d0 00025528 049dbb00 00000000 000001fa 00000000
       c0364564 00000001 0000000a c0364440 c013add1 00000080 000000d0 00000000
Call Trace:
 [<c0162d1f>] shrink_dcache_memory+0x1f/0x30
 [<c0139a2b>] shrink_slab+0x14b/0x190
 [<c013add1>] balance_pgdat+0x1b1/0x200
 [<c013aee7>] kswapd+0xc7/0xe0
 [<c0114270>] autoremove_wake_function+0x0/0x60
 [<c0103e9e>] ret_from_fork+0x6/0x14
 [<c0114270>] autoremove_wake_function+0x0/0x60
 [<c013ae20>] kswapd+0x0/0xe0
 [<c01021d1>] kernel_thread_helper+0x5/0x14
Code: 8b 50 14 85 d2 75 27 89 34 24 e8 4a 2b 00 00 8b 73 0c 89 1c

Let's try to find out where did that exactly happened.
Grep in your kernel tree for prune_dcache. Aha, it is defined in
fs/dcache.c! Ok, execute these two commands:

# objdump -d fs/dcache.o > fs/dcache.disasm
# make fs/cache.s

Now in fs/ you should have:

dcache.c - source code
dcache.o - compiled object file
dcache.s - assembler output of C compiler ('half-compiled' code)
dcache.disasm - disasembled object file

Open dcache.disasm and find "prune_dcache":

00000540 <prune_dcache>:
     540:       55                      push   %ebp

We need to find prune_dcache+0x147. Using shell,

# printf "0x%x\n" $((0x540+0x147))
0x687

and in dcache.disasm:

     683:       85 c0                   test   %eax,%eax
     685:       74 07                   je     68e <prune_dcache+0x14e>
     687:       8b 50 14                mov    0x14(%eax),%edx    <======== OOPS
     68a:       85 d2                   test   %edx,%edx
     68c:       75 27                   jne    6b5 <prune_dcache+0x175>
     68e:       89 34 24                mov    %esi,(%esp)
     691:       e8 fc ff ff ff          call   692 <prune_dcache+0x152>
     696:       8b 73 0c                mov    0xc(%ebx),%esi
     699:       89 1c 24                mov    %ebx,(%esp)
     69c:       e8 9f f9 ff ff          call   40 <d_free>

Comparing with "Code: 8b 50 14 85 d2 75 27 " - match!

We need to find matching line in dcache.s and, eventually, in dcache.c.
It's easy to find prune_dcache in dcache.s:

prune_dcache:
        pushl   %ebp

but even though it is not too hard to find matching instruction:

        movl    8(%edi), %eax
        decl    20(%edi)
        testb   $8, %al
        jne     .L593
.L517:
        movl    68(%ebx), %eax
        testl   %eax, %eax
        je      .L532
        movl    20(%eax), %edx  <========= OOPS
        testl   %edx, %edx
        jne     .L594
.L532:
        movl    %esi, (%esp)
        call    iput
.L565:
        movl    12(%ebx), %esi
        movl    %ebx, (%esp)
        call    d_free

it is unclear to which part of .c code it belongs:

static void prune_dcache(int count)
{
        spin_lock(&dcache_lock);
        for (; count ; count--) {
                struct dentry *dentry;
                struct list_head *tmp;
                tmp = dentry_unused.prev;
                if (tmp == &dentry_unused)
                        break;
                list_del_init(tmp);
                prefetch(dentry_unused.prev);
                dentry_stat.nr_unused--;
                dentry = list_entry(tmp, struct dentry, d_lru);
                spin_lock(&dentry->d_lock);
                /*
                 * We found an inuse dentry which was not removed from
                 * dentry_unused because of laziness during lookup.  Do not free
                 * it - just keep it off the dentry_unused list.
                 */
                if (atomic_read(&dentry->d_count)) {
                        spin_unlock(&dentry->d_lock);
                        continue;
                }
                /* If the dentry was recently referenced, don't free it. */
                if (dentry->d_flags & DCACHE_REFERENCED) {
                        dentry->d_flags &= ~DCACHE_REFERENCED;
                        list_add(&dentry->d_lru, &dentry_unused);
                        dentry_stat.nr_unused++;
                        spin_unlock(&dentry->d_lock);
                        continue;
                }
                prune_one_dentry(dentry);
        }
        spin_unlock(&dcache_lock);
}

What now?! Well, I have a silly method which helps to find
C code line corresponding to that asm one. Edit your
prune_dcache in dcache.c like this:

static void prune_dcache(int count)
{
        spin_lock(&dcache_lock);
        for (; count ; count--) {
                struct dentry *dentry;
                struct list_head *tmp;
asm("#1");
                tmp = dentry_unused.prev;
asm("#2");
                if (tmp == &dentry_unused)
                        break;
asm("#3");
                list_del_init(tmp);
asm("#4");
                prefetch(dentry_unused.prev);
asm("#5");
                dentry_stat.nr_unused--;
asm("#6");
...
...
asm("#e");
                prune_one_dentry(dentry);
        }
asm("#f");
        spin_unlock(&dcache_lock);
}

and do "make fs/dcache.s" again. Look into new dcache.s.
Nasty surprize:

APP
        #e
#NO_APP
        testb   $16, %al
        jne     .L495
        orl     $16, %eax
        leal    72(%ecx), %esi
        movl    %eax, 4(%ebx)
        movl    4(%esi), %edx
        movl    72(%ecx), %eax
        testl   %eax, %eax
        movl    %eax, (%edx)
        je      .L493
        movl    %edx, 4(%eax)
.L493:
        movl    $2097664, 4(%esi)
.L495:
        leal    40(%ebx), %ecx
        movl    40(%ebx), %eax
        movl    4(%ecx), %edx
        movl    %edx, 4(%eax)
        movl    %eax, (%edx)
        movl    $2097664, 4(%ecx)
        movl    $1048832, 40(%ebx)
        decl    dentry_stat
        movl    8(%ebx), %esi
        testl   %esi, %esi
        je      .L536
        leal    56(%ebx), %eax
        movl    $0, 8(%ebx)
        movl    56(%ebx), %edx
        movl    4(%eax), %ecx
        movl    %ecx, 4(%edx)
        movl    %edx, (%ecx)
        movl    %eax, 4(%eax)
        movl    %eax, 56(%ebx)
        movl    8(%edi), %eax
        decl    20(%edi)
        testb   $8, %al
        jne     .L592
.L518:
        movl    8(%edi), %eax
        decl    20(%edi)
        testb   $8, %al
        jne     .L593
.L517:
        movl    68(%ebx), %eax
        testl   %eax, %eax
        je      .L532
        movl    20(%eax), %edx    <======== OOPS
        testl   %edx, %edx
        jne     .L594
.L532:
        movl    %esi, (%esp)
        call    iput

How come one line of C code expanded in so much asm?!
Hmm... asm("#e") was directly before prune_one_dentry(dentry),
what's that?

static inline void prune_one_dentry(struct dentry * dentry)
{
        struct dentry * parent;
        __d_drop(dentry);
        list_del(&dentry->d_child);
        dentry_stat.nr_dentry--;        /* For d_free, below */
        dentry_iput(dentry);
        parent = dentry->d_parent;
        d_free(dentry);
        if (parent != dentry)
                dput(parent);
        spin_lock(&dcache_lock);
}

Argh! An inline function. Do asm trick to it too:

static inline void prune_one_dentry(struct dentry * dentry)
{
        struct dentry * parent;
asm("#A");
        __d_drop(dentry);
asm("#B");
        list_del(&dentry->d_child);
asm("#C");
        dentry_stat.nr_dentry--;        /* For d_free, below */
asm("#D");
        dentry_iput(dentry);
asm("#E");
...
...
}

"make fs/dcache.s", rinse, repeat. You will discover that OOPS
happened after #D mark, inside dentry_iput wich is an inline too.
Will this ever end? Lickily, yes. After yet another round of asm
insertion, we arrive at:

static inline void dentry_iput(struct dentry * dentry)
{
        struct inode *inode = dentry->d_inode;
        if (inode) {
asm("#K");
                dentry->d_inode = NULL;
asm("#L");
                list_del_init(&dentry->d_alias);
asm("#M");
                spin_unlock(&dentry->d_lock);
asm("#N");
                spin_unlock(&dcache_lock);
asm("#O");
                if (dentry->d_op && dentry->d_op->d_iput)
{
asm("#P");
                        dentry->d_op->d_iput(dentry, inode);
}
                else
...

Which corresponds to this part of new dcache.s:

.L517:
#APP
        #O
#NO_APP
        movl    68(%ebx), %eax
        testl   %eax, %eax
        je      .L532
        movl    20(%eax), %edx   <=== OOPS
        testl   %edx, %edx
        jne     .L594
.L532:
#APP
        #Q
#NO_APP

This is "if (dentry->d_op && dentry->d_op->d_iput)" condition
check, and it is oopsing trying to do second check. dentry->d_op
contains bogus pointer value 0x00000e00.
--
vda



From: Muli Ben-Yehuda [email blocked]
Subject: Re: [RFC] HOWTO find oops location
Date:   Sat, 14 Aug 2004 12:11:06 +0300

On Sat, Aug 14, 2004 at 11:53:06AM +0300, Denis Vlasenko wrote:
> Hi folks,
> 
> Is this draft HOWTO useful? Comments?

Looks very nice. One small niggle: 

> EIP is at prune_dcache+0x147/0x1c0
>           ^^^^^^^^^^^^^^^^^^^^^^^^

> Let's try to find out where did that exactly happened.
> Grep in your kernel tree for prune_dcache. Aha, it is defined in
> fs/dcache.c! Ok, execute these two commands:
> 
> # objdump -d fs/dcache.o > fs/dcache.disasm
> # make fs/cache.s

you mean 'make fs/dcache.s' here, I believe. 

Cheers, 
Muli
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/



From: Zwane Mwaikambo [email blocked]
Subject: Re: [RFC] HOWTO find oops location
Date:   Sat, 14 Aug 2004 09:41:10 -0400 (EDT)

There are a few very simple methods i use all the time;

compile with CONFIG_DEBUG_INFO (it's safe to select the option and
recompile after the oops even) and then;

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
 printing eip:
c046a188
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU:    0
EIP:    0060:[<c046a188>]    Not tainted VLI
EFLAGS: 00010246   (2.6.6-mm3)
EIP is at serial_open+0x38/0x170
[...]

(gdb) list *serial_open+0x38
0xc046a188 is in serial_open (drivers/usb/serial/usb-serial.c:465).
460
461             /* get the serial object associated with this tty pointer */
462             serial = usb_serial_get_by_index(tty->index);
463
464             /* set up our port structure making the tty driver remember our port object, and us it */
465             portNumber = tty->index - serial->minor;
466             port = serial->port[portNumber];
467             tty->driver_data = port;
468
469             port->tty = tty;

And then for cases where you deadlock and the NMI watchdog triggers with
%eip in a lock section;

NMI Watchdog detected LOCKUP on CPU0,
eip c0119e5e, registers:
Modules linked in:
CPU:    0
EIP:    0060:[<c0119e5e>]    Tainted:
EFLAGS: 00000086   (2.6.7)
EIP is at .text.lock.sched+0x89/0x12b
[...]

(gdb) disassemble 0xc0119e5e
Dump of assembler code for function Letext:
[...]
0xc0119e59 <Letext+132>:        repz nop
0xc0119e5b <Letext+134>:        cmpb   $0x0,(%edi)
0xc0119e5e <Letext+137>:        jle    0xc0119e59 <Letext+132>
0xc0119e60 <Letext+139>:        jmp    0xc0118183 <scheduler_tick+487>

(gdb) list *scheduler_tick+487
0xc0118183 is in scheduler_tick (include/asm/spinlock.h:124).
119             if (unlikely(lock->magic != SPINLOCK_MAGIC)) {
120                     printk("eip: %p\n", &&here);
121                     BUG();
122             }
123     #endif
124             __asm__ __volatile__(
125                     spin_lock_string
126                     :"=m" (lock->lock) : : "memory");
127     }

But that's not much help since it's pointing to an inline function and not
the real lock location, so just subtract a few bytes;

(gdb) list *scheduler_tick+450
0xc011815e is in scheduler_tick (kernel/sched.c:2021).
2016            cpustat->system += sys_ticks;
2017
2018            /* Task might have expired already, but not scheduled off yet */
2019            if (p->array != rq->active) {
2020                    set_tsk_need_resched(p);
2021                    goto out;
2022            }
2023            spin_lock(&rq->lock);

So we have our lock location. Then there are cases where there is a "Bad
EIP" most common ones are when a bad function pointer is followed or if
some of the kernel text or a module got unloaded/unmapped (e.g. via
__init). You can normally determine which is which by noting that bad eip
for unloaded text normally looks like a valid virtual address.

Unable to handle kernel NULL pointer dereference at virtual address 00000000
00000000
*pde = 00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<00000000>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00210246
[...]
Call Trace:
 [<c01dbbfb>] smb_readdir+0x4fb/0x6e0
 [<c0165560>] filldir64+0x0/0x130
 [<c016524a>] vfs_readdir+0x8a/0x90
 [<c0165560>] filldir64+0x0/0x130
 [<c01656fd>] sys_getdents64+0x6d/0xa6
 [<c0165560>] filldir64+0x0/0x130
 [<c010adff>] syscall_call+0x7/0xb
Code: Bad EIP value.

>From there you're best off examining the call trace to find the culprit.



From: Marcelo Tosatti [email blocked]
Subject: Re: [RFC] HOWTO find oops location
Date:   Sat, 14 Aug 2004 11:06:42 -0300

> What now?! Well, I have a silly method which helps to find

> C code line corresponding to that asm one. Edit your
> prune_dcache in dcache.c like this:
> 
> static void prune_dcache(int count)
> {
>         spin_lock(&dcache_lock);
>         for (; count ; count--) {
>                 struct dentry *dentry;
>                 struct list_head *tmp;
> asm("#1");
>                 tmp = dentry_unused.prev;
> asm("#2");
>                 if (tmp == &dentry_unused)
>                         break;
> asm("#3");
>                 list_del_init(tmp);
> asm("#4");
>                 prefetch(dentry_unused.prev);
> asm("#5");
>                 dentry_stat.nr_unused--;
> asm("#6");
> ...
> ...
> asm("#e");
>                 prune_one_dentry(dentry);
>         }
> asm("#f");
>         spin_unlock(&dcache_lock);
> }

Might be also worth mentioning "gcc -c file.c -g -Wa,-a,-ad  > file.s"
which makes gcc output C code mixed with asm output. 

Sometimes its not as effective as the comment method you describe, 
but it will be less work for sure :)

The document looks great, but could go deeper into things 
like like hardware-flaky bitflips, stack junk (explain why
the stack can be "unreliable"), etc. to be even more
useful. 

Hosting it somewhere would be nice also.

horace papa 發表在痞客邦留言(0) 人氣()

個人分類：Linux kernel

▲top

Aug 20 Fri 2010 11:25
sk_buff structure

http://blog.chinaunix.net/u3/115276/showart_2284947.html

一、sk_buff的结构图如下

Linux学习之网络报文sk_buff结构 - 锕扬 - 锕扬的博客

二.sk_buff结构基本操作

1、skb_headroom(), skb_tailroom()

    原型/描述

int skb_headroom(const struct sk_buff *skb);

   bytes at buffer head

int skb_tailroom(const struct sk_buff *skb);

bytes at buffer

     示图

2、skb_reserve()

原型

void skb_reserve(struct sk_buff *skb, unsigned int len);

描述

       adjust headroom

示图

3、skb_push()

原型

unsigned char *skb_push(struct sk_buff *skb, unsigned int len);