2012年9月7日星期五

Learning C with gdb



Coming from a background in higher-level languages like Ruby, Scheme, or Haskell, learning C can be challenging. In addition to having to wrestle with C's lower-level features like manual memory management and pointers, you have to make do without a REPL. Once you get used to exploratory programming in a REPL, having to deal with the write-compile-run loop is a bit of a bummer.
It occurred to me recently that I could use gdb as a pseudo-REPL for C. I've been experimenting with using gdb as a tool for learning C, rather than merely debugging C, and it's a lot of fun.
My goal in this post is to show you that gdb is a great tool for learning C. I'll introduce you to a few of my favorite gdb commands, and then I'll demonstrate how you can use gdb to understand a notoriously tricky part of C: the difference between arrays and pointers.

An introduction to gdb

Start by creating the following little C program, minimal.c:
int main()
{
    int i = 1337;
    return 0;
}
Note that the program does nothing and has not a single printf statement.1 Behold the brave new world of learning C with gdb!
Compile it with the -g flag so that gdb has debug information to work with, and then feed it to gdb:
$ gcc -g minimal.c -o minimal
$ gdb minimal
You should now find yourself at a rather stark gdb prompt. I promised you a REPL, so here goes:
(gdb) print 1 + 2
$1 = 3
Amazing! print is a built-in gdb command that prints the evaluation of a C expression. If you're unsure of what a gdb command does, try running help name-of-the-command at the gdb prompt.
Here's a somewhat more interesting example:
(gbd) print (int) 2147483648
$2 = -2147483648
I'm going to ignore why 2147483648 == -2147483648; the point is that even arithmetic can be tricky in C, and gdb understands C arithmetic.
Let's now set a breakpoint in the main function and start the program:
(gdb) break main
(gdb) run
The program is now paused on line 3, just before i gets initialized. Interestingly, even though i hasn't been initialized yet, we can still look at its value using the print commnd:
(gdb) print i
$3 = 32767
In C, the value of an uninitialized local variable is undefined, so gdb might print something different for you!
We can execute the current line with the next command:
(gdb) next
(gdb) print i
$4 = 1337

Examining memory with x

Variables in C label contiguous chunks of memory. A variable's chunk is characterized by two numbers:
  1. The numerical address of the first byte in the chunk.
  2. The size of the chunk, measured in bytes. The size of a variable's chunk is determined by the variable's type.
One of the distinctive features of C is that you have direct access to a variable's chunk of memory. The & operator computes a variable's address, and the sizeof operator computes a variable's size in memory.
You can play around with both concepts in gdb:
(gdb) print &i
$5 = (int *) 0x7fff5fbff584
(gdb) print sizeof(i)
$6 = 4
In words, this says that i's chunk of memory starts at address 0x7fff5fbff5b4 and takes up four bytes of memory.
I mentioned above that a variable's size in memory is determined by its type, and indeed, thesizeof operator can operate directly on types:
(gdb) print sizeof(int)
$7 = 4
(gdb) print sizeof(double)
$8 = 8
This means that, on my machine at least, int variables take up four bytes of space anddouble variables take up eight.
Gdb comes with a powerful tool for directly examing memory: the x command. The xcommand examines memory, starting at a particular address. It comes with a number of formatting commands that provide precise control over how many bytes you'd like to examine and how you'd like to print them; when in doubt, try running help x at the gdb prompt.
The & operator computes a variable's address, so that means we can feed &i to x and thereby take a look at the raw bytes underlying i's value:
(gdb) x/4xb &i
0x7fff5fbff584: 0x39    0x05    0x00    0x00
The flags indicate that I want to examine 4 values, formatted as hex numerals, one byte at a time. I've chosen to examine four bytes because i's size in memory is four bytes; the printout shows i's raw byte-by-byte representation in memory.
One subtlety to bear in mind with raw byte-by-byte examinations is that on Intel machines, bytes are stored in "little-endian" order: unlike human notation, the least significant bytes of a number come first in memory.
One way to clarify the issue would be to give i a more interesting value and then re-examine its chunk of memory:
(gdb) set var i = 0x12345678
(gdb) x/4xb &i
0x7fff5fbff584: 0x78    0x56    0x34    0x12

Examining types with ptype

The ptype command might be my favorite command. It tells you the type of a C expression:
(gdb) ptype i
type = int
(gdb) ptype &i
type = int *
(gdb) ptype main
type = int (void)
Types in C can get complex but ptype allows you to explore them interactively.

Pointers and arrays

Arrays are a surprisingly subtle concept in C. The plan for this section is to write a simple program and then poke it in gdb until arrays start to make sense.
Code up the following arrays.c program:
int main()
{
    int a[] = {1,2,3};
    return 0;
}
Compile it with the -g flag, run it in gdb, then next over the initialization line:
$ gcc -g arrays.c -o arrays
$ gdb arrays
(gdb) break main
(gdb) run
(gdb) next
At this point you should be able to print the contents of a and examine its type:
(gdb) print a
$1 = {1, 2, 3}
(gdb) ptype a
type = int [3]
Now that our program is set up correctly in gdb, the first thing we should do is use x to see what a looks like under the hood:
(gdb) x/12xb &a
0x7fff5fbff56c: 0x01  0x00  0x00  0x00  0x02  0x00  0x00  0x00
0x7fff5fbff574: 0x03  0x00  0x00  0x00
This means that a's chunk of memory starts at address 0x7fff5fbff5dc. The first four bytes store a[0], the next four store a[1], and the final four store a[2]. Indeed, you can check that sizeof knows that a's size in memory is twelve bytes:
(gdb) print sizeof(a)
$2 = 12
At this point, arrays seem to be quite array-like. They have their own array-like types and store their members in a contiguous chunk of memory. However, in certain situations, arrays act a lot like pointers! For instance, we can do pointer arithmetic on a:
(gdb) print a + 1
$3 = (int *) 0x7fff5fbff570
In words, this says that a + 1 is a pointer to an int and holds the address0x7fff5fbff570. At this point you should be reflexively passing pointers to the xcommand, so let's see what happens:
(gdb) x/4xb a + 1
0x7fff5fbff570: 0x02  0x00  0x00  0x00
Note that 0x7fff5fbff570 is four more than 0x7fff5fbff56c, the address of a's first byte in memory. Given that int values take up four bytes, this means that a + 1 points to a[1].
In fact, array indexing in C is syntactic sugar for pointer arithmetic: a[i] is equivalent to *(a + i). You can try this in gdb:
(gdb) print a[0]
$4 = 1
(gdb) print *(a + 0)
$5 = 1
(gdb) print a[1]
$6 = 2
(gdb) print *(a + 1)
$7 = 2
(gdb) print a[2]
$8 = 3
(gdb) print *(a + 2)
$9 = 3
We've seen that in some situations a acts like an array and in others it acts like a pointer to its first element. What's going on?
The answer is that when an array name is used in a C expression, it "decays" to a pointer to the array's first element. There are only two exceptions to this rule: when the array name is passed to sizeof and when the array name is passed to the & operator.2
The fact that a doesn't decay to a pointer when passed to the & operator brings up an interesting question: is there a difference between the pointer that a decays to and &a?
Numerically, they both represent the same address:
(gdb) x/4xb a
0x7fff5fbff56c: 0x01  0x00  0x00  0x00
(gdb) x/4xb &a
0x7fff5fbff56c: 0x01  0x00  0x00  0x00
However, their types are different. We've already seen that the decayed value of a is a pointer to a's first element; this must have type int *. As for the type of &a, we can ask gdb directly:
(gdb) ptype &a
type = int (*)[3]
In words, &a is a pointer to an array of three integers. This makes sense: a doesn't decay when passed to &, and a has type int [3].
You can observe the distinction between a's decayed value and &a by checking how they behave with respect to pointer arithmetic:
(gdb) print a + 1
$10 = (int *) 0x7fff5fbff570
(gdb) print &a + 1
$11 = (int (*)[3]) 0x7fff5fbff578
Note that adding 1 to a adds four to a's address, whereas adding 1 to &a adds twelve!
The pointer that a actually decays to is &a[0]:
(gdb) print &a[0]
$11 = (int *) 0x7fff5fbff56c

Conclusion

Hopefully I've convinced you that gdb a neat exploratory environment for learning C. You can print the evaluation of expressions, examine raw bytes in memory, and tinker with the type system using ptype.
If you'd like to experiment further with using gdb to learn C, I have a few suggestions:
  1. Use gdb to work through the Ksplice pointer challenge.
  2. Investigate how structs are stored in memory. How do they compare to arrays?
  3. Use gdb's disassemble command to learn assembly programming! A particularly fun exercise is to investigate how the function call stack works.
  4. Check out gdb's "tui" mode, which provides a grahical ncurses layer on top of regular gdb. On OS X, you'll likely need to install gdb from source.

没有评论:

发表评论