Coming from a background in higher-level languages like Ruby, Scheme, or Haskell, learning C can be challenging. In addition to having to wrestle with C's lower-level features like manual memory management and pointers, you have to make do without a REPL. Once you get used to exploratory programming in a REPL, having to deal with the write-compile-run loop is a bit of a bummer.
It occurred to me recently that I could use gdb as a pseudo-REPL for C. I've been experimenting with using gdb as a tool for learning C, rather than merely debugging C, and it's a lot of fun.
My goal in this post is to show you that gdb is a great tool for learning C. I'll introduce you to a few of my favorite gdb commands, and then I'll demonstrate how you can use gdb to understand a notoriously tricky part of C: the difference between arrays and pointers.
An introduction to gdb
Start by creating the following little C program, minimal.c:
int main()
{
int i = 1337;
return 0;
}
Note that the program does nothing and has not a single printf statement. Behold the brave new world of learning C with gdb!
Compile it with the -g flag so that gdb has debug information to work with, and then feed it to gdb:
$ gcc -g minimal.c -o minimal
$ gdb minimal
You should now find yourself at a rather stark gdb prompt. I promised you a REPL, so here goes:
(gdb) print 1 + 2
$1 = 3
Amazing! print
is a built-in gdb command that prints the evaluation of a C expression. If you're unsure of what a gdb command does, try running help name-of-the-command
at the gdb prompt.
Here's a somewhat more interesting example:
(gbd) print (int) 2147483648
$2 = -2147483648
I'm going to ignore why 2147483648 == -2147483648
; the point is that even arithmetic can be tricky in C, and gdb understands C arithmetic.
Let's now set a breakpoint in the main
function and start the program:
(gdb) break main
(gdb) run
The program is now paused on line 3, just before i
gets initialized. Interestingly, even though i
hasn't been initialized yet, we can still look at its value using the print
commnd:
(gdb) print i
$3 = 32767
In C, the value of an uninitialized local variable is undefined, so gdb might print something different for you!
We can execute the current line with the next
command:
(gdb) next
(gdb) print i
$4 = 1337
Examining memory with x
Variables in C label contiguous chunks of memory. A variable's chunk is characterized by two numbers:
The numerical address of the first byte in the chunk.
The size of the chunk, measured in bytes. The size of a variable's chunk is determined by the variable's type.
One of the distinctive features of C is that you have direct access to a variable's chunk of memory. The &
operator computes a variable's address, and the sizeof
operator computes a variable's size in memory.
You can play around with both concepts in gdb:
(gdb) print &i
$5 = (int *) 0x7fff5fbff584
(gdb) print sizeof(i)
$6 = 4
In words, this says that i
's chunk of memory starts at address 0x7fff5fbff5b4
and takes up four bytes of memory.
I mentioned above that a variable's size in memory is determined by its type, and indeed, thesizeof
operator can operate directly on types:
(gdb) print sizeof(int)
$7 = 4
(gdb) print sizeof(double)
$8 = 8
This means that, on my machine at least, int
variables take up four bytes of space anddouble
variables take up eight.
Gdb comes with a powerful tool for directly examing memory: the x
command. The x
command examines memory, starting at a particular address. It comes with a number of formatting commands that provide precise control over how many bytes you'd like to examine and how you'd like to print them; when in doubt, try running help x
at the gdb prompt.
The &
operator computes a variable's address, so that means we can feed &i
to x
and thereby take a look at the raw bytes underlying i
's value:
(gdb) x/4xb &i
0x7fff5fbff584: 0x39 0x05 0x00 0x00
The flags indicate that I want to examine 4
values, formatted as hex
numerals, one b
yte at a time. I've chosen to examine four bytes because i
's size in memory is four bytes; the printout shows i
's raw byte-by-byte representation in memory.
One subtlety to bear in mind with raw byte-by-byte examinations is that on Intel machines, bytes are stored in "little-endian" order: unlike human notation, the least significant bytes of a number come first in memory.
One way to clarify the issue would be to give i
a more interesting value and then re-examine its chunk of memory:
(gdb) set var i = 0x12345678
(gdb) x/4xb &i
0x7fff5fbff584: 0x78 0x56 0x34 0x12
Examining types with ptype
The ptype
command might be my favorite command. It tells you the type of a C expression:
(gdb) ptype i
type = int
(gdb) ptype &i
type = int *
(gdb) ptype main
type = int (void)
Types in C can get
complex but
ptype
allows you to explore them interactively.
Pointers and arrays
Arrays are a surprisingly subtle concept in C. The plan for this section is to write a simple program and then poke it in gdb until arrays start to make sense.
Code up the following arrays.c program:
int main()
{
int a[] = {1,2,3};
return 0;
}
Compile it with the -g
flag, run it in gdb, then next
over the initialization line:
$ gcc -g arrays.c -o arrays
$ gdb arrays
(gdb) break main
(gdb) run
(gdb) next
At this point you should be able to print
the contents of a
and examine its type:
(gdb) print a
$1 = {1, 2, 3}
(gdb) ptype a
type = int [3]
Now that our program is set up correctly in gdb, the first thing we should do is use x
to see what a
looks like under the hood:
(gdb) x/12xb &a
0x7fff5fbff56c: 0x01 0x00 0x00 0x00 0x02 0x00 0x00 0x00
0x7fff5fbff574: 0x03 0x00 0x00 0x00
This means that a
's chunk of memory starts at address 0x7fff5fbff5dc
. The first four bytes store a[0]
, the next four store a[1]
, and the final four store a[2]
. Indeed, you can check that sizeof
knows that a
's size in memory is twelve bytes:
(gdb) print sizeof(a)
$2 = 12
At this point, arrays seem to be quite array-like. They have their own array-like types and store their members in a contiguous chunk of memory. However, in certain situations, arrays act a lot like pointers! For instance, we can do pointer arithmetic on a
:
(gdb) print a + 1
$3 = (int *) 0x7fff5fbff570
In words, this says that a + 1
is a pointer to an int
and holds the address0x7fff5fbff570
. At this point you should be reflexively passing pointers to the x
command, so let's see what happens:
(gdb) x/4xb a + 1
0x7fff5fbff570: 0x02 0x00 0x00 0x00
Note that 0x7fff5fbff570
is four more than 0x7fff5fbff56c
, the address of a
's first byte in memory. Given that int
values take up four bytes, this means that a + 1
points to a[1]
.
In fact, array indexing in C is syntactic sugar for pointer arithmetic: a[i]
is equivalent to *(a + i)
. You can try this in gdb:
(gdb) print a[0]
$4 = 1
(gdb) print *(a + 0)
$5 = 1
(gdb) print a[1]
$6 = 2
(gdb) print *(a + 1)
$7 = 2
(gdb) print a[2]
$8 = 3
(gdb) print *(a + 2)
$9 = 3
We've seen that in some situations a
acts like an array and in others it acts like a pointer to its first element. What's going on?
The answer is that when an array name is used in a C expression, it "decays" to a pointer to the array's first element. There are only two exceptions to this rule: when the array name is passed to sizeof
and when the array name is passed to the &
operator.
The fact that a
doesn't decay to a pointer when passed to the &
operator brings up an interesting question: is there a difference between the pointer that a
decays to and &a
?
Numerically, they both represent the same address:
(gdb) x/4xb a
0x7fff5fbff56c: 0x01 0x00 0x00 0x00
(gdb) x/4xb &a
0x7fff5fbff56c: 0x01 0x00 0x00 0x00
However, their types are different. We've already seen that the decayed value of a
is a pointer to a
's first element; this must have type int *
. As for the type of &a
, we can ask gdb directly:
(gdb) ptype &a
type = int (*)[3]
In words, &a
is a pointer to an array of three integers. This makes sense: a
doesn't decay when passed to &
, and a
has type int [3]
.
You can observe the distinction between a
's decayed value and &a
by checking how they behave with respect to pointer arithmetic:
(gdb) print a + 1
$10 = (int *) 0x7fff5fbff570
(gdb) print &a + 1
$11 = (int (*)[3]) 0x7fff5fbff578
Note that adding 1
to a
adds four to a
's address, whereas adding 1
to &a
adds twelve!
The pointer that a
actually decays to is &a[0]
:
(gdb) print &a[0]
$11 = (int *) 0x7fff5fbff56c
Conclusion
Hopefully I've convinced you that gdb a neat exploratory environment for learning C. You can print
the evaluation of expressions, ex
amine raw bytes in memory, and tinker with the type system using ptype
.
If you'd like to experiment further with using gdb to learn C, I have a few suggestions:
-
Investigate how structs are stored in memory. How do they compare to arrays?
Use gdb's disassemble
command to learn assembly programming! A particularly fun exercise is to investigate how the function call stack works.
Check out gdb's "tui" mode, which provides a grahical ncurses layer on top of regular gdb. On OS X, you'll likely need to install gdb from source.