Compiling a program turns a human-readable source file (set x = 3) into assembly language (load 3 into register) and finally into machine language (101010101…).
Debugging is stepping through a program to identify the cause of errors. But how do you step through a program that is stored as low-level machine instructions? (Some talented souls can read and decipher raw computer instructions — I cannot).
Enter the symbol table. This table maps instructions in the compiled binary program to their corresponding variable, function, or line in the source code. This mapping could be something like:
- Program instruction => item name, item type, original file, line number defined…
[Aside: I'm not sure exactly how symbol tables are implemented. They could add tags to the source code (a SymbolID), or store the address of the instruction and map that to a variable declaration, variable use, line number, etc.]
Symbol tables may be embedded into the program, or stored as a separate file.
Symbol tables may not be created by default — the compiler must be told to create a “debug” version with a symbol table (the “-g” option for the GCC compiler). A program without the symbol table is called a “retail” build, and is more difficult to reverse-engineer — it has no information that maps the binary program to the original source code.
The symbol table does not include the source code, but can give clues about it by referring to the actual variable and function names. There are no variable names in compiled binary programs — all operations are done using numbered registers.
A “debugger” is an application that reads the symbol table and lets a programmer walk through the program being debugged. It can execute and step through the program, showing the line of source code. This is great for fixing bugs — if your program crashes, reproduce the behavior and see the exact line in the source code that caused the crash. Fix the bug, and try again.
Debuggers can also
- Set breakpoints — pause the program when it reaches a certain line of code (useful for checking error conditions)
- Set variables — change internal program variables, and see how the software responds
- Set watches and conditions — pause (aka break) the program when a certain conditions is met, such as a variable reaching a certain value
We can infer a few facts about symbol tables:
- A symbol table works for a particular version of the program –if the program changes, a new table must be made.
- Debug builds are often larger and slower than retail (non-debug) builds; debug builds contain the symbol table and other ancillary information.
- If you wish to debug a binary program you did not compile yourself, you must get the symbol tables from the author.