Библиотека сайта rus-linux.net
| Purchase | Copyright © 2002 Paul Sheer. Click here for copying permissions. | Home |
|
| |
Next: 23. Shared Libraries Up: rute Previous: 21. System Services and   Contents
Subsections
- 22.1 C Fundamentals
- 22.1.1 The simplest C program
- 22.1.2 Variables and types
- 22.1.3 Functions
- 22.1.4
for,while,if, andswitchstatements - 22.1.5 Strings, arrays, and memory allocation
- 22.1.6 String operations
- 22.1.7 File operations
- 22.1.8 Reading command-line arguments inside C programs
- 22.1.9 A more complicated example
- 22.1.10
#includestatements and prototypes - 22.1.11 C comments
- 22.1.12
#defineand#if-- C macros
- 22.2 Debugging with
gdbandstrace - 22.3 C Libraries
- 22.4 C Projects --
Makefiles
22. Trivial Introduction to C
C was invented for the purpose of writing an operating system that could be recompiled (ported) to different hardware platforms (different CPUs). Because the operating system is written in C, this language is the first choice for writing any kind of application that has to communicate efficiently with the operating system.
Many people who don't program very well in C think of C as an arbitrary language out of many. This point should be made at once: C is the fundamental basis of all computing in the world today. UNIX, Microsoft Windows, office suites, web browsers and device drivers are all written in C. Ninety-nine percent of your time spent at a computer is probably spent using an application written in C. About 70% of all ``open source'' software is written in C, and the remaining 30% written in languages whose compilers or interpreters are written in C. [C++ is also quite popular. It is, however, not as fundamental to computing, although it is more suitable in many situations.]
Further, there is no replacement for C. Since it fulfills its purpose almost flawlessly, there will never be a need to replace it. Other languages may fulfill other purposes, but C fulfills its purpose most adequately. For instance, all future operating systems will probably be written in C for a long time to come.
It is for these reasons that your knowledge of UNIX will never be complete until you can program in C. On the other hand, just because you can program in C does not mean that you should. Good C programming is a fine art which many veteran C programmers never manage to master, even after many years. It is essential to join a Free software project to properly master an effective style of C development.
22.1 C Fundamentals
We start with a simple C program
and then add fundamental elements to it.
Before going too far, you may wish to review
bash
functions in Section 7.7.
22.1.1 The simplest C program
A simple C program is:
5 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ printf ("Hello World!\n"); return 3;} |
Save this program in a file
hello.c. We will now
compile the program. [Compiling is the process of turning C
code into assembler instructions. Assembler instructions are
the program code that your 80?86/SPARC/RS6000 CPU
understands directly. The resulting binary executable is fast because
it is executed natively by
your processor--it is the very chip that
you see on your motherboard that does fetch
Hello byte for byte
from memory and executes each instruction. This is what is meant by
million instructions per second (MIPS). The
megahertz of the machine quoted by hardware vendors is
very roughly the number of MIPS. Interpreted languages (like
shell scripts) are much slower because the code itself is written in
something not understandable to the CPU. The
/bin/bash
program has to interpret the shell program.
/bin/bash
itself is written in C, but the overhead of interpretation makes
scripting languages many orders of magnitude slower than compiled
languages. Shell scripts do not need to be compiled.]Run the command
|
gcc -Wall -o hello hello.c |
The
-o hello option tells
gcc [GNU C Compiler.
cc on other UNIX
systems.] to produce the binary file
hello instead of the
default binary file named
a.out. [Called
a.out for historical reasons.]The
-Wall option means to report
all
Warnings during the compilation. This is not
strictly necessary but is most helpful
for correcting possible errors in your programs. More compiler options
are discussed on page
.
Then, run the program with
|
./hello |
Previously you should have familiarized yourself with
bash functions. In
C all code is inside a function.
The first function to be called (by the operating system) is the
main function.
Type
echo $? to see the return
code of the program. You will
see it is
3, the return value of the
main
function.
Other things to note are the
" on either side of the string
to be printed. Quotes are required around string literals. Inside
a string literal, the
\n escape sequence
indicates a newline character.
ascii(7) shows some
other escape sequences. You can also see a proliferation of
; everywhere in a C program. Every statement in C is
terminated by a
; unlike statements in shell scripts where
a
; is optional.
Now try:
5 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ printf ("number %d, number %d\n", 1 + 2, 10); exit (3);} |
printf can be thought of as the command to send output
to the terminal. It is also what is known as a standard C
library function. In other words, it is specified that a C
implementation should always have the
printf function
and that it should behave in a certain way.
The
%d specifies that a
decimal should
go in at that point in the text. The number to be substituted
will be the first argument to the
printf function
after the string literal--that is, the
1 + 2. The
next
%d is substituted with the second argument--that is,
the
10. The
%d is known as a format
specifier. It essentially converts an integer number into
a decimal representation. See
printf(3) for more details.
22.1.2 Variables and types
With
bash, you could use a variable anywhere, anytime, and the
variable would just be blank if it had never been assigned a value. In C,
however, you have to explicitly tell the compiler what variables you are
going to need before each block of code. You do this with a variable declaration:
5 10 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ int x; int y; x = 10; y = 2: printf ("number %d, number %d\n", 1 + y, x); exit (3);} |
The
int x is a variable declaration. It tells the
program to reserve space for one
integer
variable
that it will later refer to as
x.
int is the
type of the variable.
x = 10 assigned a value of 10
to the variable. There are types for each kind of number
you would like to work with, and format specifiers to convert them
for printing:
5 10 15 20 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ char a; short b; int c; long d; float e; double f; long double g; a = 'A'; b = 10; c = 10000000; d = 10000000; e = 3.14159; f = 10e300; g = 10e300; printf ("%c, %hd, %d, %ld, %f, %f, %Lf\n", a, b, c, d, e, f, g); exit (3);} |
You will notice that
%f is used for both
floats and
doubles. The reason is that a
float is always converted
to a
double before an operation like this. Also try replacing
%f with
%e to print in
exponential notation--that is,
less significant digits.
22.1.3 Functions
Functions are implemented as follows:
5 10 |
#include <stdlib.h>#include <stdio.h> void mutiply_and_print (int x, int y){ printf ("%d * %d = %d\n", x, y, x * y);} int main (int argc, char *argv[]){ mutiply_and_print (30, 5); mutiply_and_print (12, 3); exit (3);} |
Here we have a non-main function called by the
main function. The function is first declared
with
|
void mutiply_and_print (int x, int y) |
This declaration states the return value of the function
(
void for no return value), the function name
(
mutiply_and_print), and then the
arguments that
are going to be passed to the function. The numbers passed to
the function are given their own names,
x and
y,
and are converted to the type of
x and
y before
being passed to the function--in this case,
int and
int. The actual C code that comprises the function
goes between curly braces
{ and
}.
In other words, the above function is equivalent to:
5 |
void mutiply_and_print (){ int x; int y; x = <first-number-passed> y = <second-number-passed> printf ("%d * %d = %d\n", x, y, x * y);} |
22.1.4
for,
while,
if, and
switch statements
As with shell scripting, we have
the
for,
while, and
if statements:
5 10 15 20 25 30 35 40 45 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ int x; x = 10; if (x == 10) { printf ("x is exactly 10\n"); x++; } else if (x == 20) { printf ("x is equal to 20\n"); } else { printf ("No, x is not equal to 10 or 20\n"); } if (x > 10) { printf ("Yes, x is more than 10\n"); } while (x > 0) { printf ("x is %d\n", x); x = x - 1; } for (x = 0; x < 10; x++) { printf ("x is %d\n", x); } switch (x) { case 9: printf ("x is nine\n"); break; case 10: printf ("x is ten\n"); break; case 11: printf ("x is eleven\n"); break; default: printf ("x is huh?\n"); break; } return 0;} |
It is easy to see the format that these statements take, although they
are vastly different from shell scripts. C code works in
statement blocks between
curly braces, in the same way that shell scripts have
do's and
done's.
Note that with most programming languages when we want to
add
1 to a variable we have to write, say,
x = x + 1.
In C, the abbreviation
x++ is used, meaning to
increment a variable by
1.
The
for loop takes three statements between
(
...
): a statement to start things off, a comparison, and
a statement to be executed on each completion of the statement block.
The statement block after the
for is repeatedly executed until
the comparison is untrue.
The
switch statement is like
case in shell scripts.
switch considers the argument inside its
( ...
) and decides which
case line to jump to. In this
example it will obviously be
printf ("x is ten\n"); because
x was 10 when the previous
for loop exited.
The
break tokens mean that we are through with the
switch
statement and that execution should continue from Line 46.
Note that in C the comparison
== is used instead of
=.
The symbol
= means to assign a value to a variable, whereas
==
is an equality operator.
22.1.5 Strings, arrays, and memory allocation
You can define a list of numbers with:
|
int y[10]; |
This list is called an array:
5 10 15 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ int x; int y[10]; for (x = 0; x < 10; x++) { y[x] = x * 2; } for (x = 0; x < 10; x++) { printf ("item %d is %d\n", x, y[x]); } return 0;} |
If an array is of type
character,
then it is called a string:
5 10 15 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ int x; char y[11]; for (x = 0; x < 10; x++) { y[x] = 65 + x * 2; } for (x = 0; x < 10; x++) { printf ("item %d is %d\n", x, y[x]); } y[10] = 0; printf ("string is %s\n", y); return 0;} |
Note that a string has to be null-terminated. This means
that the last character must be a zero. The code
y[10] = 0
sets the 11th item in the array to zero. This also means that
strings need to be one
char longer than you would think.
(Note that the first item in the array is
y[0], not
y[1],
as with some other programming languages.)
In the preceding example, the line
char y[11] reserved 11
bytes for the string. But what if you want a string of 100,000
bytes? C allows you to request memory
from the kernel.
This is called allocate memory. Any non-trivial
program will allocate memory for itself and there is no other
way of getting large blocks of memory for your program
to use. Try:
5 10 15 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ int x; char *y; y = malloc (11); printf ("%ld\n", y); for (x = 0; x < 10; x++) { y[x] = 65 + x * 2; } y[10] = 0; printf ("string is %s\n", y); free (y); return 0;} |
The declaration
char *y means to
declare a variable (a number) called
y that points
to a memory location. The
*
(asterisk) in this context
means pointer. For example, if you have a machine with perhaps 256
megabytes of RAM + swap, then
y potentially has a range of
this much. The numerical value of
y is also printed
with
printf ("%ld\n", y);, but is of no
interest to the programmer.
When you have finished using memory you must give it back to the operating
system by using
free. Programs that don't
free
all the memory they allocate are said to
leak memory.
Allocating memory often requires you to perform a calculation to
determine the amount of memory required. In the above case we
are allocating the space of 11
chars. Since each
char is really a single byte, this presents no problem. But
what if we were allocating 11
ints? An
int on a
PC is 32 bits--four bytes. To determine the size of a type,
we use the
sizeof keyword:
5 10 15 20 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ int a; int b; int c; int d; int e; int f; int g; a = sizeof (char); b = sizeof (short); c = sizeof (int); d = sizeof (long); e = sizeof (float); f = sizeof (double); g = sizeof (long double); printf ("%d, %d, %d, %d, %d, %d, %d\n", a, b, c, d, e, f, g); return 0;} |
Here you can see the number of bytes required by all of these
types.
Now we can easily allocate arrays of things other than
char.
5 10 15 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ int x; int *y; y = malloc (10 * sizeof (int)); printf ("%ld\n", y); for (x = 0; x < 10; x++) { y[x] = 65 + x * 2; } for (x = 0; x < 10; x++) { printf ("%d\n", y[x]); } free (y); return 0;} |
On many machines an
int is four bytes (32 bits), but you
should never assume this. Always use the
sizeof
keyword to allocate memory.
22.1.6 String operations
C programs probably do more string manipulation than anything else. Here is a program that divides a sentence into words:
5 10 15 20 25 30 35 |
#include <stdlib.h>#include <stdio.h>#include <string.h> int main (int argc, char *argv[]){ int length_of_word; int i; int length_of_sentence; char p[256]; char *q; strcpy (p, "hello there, my name is fred."); length_of_sentence = strlen (p); length_of_word = 0; for (i = 0; i <= length_of_sentence; i++) { if (p[i] == ' ' || i == length_of_sentence) { q = malloc (length_of_word + 1); if (q == 0) { perror ("malloc failed"); abort (); } strncpy (q, p + i - length_of_word, length_of_word); q[length_of_word] = 0; printf ("word: %s\n", q); free (q); length_of_word = 0; } else { length_of_word = length_of_word + 1; } } return 0;} |
Here we introduce three more
standard C library functions.
strcpy stands for
string
co
py. It copies
bytes from one place to another sequentially, until it reaches a zero
byte (i.e., the end of string). Line 13 of this program
copies text into the
character array
p, which
is called the target of the copy.
strlen stands for
string
length.
It determines the length of a string, which is just a count of
the number of
characters up to the null character.
We need to loop over the length of the sentence. The variable
i indicates the current position in the sentence.
Line 20 says that if
we find a character 32 (denoted by
' '), we know we have
reached a word boundary. We also know that the end of the
sentence is a word boundary even though there may not be a
space there. The token
||
means OR. At
this point we can allocate memory for the
current word and copy the word into that memory. The
strncpy function is useful for this. It copies
a string, but only up to a limit of
length_of_word
characters (the last argument). Like
strcpy, the first
argument is the target, and the second argument is the
place to copy from.
To calculate the position of the start of the last word, we use
p + i - length_of_word. This means that we are adding
i to the memory location
p and then going back
length_of_word counts thereby pointing
strncpy to the exact position.
Finally, we null-terminate the string on Line 27. We can then print
q,
free the used memory,
and begin with the next word.
For a complete list of string operations, see
string(3).
22.1.7 File operations
Under most programming languages, file operations involve
three steps: opening a file, reading or
writing to the file, and then closing the
file. You use the command
fopen to tell the operating
system that you are ready to begin working with a file:
The following program opens a file and spits it out on the terminal:
5 10 15 20 |
#include <stdlib.h>#include <stdio.h>#include <string.h> int main (int argc, char *argv[]){ int c; FILE *f; f = fopen ("mytest.c", "r"); if (f == 0) { perror ("fopen"); return 1; } for (;;) { c = fgetc (f); if (c == -1) break; printf ("%c", c); } fclose (f); return 0;} |
A new type is presented here:
FILE *. It is a file
operations variable that must be initialized with
fopen before it can be used. The
fopen
function takes two arguments: the first is the name of the file, and
the second is a string explaining how we want to open the
file--in this case
"r" means
reading
from the start of the file. Other options are
"w" for
writing and several more described in
fopen(3).
If the return value of
fopen is zero, it means that
fopen has failed. The
perror function then prints a
textual error message (for example,
No such file or directory).
It is essential to check the return value of all
library calls in this way. These checks will
constitute about one third of your C program.
The command
fgetc gets a character from the file.
It retrieves consecutive bytes from the file until it reaches the
end of the file, when it returns a
-1. The
break
statement says to immediately terminate the
for loop,
whereupon execution will continue from line 21.
break
statements can appear inside
while loops as well.
You will notice that the
for statement is empty. This is
allowable C code and means to loop forever.
Some other file functions are
fread,
fwrite,
fputc,
fprintf, and
fseek. See
fwrite(3),
fputc(3),
fprintf(3), and
fseek(3).
22.1.8 Reading command-line arguments inside C programs
Up until now, you are probably wondering what the
(int argc, char *argv[]) are for. These are
the command-line arguments passed to the program by
the shell.
argc is the total number of command-line
arguments, and
argv is an array of strings of
each argument. Printing them out is easy:
5 10 |
#include <stdlib.h>#include <stdio.h>#include <string.h> int main (int argc, char *argv[]){ int i; for (i = 0; i < argc; i++) { printf ("argument %d is %s\n", i, argv[i]); } return 0;} |
22.1.9 A more complicated example
Here we put this altogether in a program that reads in lots of files and
dumps them as words. Here are some new notations you will encounter:
!= is the inverse of
== and tests if
not-equal-to;
realloc reallocates
memory--it resizes an old block of memory so that any
bytes of the old block are preserved;
\n,
\t mean the newline character, 10, or the
tab character, 9, respectively (see
ascii(7)).
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 |
#include <stdlib.h>#include <stdio.h>#include <string.h> void word_dump (char *filename){ int length_of_word; int amount_allocated; char *q; FILE *f; int c; c = 0; f = fopen (filename, "r"); if (f == 0) { perror ("fopen failed"); exit (1); } length_of_word = 0; amount_allocated = 256; q = malloc (amount_allocated); if (q == 0) { perror ("malloc failed"); abort (); } while (c != -1) { if (length_of_word >= amount_allocated) { amount_allocated = amount_allocated * 2; q = realloc (q, amount_allocated); if (q == 0) { perror ("realloc failed"); abort (); } } c = fgetc (f); q[length_of_word] = c; if (c == -1 || c == ' ' || c == '\n' || c == '\t') { if (length_of_word > 0) { q[length_of_word] = 0; printf ("%s\n", q); } amount_allocated = 256; q = realloc (q, amount_allocated); if (q == 0) { perror ("realloc failed"); abort (); } length_of_word = 0; } else { length_of_word = length_of_word + 1; } } fclose (f);} int main (int argc, char *argv[]){ int i; if (argc < 2) { printf ("Usage:\n\twordsplit <filename> ...\n"); exit (1); } for (i = 1; i < argc; i++) { word_dump (argv[i]); } return 0;} |
This program is more complicated than you might immediately expect. Reading in a file where we are sure that a word will never exceed 30 characters is simple. But what if we have a file that contains some words that are 100,000 characters long? GNU programs are expected to behave correctly under these circumstances.
To cope with normal as well as extreme circumstances, we
start off assuming that a word will never be more than 256
characters. If it appears that the word is growing over 256
characters, we
reallocate the memory space to
double its size (lines 32 amd 33). When we start with a new word, we can free up
memory again, so we
realloc back to 256 again (lines 48 and 49). In this
way we are using the minimum amount of memory at each point in time.
We have hence created a program that can work efficiently with a 100-gigabyte file just as easily as with a 100-byte file. This is part of the art of C programming.
Experienced C programmers may actually scoff at the above listing because it really isn't as ``minimalistic'' as is absolutely possible. In fact, it is a truly excellent listing for the following reasons:
- The program is easy to understand.
- The program uses an efficient algorithm (albeit not optimal).
- The program contains no arbitrary limits that would cause unexpected behavior in extreme circumstances.
- The program uses no nonstandard C functions or notations that would prohibit it compiling successfully on other systems. It is therefore portable.
22.1.10
#include statements and prototypes
At the start of each program will be one or more
#include
statements. These tell the compiler to read in another C program.
Now, ``raw'' C does not have a whole lot in the way of protecting
against errors: for example, the
strcpy function could
just as well be used with one, three, or four arguments, and the
C program would still compile. It would, however, wreak havoc
with the internal memory and cause the program to crash. These
other
.h C programs are called header files.
They contain templates for how functions are meant to be called.
Every function you might like to use is contained in one or another
template file. The templates are called function
prototypes. [C++ has something called ``templates.'' This is
a special C++ term having nothing to do with the discussion here.]
A function prototype is written the same as the function itself,
but without the code. A function
prototype for
word_dump
would simply be:
|
void word_dump (char *filename); |
The trailing
; is essential and distinguishes a function
prototype from a function.
After a function prototype is defined, any attempt to use the function in a
way other than intended--say, passing it to few arguments or
arguments of the wrong type--will be met with fierce
opposition from
gcc.
You will notice that the
#include <string.h> appeared
when we started using
string operations. Recompiling
these programs without the
#include <string.h> line
gives the warning message
|
mytest.c:21: warning: implicit declaration of function `strncpy' |
which is quite to the point.
The function prototypes give a clear definition of how every function is to be used. Man pages will always first state the function prototype so that you are clear on what arguments are to be passed and what types they should have.
22.1.11 C comments
A C comment is denoted with
/* <comment lines> */ and can
span multiple lines.
Anything between the
/* and
*/ is ignored. Every function
should be commented, and all nonobvious code should be
commented. It is a good maxim that a program that needs
lots of comments to explain it is badly written. Also, never
comment the obvious, and explain why you do things rather
that what you are doing. It is advisable not to
make pretty graphics between each function, so rather:
|
/* returns -1 on error, takes a positive integer */int sqr (int x){ <...> |
than
5 |
/***************************----SQR----****************************** * x = argument to make the square of * * return value = * * -1 (on error) * * square of x (on success) * ********************************************************************/int sqr (int x){ <...> |
which is liable to cause nausea. In C++, the additional comment
// is allowed, whereby everything between the
// and the end of the line is ignored. It is accepted under
gcc,
but should not be used unless you really are programming in C++. In addition,
programmers often ``comment out'' lines by placing a
#if 0 ...
#endif around them, which really does exactly the same
thing as a comment (see Section 22.1.12) but allows
you to have comments within comments. For example
5 |
int x; x = 10;#if 0 printf ("debug: x is %d\n", x); /* print debug information */#endif y = x + 10; <...> |
comments out Line 4.
22.1.12
#define and
#if -- C macros
Anything starting with a
# is not actually C, but a
C preprocessor directive. A C program is first
run through a preprocessor that removes all spurious
junk, like comments,
#include statements, and anything
else beginning with a
#. You can make C programs
much more readable by defining macros instead
of literal values. For instance,
|
#define START_BUFFER_SIZE 256 |
in our example program,
#defines the text
START_BUFFER_SIZE to be the text
256.
Thereafter, wherever in the C program we have a
START_BUFFER_SIZE, the text
256 will be seen by
the compiler, and we can use
START_BUFFER_SIZE
instead. This is a much cleaner way of programming
because, if, say, we would like to change the
256 to some
other value, we only need to change it in one place.
START_BUFFER_SIZE is also more meaningful than a
number, making the program more readable.
Whenever you have a literal constant like
256,
you should replace it with a macro defined near the top of
your program.
You can also check for the existence of macros with the
#ifdef and
#ifndef directive.
# directives are
really a programming language all on their own:
5 10 15 20 25 |
/* Set START_BUFFER_SIZE to fine-tune performance before compiling: */#define START_BUFFER_SIZE 256/* #define START_BUFFER_SIZE 128 *//* #define START_BUFFER_SIZE 1024 *//* #define START_BUFFER_SIZE 16384 */ #ifndef START_BUFFER_SIZE#error This code did not define START_BUFFER_SIZE. Please edit#endif #if START_BUFFER_SIZE <= 0#error Wooow! START_BUFFER_SIZE must be greater than zero#endif #if START_BUFFER_SIZE < 16#warning START_BUFFER_SIZE to small, program may be inefficient#elif START_BUFFER_SIZE > 65536#warning START_BUFFER_SIZE to large, program may be inefficient#else/* START_BUFFER_SIZE is ok, do not report */#endif void word_dump (char *filename){ <...> amount_allocated = START_BUFFER_SIZE; q = malloc (amount_allocated); <...> |
22.2 Debugging with
gdb and
strace
Programming errors, or bugs, can be found by inspecting program execution. Some developers claim that the need for such inspection implies a sloppy development process. Nonetheless it is instructive to learn C by actually watching a program work.
22.2.1
gdb
The GNU debugger,
gdb,
is a replacement for the standard
UNIX debugger,
db. To debug a program means to step through its execution
line-by-line, in order to find programming errors as they happen. Use the
command
gcc -Wall -g -O0 -o wordsplit wordsplit.c to recompile your
program above.
The
-g option enables debugging support in the resulting executable
and the
-O0 option disables compiler optimization
(which sometimes causes confusing behavior).
For the following example, create a test file
readme.txt with some plain text
inside it. You can then run
gdb -q wordsplit. The
standard
gdb prompt will appear,
which indicates the start of a debugging session:
|
(gdb) |
At the prompt, many one letter commands are available to
control program execution. The first of these is
run
which executes the program as though it had been started
from a regular shell:
5 |
(gdb) Starting program: /homes/src/wordsplit/wordsplit Usage: wordsplit <filename> ... Program exited with code 01. |
Obviously, we will want to set some trial command-line arguments.
This is done with the special command,
set args:
|
(gdb) |
The
break command
is used like
b [[<file>:]<line>|<function>],
and sets a break point at a function or line number:
|
(gdb) Breakpoint 1 at 0x8048796: file wordsplit.c, line 67. |
A break point will interrupt execution of the program. In this case
the program will stop when it enters the
main function (i.e., right at the start).
Now we can
run the program again:
5 |
(gdb) Starting program: /home/src/wordsplit/wordsplit readme.txt readme2.txt Breakpoint 1, main (argc=3, argv=0xbffff804) at wordsplit.c:6767 if (argc < 2) {(gdb) |
As specified, the program stops at the beginning of the
main
function at line 67.
If you are interested in viewing the contents of
a variable, you can use the
print command:
|
(gdb) $1 = 3(gdb) $2 = 0xbffff988 "readme.txt" |
which tells us the value of
argc and
argv[1].
The
list command
displays the lines about the current line:
5 |
(gdb) 63 int main (int argc, char *argv[])64 {65 int i;66 67 if (argc < 2) {68 printf ("Usage:\n\twordsplit <filename> ...\n");69 exit (1);70 } |
The
list command can also take an optional file and line number
(or even a function name):
5 |
(gdb) 1 #include <stdlib.h>2 #include <stdio.h>3 #include <string.h>4 5 void word_dump (char *filename)6 {7 int length_of_word;8 int amount_allocated; |
Next, we can try setting a break point at an arbitrary line and then
using the
continue
command to proceed with program execution:
5 |
(gdb) Breakpoint 2 at 0x804873e: file wordsplit.c, line 48.(gdb) Continuing.Zaphod Breakpoint 2, word_dump (filename=0xbffff988 "readme.txt") at wordsplit.c:4848 amount_allocated = 256; |
Execution obediently stops at line 48. At this point it is useful to
run a
back
trace.
This prints out the current
stack which shows the functions that were called to get to the
current line. This output allows you to trace the history of
execution.
5 |
(gdb) #0 word_dump (filename=0xbffff988 "readme.txt") at wordsplit.c:48#1 0x80487e0 in main (argc=3, argv=0xbffff814) at wordsplit.c:73#2 0x4003db65 in __libc_start_main (main=0x8048790 <main>, argc=3, ubp_av=0xbffff814, init=0x8048420 <_init>, fini=0x804883c <_fini>, rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffff80c) at ../sysdeps/generic/libc-start.c:111 |
The
clear command then deletes the break point at the
current line:
|
(gdb) Deleted breakpoint 2 |
The most important commands for debugging are the
next
and
step commands. The
n command simply executes one line of
C code:
5 |
(gdb) 49 q = realloc (q, amount_allocated);(gdb) 50 if (q == 0) {(gdb) 54 length_of_word = 0; |
This activity is called stepping through your program. The
s
command is identical to
n except that it dives into functions instead
of running them as single line. To see the difference, step over line 73
first with
n, and then with
s, as follows:
5 10 15 20 25 |
(gdb) (gdb) Breakpoint 1 at 0x8048796: file wordsplit.c, line 67.(gdb) Starting program: /home/src/wordsplit/wordsplit readme.txt readme2.txt Breakpoint 1, main (argc=3, argv=0xbffff814) at wordsplit.c:6767 if (argc < 2) {(gdb) 72 for (i = 1; i < argc; i++) {(gdb) 73 word_dump (argv[i]);(gdb) Zaphodhastwoheads72 for (i = 1; i < argc; i++) {(gdb) 73 word_dump (argv[i]);(gdb) word_dump (filename=0xbffff993 "readme2.txt") at wordsplit.c:1313 c = 0;(gdb) 15 f = fopen (filename, "r");(gdb) |
An interesting feature of
gdb is its ability to
attach onto running programs. Try the following sequence of
commands:
5 10 |
[root@cericon]# [root@cericon]# 28157 ? S 0:00 lpd Waiting28160 pts/6 S 0:00 grep lpd[root@cericon]# (no debugging symbols found)...(gdb) Attaching to program: /usr/sbin/lpd, Pid 281570x40178bfe in __select () from /lib/libc.so.6(gdb) |
The
lpd daemon was not compiled with debugging support,
but the point is still made: you can halt and debug any running
process on the system. Try running a
bt for fun. Now release
the process with
|
(gdb) Detaching from program: /usr/sbin/lpd, Pid 28157 |
The debugger provides copious amounts of online help.
The
help command can
be run to explain further.
The
gdb
info pages also elaborate on an
enormous number of display features and tracing
features not covered here.
22.2.2 Examining
core files
If your program has a segmentation violation
(``segfault'') then a
core file will be
written to the current directory. This is known as a core dump.
A core dump is caused by a bug in the program--its response to a
SIGSEGV signal sent to the program because
it tried to access an area of memory outside of its allowed range.
These files can be examined using
gdb to (usually) reveal where the problem
occurred. Simply run
gdb <executable> ./core and then type
bt (or any
gdb command)
at the
gdb prompt. Typing
file ./core will reveal something like
|
/root/core: ELF 32-bit LSB core file of '<executable>' (signal 11), Intel 80386, version 1 |
22.2.3
strace
The
strace command prints every
system call performed
by a program. A system call is a function call made by a C
library function to the LINUX kernel. Try
|
strace lsstrace ./wordsplit |
If a program has not been compiled with debugging support, the
only way to inspect its execution may be with the
strace
command. In any case, the command can provide valuable information
about where a program is failing and is useful for diagnosing errors.
22.3 C Libraries
We made reference to the Standard C library. The C
language on its own does almost nothing; everything useful
is an external function. External functions are grouped
into libraries. The Standard C library is the file
/lib/libc.so.6. To list all the C library functions, run:
|
nm /lib/libc.so.6nm /lib/libc.so.6 | grep ' T ' | cut -f3 -d' ' | grep -v '^_' | sort -u | less |
many of these have
man pages, but some will have
no documentation and require you to read the comments inside
the header files (which are often most explanatory).
It is better not to use functions unless
you are sure that they are
standard functions in the
sense that they are common to other systems.
To create your own library is simple. Let's say we have two
files that contain several functions that we would like to
compile into a library. The files are
simple_math_sqrt.c
5 10 15 20 |
#include <stdlib.h>#include <stdio.h> static int abs_error (int a, int b){ if (a > b) return a - b; return b - a;} int simple_math_isqrt (int x){ int result; if (x < 0) { fprintf (stderr, "simple_math_sqrt: taking the sqrt of a negative number\n"); abort (); } result = 2; while (abs_error (result * result, x) > 1) { result = (x / result + result) / 2; } return result;} |
and
simple_math_pow.c
5 10 15 20 |
#include <stdlib.h>#include <stdio.h> int simple_math_ipow (int x, int y){ int result; if (x == 1 || y == 0) return 1; if (x == 0 && y < 0) { fprintf (stderr, "simple_math_pow: raising zero to a negative power\n"); abort (); } if (y < 0) return 0; result = 1; while (y > 0) { result = result * x; y = y - 1; } return result;} |
We would like to call the library
simple_math.
It is good practice to name all the functions in the library
simple_math_??????. The function
abs_error
is not going to be used outside of the file
simple_math_sqrt.c
and so we put the keyword
static in front of it, meaning
that it is a local function.
We can compile the code with:
|
gcc -Wall -c simple_math_sqrt.cgcc -Wall -c simple_math_pow.c |
The
-c option means compile only. The
code is not turned into an executable. The generated
files are
simple_math_sqrt.o and
simple_math_pow.o.
These are called
object files.
We now need to archive these files into
a library. We do this with the
ar command (a predecessor
of
tar):
|
ar libsimple_math.a simple_math_sqrt.o simple_math_pow.oranlib libsimple_math.a |
The
ranlib command indexes the archive.
The library can now be used. Create a file
mytest.c:
5 |
#include <stdlib.h>#include <stdio.h> int main (int argc, char *argv[]){ printf ("%d\n", simple_math_ipow (4, 3)); printf ("%d\n", simple_math_isqrt (50)); return 0;} |
and run
|
gcc -Wall -c mytest.cgcc -o mytest mytest.o -L. -lsimple_math |
The first command compiles the file
mytest.c into
mytest.o, and the second function is called
linking the program, which assimilates
mytest.o
and the libraries into a single executable. The option
L. means to look in the current directory
for any libraries (usually only
/lib
and
/usr/lib are
searched). The option
-lsimple_math means to assimilate the
library
libsimple_math.a (
lib and
.a are added
automatically). This operation is called static [Nothing
to do with the ``
static'' keyword.] linking
because it happens before the program is run and includes all object
files into the executable.
As an aside, note that it is often the case that many static libraries are linked into the same program. Here order is important: the library with the least dependencies should come last, or you will get so-called symbol referencing errors.
We can also create a header file
simple_math.h for using the
library.
5 |
/* calculates the integer square root, aborts on error */int simple_math_isqrt (int x); /* calculates the integer power, aborts on error */int simple_math_ipow (int x, int y); |
Add the line
#include "simple_math.h" to
the top of
mytest.c:
|
#include <stdlib.h>#include <stdio.h>#include "simple_math.h" |
This addition gets rid of
the
implicit declaration of function
warning messages.
Usually
#include <simple_math.h> would be used,
but here, this is a header file in the current directory--our
own header file--and this is where
we use
"simple_math.h" instead of
<simple_math.h>.
22.4 C Projects --
Makefiles
What if you make
a small change to one of the files
(as you are likely to do very often when developing)?
You could script the process of compiling and linking,
but the script would build everything, and not just the
changed file. What we really need is a utility that
only recompiles object files whose sources have changed:
make is such a utility.
make is a program that looks inside a
Makefile
in the current directory then does a lot of compiling and linking.
Makefiles contain lists of rules and dependencies
describing how to build a program.
Inside a
Makefile you need to state a list of
what-depends-on-what dependencies that
make
can work through, as well as the shell commands needed to
achieve each goal.
22.4.1 Completing our example
Makefile
Our first (last?) dependency in the process of completing
the compilation is that
mytest depends on
both the library,
libsimple_math.a, and
the object file,
mytest.o. In
make terms
we create a
Makefile line that looks like:
|
mytest: libsimple_math.a mytest.o |
meaning simply that the files
libsimple_math.a mytest.o
must exist and be updated before
mytest.
mytest: is called a
make target.
Beneath this line, we also need to state
how to build
mytest:
|
gcc -Wall -o $@ mytest.o -L. -lsimple_math |
The
$@ means the name of the target
itself, which is just substituted with
mytest. Note that the
space before the
gcc is a tab
character and not 8 space characters.
The next dependency is that
libsimple_math.a
depends on
simple_math_sqrt.o simple_math_pow.o.
Once again we have a dependency, along with a shell
script to build the target. The full
Makefile
rule is:
|
libsimple_math.a: simple_math_sqrt.o simple_math_pow.o rm -f $@ ar rc $@ simple_math_sqrt.o simple_math_pow.o ranlib $@ |
Note again that the left margin consists of a single tab character and not spaces.
The final dependency is that the files
simple_math_sqrt.o and
simple_math_pow.o
depend on the files
simple_math_sqrt.c and
simple_math_pow.c.
This requires two
make target rules, but
make has a short
way of stating such a rule in the case of many C source files,
|
.c.o: gcc -Wall -c -o $*.o $< |
which means that any
.o files needed
can be built from a
.c file of a similar name by means
of the command
gcc -Wall -c -o $*.o $<, where
$*.o
means the name of the object file and
$< means
the name of the file that
$*.o depends on, one at
a time.
22.4.2 Putting it all together
Makefiles can, in fact, have their rules
put in any order, so it's best to state the most obvious
rules first for readability.
There is also a rule you should always state at the outset:
|
all: libsimple_math.a mytest |
The
all: target is the rule that
make tries to satisfy when
make is
run with no command-line arguments. This just
means that
libsimple_math.a and
mytest
are the last two files to be built, that is, they are
the top-level dependencies.
Makefiles also have their own form of environment
variables, like shell scripts. You can see that we have
used the text
simple_math in three of our rules.
It makes sense to define a macro for this so that
we can easily change to a different library name.
5 10 15 20 |
# Comments start with a # (hash) character like shell scripts.# Makefile to build libsimple_math.a and mytest program.# Paul Sheer <psheer@cranzgot.co.za> Sun Mar 19 15:56:08 2000 OBJS = simple_math_sqrt.o simple_math_pow.oLIBNAME = simple_mathCFLAGS = -Wall all: lib$(LIBNAME).a mytest mytest: lib$(LIBNAME).a mytest.o gcc $(CFLAGS) -o $@ mytest.o -L. -l${LIBNAME} lib$(LIBNAME).a: $(OBJS) rm -f $@ ar rc $@ $(OBJS) ranlib $@ .c.o: gcc $(CFLAGS) -c -o $*.o $< clean: rm -f *.o *.a mytest |
We can now easily type
|
make |
in the current directory to cause everything to be built.
You can see we have added an additional disconnected
target
clean:. Targets can be run explictly on the command-line
like this:
|
make clean |
which removes all built files.
Makefiles have far more uses than just building C
programs. Anything that needs to be built from
sources can employ a
Makefile to make things
easier.
Next: 23. Shared Libraries Up: rute Previous: 21. System Services and   Contents
