Библиотека сайта rus-linux.net
Purchase | Copyright © 2002 Paul Sheer. Click here for copying permissions. | Home |
Next: 23. Shared Libraries Up: rute Previous: 21. System Services and   Contents
Subsections
- 22.1 C Fundamentals
- 22.1.1 The simplest C program
- 22.1.2 Variables and types
- 22.1.3 Functions
- 22.1.4
for
,while
,if
, andswitch
statements - 22.1.5 Strings, arrays, and memory allocation
- 22.1.6 String operations
- 22.1.7 File operations
- 22.1.8 Reading command-line arguments inside C programs
- 22.1.9 A more complicated example
- 22.1.10
#include
statements and prototypes - 22.1.11 C comments
- 22.1.12
#define
and#if
-- C macros
- 22.2 Debugging with
gdb
andstrace
- 22.3 C Libraries
- 22.4 C Projects --
Makefile
s
22. Trivial Introduction to C
C was invented for the purpose of writing an operating system that could be recompiled (ported) to different hardware platforms (different CPUs). Because the operating system is written in C, this language is the first choice for writing any kind of application that has to communicate efficiently with the operating system.
Many people who don't program very well in C think of C as an arbitrary language out of many. This point should be made at once: C is the fundamental basis of all computing in the world today. UNIX, Microsoft Windows, office suites, web browsers and device drivers are all written in C. Ninety-nine percent of your time spent at a computer is probably spent using an application written in C. About 70% of all ``open source'' software is written in C, and the remaining 30% written in languages whose compilers or interpreters are written in C. [C++ is also quite popular. It is, however, not as fundamental to computing, although it is more suitable in many situations.]
Further, there is no replacement for C. Since it fulfills its purpose almost flawlessly, there will never be a need to replace it. Other languages may fulfill other purposes, but C fulfills its purpose most adequately. For instance, all future operating systems will probably be written in C for a long time to come.
It is for these reasons that your knowledge of UNIX will never be complete until you can program in C. On the other hand, just because you can program in C does not mean that you should. Good C programming is a fine art which many veteran C programmers never manage to master, even after many years. It is essential to join a Free software project to properly master an effective style of C development.
22.1 C Fundamentals
We start with a simple C program
and then add fundamental elements to it.
Before going too far, you may wish to review
bash
functions in Section 7.7.
22.1.1 The simplest C program
A simple C program is:
5 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { printf ("Hello World!\n"); return 3; } |
Save this program in a file
hello.c
. We will now
compile the program. [Compiling is the process of turning C
code into assembler instructions. Assembler instructions are
the program code that your 80?86/SPARC/RS6000 CPU
understands directly. The resulting binary executable is fast because
it is executed natively by
your processor--it is the very chip that
you see on your motherboard that does fetch
Hello
byte for byte
from memory and executes each instruction. This is what is meant by
million instructions per second (MIPS). The
megahertz of the machine quoted by hardware vendors is
very roughly the number of MIPS. Interpreted languages (like
shell scripts) are much slower because the code itself is written in
something not understandable to the CPU. The
/bin/bash
program has to interpret the shell program.
/bin/bash
itself is written in C, but the overhead of interpretation makes
scripting languages many orders of magnitude slower than compiled
languages. Shell scripts do not need to be compiled.]Run the command
|
gcc -Wall -o hello hello.c |
The
-o hello
option tells
gcc
[GNU C Compiler.
cc
on other UNIX
systems.] to produce the binary file
hello
instead of the
default binary file named
a.out
. [Called
a.out
for historical reasons.]The
-Wall
option means to report
all
W
arnings during the compilation. This is not
strictly necessary but is most helpful
for correcting possible errors in your programs. More compiler options
are discussed on page .
Then, run the program with
|
./hello |
Previously you should have familiarized yourself with
bash
functions. In
C all code is inside a function.
The first function to be called (by the operating system) is the
main
function.
Type
echo $?
to see the return
code of the program. You will
see it is
3
, the return value of the
main
function.
Other things to note are the
"
on either side of the string
to be printed. Quotes are required around string literals. Inside
a string literal, the
\n
escape sequence
indicates a newline character.
ascii
(7) shows some
other escape sequences. You can also see a proliferation of
;
everywhere in a C program. Every statement in C is
terminated by a
;
unlike statements in shell scripts where
a
;
is optional.
Now try:
5 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { printf ("number %d, number %d\n", 1 + 2, 10); exit (3); } |
printf
can be thought of as the command to send output
to the terminal. It is also what is known as a standard C
library function. In other words, it is specified that a C
implementation should always have the
printf
function
and that it should behave in a certain way.
The
%d
specifies that a
d
ecimal should
go in at that point in the text. The number to be substituted
will be the first argument to the
printf
function
after the string literal--that is, the
1 + 2
. The
next
%d
is substituted with the second argument--that is,
the
10
. The
%d
is known as a format
specifier. It essentially converts an integer number into
a decimal representation. See
printf
(3) for more details.
22.1.2 Variables and types
With
bash
, you could use a variable anywhere, anytime, and the
variable would just be blank if it had never been assigned a value. In C,
however, you have to explicitly tell the compiler what variables you are
going to need before each block of code. You do this with a variable declaration:
5 10 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { int x; int y; x = 10; y = 2: printf ("number %d, number %d\n", 1 + y, x); exit (3); } |
The
int x
is a variable declaration. It tells the
program to reserve space for one
int
eger
variable
that it will later refer to as
x
.
int
is the
type of the variable.
x = 10
assigned a value of 10
to the variable. There are types for each kind of number
you would like to work with, and format specifiers to convert them
for printing:
5 10 15 20 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { char a; short b; int c; long d; float e; double f; long double g; a = 'A'; b = 10; c = 10000000; d = 10000000; e = 3.14159; f = 10e300; g = 10e300; printf ("%c, %hd, %d, %ld, %f, %f, %Lf\n", a, b, c, d, e, f, g); exit (3); } |
You will notice that
%f
is used for both
float
s and
double
s. The reason is that a
float
is always converted
to a
double
before an operation like this. Also try replacing
%f
with
%e
to print in
exponential notation--that is,
less significant digits.
22.1.3 Functions
Functions are implemented as follows:
5 10 |
#include <stdlib.h> #include <stdio.h> void mutiply_and_print (int x, int y) { printf ("%d * %d = %d\n", x, y, x * y); } int main (int argc, char *argv[]) { mutiply_and_print (30, 5); mutiply_and_print (12, 3); exit (3); } |
Here we have a non-main function called by the
main
function. The function is first declared
with
|
void mutiply_and_print (int x, int y) |
This declaration states the return value of the function
(
void
for no return value), the function name
(
mutiply_and_print
), and then the
arguments that
are going to be passed to the function. The numbers passed to
the function are given their own names,
x
and
y
,
and are converted to the type of
x
and
y
before
being passed to the function--in this case,
int
and
int
. The actual C code that comprises the function
goes between curly braces
{
and
}
.
In other words, the above function is equivalent to:
5 |
void mutiply_and_print () { int x; int y; x = <first-number-passed> y = <second-number-passed> printf ("%d * %d = %d\n", x, y, x * y); } |
22.1.4
for
,
while
,
if
, and
switch
statements
As with shell scripting, we have
the
for
,
while
, and
if
statements:
5 10 15 20 25 30 35 40 45 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { int x; x = 10; if (x == 10) { printf ("x is exactly 10\n"); x++; } else if (x == 20) { printf ("x is equal to 20\n"); } else { printf ("No, x is not equal to 10 or 20\n"); } if (x > 10) { printf ("Yes, x is more than 10\n"); } while (x > 0) { printf ("x is %d\n", x); x = x - 1; } for (x = 0; x < 10; x++) { printf ("x is %d\n", x); } switch (x) { case 9: printf ("x is nine\n"); break; case 10: printf ("x is ten\n"); break; case 11: printf ("x is eleven\n"); break; default: printf ("x is huh?\n"); break; } return 0; } |
It is easy to see the format that these statements take, although they
are vastly different from shell scripts. C code works in
statement blocks between
curly braces, in the same way that shell scripts have
do
's and
done
's.
Note that with most programming languages when we want to
add
1
to a variable we have to write, say,
x = x + 1
.
In C, the abbreviation
x++
is used, meaning to
increment a variable by
1
.
The
for
loop takes three statements between
(
...
)
: a statement to start things off, a comparison, and
a statement to be executed on each completion of the statement block.
The statement block after the
for
is repeatedly executed until
the comparison is untrue.
The
switch
statement is like
case
in shell scripts.
switch
considers the argument inside its
(
...
)
and decides which
case
line to jump to. In this
example it will obviously be
printf ("x is ten\n");
because
x
was 10 when the previous
for
loop exited.
The
break
tokens mean that we are through with the
switch
statement and that execution should continue from Line 46.
Note that in C the comparison
==
is used instead of
=
.
The symbol
=
means to assign a value to a variable, whereas
==
is an equality operator.
22.1.5 Strings, arrays, and memory allocation
You can define a list of numbers with:
|
int y[10]; |
This list is called an array:
5 10 15 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { int x; int y[10]; for (x = 0; x < 10; x++) { y[x] = x * 2; } for (x = 0; x < 10; x++) { printf ("item %d is %d\n", x, y[x]); } return 0; } |
If an array is of type
char
acter,
then it is called a string:
5 10 15 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { int x; char y[11]; for (x = 0; x < 10; x++) { y[x] = 65 + x * 2; } for (x = 0; x < 10; x++) { printf ("item %d is %d\n", x, y[x]); } y[10] = 0; printf ("string is %s\n", y); return 0; } |
Note that a string has to be null-terminated. This means
that the last character must be a zero. The code
y[10] = 0
sets the 11th item in the array to zero. This also means that
strings need to be one
char
longer than you would think.
(Note that the first item in the array is
y[0]
, not
y[1]
,
as with some other programming languages.)
In the preceding example, the line
char y[11]
reserved 11
bytes for the string. But what if you want a string of 100,000
bytes? C allows you to request memory
from the kernel.
This is called allocate memory. Any non-trivial
program will allocate memory for itself and there is no other
way of getting large blocks of memory for your program
to use. Try:
5 10 15 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { int x; char *y; y = malloc (11); printf ("%ld\n", y); for (x = 0; x < 10; x++) { y[x] = 65 + x * 2; } y[10] = 0; printf ("string is %s\n", y); free (y); return 0; } |
The declaration
char *y
means to
declare a variable (a number) called
y
that points
to a memory location. The
*
(asterisk) in this context
means pointer. For example, if you have a machine with perhaps 256
megabytes of RAM + swap, then
y
potentially has a range of
this much. The numerical value of
y
is also printed
with
printf ("%ld\n", y);
, but is of no
interest to the programmer.
When you have finished using memory you must give it back to the operating
system by using
free
. Programs that don't
free
all the memory they allocate are said to
leak memory.
Allocating memory often requires you to perform a calculation to
determine the amount of memory required. In the above case we
are allocating the space of 11
char
s. Since each
char
is really a single byte, this presents no problem. But
what if we were allocating 11
int
s? An
int
on a
PC is 32 bits--four bytes. To determine the size of a type,
we use the
sizeof
keyword:
5 10 15 20 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { int a; int b; int c; int d; int e; int f; int g; a = sizeof (char); b = sizeof (short); c = sizeof (int); d = sizeof (long); e = sizeof (float); f = sizeof (double); g = sizeof (long double); printf ("%d, %d, %d, %d, %d, %d, %d\n", a, b, c, d, e, f, g); return 0; } |
Here you can see the number of bytes required by all of these
types.
Now we can easily allocate arrays of things other than
char
.
5 10 15 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { int x; int *y; y = malloc (10 * sizeof (int)); printf ("%ld\n", y); for (x = 0; x < 10; x++) { y[x] = 65 + x * 2; } for (x = 0; x < 10; x++) { printf ("%d\n", y[x]); } free (y); return 0; } |
On many machines an
int
is four bytes (32 bits), but you
should never assume this. Always use the
sizeof
keyword to allocate memory.
22.1.6 String operations
C programs probably do more string manipulation than anything else. Here is a program that divides a sentence into words:
5 10 15 20 25 30 35 |
#include <stdlib.h> #include <stdio.h> #include <string.h> int main (int argc, char *argv[]) { int length_of_word; int i; int length_of_sentence; char p[256]; char *q; strcpy (p, "hello there, my name is fred."); length_of_sentence = strlen (p); length_of_word = 0; for (i = 0; i <= length_of_sentence; i++) { if (p[i] == ' ' || i == length_of_sentence) { q = malloc (length_of_word + 1); if (q == 0) { perror ("malloc failed"); abort (); } strncpy (q, p + i - length_of_word, length_of_word); q[length_of_word] = 0; printf ("word: %s\n", q); free (q); length_of_word = 0; } else { length_of_word = length_of_word + 1; } } return 0; } |
Here we introduce three more
standard C library functions.
strcpy
stands for
str
ing
c
o
py
. It copies
bytes from one place to another sequentially, until it reaches a zero
byte (i.e., the end of string). Line 13 of this program
copies text into the
char
acter array
p
, which
is called the target of the copy.
strlen
stands for
str
ing
len
gth.
It determines the length of a string, which is just a count of
the number of
char
acters up to the null character.
We need to loop over the length of the sentence. The variable
i
indicates the current position in the sentence.
Line 20 says that if
we find a character 32 (denoted by
' '
), we know we have
reached a word boundary. We also know that the end of the
sentence is a word boundary even though there may not be a
space there. The token
||
means OR. At
this point we can allocate memory for the
current word and copy the word into that memory. The
strncpy
function is useful for this. It copies
a string, but only up to a limit of
length_of_word
characters (the last argument). Like
strcpy
, the first
argument is the target, and the second argument is the
place to copy from.
To calculate the position of the start of the last word, we use
p + i - length_of_word
. This means that we are adding
i
to the memory location
p
and then going back
length_of_word
counts thereby pointing
strncpy
to the exact position.
Finally, we null-terminate the string on Line 27. We can then print
q
,
free
the used memory,
and begin with the next word.
For a complete list of string operations, see
string
(3).
22.1.7 File operations
Under most programming languages, file operations involve
three steps: opening a file, reading or
writing to the file, and then closing the
file. You use the command
fopen
to tell the operating
system that you are ready to begin working with a file:
The following program opens a file and spits it out on the terminal:
5 10 15 20 |
#include <stdlib.h> #include <stdio.h> #include <string.h> int main (int argc, char *argv[]) { int c; FILE *f; f = fopen ("mytest.c", "r"); if (f == 0) { perror ("fopen"); return 1; } for (;;) { c = fgetc (f); if (c == -1) break; printf ("%c", c); } fclose (f); return 0; } |
A new type is presented here:
FILE *
. It is a file
operations variable that must be initialized with
fopen
before it can be used. The
fopen
function takes two arguments: the first is the name of the file, and
the second is a string explaining how we want to open the
file--in this case
"r"
means
r
eading
from the start of the file. Other options are
"w"
for
w
riting and several more described in
fopen
(3).
If the return value of
fopen
is zero, it means that
fopen
has failed. The
perror
function then prints a
textual error message (for example,
No such file or directory
).
It is essential to check the return value of all
library calls in this way. These checks will
constitute about one third of your C program.
The command
fgetc
gets a character from the file.
It retrieves consecutive bytes from the file until it reaches the
end of the file, when it returns a
-1
. The
break
statement says to immediately terminate the
for
loop,
whereupon execution will continue from line 21.
break
statements can appear inside
while
loops as well.
You will notice that the
for
statement is empty. This is
allowable C code and means to loop forever.
Some other file functions are
fread
,
fwrite
,
fputc
,
fprintf
, and
fseek
. See
fwrite
(3),
fputc
(3),
fprintf
(3), and
fseek
(3).
22.1.8 Reading command-line arguments inside C programs
Up until now, you are probably wondering what the
(int argc, char *argv[])
are for. These are
the command-line arguments passed to the program by
the shell.
argc
is the total number of command-line
arguments, and
argv
is an array of strings of
each argument. Printing them out is easy:
5 10 |
#include <stdlib.h> #include <stdio.h> #include <string.h> int main (int argc, char *argv[]) { int i; for (i = 0; i < argc; i++) { printf ("argument %d is %s\n", i, argv[i]); } return 0; } |
22.1.9 A more complicated example
Here we put this altogether in a program that reads in lots of files and
dumps them as words. Here are some new notations you will encounter:
!=
is the inverse of
==
and tests if
not-equal-to;
realloc
reallocates
memory--it resizes an old block of memory so that any
bytes of the old block are preserved;
\n
,
\t
mean the newline character, 10, or the
tab character, 9, respectively (see
ascii
(7)).
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 |
#include <stdlib.h> #include <stdio.h> #include <string.h> void word_dump (char *filename) { int length_of_word; int amount_allocated; char *q; FILE *f; int c; c = 0; f = fopen (filename, "r"); if (f == 0) { perror ("fopen failed"); exit (1); } length_of_word = 0; amount_allocated = 256; q = malloc (amount_allocated); if (q == 0) { perror ("malloc failed"); abort (); } while (c != -1) { if (length_of_word >= amount_allocated) { amount_allocated = amount_allocated * 2; q = realloc (q, amount_allocated); if (q == 0) { perror ("realloc failed"); abort (); } } c = fgetc (f); q[length_of_word] = c; if (c == -1 || c == ' ' || c == '\n' || c == '\t') { if (length_of_word > 0) { q[length_of_word] = 0; printf ("%s\n", q); } amount_allocated = 256; q = realloc (q, amount_allocated); if (q == 0) { perror ("realloc failed"); abort (); } length_of_word = 0; } else { length_of_word = length_of_word + 1; } } fclose (f); } int main (int argc, char *argv[]) { int i; if (argc < 2) { printf ("Usage:\n\twordsplit <filename> ...\n"); exit (1); } for (i = 1; i < argc; i++) { word_dump (argv[i]); } return 0; } |
This program is more complicated than you might immediately expect. Reading in a file where we are sure that a word will never exceed 30 characters is simple. But what if we have a file that contains some words that are 100,000 characters long? GNU programs are expected to behave correctly under these circumstances.
To cope with normal as well as extreme circumstances, we
start off assuming that a word will never be more than 256
characters. If it appears that the word is growing over 256
characters, we
realloc
ate the memory space to
double its size (lines 32 amd 33). When we start with a new word, we can free up
memory again, so we
realloc
back to 256 again (lines 48 and 49). In this
way we are using the minimum amount of memory at each point in time.
We have hence created a program that can work efficiently with a 100-gigabyte file just as easily as with a 100-byte file. This is part of the art of C programming.
Experienced C programmers may actually scoff at the above listing because it really isn't as ``minimalistic'' as is absolutely possible. In fact, it is a truly excellent listing for the following reasons:
- The program is easy to understand.
- The program uses an efficient algorithm (albeit not optimal).
- The program contains no arbitrary limits that would cause unexpected behavior in extreme circumstances.
- The program uses no nonstandard C functions or notations that would prohibit it compiling successfully on other systems. It is therefore portable.
22.1.10
#include
statements and prototypes
At the start of each program will be one or more
#include
statements. These tell the compiler to read in another C program.
Now, ``raw'' C does not have a whole lot in the way of protecting
against errors: for example, the
strcpy
function could
just as well be used with one, three, or four arguments, and the
C program would still compile. It would, however, wreak havoc
with the internal memory and cause the program to crash. These
other
.h
C programs are called header files.
They contain templates for how functions are meant to be called.
Every function you might like to use is contained in one or another
template file. The templates are called function
prototypes. [C++ has something called ``templates.'' This is
a special C++ term having nothing to do with the discussion here.]
A function prototype is written the same as the function itself,
but without the code. A function
prototype for
word_dump
would simply be:
|
void word_dump (char *filename); |
The trailing
;
is essential and distinguishes a function
prototype from a function.
After a function prototype is defined, any attempt to use the function in a
way other than intended--say, passing it to few arguments or
arguments of the wrong type--will be met with fierce
opposition from
gcc
.
You will notice that the
#include <string.h>
appeared
when we started using
str
ing operations. Recompiling
these programs without the
#include <string.h>
line
gives the warning message
|
mytest.c:21: warning: implicit declaration of function `strncpy' |
which is quite to the point.
The function prototypes give a clear definition of how every function is to be used. Man pages will always first state the function prototype so that you are clear on what arguments are to be passed and what types they should have.
22.1.11 C comments
A C comment is denoted with
/* <comment lines> */
and can
span multiple lines.
Anything between the
/*
and
*/
is ignored. Every function
should be commented, and all nonobvious code should be
commented. It is a good maxim that a program that needs
lots of comments to explain it is badly written. Also, never
comment the obvious, and explain why you do things rather
that what you are doing. It is advisable not to
make pretty graphics between each function, so rather:
|
/* returns -1 on error, takes a positive integer */ int sqr (int x) { <...> |
than
5 |
/***************************----SQR----****************************** * x = argument to make the square of * * return value = * * -1 (on error) * * square of x (on success) * ********************************************************************/ int sqr (int x) { <...> |
which is liable to cause nausea. In C++, the additional comment
//
is allowed, whereby everything between the
//
and the end of the line is ignored. It is accepted under
gcc
,
but should not be used unless you really are programming in C++. In addition,
programmers often ``comment out'' lines by placing a
#if 0
...
#endif
around them, which really does exactly the same
thing as a comment (see Section 22.1.12) but allows
you to have comments within comments. For example
5 |
int x; x = 10; #if 0 printf ("debug: x is %d\n", x); /* print debug information */ #endif y = x + 10; <...> |
comments out Line 4.
22.1.12
#define
and
#if
-- C macros
Anything starting with a
#
is not actually C, but a
C preprocessor directive. A C program is first
run through a preprocessor that removes all spurious
junk, like comments,
#include
statements, and anything
else beginning with a
#
. You can make C programs
much more readable by defining macros instead
of literal values. For instance,
|
#define START_BUFFER_SIZE 256 |
in our example program,
#define
s the text
START_BUFFER_SIZE
to be the text
256
.
Thereafter, wherever in the C program we have a
START_BUFFER_SIZE
, the text
256
will be seen by
the compiler, and we can use
START_BUFFER_SIZE
instead. This is a much cleaner way of programming
because, if, say, we would like to change the
256
to some
other value, we only need to change it in one place.
START_BUFFER_SIZE
is also more meaningful than a
number, making the program more readable.
Whenever you have a literal constant like
256
,
you should replace it with a macro defined near the top of
your program.
You can also check for the existence of macros with the
#ifdef
and
#ifndef
directive.
#
directives are
really a programming language all on their own:
5 10 15 20 25 |
/* Set START_BUFFER_SIZE to fine-tune performance before compiling: */ #define START_BUFFER_SIZE 256 /* #define START_BUFFER_SIZE 128 */ /* #define START_BUFFER_SIZE 1024 */ /* #define START_BUFFER_SIZE 16384 */ #ifndef START_BUFFER_SIZE #error This code did not define START_BUFFER_SIZE. Please edit #endif #if START_BUFFER_SIZE <= 0 #error Wooow! START_BUFFER_SIZE must be greater than zero #endif #if START_BUFFER_SIZE < 16 #warning START_BUFFER_SIZE to small, program may be inefficient #elif START_BUFFER_SIZE > 65536 #warning START_BUFFER_SIZE to large, program may be inefficient #else /* START_BUFFER_SIZE is ok, do not report */ #endif void word_dump (char *filename) { <...> amount_allocated = START_BUFFER_SIZE; q = malloc (amount_allocated); <...> |
22.2 Debugging with
gdb
and
strace
Programming errors, or bugs, can be found by inspecting program execution. Some developers claim that the need for such inspection implies a sloppy development process. Nonetheless it is instructive to learn C by actually watching a program work.
22.2.1
gdb
The GNU debugger,
gdb
,
is a replacement for the standard
UNIX debugger,
db
. To debug a program means to step through its execution
line-by-line, in order to find programming errors as they happen. Use the
command
gcc -Wall -g -O0 -o wordsplit wordsplit.c
to recompile your
program above.
The
-g
option enables debugging support in the resulting executable
and the
-O0
option disables compiler optimization
(which sometimes causes confusing behavior).
For the following example, create a test file
readme.txt
with some plain text
inside it. You can then run
gdb -q wordsplit
. The
standard
gdb
prompt will appear,
which indicates the start of a debugging session:
|
(gdb) |
At the prompt, many one letter commands are available to
control program execution. The first of these is
r
un
which executes the program as though it had been started
from a regular shell:
5 |
(gdb) Starting program: /homes/src/wordsplit/wordsplit Usage: wordsplit <filename> ... Program exited with code 01. |
Obviously, we will want to set some trial command-line arguments.
This is done with the special command,
set args
:
|
(gdb) |
The
b
reak command
is used like
b [[<file>:]<line>|<function>]
,
and sets a break point at a function or line number:
|
(gdb) Breakpoint 1 at 0x8048796: file wordsplit.c, line 67. |
A break point will interrupt execution of the program. In this case
the program will stop when it enters the
main
function (i.e., right at the start).
Now we can
r
un the program again:
5 |
(gdb) Starting program: /home/src/wordsplit/wordsplit readme.txt readme2.txt Breakpoint 1, main (argc=3, argv=0xbffff804) at wordsplit.c:67 67 if (argc < 2) { (gdb) |
As specified, the program stops at the beginning of the
main
function at line 67.
If you are interested in viewing the contents of
a variable, you can use the
p
rint command:
|
(gdb) $1 = 3 (gdb) $2 = 0xbffff988 "readme.txt" |
which tells us the value of
argc
and
argv[1]
.
The
l
ist command
displays the lines about the current line:
5 |
(gdb) 63 int main (int argc, char *argv[]) 64 { 65 int i; 66 67 if (argc < 2) { 68 printf ("Usage:\n\twordsplit <filename> ...\n"); 69 exit (1); 70 } |
The
l
ist command can also take an optional file and line number
(or even a function name):
5 |
(gdb) 1 #include <stdlib.h> 2 #include <stdio.h> 3 #include <string.h> 4 5 void word_dump (char *filename) 6 { 7 int length_of_word; 8 int amount_allocated; |
Next, we can try setting a break point at an arbitrary line and then
using the
c
ontinue
command to proceed with program execution:
5 |
(gdb) Breakpoint 2 at 0x804873e: file wordsplit.c, line 48. (gdb) Continuing. Zaphod Breakpoint 2, word_dump (filename=0xbffff988 "readme.txt") at wordsplit.c:48 48 amount_allocated = 256; |
Execution obediently stops at line 48. At this point it is useful to
run a
b
ack
t
race.
This prints out the current
stack which shows the functions that were called to get to the
current line. This output allows you to trace the history of
execution.
5 |
(gdb) #0 word_dump (filename=0xbffff988 "readme.txt") at wordsplit.c:48 #1 0x80487e0 in main (argc=3, argv=0xbffff814) at wordsplit.c:73 #2 0x4003db65 in __libc_start_main (main=0x8048790 <main>, argc=3, ubp_av=0xbf fff814, init=0x8048420 <_init>, fini=0x804883c <_fini>, rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffff8 0c) at ../sysdeps/generic/libc-start.c:111 |
The
clear
command then deletes the break point at the
current line:
|
(gdb) Deleted breakpoint 2 |
The most important commands for debugging are the
n
ext
and
s
tep commands. The
n
command simply executes one line of
C code:
5 |
(gdb) 49 q = realloc (q, amount_allocated); (gdb) 50 if (q == 0) { (gdb) 54 length_of_word = 0; |
This activity is called stepping through your program. The
s
command is identical to
n
except that it dives into functions instead
of running them as single line. To see the difference, step over line 73
first with
n
, and then with
s
, as follows:
5 10 15 20 25 |
(gdb) (gdb) Breakpoint 1 at 0x8048796: file wordsplit.c, line 67. (gdb) Starting program: /home/src/wordsplit/wordsplit readme.txt readme2.txt Breakpoint 1, main (argc=3, argv=0xbffff814) at wordsplit.c:67 67 if (argc < 2) { (gdb) 72 for (i = 1; i < argc; i++) { (gdb) 73 word_dump (argv[i]); (gdb) Zaphod has two heads 72 for (i = 1; i < argc; i++) { (gdb) 73 word_dump (argv[i]); (gdb) word_dump (filename=0xbffff993 "readme2.txt") at wordsplit.c:13 13 c = 0; (gdb) 15 f = fopen (filename, "r"); (gdb) |
An interesting feature of
gdb
is its ability to
attach onto running programs. Try the following sequence of
commands:
5 10 |
[root@cericon]# [root@cericon]# 28157 ? S 0:00 lpd Waiting 28160 pts/6 S 0:00 grep lpd [root@cericon]# (no debugging symbols found)... (gdb) Attaching to program: /usr/sbin/lpd, Pid 28157 0x40178bfe in __select () from /lib/libc.so.6 (gdb) |
The
lpd
daemon was not compiled with debugging support,
but the point is still made: you can halt and debug any running
process on the system. Try running a
bt
for fun. Now release
the process with
|
(gdb) Detaching from program: /usr/sbin/lpd, Pid 28157 |
The debugger provides copious amounts of online help.
The
help
command can
be run to explain further.
The
gdb
info
pages also elaborate on an
enormous number of display features and tracing
features not covered here.
22.2.2 Examining
core
files
If your program has a segmentation violation
(``segfault'') then a
core
file will be
written to the current directory. This is known as a core dump.
A core dump is caused by a bug in the program--its response to a
SIGSEGV
signal sent to the program because
it tried to access an area of memory outside of its allowed range.
These files can be examined using
gdb
to (usually) reveal where the problem
occurred. Simply run
gdb <executable> ./core
and then type
bt
(or any
gdb
command)
at the
gdb
prompt. Typing
file ./core
will reveal something like
|
/root/core: ELF 32-bit LSB core file of '<executable>' (signal 11), Intel 80386, version 1 |
22.2.3
strace
The
strace
command prints every
system call performed
by a program. A system call is a function call made by a C
library function to the LINUX kernel. Try
|
strace ls strace ./wordsplit |
If a program has not been compiled with debugging support, the
only way to inspect its execution may be with the
strace
command. In any case, the command can provide valuable information
about where a program is failing and is useful for diagnosing errors.
22.3 C Libraries
We made reference to the Standard C library. The C
language on its own does almost nothing; everything useful
is an external function. External functions are grouped
into libraries. The Standard C library is the file
/lib/libc.so.6
. To list all the C library functions, run:
|
nm /lib/libc.so.6 nm /lib/libc.so.6 | grep ' T ' | cut -f3 -d' ' | grep -v '^_' | sort -u | less |
many of these have
man
pages, but some will have
no documentation and require you to read the comments inside
the header files (which are often most explanatory).
It is better not to use functions unless
you are sure that they are
standard functions in the
sense that they are common to other systems.
To create your own library is simple. Let's say we have two
files that contain several functions that we would like to
compile into a library. The files are
simple_math_sqrt.c
5 10 15 20 |
#include <stdlib.h> #include <stdio.h> static int abs_error (int a, int b) { if (a > b) return a - b; return b - a; } int simple_math_isqrt (int x) { int result; if (x < 0) { fprintf (stderr, "simple_math_sqrt: taking the sqrt of a negative number\n"); abort (); } result = 2; while (abs_error (result * result, x) > 1) { result = (x / result + result) / 2; } return result; } |
and
simple_math_pow.c
5 10 15 20 |
#include <stdlib.h> #include <stdio.h> int simple_math_ipow (int x, int y) { int result; if (x == 1 || y == 0) return 1; if (x == 0 && y < 0) { fprintf (stderr, "simple_math_pow: raising zero to a negative power\n"); abort (); } if (y < 0) return 0; result = 1; while (y > 0) { result = result * x; y = y - 1; } return result; } |
We would like to call the library
simple_math
.
It is good practice to name all the functions in the library
simple_math_
??????. The function
abs_error
is not going to be used outside of the file
simple_math_sqrt.c
and so we put the keyword
static
in front of it, meaning
that it is a local function.
We can compile the code with:
|
gcc -Wall -c simple_math_sqrt.c gcc -Wall -c simple_math_pow.c |
The
-c
option means compile only. The
code is not turned into an executable. The generated
files are
simple_math_sqrt.o
and
simple_math_pow.o
.
These are called
o
bject files.
We now need to archive these files into
a library. We do this with the
ar
command (a predecessor
of
tar
):
|
ar libsimple_math.a simple_math_sqrt.o simple_math_pow.o ranlib libsimple_math.a |
The
ranlib
command indexes the archive.
The library can now be used. Create a file
mytest.c
:
5 |
#include <stdlib.h> #include <stdio.h> int main (int argc, char *argv[]) { printf ("%d\n", simple_math_ipow (4, 3)); printf ("%d\n", simple_math_isqrt (50)); return 0; } |
and run
|
gcc -Wall -c mytest.c gcc -o mytest mytest.o -L. -lsimple_math |
The first command compiles the file
mytest.c
into
mytest.o
, and the second function is called
linking the program, which assimilates
mytest.o
and the libraries into a single executable. The option
L.
means to look in the current directory
for any libraries (usually only
/lib
and
/usr/lib
are
searched). The option
-lsimple_math
means to assimilate the
library
libsimple_math.a
(
lib
and
.a
are added
automatically). This operation is called static [Nothing
to do with the ``
static
'' keyword.] linking
because it happens before the program is run and includes all object
files into the executable.
As an aside, note that it is often the case that many static libraries are linked into the same program. Here order is important: the library with the least dependencies should come last, or you will get so-called symbol referencing errors.
We can also create a header file
simple_math.h
for using the
library.
5 |
/* calculates the integer square root, aborts on error */ int simple_math_isqrt (int x); /* calculates the integer power, aborts on error */ int simple_math_ipow (int x, int y); |
Add the line
#include "simple_math.h"
to
the top of
mytest.c
:
|
#include <stdlib.h> #include <stdio.h> #include "simple_math.h" |
This addition gets rid of
the
implicit declaration of function
warning messages.
Usually
#include <simple_math.h>
would be used,
but here, this is a header file in the current directory--our
own header file--and this is where
we use
"simple_math.h"
instead of
<simple_math.h>
.
22.4 C Projects --
Makefile
s
What if you make
a small change to one of the files
(as you are likely to do very often when developing)?
You could script the process of compiling and linking,
but the script would build everything, and not just the
changed file. What we really need is a utility that
only recompiles object files whose sources have changed:
make
is such a utility.
make
is a program that looks inside a
Makefile
in the current directory then does a lot of compiling and linking.
Makefile
s contain lists of rules and dependencies
describing how to build a program.
Inside a
Makefile
you need to state a list of
what-depends-on-what dependencies that
make
can work through, as well as the shell commands needed to
achieve each goal.
22.4.1 Completing our example
Makefile
Our first (last?) dependency in the process of completing
the compilation is that
mytest
depends on
both the library,
libsimple_math.a
, and
the object file,
mytest.o
. In
make
terms
we create a
Makefile
line that looks like:
|
mytest: libsimple_math.a mytest.o |
meaning simply that the files
libsimple_math.a mytest.o
must exist and be updated before
mytest
.
mytest:
is called a
make
target.
Beneath this line, we also need to state
how to build
mytest
:
|
gcc -Wall -o $@ mytest.o -L. -lsimple_math |
The
$@
means the name of the target
itself, which is just substituted with
mytest
. Note that the
space before the
gcc
is a tab
character and not 8 space characters.
The next dependency is that
libsimple_math.a
depends on
simple_math_sqrt.o simple_math_pow.o
.
Once again we have a dependency, along with a shell
script to build the target. The full
Makefile
rule is:
|
libsimple_math.a: simple_math_sqrt.o simple_math_pow.o rm -f $@ ar rc $@ simple_math_sqrt.o simple_math_pow.o ranlib $@ |
Note again that the left margin consists of a single tab character and not spaces.
The final dependency is that the files
simple_math_sqrt.o
and
simple_math_pow.o
depend on the files
simple_math_sqrt.c
and
simple_math_pow.c
.
This requires two
make
target rules, but
make
has a short
way of stating such a rule in the case of many C source files,
|
.c.o: gcc -Wall -c -o $*.o $< |
which means that any
.o
files needed
can be built from a
.c
file of a similar name by means
of the command
gcc -Wall -c -o $*.o $<
, where
$*.o
means the name of the object file and
$<
means
the name of the file that
$*.o
depends on, one at
a time.
22.4.2 Putting it all together
Makefile
s can, in fact, have their rules
put in any order, so it's best to state the most obvious
rules first for readability.
There is also a rule you should always state at the outset:
|
all: libsimple_math.a mytest |
The
all:
target is the rule that
make
tries to satisfy when
make
is
run with no command-line arguments. This just
means that
libsimple_math.a
and
mytest
are the last two files to be built, that is, they are
the top-level dependencies.
Makefile
s also have their own form of environment
variables, like shell scripts. You can see that we have
used the text
simple_math
in three of our rules.
It makes sense to define a macro for this so that
we can easily change to a different library name.
5 10 15 20 |
# Comments start with a # (hash) character like shell scripts. # Makefile to build libsimple_math.a and mytest program. # Paul Sheer <psheer@cranzgot.co.za> Sun Mar 19 15:56:08 2000 OBJS = simple_math_sqrt.o simple_math_pow.o LIBNAME = simple_math CFLAGS = -Wall all: lib$(LIBNAME).a mytest mytest: lib$(LIBNAME).a mytest.o gcc $(CFLAGS) -o $@ mytest.o -L. -l${LIBNAME} lib$(LIBNAME).a: $(OBJS) rm -f $@ ar rc $@ $(OBJS) ranlib $@ .c.o: gcc $(CFLAGS) -c -o $*.o $< clean: rm -f *.o *.a mytest |
We can now easily type
|
make |
in the current directory to cause everything to be built.
You can see we have added an additional disconnected
target
clean:
. Targets can be run explictly on the command-line
like this:
|
make clean |
which removes all built files.
Makefile
s have far more uses than just building C
programs. Anything that needs to be built from
sources can employ a
Makefile
to make things
easier.
Next: 23. Shared Libraries Up: rute Previous: 21. System Services and   Contents