Библиотека сайта rus-linux.net
|Purchase||Copyright © 2002 Paul Sheer. Click here for copying permissions.||Home|
Next: 23. Shared Libraries Up: rute Previous: 21. System Services and   Contents
- 22.1 C Fundamentals
- 22.1.1 The simplest C program
- 22.1.2 Variables and types
- 22.1.3 Functions
- 22.1.5 Strings, arrays, and memory allocation
- 22.1.6 String operations
- 22.1.7 File operations
- 22.1.8 Reading command-line arguments inside C programs
- 22.1.9 A more complicated example
#includestatements and prototypes
- 22.1.11 C comments
#if-- C macros
- 22.2 Debugging with
- 22.3 C Libraries
- 22.4 C Projects --
C was invented for the purpose of writing an operating system that could be recompiled (ported) to different hardware platforms (different CPUs). Because the operating system is written in C, this language is the first choice for writing any kind of application that has to communicate efficiently with the operating system.
Many people who don't program very well in C think of C as an arbitrary language out of many. This point should be made at once: C is the fundamental basis of all computing in the world today. UNIX, Microsoft Windows, office suites, web browsers and device drivers are all written in C. Ninety-nine percent of your time spent at a computer is probably spent using an application written in C. About 70% of all ``open source'' software is written in C, and the remaining 30% written in languages whose compilers or interpreters are written in C. [C++ is also quite popular. It is, however, not as fundamental to computing, although it is more suitable in many situations.]
Further, there is no replacement for C. Since it fulfills its purpose almost flawlessly, there will never be a need to replace it. Other languages may fulfill other purposes, but C fulfills its purpose most adequately. For instance, all future operating systems will probably be written in C for a long time to come.
It is for these reasons that your knowledge of UNIX will never be complete until you can program in C. On the other hand, just because you can program in C does not mean that you should. Good C programming is a fine art which many veteran C programmers never manage to master, even after many years. It is essential to join a Free software project to properly master an effective style of C development.
We start with a simple C program
and then add fundamental elements to it.
Before going too far, you may wish to review
functions in Section 7.7.
A simple C program is:
Save this program in a file
hello.c. We will now
compile the program. [Compiling is the process of turning C
code into assembler instructions. Assembler instructions are
the program code that your 80?86/SPARC/RS6000 CPU
understands directly. The resulting binary executable is fast because
it is executed natively by
your processor--it is the very chip that
you see on your motherboard that does fetch
Hello byte for byte
from memory and executes each instruction. This is what is meant by
million instructions per second (MIPS). The
megahertz of the machine quoted by hardware vendors is
very roughly the number of MIPS. Interpreted languages (like
shell scripts) are much slower because the code itself is written in
something not understandable to the CPU. The
program has to interpret the shell program.
itself is written in C, but the overhead of interpretation makes
scripting languages many orders of magnitude slower than compiled
languages. Shell scripts do not need to be compiled.]Run the command
-o hello option tells
gcc [GNU C Compiler.
cc on other UNIX
systems.] to produce the binary file
hello instead of the
default binary file named
a.out for historical reasons.]The
-Wall option means to report
Warnings during the compilation. This is not
strictly necessary but is most helpful
for correcting possible errors in your programs. More compiler options
are discussed on page .
Then, run the program with
Previously you should have familiarized yourself with
bash functions. In
C all code is inside a function.
The first function to be called (by the operating system) is the
echo $? to see the return
code of the program. You will
see it is
3, the return value of the
Other things to note are the
" on either side of the string
to be printed. Quotes are required around string literals. Inside
a string literal, the
\n escape sequence
indicates a newline character.
ascii(7) shows some
other escape sequences. You can also see a proliferation of
; everywhere in a C program. Every statement in C is
terminated by a
; unlike statements in shell scripts where
; is optional.
printf can be thought of as the command to send output
to the terminal. It is also what is known as a standard C
library function. In other words, it is specified that a C
implementation should always have the
and that it should behave in a certain way.
%d specifies that a
go in at that point in the text. The number to be substituted
will be the first argument to the
after the string literal--that is, the
1 + 2. The
%d is substituted with the second argument--that is,
%d is known as a format
specifier. It essentially converts an integer number into
a decimal representation. See
printf(3) for more details.
bash, you could use a variable anywhere, anytime, and the
variable would just be blank if it had never been assigned a value. In C,
however, you have to explicitly tell the compiler what variables you are
going to need before each block of code. You do this with a variable declaration:
int x is a variable declaration. It tells the
program to reserve space for one
that it will later refer to as
int is the
type of the variable.
x = 10 assigned a value of 10
to the variable. There are types for each kind of number
you would like to work with, and format specifiers to convert them
You will notice that
%f is used for both
doubles. The reason is that a
float is always converted
double before an operation like this. Also try replacing
%e to print in
exponential notation--that is,
less significant digits.
Functions are implemented as follows:
Here we have a non-main function called by the
main function. The function is first declared
This declaration states the return value of the function
void for no return value), the function name
mutiply_and_print), and then the
are going to be passed to the function. The numbers passed to
the function are given their own names,
and are converted to the type of
being passed to the function--in this case,
int. The actual C code that comprises the function
goes between curly braces
In other words, the above function is equivalent to:
As with shell scripting, we have
It is easy to see the format that these statements take, although they
are vastly different from shell scripts. C code works in
statement blocks between
curly braces, in the same way that shell scripts have
Note that with most programming languages when we want to
1 to a variable we have to write, say,
x = x + 1.
In C, the abbreviation
x++ is used, meaning to
increment a variable by
for loop takes three statements between
): a statement to start things off, a comparison, and
a statement to be executed on each completion of the statement block.
The statement block after the
for is repeatedly executed until
the comparison is untrue.
switch statement is like
case in shell scripts.
switch considers the argument inside its
) and decides which
case line to jump to. In this
example it will obviously be
printf ("x is ten\n"); because
x was 10 when the previous
for loop exited.
break tokens mean that we are through with the
statement and that execution should continue from Line 46.
Note that in C the comparison
== is used instead of
= means to assign a value to a variable, whereas
is an equality operator.
You can define a list of numbers with:
This list is called an array:
If an array is of type
then it is called a string:
Note that a string has to be null-terminated. This means
that the last character must be a zero. The code
y = 0
sets the 11th item in the array to zero. This also means that
strings need to be one
char longer than you would think.
(Note that the first item in the array is
as with some other programming languages.)
In the preceding example, the line
char y reserved 11
bytes for the string. But what if you want a string of 100,000
bytes? C allows you to request memory
from the kernel.
This is called allocate memory. Any non-trivial
program will allocate memory for itself and there is no other
way of getting large blocks of memory for your program
to use. Try:
char *y means to
declare a variable (a number) called
y that points
to a memory location. The
(asterisk) in this context
means pointer. For example, if you have a machine with perhaps 256
megabytes of RAM + swap, then
y potentially has a range of
this much. The numerical value of
y is also printed
printf ("%ld\n", y);, but is of no
interest to the programmer.
When you have finished using memory you must give it back to the operating
system by using
free. Programs that don't
all the memory they allocate are said to
Allocating memory often requires you to perform a calculation to
determine the amount of memory required. In the above case we
are allocating the space of 11
chars. Since each
char is really a single byte, this presents no problem. But
what if we were allocating 11
int on a
PC is 32 bits--four bytes. To determine the size of a type,
we use the
Here you can see the number of bytes required by all of these
Now we can easily allocate arrays of things other than
On many machines an
int is four bytes (32 bits), but you
should never assume this. Always use the
keyword to allocate memory.
C programs probably do more string manipulation than anything else. Here is a program that divides a sentence into words:
Here we introduce three more
standard C library functions.
strcpy stands for
py. It copies
bytes from one place to another sequentially, until it reaches a zero
byte (i.e., the end of string). Line 13 of this program
copies text into the
is called the target of the copy.
strlen stands for
It determines the length of a string, which is just a count of
the number of
characters up to the null character.
We need to loop over the length of the sentence. The variable
i indicates the current position in the sentence.
Line 20 says that if
we find a character 32 (denoted by
' '), we know we have
reached a word boundary. We also know that the end of the
sentence is a word boundary even though there may not be a
space there. The token
means OR. At
this point we can allocate memory for the
current word and copy the word into that memory. The
strncpy function is useful for this. It copies
a string, but only up to a limit of
characters (the last argument). Like
strcpy, the first
argument is the target, and the second argument is the
place to copy from.
To calculate the position of the start of the last word, we use
p + i - length_of_word. This means that we are adding
i to the memory location
p and then going back
length_of_word counts thereby pointing
strncpy to the exact position.
Finally, we null-terminate the string on Line 27. We can then print
free the used memory,
and begin with the next word.
For a complete list of string operations, see
Under most programming languages, file operations involve
three steps: opening a file, reading or
writing to the file, and then closing the
file. You use the command
fopen to tell the operating
system that you are ready to begin working with a file:
The following program opens a file and spits it out on the terminal:
A new type is presented here:
FILE *. It is a file
operations variable that must be initialized with
fopen before it can be used. The
function takes two arguments: the first is the name of the file, and
the second is a string explaining how we want to open the
file--in this case
from the start of the file. Other options are
writing and several more described in
If the return value of
fopen is zero, it means that
fopen has failed. The
perror function then prints a
textual error message (for example,
No such file or directory).
It is essential to check the return value of all
library calls in this way. These checks will
constitute about one third of your C program.
fgetc gets a character from the file.
It retrieves consecutive bytes from the file until it reaches the
end of the file, when it returns a
statement says to immediately terminate the
whereupon execution will continue from line 21.
statements can appear inside
while loops as well.
You will notice that the
for statement is empty. This is
allowable C code and means to loop forever.
Some other file functions are
Up until now, you are probably wondering what the
(int argc, char *argv) are for. These are
the command-line arguments passed to the program by
argc is the total number of command-line
argv is an array of strings of
each argument. Printing them out is easy:
Here we put this altogether in a program that reads in lots of files and
dumps them as words. Here are some new notations you will encounter:
!= is the inverse of
== and tests if
memory--it resizes an old block of memory so that any
bytes of the old block are preserved;
\t mean the newline character, 10, or the
tab character, 9, respectively (see
This program is more complicated than you might immediately expect. Reading in a file where we are sure that a word will never exceed 30 characters is simple. But what if we have a file that contains some words that are 100,000 characters long? GNU programs are expected to behave correctly under these circumstances.
To cope with normal as well as extreme circumstances, we
start off assuming that a word will never be more than 256
characters. If it appears that the word is growing over 256
reallocate the memory space to
double its size (lines 32 amd 33). When we start with a new word, we can free up
memory again, so we
realloc back to 256 again (lines 48 and 49). In this
way we are using the minimum amount of memory at each point in time.
We have hence created a program that can work efficiently with a 100-gigabyte file just as easily as with a 100-byte file. This is part of the art of C programming.
Experienced C programmers may actually scoff at the above listing because it really isn't as ``minimalistic'' as is absolutely possible. In fact, it is a truly excellent listing for the following reasons:
- The program is easy to understand.
- The program uses an efficient algorithm (albeit not optimal).
- The program contains no arbitrary limits that would cause unexpected behavior in extreme circumstances.
- The program uses no nonstandard C functions or notations that would prohibit it compiling successfully on other systems. It is therefore portable.
At the start of each program will be one or more
statements. These tell the compiler to read in another C program.
Now, ``raw'' C does not have a whole lot in the way of protecting
against errors: for example, the
strcpy function could
just as well be used with one, three, or four arguments, and the
C program would still compile. It would, however, wreak havoc
with the internal memory and cause the program to crash. These
.h C programs are called header files.
They contain templates for how functions are meant to be called.
Every function you might like to use is contained in one or another
template file. The templates are called function
prototypes. [C++ has something called ``templates.'' This is
a special C++ term having nothing to do with the discussion here.]
A function prototype is written the same as the function itself,
but without the code. A function
would simply be:
; is essential and distinguishes a function
prototype from a function.
After a function prototype is defined, any attempt to use the function in a
way other than intended--say, passing it to few arguments or
arguments of the wrong type--will be met with fierce
You will notice that the
#include <string.h> appeared
when we started using
string operations. Recompiling
these programs without the
#include <string.h> line
gives the warning message
which is quite to the point.
The function prototypes give a clear definition of how every function is to be used. Man pages will always first state the function prototype so that you are clear on what arguments are to be passed and what types they should have.
A C comment is denoted with
/* <comment lines> */ and can
span multiple lines.
Anything between the
*/ is ignored. Every function
should be commented, and all nonobvious code should be
commented. It is a good maxim that a program that needs
lots of comments to explain it is badly written. Also, never
comment the obvious, and explain why you do things rather
that what you are doing. It is advisable not to
make pretty graphics between each function, so rather:
which is liable to cause nausea. In C++, the additional comment
// is allowed, whereby everything between the
// and the end of the line is ignored. It is accepted under
but should not be used unless you really are programming in C++. In addition,
programmers often ``comment out'' lines by placing a
#if 0 ...
#endif around them, which really does exactly the same
thing as a comment (see Section 22.1.12) but allows
you to have comments within comments. For example
comments out Line 4.
Anything starting with a
# is not actually C, but a
C preprocessor directive. A C program is first
run through a preprocessor that removes all spurious
junk, like comments,
#include statements, and anything
else beginning with a
#. You can make C programs
much more readable by defining macros instead
of literal values. For instance,
in our example program,
#defines the text
START_BUFFER_SIZE to be the text
Thereafter, wherever in the C program we have a
START_BUFFER_SIZE, the text
256 will be seen by
the compiler, and we can use
instead. This is a much cleaner way of programming
because, if, say, we would like to change the
256 to some
other value, we only need to change it in one place.
START_BUFFER_SIZE is also more meaningful than a
number, making the program more readable.
Whenever you have a literal constant like
you should replace it with a macro defined near the top of
You can also check for the existence of macros with the
# directives are
really a programming language all on their own:
Programming errors, or bugs, can be found by inspecting program execution. Some developers claim that the need for such inspection implies a sloppy development process. Nonetheless it is instructive to learn C by actually watching a program work.
The GNU debugger,
is a replacement for the standard
db. To debug a program means to step through its execution
line-by-line, in order to find programming errors as they happen. Use the
gcc -Wall -g -O0 -o wordsplit wordsplit.c to recompile your
-g option enables debugging support in the resulting executable
-O0 option disables compiler optimization
(which sometimes causes confusing behavior).
For the following example, create a test file
readme.txt with some plain text
inside it. You can then run
gdb -q wordsplit. The
gdb prompt will appear,
which indicates the start of a debugging session:
At the prompt, many one letter commands are available to
control program execution. The first of these is
which executes the program as though it had been started
from a regular shell:
Obviously, we will want to set some trial command-line arguments.
This is done with the special command,
is used like
and sets a break point at a function or line number:
A break point will interrupt execution of the program. In this case
the program will stop when it enters the
main function (i.e., right at the start).
Now we can
run the program again:
As specified, the program stops at the beginning of the
function at line 67.
If you are interested in viewing the contents of
a variable, you can use the
which tells us the value of
displays the lines about the current line:
list command can also take an optional file and line number
(or even a function name):
Next, we can try setting a break point at an arbitrary line and then
command to proceed with program execution:
Execution obediently stops at line 48. At this point it is useful to
This prints out the current
stack which shows the functions that were called to get to the
current line. This output allows you to trace the history of
clear command then deletes the break point at the
The most important commands for debugging are the
step commands. The
n command simply executes one line of
This activity is called stepping through your program. The
command is identical to
n except that it dives into functions instead
of running them as single line. To see the difference, step over line 73
n, and then with
s, as follows:
An interesting feature of
gdb is its ability to
attach onto running programs. Try the following sequence of
lpd daemon was not compiled with debugging support,
but the point is still made: you can halt and debug any running
process on the system. Try running a
bt for fun. Now release
the process with
The debugger provides copious amounts of online help.
help command can
be run to explain further.
info pages also elaborate on an
enormous number of display features and tracing
features not covered here.
If your program has a segmentation violation
(``segfault'') then a
core file will be
written to the current directory. This is known as a core dump.
A core dump is caused by a bug in the program--its response to a
SIGSEGV signal sent to the program because
it tried to access an area of memory outside of its allowed range.
These files can be examined using
gdb to (usually) reveal where the problem
occurred. Simply run
gdb <executable> ./core and then type
bt (or any
gdb prompt. Typing
file ./core will reveal something like
strace command prints every
system call performed
by a program. A system call is a function call made by a C
library function to the LINUX kernel. Try
If a program has not been compiled with debugging support, the
only way to inspect its execution may be with the
command. In any case, the command can provide valuable information
about where a program is failing and is useful for diagnosing errors.
We made reference to the Standard C library. The C
language on its own does almost nothing; everything useful
is an external function. External functions are grouped
into libraries. The Standard C library is the file
/lib/libc.so.6. To list all the C library functions, run:
many of these have
man pages, but some will have
no documentation and require you to read the comments inside
the header files (which are often most explanatory).
It is better not to use functions unless
you are sure that they are
standard functions in the
sense that they are common to other systems.
To create your own library is simple. Let's say we have two
files that contain several functions that we would like to
compile into a library. The files are
We would like to call the library
It is good practice to name all the functions in the library
simple_math_??????. The function
is not going to be used outside of the file
and so we put the keyword
static in front of it, meaning
that it is a local function.
We can compile the code with:
-c option means compile only. The
code is not turned into an executable. The generated
These are called
We now need to archive these files into
a library. We do this with the
ar command (a predecessor
ranlib command indexes the archive.
The library can now be used. Create a file
The first command compiles the file
mytest.o, and the second function is called
linking the program, which assimilates
and the libraries into a single executable. The option
L. means to look in the current directory
for any libraries (usually only
searched). The option
-lsimple_math means to assimilate the
.a are added
automatically). This operation is called static [Nothing
to do with the ``
static'' keyword.] linking
because it happens before the program is run and includes all object
files into the executable.
As an aside, note that it is often the case that many static libraries are linked into the same program. Here order is important: the library with the least dependencies should come last, or you will get so-called symbol referencing errors.
We can also create a header file
simple_math.h for using the
Add the line
#include "simple_math.h" to
the top of
This addition gets rid of
implicit declaration of function
#include <simple_math.h> would be used,
but here, this is a header file in the current directory--our
own header file--and this is where
"simple_math.h" instead of
What if you make
a small change to one of the files
(as you are likely to do very often when developing)?
You could script the process of compiling and linking,
but the script would build everything, and not just the
changed file. What we really need is a utility that
only recompiles object files whose sources have changed:
make is such a utility.
make is a program that looks inside a
in the current directory then does a lot of compiling and linking.
Makefiles contain lists of rules and dependencies
describing how to build a program.
Makefile you need to state a list of
what-depends-on-what dependencies that
can work through, as well as the shell commands needed to
achieve each goal.
Our first (last?) dependency in the process of completing
the compilation is that
mytest depends on
both the library,
the object file,
we create a
Makefile line that looks like:
meaning simply that the files
must exist and be updated before
mytest: is called a
Beneath this line, we also need to state
how to build
$@ means the name of the target
itself, which is just substituted with
mytest. Note that the
space before the
gcc is a tab
character and not 8 space characters.
The next dependency is that
Once again we have a dependency, along with a shell
script to build the target. The full
Note again that the left margin consists of a single tab character and not spaces.
The final dependency is that the files
depend on the files
This requires two
make target rules, but
make has a short
way of stating such a rule in the case of many C source files,
which means that any
.o files needed
can be built from a
.c file of a similar name by means
of the command
gcc -Wall -c -o $*.o $<, where
means the name of the object file and
the name of the file that
$*.o depends on, one at
Makefiles can, in fact, have their rules
put in any order, so it's best to state the most obvious
rules first for readability.
There is also a rule you should always state at the outset:
all: target is the rule that
make tries to satisfy when
run with no command-line arguments. This just
are the last two files to be built, that is, they are
the top-level dependencies.
Makefiles also have their own form of environment
variables, like shell scripts. You can see that we have
used the text
simple_math in three of our rules.
It makes sense to define a macro for this so that
we can easily change to a different library name.
We can now easily type
in the current directory to cause everything to be built.
You can see we have added an additional disconnected
clean:. Targets can be run explictly on the command-line
which removes all built files.
Makefiles have far more uses than just building C
programs. Anything that needs to be built from
sources can employ a
Makefile to make things
Next: 23. Shared Libraries Up: rute Previous: 21. System Services and   Contents