Библиотека сайта rus-linux.net
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14. Formatting Text
Methods and tools for changing the arrangement or presentation of text are often useful for preparing text for printing. This chapter discusses ways of changing the spacing of text and setting up pages, of underlining and sorting and reversing text, and of numbering lines of text.
14.1 Spacing Text Change the spacing in text. 14.2 Paginating Text Paginating text. 14.3 Underlining Text Underlining text. 14.4 Sorting Text Sorting text. 14.5 Numbering Lines of Text Numbering text. 14.6 Reversing Text Reversing text.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1 Spacing Text
These recipes are for changing the spacing of text--the whitespace that exists between words, lines, and paragraphs.
The filters described in this section send output to standard output by default; to save their output to a file, use shell redirection (see section Redirecting Output to a File).
14.1.1 Eliminating Extra Spaces in Text Making the whitespace the same. 14.1.2 Single-Spacing Text Single-spacing text. 14.1.3 Double-Spacing Text Double-spacing text. 14.1.4 Triple-Spacing Text Triple-spacing text. 14.1.5 Adding Line Breaks to Text Putting line breaks in text. 14.1.6 Adding Margins to Text Putting margins in text. 14.1.7 Swapping Tab and Space Characters Swapping tab and space characters.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1.1 Eliminating Extra Spaces in Text
To eliminate extra whitespaces within
lines of text, use the
fmt
filter; to eliminate extra whitespace between lines of
text, use cat
.
Use fmt
with the `-u' option to output text with "uniform
spacing," where the space between words is reduced to one space
character and the space between sentences is reduced to two space
characters.
-
To output the file
`term-paper'
with uniform spacing, type:$ fmt -u term-paper RET
Use cat
with the `-s' option to "squeeze" multiple
adjacent blank lines into one.
-
To output the file
`term-paper'
with multiple blank lines output as only one blank line, type:$ cat -s term-paper RET
You can combine both of these commands to output text with multiple
adjacent lines removed and give it a unified spacing between
words. The following example shows how the output of the combined
commands is sent to less
so that it can be perused on the screen.
-
To peruse the text file
`term-paper'
with multiple blank lines removed and giving the text unified spacing between words, type:$ cat -s term-paper | fmt -u | less RET
Notice that in this example, both fmt
and less
worked on
their standard input instead of on a file--the standard output of
cat
(the contents of `term-paper'
with extra blank lines
squeezed out) was passed to the standard input of fmt
, and its
standard output (the space-squeezed `term-paper'
, now with uniform
spacing) was sent to the standard input of less
, which displayed
it on the screen.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1.2 Single-Spacing Text
There are many methods for single-spacing text. To remove all empty
lines from text output, use grep
with the regular expression
`.', which matches any character, and therefore matches any line
that isn't empty (see section Regular Expressions--Matching Text Patterns). You can then redirect this output to a file, or pipe it to
other commands; the original file is not altered.
-
To output all non-empty lines from the file
`term-paper'
, type:$ grep . term-paper RET
This command outputs all lines that are not empty--so lines containing only non-printing characters, such as spaces and tabs, will still be output.
To remove from the output all empty lines, and all lines that consist of only space characters, use `[^ ].' as the regexp to search for. But this regexp will still output lines that contain only tab characters; to remove from the output all empty lines and lines that contain only a combination of tab or space characters, use `[^[:space:]].' as the regexp to search for. It uses the special predefined `[:space:]' regexp class, which matches any kind of space character at all, including tabs.
-
To output only the lines from the file
`term-paper'
that contain more than just space characters, type:$ grep '[^ ].' term-paper RET
To output only the lines from the file
`term-paper'
that contain more than just space or tab characters, type:$ grep '[^[:space:]].' term-paper RET
If a file is already double-spaced, where all even lines are blank, you
can remove those lines from the output by using sed
with the
`n;d' expression.
-
To output only the odd lines from file
`term-paper'
, type:$ sed 'n;d' term-paper RET
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1.3 Double-Spacing Text
To double-space text, where one blank line is inserted between each line
in the original text, use the pr
tool with the `-d'
option. By default, pr
paginates text and puts a header at the
top of each page with the current date, time, and page number; give the
`-t' option to omit this header.
-
To double-space the file
`term-paper'
and write the output to the file`term-paper.print'
, type:$ pr -d -t term-paper > term-paper.print RET
To send the output directly to the printer for printing, you would pipe
the output to lpr
:
$ pr -d -t term-paper | lpr RET |
NOTE: The pr
("print") tool is a text pre-formatter,
often used to paginate and otherwise prepare text files for printing;
there is more discussion on the use of this tool in Paginating Text.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1.4 Triple-Spacing Text
To triple-space text, where two blank lines are inserted between each
line of the original text, use sed
with the `'G;G''
expression.
-
To triple-space the file
`term-paper'
and write the output to the file`term-paper.print'
, type:$ sed 'G;G' term-paper > term-paper.print RET
The `G' expression appends one blank line to each line of
sed
's output; using `;' you can specify more than one blank
line to append (but you must quote this command, because the semicolon
(`;') has meaning to the shell--see Passing Special Characters to Commands). You can use multiple `G'
characters to output text with more than double or triple spaces.
-
To quadruple-space the file
`term-paper'
, and write the output to the file`term-paper.print'
, type:$ sed 'G;G;G' term-paper > term-paper.print RET
The usage of sed
is described in Editing Streams of Text.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1.5 Adding Line Breaks to Text
Sometimes a file will not have line breaks at the end of each line (this
commonly happens during file conversions between operating systems). To
add line breaks to a file that does not have them, use the text
formatter fmt
. It outputs text with lines arranged up to a
specified width; if no length is specified, it formats text up to a
width of 75 characters per line.
-
To output the file
`term-paper'
with lines up to 75 characters long, type:$ fmt term-paper RET
Use the `-w' option to specify the maximum line width.
-
To output the file
`term-paper'
with lines up to 80 characters long, type:$ fmt -w 80 term-paper RET
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1.6 Adding Margins to Text
Giving text an extra left margin is especially good when you want to print a copy and punch holes in it for use with a three-ring binder.
To output a text file with a larger left margin, use pr
with the
file name as an argument; give the `-t' option (to disable headers
and footers), and, as an argument to the `-o' option, give the
number of spaces to offset the text. Add the number of spaces to the
page width (whose default is 72) and specify this new width as an
argument to the `-w' option.
-
To output the file
`owners-manual'
with a five-space (or five-column) margin to a new file,`owners-manual.pr'
, type:$ pr -t -o 5 -w 77 owners-manual > owners-manual.pr RET
This command is almost always used for printing, so the output is
usually just piped to lpr
instead of saved to a file. Many text
documents have a width of 80 and not 72 columns; if you are printing
such a document and need to keep the 80 columns across the page, specify
a new width of 85. If your printer can only print 80 columns of text,
specify a width of 80; the text will be reformatted to 75 columns after
the 5-column margin.
-
To print the file
`owners-manual'
with a 5-column margin and 80 columns of text, type:$ pr -t -o 5 -w 85 owners-manual | lpr RET
-
To print the file
`owners-manual'
with a 5-column margin and 75 columns of text, type:$ pr -t -o 5 -w 80 owners-manual | lpr RET
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.1.7 Swapping Tab and Space Characters
Use the expand
and unexpand
tools to swap tab characters
for space characters, and to swap space characters with tabs,
respectively.
Both tools take a file name as an argument and write changes to the standard output; if no files are specified, they work on the standard input.
To convert tab characters to spaces, use expand
. To convert only
the initial or leading tabs on each line, give the `-i'
option; the default action is to convert all tabs.
-
To convert all tab characters to spaces in file
`list'
, and write the output to`list2'
, type:$ expand list > list2 RET
-
To convert only initial tab characters to spaces in file
`list'
, and write the output to the standard output, type:$ expand -i list RET
To convert multiple space characters to tabs, use unexpand
. By
default, it only converts leading spaces into tabs, counting eight space
characters for each tab. Use the `-a' option to specify that
all instances of eight space characters be converted to tabs.
-
To convert every eight leading space characters to tabs in file
`list2'
, and write the output to`list'
, type:$ unexpand list2 > list RET
-
To convert all occurrences of eight space characters to tabs in file
`list2'
, and write the output to the standard output, type:$ unexpand -a list2 RET
To specify the number of spaces to convert to a tab, give that number as an argument to the `-t' option.
-
To convert every leading space character to a tab character in
`list2'
, and write the output to the standard output, type:$ unexpand -t 1 list2 RET
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.2 Paginating Text
The formfeed character, ASCII C-l or octal code 014, is the delimiter used to paginate text. When you send text with a formfeed character to the printer, the current page being printed is ejected and a new page begins--thus, you can paginate a text file by inserting formfeed characters at a place where you want a page break to occur.
To insert formfeed characters in a text file, use the pr
filter.
Give the `-f' option to omit the footer and separate pages of output with the formfeed character, and use `-h ""' to output a blank header (otherwise, the current date and time, file name, and current page number are output at the top of each page).
-
To paginate the file
`listings'
and write the output to a file called`listings.page'
, type:$ pr -f -h "" listings > listings.page RET
By default, pr
outputs pages of 66 lines each. You can specify
the page length as an argument to the `-l' option.
-
To paginate the file
`listings'
with 43-line pages, and write the output to a file called`listings.page'
, type:$ pr -f -h "" -l 43 listings > listings.page RET
NOTE: If a page has more lines than a printer can fit on a physical sheet of paper, it will automatically break the text at that line as well as at the places in the text where there are formfeed characters.
You can paginate text in Emacs by manually inserting formfeed characters where you want them--see Inserting Special Characters in Emacs.
14.2.1 Placing Headers on Each Page Putting headers on a page. 14.2.2 Placing Text in Columns Putting text in columns. 14.2.3 Options Available When Paginating Text More options for pagination.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.2.1 Placing Headers on Each Page
The pr
tool is a general-purpose page formatter and
print-preparation utility. By default, pr
outputs text in pages
of 66 lines each, with headers at the top of each page containing the
date and time, file name, and page number, and footers containing five
blank lines.
-
To print the file
`duchess'
with the defaultpr
preparation, type:$ pr duchess | lpr RET
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.2.2 Placing Text in Columns
You can also use pr
to put text in columns--give the number of
columns to output as an argument. Use the `-t' option to omit the
printing of the default headers and footers.
-
To print the file
`news.update'
in four columns with no headers or footers, type:$ pr -4 -t news.update | lpr RET
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.2.3 Options Available When Paginating Text
The following table describes some of pr
's options; see the
pr
info
for a complete description of its capabilities
(see section Using the GNU Info System).
OPTION | DESCRIPTION |
+first:last |
Specify the first and last page to process; the last page can be omitted, so +7 begins processing with the seventh page and continues until the end of the file is reached. |
-column |
Specify the number of columns to output text in, making all columns fit the page width. |
-a |
Print columns across instead of down. |
-c |
Output control characters in hat notation and print all other unprintable characters in "octal backslash" notation. |
-d |
Specify double-spaced output. |
-f |
Separate pages of output with a formfeed character instead of a footer of blank lines (63 lines of text per 66-line page instead of 53). |
-h header |
Specify the header to use instead of the default; specify -h "" for a blank header. |
-l length |
Specify the page length to be length lines (default 66). If page length is less than 11, headers and footers are omitted and existing form feeds are ignored. |
-m |
Use when specifying multiple files; this option merges and outputs them in parallel, one per column. |
-o spaces |
Set the number of spaces to use in the left margin (default 0). |
-t |
Omit the header and footer on each page, but retain existing formfeeds. |
-T |
Omit the header and footer on each page, as well as existing formfeeds. |
-v |
Output non-printing characters in "octal backslash" notation. |
-w width |
Specify the page width to use, in characters (default 72). |
NOTE: It's also common to use pr
to change the spacing
of text (see section Spacing Text).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.3 Underlining Text
In the days of typewriters, text that was meant to be set in an italicized font was denoted by underlining the text with underscore characters; now, it's common practice to denote an italicized word in plain text by typing an underscore character, `_', just before and after a word in a text file, like `_this_'.
Some text markup languages use different methods for denoting italics; for example, in TeX or LaTeX files, italicized text is often denoted with brackets and the `\it' command, like `{\it this}'. (LaTeX files use the same format, but `\emph' is often used in place of `\it'.)
You can convert one form to the other by using the Emacs
replace-regular-expression
function and specifying the text to be
replaced as a regexp (see section Regular Expressions--Matching Text Patterns).
-
To replace plaintext-style italics with TeX `\it' commands, type:
M-x replace-regular-expression RET _\([^_]+\)_ RET \{\\it \1} RET
-
To replace TeX-style italics with plaintext _underscores_, type:
M-x replace-regular-expression RET \{\\it \{\([^\}]+\)\} RET _\1_ RET
Both examples above used the special regexp symbol `\1', which matches the same text matched by the first `\( ... \)' construct in the previous regexp. See Info file `emacs-e20.info', node `Regexps' for more information on regexp syntax in Emacs.
To put a literal underline under text, you need to use a text editor to
insert a C-h
character followed by an underscore (`_')
immediately after each character you want to underline; you can insert
the C-h
in Emacs with the C-q function (see section Inserting Special Characters in Emacs).
When a text file contains these literal underlines, use the ul
tool to output the file so that it is viewable by the terminal you are
using; this is also useful for printing (pipe the output of ul
to
lpr
).
-
To output the file
`term-paper'
so that you can view underbars, type:$ ul term-paper RET
To output such text without the backspace character, C-h, in the
output, use col
with the `-u' option.
-
To output the file
`term-paper'
with all backspace characters stripped out, type:$ col -u term-paper RET
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.4 Sorting Text
You can sort a list in a text file with sort
. By default, it
outputs text in ascending alphabetical order; use the `-r' option
to reverse the sort and output text in descending alphabetical order.
For example, suppose a file `provinces'
contains the following:
Shantung Honan Szechwan Hunan Kiangsu Kwangtung Fukien |
-
To sort the file
`provinces'
and output all lines in ascending order, type:$ sort provinces RET Fukien Honan Hunan Kiangsu Kwangtung Shantung Szechwan $
-
To sort the file
`provinces'
and output all lines in descending order, type:$ sort -r provinces RET Szechwan Shantung Kwangtung Kiangsu Hunan Honan Fukien $
The following table describes some of sort
's options.
OPTION | DESCRIPTION |
-b |
Ignore leading blanks on each line when sorting. |
-d |
Sort in "phone directory" order, with only letters, digits, and blanks being sorted. |
-f |
When sorting, fold lowercase letters into their uppercase equivalent, so that differences in case are ignored. |
-i |
Ignore all spaces and all non-typewriter characters when sorting. |
-n |
Sort numerically instead of by character value. |
-o file |
Write output to file instead of standard output. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.5 Numbering Lines of Text
There are several ways to number lines of text.
One way to do it is to use the nl
("number lines") tool. Its
default action is to write its input (either the file names given as an
argument, or the standard input) to the standard output, with an
indentation and all non-empty lines preceded with line numbers.
-
To peruse the file
`report'
with each line of the file preceded by line numbers, type:$ nl report | less RET
You can set the numbering style with the `-b' option followed by an argument. The following table lists the possible arguments and describes the numbering style they select.
ARGUMENT | NUMBERING STYLE |
a |
Number all lines. |
t |
Number only non-blank lines. This is the default. |
n |
Do not number lines. |
pregexp |
Only number lines that contain the regular expression regexp (see section Regular Expressions--Matching Text Patterns). |
The default is for line numbers to start with one, and increment by one. Set the initial line number by giving an argument to the `-v' option, and set the increment by giving an argument to the `-i' option.
-
To output the file
`report'
with each line of the file preceded by line numbers, starting with the number two and counting by fours, type:$ nl -v 2 -i 4 report RET
-
To number only the lines of the file
`cantos'
that begin with a period (`.'), starting numbering at zero and using a numbering increment of five, and to write the output to`cantos.numbered'
, type:$ nl -i 5 -v 0 -b p'^\.' cantos > cantos.numbered RET
The other way to number lines is to use cat
with one of the
following two options: the `-n' option numbers each line of its
input text, while the `-b' option only numbers non-blank lines.
-
To peruse the text file
`report'
with each line of the file numbered, type:$ cat -n report | less RET
-
To peruse the text file
`report'
with each non-blank line of the file numbered, type:$ cat -b report | less RET
In the preceding examples, output from cat
is piped to
less
for perusal; the original file is not altered.
To take an input file, number its lines, and then write the
line-numbered version to a new file, send the standard output of the
cat
command to the new file to write.
-
To write a line-numbered version of file
`report'
to file`report.lines'
, type:$ cat -n report > report.lines RET
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
14.6 Reversing Text
The tac
command is similar to cat
, but it outputs text in
reverse order. There is another difference---tac
works on
records, sections of text with separator strings, instead of lines
of text. Its default separator string is the linebreak character, so by
default tac
outputs files in line-for-line reverse order.
-
To output the file
`prizes'
in line-for-line reverse order, type:$ tac prizes RET
Specify a different separator with the `-s' option. This is often useful when specifying non-printing characters such as formfeeds. To specify such a character, use the ANSI-C method of quoting (see section Passing Special Characters to Commands).
-
To output
`prizes'
in page-for-page reverse order, type:$ tac -s $'\f' prizes RET
The preceding example uses the formfeed, or page break, character as
the delimiter, and so it outputs the file `prizes'
in page-for-page
reverse order, with the last page output first.
Use the `-r' option to use a regular expression for the separator string (see section Regular Expressions--Matching Text Patterns). You can build regular expressions to output text in word-for-word and character-for-character reverse order:
-
To output
`prizes'
in word-for-word reverse order, type:$ tac -r -s '[^a-zA-z0-9\-]' prizes RET
-
To output
`prizes'
in character-for-character reverse order, type:$ tac -r -s '.\| RET ' prizes RET
To reverse the characters on each line, use rev
.
-
To output
`prizes'
with the characters on each line reversed, type:$ rev prizes RET
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |