Наши партнеры

Мои книги

"Linux для пользователя"

"OpenOffice.org - открытый офис для Linux и Windows"

"Свободная система для свободных людей (обзор истории операционной системы Linux)"

"Система виртуальных машин фирмы VMWARE"

Библиотека сайта rus-linux.net

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14. Formatting Text

Methods and tools for changing the arrangement or presentation of text are often useful for preparing text for printing. This chapter discusses ways of changing the spacing of text and setting up pages, of underlining and sorting and reversing text, and of numbering lines of text.

14.1 Spacing Text    Change the spacing in text.

14.2 Paginating Text    Paginating text.

14.3 Underlining Text    Underlining text.

14.4 Sorting Text    Sorting text.

14.5 Numbering Lines of Text    Numbering text.

14.6 Reversing Text    Reversing text.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1 Spacing Text

These recipes are for changing the spacing of text--the whitespace that exists between words, lines, and paragraphs.

The filters described in this section send output to standard output by default; to save their output to a file, use shell redirection (see section Redirecting Output to a File).

14.1.1 Eliminating Extra Spaces in Text    Making the whitespace the same.

14.1.2 Single-Spacing Text    Single-spacing text.

14.1.3 Double-Spacing Text    Double-spacing text.

14.1.4 Triple-Spacing Text    Triple-spacing text.

14.1.5 Adding Line Breaks to Text    Putting line breaks in text.

14.1.6 Adding Margins to Text    Putting margins in text.

14.1.7 Swapping Tab and Space Characters    Swapping tab and space characters.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1.1 Eliminating Extra Spaces in Text

To eliminate extra whitespaces within lines of text, use the fmt filter; to eliminate extra whitespace between lines of text, use cat.

Use fmt with the `-u' option to output text with "uniform spacing," where the space between words is reduced to one space character and the space between sentences is reduced to two space characters.

To output the file `term-paper' with uniform spacing, type:
$ fmt -u term-paper RET

Use cat with the `-s' option to "squeeze" multiple adjacent blank lines into one.

To output the file `term-paper' with multiple blank lines output as only one blank line, type:
$ cat -s term-paper RET

You can combine both of these commands to output text with multiple adjacent lines removed and give it a unified spacing between words. The following example shows how the output of the combined commands is sent to less so that it can be perused on the screen.

To peruse the text file `term-paper' with multiple blank lines removed and giving the text unified spacing between words, type:
$ cat -s term-paper | fmt -u | less RET

Notice that in this example, both fmt and less worked on their standard input instead of on a file--the standard output of cat (the contents of `term-paper' with extra blank lines squeezed out) was passed to the standard input of fmt, and its standard output (the space-squeezed `term-paper', now with uniform spacing) was sent to the standard input of less, which displayed it on the screen.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1.2 Single-Spacing Text

There are many methods for single-spacing text. To remove all empty lines from text output, use grep with the regular expression `.', which matches any character, and therefore matches any line that isn't empty (see section Regular Expressions--Matching Text Patterns). You can then redirect this output to a file, or pipe it to other commands; the original file is not altered.

To output all non-empty lines from the file `term-paper', type:
$ grep . term-paper RET

This command outputs all lines that are not empty--so lines containing only non-printing characters, such as spaces and tabs, will still be output.

To remove from the output all empty lines, and all lines that consist of only space characters, use `[^ ].' as the regexp to search for. But this regexp will still output lines that contain only tab characters; to remove from the output all empty lines and lines that contain only a combination of tab or space characters, use `[^[:space:]].' as the regexp to search for. It uses the special predefined `[:space:]' regexp class, which matches any kind of space character at all, including tabs.

To output only the lines from the file `term-paper' that contain more than just space characters, type:
$ grep '[^ ].' term-paper RET
To output only the lines from the file `term-paper' that contain more than just space or tab characters, type:
$ grep '[^[:space:]].' term-paper RET

If a file is already double-spaced, where all even lines are blank, you can remove those lines from the output by using sed with the `n;d' expression.

To output only the odd lines from file `term-paper', type:
$ sed 'n;d' term-paper RET

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1.3 Double-Spacing Text

To double-space text, where one blank line is inserted between each line in the original text, use the pr tool with the `-d' option. By default, pr paginates text and puts a header at the top of each page with the current date, time, and page number; give the `-t' option to omit this header.

To double-space the file `term-paper' and write the output to the file `term-paper.print', type:
$ pr -d -t term-paper > term-paper.print RET

To send the output directly to the printer for printing, you would pipe the output to lpr:

$ pr -d -t term-paper | lpr RET

NOTE: The pr ("print") tool is a text pre-formatter, often used to paginate and otherwise prepare text files for printing; there is more discussion on the use of this tool in Paginating Text.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1.4 Triple-Spacing Text

To triple-space text, where two blank lines are inserted between each line of the original text, use sed with the `'G;G'' expression.

To triple-space the file `term-paper' and write the output to the file `term-paper.print', type:
$ sed 'G;G' term-paper > term-paper.print RET

The `G' expression appends one blank line to each line of sed's output; using `;' you can specify more than one blank line to append (but you must quote this command, because the semicolon (`;') has meaning to the shell--see Passing Special Characters to Commands). You can use multiple `G' characters to output text with more than double or triple spaces.

To quadruple-space the file `term-paper', and write the output to the file `term-paper.print', type:
$ sed 'G;G;G' term-paper > term-paper.print RET

The usage of sed is described in Editing Streams of Text.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1.5 Adding Line Breaks to Text

Sometimes a file will not have line breaks at the end of each line (this commonly happens during file conversions between operating systems). To add line breaks to a file that does not have them, use the text formatter fmt. It outputs text with lines arranged up to a specified width; if no length is specified, it formats text up to a width of 75 characters per line.

To output the file `term-paper' with lines up to 75 characters long, type:
$ fmt term-paper RET

Use the `-w' option to specify the maximum line width.

To output the file `term-paper' with lines up to 80 characters long, type:
$ fmt -w 80 term-paper RET

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1.6 Adding Margins to Text

Giving text an extra left margin is especially good when you want to print a copy and punch holes in it for use with a three-ring binder.

To output a text file with a larger left margin, use pr with the file name as an argument; give the `-t' option (to disable headers and footers), and, as an argument to the `-o' option, give the number of spaces to offset the text. Add the number of spaces to the page width (whose default is 72) and specify this new width as an argument to the `-w' option.

To output the file `owners-manual' with a five-space (or five-column) margin to a new file, `owners-manual.pr', type:
$ pr -t -o 5 -w 77 owners-manual > owners-manual.pr RET

This command is almost always used for printing, so the output is usually just piped to lpr instead of saved to a file. Many text documents have a width of 80 and not 72 columns; if you are printing such a document and need to keep the 80 columns across the page, specify a new width of 85. If your printer can only print 80 columns of text, specify a width of 80; the text will be reformatted to 75 columns after the 5-column margin.

To print the file `owners-manual' with a 5-column margin and 80 columns of text, type:
$ pr -t -o 5 -w 85 owners-manual | lpr RET
To print the file `owners-manual' with a 5-column margin and 75 columns of text, type:
$ pr -t -o 5 -w 80 owners-manual | lpr RET

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1.7 Swapping Tab and Space Characters

Use the expand and unexpand tools to swap tab characters for space characters, and to swap space characters with tabs, respectively.

Both tools take a file name as an argument and write changes to the standard output; if no files are specified, they work on the standard input.

To convert tab characters to spaces, use expand. To convert only the initial or leading tabs on each line, give the `-i' option; the default action is to convert all tabs.

To convert all tab characters to spaces in file `list', and write the output to `list2', type:
$ expand list > list2 RET
To convert only initial tab characters to spaces in file `list', and write the output to the standard output, type:
$ expand -i list RET

To convert multiple space characters to tabs, use unexpand. By default, it only converts leading spaces into tabs, counting eight space characters for each tab. Use the `-a' option to specify that all instances of eight space characters be converted to tabs.

To convert every eight leading space characters to tabs in file `list2', and write the output to `list', type:
$ unexpand list2 > list RET
To convert all occurrences of eight space characters to tabs in file `list2', and write the output to the standard output, type:
$ unexpand -a list2 RET

To specify the number of spaces to convert to a tab, give that number as an argument to the `-t' option.

To convert every leading space character to a tab character in `list2', and write the output to the standard output, type:
$ unexpand -t 1 list2 RET

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.2 Paginating Text

The formfeed character, ASCII C-l or octal code 014, is the delimiter used to paginate text. When you send text with a formfeed character to the printer, the current page being printed is ejected and a new page begins--thus, you can paginate a text file by inserting formfeed characters at a place where you want a page break to occur.

To insert formfeed characters in a text file, use the pr filter.

Give the `-f' option to omit the footer and separate pages of output with the formfeed character, and use `-h ""' to output a blank header (otherwise, the current date and time, file name, and current page number are output at the top of each page).

To paginate the file `listings' and write the output to a file called `listings.page', type:
$ pr -f -h "" listings > listings.page RET

By default, pr outputs pages of 66 lines each. You can specify the page length as an argument to the `-l' option.

To paginate the file `listings' with 43-line pages, and write the output to a file called `listings.page', type:
$ pr -f -h "" -l 43 listings > listings.page RET

NOTE: If a page has more lines than a printer can fit on a physical sheet of paper, it will automatically break the text at that line as well as at the places in the text where there are formfeed characters.

You can paginate text in Emacs by manually inserting formfeed characters where you want them--see Inserting Special Characters in Emacs.

14.2.1 Placing Headers on Each Page    Putting headers on a page.

14.2.2 Placing Text in Columns    Putting text in columns.

14.2.3 Options Available When Paginating Text    More options for pagination.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.2.1 Placing Headers on Each Page

The pr tool is a general-purpose page formatter and print-preparation utility. By default, pr outputs text in pages of 66 lines each, with headers at the top of each page containing the date and time, file name, and page number, and footers containing five blank lines.

To print the file `duchess' with the default pr preparation, type:
$ pr duchess | lpr RET

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.2.2 Placing Text in Columns

You can also use pr to put text in columns--give the number of columns to output as an argument. Use the `-t' option to omit the printing of the default headers and footers.

To print the file `news.update' in four columns with no headers or footers, type:
$ pr -4 -t news.update | lpr RET

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.2.3 Options Available When Paginating Text

The following table describes some of pr's options; see the pr info for a complete description of its capabilities (see section Using the GNU Info System).

OPTION	DESCRIPTION
`+first:last`	Specify the first and last page to process; the last page can be omitted, so `+7` begins processing with the seventh page and continues until the end of the file is reached.
`-column`	Specify the number of columns to output text in, making all columns fit the page width.
`-a`	Print columns across instead of down.
`-c`	Output control characters in hat notation and print all other unprintable characters in "octal backslash" notation.
`-d`	Specify double-spaced output.
`-f`	Separate pages of output with a formfeed character instead of a footer of blank lines (63 lines of text per 66-line page instead of 53).
`-h header`	Specify the header to use instead of the default; specify `-h ""` for a blank header.
`-l length`	Specify the page length to be `length` lines (default 66). If page length is less than 11, headers and footers are omitted and existing form feeds are ignored.
`-m`	Use when specifying multiple files; this option merges and outputs them in parallel, one per column.
`-o spaces`	Set the number of spaces to use in the left margin (default 0).
`-t`	Omit the header and footer on each page, but retain existing formfeeds.
`-T`	Omit the header and footer on each page, as well as existing formfeeds.
`-v`	Output non-printing characters in "octal backslash" notation.
`-w width`	Specify the page width to use, in characters (default 72).

NOTE: It's also common to use pr to change the spacing of text (see section Spacing Text).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.3 Underlining Text

In the days of typewriters, text that was meant to be set in an italicized font was denoted by underlining the text with underscore characters; now, it's common practice to denote an italicized word in plain text by typing an underscore character, `_', just before and after a word in a text file, like `_this_'.

Some text markup languages use different methods for denoting italics; for example, in TeX or LaTeX files, italicized text is often denoted with brackets and the `\it' command, like `{\it this}'. (LaTeX files use the same format, but `\emph' is often used in place of `\it'.)

You can convert one form to the other by using the Emacs replace-regular-expression function and specifying the text to be replaced as a regexp (see section Regular Expressions--Matching Text Patterns).

To replace plaintext-style italics with TeX `\it' commands, type:

 M-x replace-regular-expression RET
_\([^_]+\)_ RET
\{\\it \1} RET

To replace TeX-style italics with plaintext _underscores_, type:

 M-x replace-regular-expression RET
\{\\it \{\([^\}]+\)\} RET
_\1_ RET

Both examples above used the special regexp symbol `\1', which matches the same text matched by the first `$ ... $' construct in the previous regexp. See Info file `emacs-e20.info', node `Regexps' for more information on regexp syntax in Emacs.

To put a literal underline under text, you need to use a text editor to insert a C-h character followed by an underscore (`_') immediately after each character you want to underline; you can insert the C-h in Emacs with the C-q function (see section Inserting Special Characters in Emacs).

When a text file contains these literal underlines, use the ul tool to output the file so that it is viewable by the terminal you are using; this is also useful for printing (pipe the output of ul to lpr).

To output the file `term-paper' so that you can view underbars, type:
$ ul term-paper RET

To output such text without the backspace character, C-h, in the output, use col with the `-u' option.

To output the file `term-paper' with all backspace characters stripped out, type:
$ col -u term-paper RET

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.4 Sorting Text

You can sort a list in a text file with sort. By default, it outputs text in ascending alphabetical order; use the `-r' option to reverse the sort and output text in descending alphabetical order.

For example, suppose a file `provinces' contains the following:

Shantung
Honan
Szechwan
Hunan
Kiangsu
Kwangtung
Fukien

To sort the file `provinces' and output all lines in ascending order, type:

$ sort provinces RET
Fukien
Honan
Hunan
Kiangsu
Kwangtung
Shantung
Szechwan
$

To sort the file `provinces' and output all lines in descending order, type:

$ sort -r provinces RET
Szechwan
Shantung
Kwangtung
Kiangsu
Hunan
Honan
Fukien
$

The following table describes some of sort's options.

OPTION	DESCRIPTION
`-b`	Ignore leading blanks on each line when sorting.
`-d`	Sort in "phone directory" order, with only letters, digits, and blanks being sorted.
`-f`	When sorting, fold lowercase letters into their uppercase equivalent, so that differences in case are ignored.
`-i`	Ignore all spaces and all non-typewriter characters when sorting.
`-n`	Sort numerically instead of by character value.
`-o file`	Write output to `file` instead of standard output.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.5 Numbering Lines of Text

There are several ways to number lines of text.

One way to do it is to use the nl ("number lines") tool. Its default action is to write its input (either the file names given as an argument, or the standard input) to the standard output, with an indentation and all non-empty lines preceded with line numbers.

To peruse the file `report' with each line of the file preceded by line numbers, type:
$ nl report | less RET

You can set the numbering style with the `-b' option followed by an argument. The following table lists the possible arguments and describes the numbering style they select.

ARGUMENT	NUMBERING STYLE
`a`	Number all lines.
`t`	Number only non-blank lines. This is the default.
`n`	Do not number lines.
`pregexp`	Only number lines that contain the regular expression `regexp` (see section Regular Expressions--Matching Text Patterns).

The default is for line numbers to start with one, and increment by one. Set the initial line number by giving an argument to the `-v' option, and set the increment by giving an argument to the `-i' option.

To output the file `report' with each line of the file preceded by line numbers, starting with the number two and counting by fours, type:
$ nl -v 2 -i 4 report RET
To number only the lines of the file `cantos' that begin with a period (`.'), starting numbering at zero and using a numbering increment of five, and to write the output to `cantos.numbered', type:
$ nl -i 5 -v 0 -b p'^\.' cantos > cantos.numbered RET

The other way to number lines is to use cat with one of the following two options: the `-n' option numbers each line of its input text, while the `-b' option only numbers non-blank lines.

To peruse the text file `report' with each line of the file numbered, type:
$ cat -n report | less RET
To peruse the text file `report' with each non-blank line of the file numbered, type:
$ cat -b report | less RET

In the preceding examples, output from cat is piped to less for perusal; the original file is not altered.

To take an input file, number its lines, and then write the line-numbered version to a new file, send the standard output of the cat command to the new file to write.

To write a line-numbered version of file `report' to file `report.lines', type:
$ cat -n report > report.lines RET

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.6 Reversing Text

The tac command is similar to cat, but it outputs text in reverse order. There is another difference---tac works on records, sections of text with separator strings, instead of lines of text. Its default separator string is the linebreak character, so by default tac outputs files in line-for-line reverse order.

To output the file `prizes' in line-for-line reverse order, type:
$ tac prizes RET

Specify a different separator with the `-s' option. This is often useful when specifying non-printing characters such as formfeeds. To specify such a character, use the ANSI-C method of quoting (see section Passing Special Characters to Commands).

To output `prizes' in page-for-page reverse order, type:
$ tac -s $'\f' prizes RET

The preceding example uses the formfeed, or page break, character as the delimiter, and so it outputs the file `prizes' in page-for-page reverse order, with the last page output first.

Use the `-r' option to use a regular expression for the separator string (see section Regular Expressions--Matching Text Patterns). You can build regular expressions to output text in word-for-word and character-for-character reverse order:

To output `prizes' in word-for-word reverse order, type:
$ tac -r -s '[^a-zA-z0-9\-]' prizes RET
To output `prizes' in character-for-character reverse order, type:
$ tac -r -s '.\| RET ' prizes RET

To reverse the characters on each line, use rev.

To output `prizes' with the characters on each line reversed, type:
$ rev prizes RET

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

14.1 Spacing Text		Change the spacing in text.
14.2 Paginating Text		Paginating text.
14.3 Underlining Text		Underlining text.
14.4 Sorting Text		Sorting text.
14.5 Numbering Lines of Text		Numbering text.
14.6 Reversing Text		Reversing text.

14.1.1 Eliminating Extra Spaces in Text		Making the whitespace the same.
14.1.2 Single-Spacing Text		Single-spacing text.
14.1.3 Double-Spacing Text		Double-spacing text.
14.1.4 Triple-Spacing Text		Triple-spacing text.
14.1.5 Adding Line Breaks to Text		Putting line breaks in text.
14.1.6 Adding Margins to Text		Putting margins in text.
14.1.7 Swapping Tab and Space Characters		Swapping tab and space characters.

14.2.1 Placing Headers on Each Page		Putting headers on a page.
14.2.2 Placing Text in Columns		Putting text in columns.
14.2.3 Options Available When Paginating Text		More options for pagination.