Библиотека сайта rus-linux.net
B.3. Validation
B.3.1. Why Validate Your Document
The LDP uses a number of scripts to distribute your document. These scripts submit your document to the LDP's CVS (a free document version management system), and then they transform your document to other formats that users then read. Your document will also be mirrored on a number of sites worldwide (yet another set of scripts).
In order for these scripts to work correctly, your document must be both "well formed" and use "valid markup". Well formed means your document follows the rules that XML is expecting: it complies with XML grammar rules. Valid markup means you only use elements or tags which are "valid" for your document: XML vocabulary rules are applied.
If your document is not well formed or uses invalid markup, the scripts will not be able to process it. As a result, your revised document will not be distributed.
The Docbook Section | |
---|---|
There is more information about how to validate your document in the DocBook section. Check out Section B.3 for more help with validating your document. |
B.3.2. Validation for the Faint of Heart
Your life is already hard enough without having to install a full set of tools just to see if you validate as well. You can upload your raw XML files to a web site, then go to http://validate.sf.net, enter the URL to your document, then validate it.
External entities | |
---|---|
When this information was added to the Author Guide external entities were not supported. Follow the instructions provided on the Validate site if you have trouble. |
B.3.3. Validation for the Not So Faint Of Heart
B.3.3.1. Catalogs
XML and SGML files contain most of the information you need; however, there are sometimes entities which are specific to SGML in general. To match these entities to their actual values you need to use a catalog. The role of a catalog is to tell your system where to find the files it is looking for. You may want to think of a catalog as a guide book (or a map) for your tools.
Most distributions (Red Hat/Fedora and Debian at least) have a common location
for the main SGML catalog file, called /etc/sgml/catalog
.
In times past, it could also be found in /usr/lib/sgml/catalog
.
The structure of XML catalog files is not the same as SGML catalog files. The section on tailoring a catalog (see Section B.3.4) will give more details about what these files actually contain.
If your system cannot find the catalog file, or you are using
custom catalog files, you may need to set the
SGML_CATALOG_FILES
and
XML_CATALOG_FILES
environment variables. Using
echo $SGML_CATALOG_FILES
,
check to see if it is currently set. If a blank line is returned,
the variable has not been set. Use the same command to see if
XML_CATALOG_FILES
is set as well. If the variables
are not set, use the following example to set them now.
Example B-1. Setting the SGML_CATALOG_FILES and XML_CATALOG_FILES Environmental Variables
|
To make this change permanent, you can add the following lines to
your ~/.bashrc
file.
|
If you installed XML tools via a RedHat or Debian package, you probably don't need to do this step. If you are using a custom XML catalog you will definitely need to do this. There is more on custom catalogs in the next section. To ensure my backup scripts grab this custom file, I have added mine in a sub-directory of my home directory named "docbook".
|
You can also change your .bashrc
if you want to
save these changes.
|
If you are adding the changes to your
.bashrc
you will not see the changes
until you open a new terminal window. To make the changes immediate in the current terminal,
"source" the configuration file.
B.3.4. Creating and modifying catalogs
In the previous section I mentioned a catalog is like a guide book for your tools. Specifically, a catalog maps the rules from the public identifier to your system's files.
At the top of every DocBook (or indeed every XML) file there is a
DOCTYPE which tells the processing tool what kind of document it is
about to be processed. At a minimum this declaration will include a public
identifier, such as -//OASIS//DTD DocBook
V4.2//EN
. This public identifier has a number of sections all
separated by //
. It contains the following
information: ISO standard if any (-
-- in this case
there is no ISO standard),
author (OASIS), type of document (DTD DocBook V4.2), language
(English). Your DOCTYPE may also include a URL.
A public identifier is useless to a processing tool, as it needs to be able to access the actual DTD. A URL is useless if the processing tool is off-line. To help your processor deal with these problems you can download all of the necessary files and then "map" them for your processing tools by using a catalog.
If you are using SGML processing tools (for instance Jade), you will need an SGML catalog. If you are using XML processing tools (like XSLT), you will need an XML catalog. Information on both is included.
B.3.4.1. SGML Catalogs
Example B-2. Example of an SGML catalog
As in the example above, to associate an identifier to a file just follow the sequence shown:
Copy the identifier PUBLIC
Type the identifying text
Indicate the path to the associated file
B.3.4.1.1. Useful commands for catalogs
The most common mappings to be used in catalogs are:
PUBLIC
The keyword
PUBLIC
maps public identifiers for identifiers on the system.SYSTEM
The
SYSTEM
keyword maps system identifiers for files on the system.SGMLDECL
The keyword
SGMLDECL
designates the system identifier of the SGML statement that should be used.DTDDECL
Similar to the
SGMLDECL
the keywordDTDDECL
identifies the SGML statement that should be used.DTDDECL
makes the association of the statement with a public identifier to a DTD. Unfortunately, this association isn't supported by the open source tools available. The benefits of this statement can be achieved somehow with multiple catalog files.CATALOG
The keyword
CATALOG
allows a catalog to be included inside another. This is a way to make use of several different catalogs without the need to alter them.OVERRIDE
The keyword
OVERRIDE
informs whether an identifier has priority over a system identifier. The standard on most systems is that the system identifier has priority over the public one.DELEGATE
The keyword
DELEGATE
allows the association of a catalog to a specific type of public identifier. The clauseDELEGATE
is very similar to theCATALOG
, except for the fact that it doesn't do anything until a specific pattern is specified.DOCTYPE
If a document starts with a type of document, but has no public identifier and no system identifier the clause
DOCTYPE
associates this document with a specific DTD.
B.3.4.2. XML Catalogs
The following sample catalog was provided by Martin A. Brown.
Example B-3. Sample XML Catalog file
<?xml version="1.0"?> <!DOCTYPE catalog PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> |
B.3.5. Validating XML
B.3.5.1. nsgmls
You can use nsgmls, which is part of the jade suite (on Debian apt-get the docbook-utils package, see Section B.4.2), to validate SGML or XML documents.
|
If there are no issues, you'll just get your command
prompt back. The -s
tells
nsgmls to show only the errors.
Function not found | |
---|---|
If you get errors about a function not being found, or
something about an ISO character not having an
authoritative source, you may
need to point nsgmls to your
|
For more information on processing files with Jade/OpenJade please read DocBook XML/SGML Processing Using OpenJade.
B.3.5.2. onsgmls
This is an alternative to nsgmls. It ships
with the OpenJade package. This program gives more options than nsgmls
and allows you to quietly ignore a number of problems that arise while
trying to validate an XML file (as opposed to an SGML file). This also
means you don't have to type out the location of your
xml.dcl
file each time.
I was able to simply use the following to validate a file with only error messages that were related to my markup errors.
|
According to Bob Stayton you can also turn off specific error messages. The following example turns off XML-specific error messages.
|
B.3.5.3. xmllint
You can also use the xmllint command-line tool from the libxml2 package to validate your documents. This tool does a simple check on completeness of tags and whether all tags that are opened, are also closed again. By default xmllint will output a results tree. So if your document comes out until the last line, you know there are no heavy errors having to do with tag mismatches, opening and closing errors and the like.
To prevent printing the entire document to your screen, add the --noout
parameter.
|
If nothing is returned, your document contains no syntax errors. Else, start with the first error that was reported. Fix that one error, and run the tool again on your document. If it still returns output, again fix the first error that you see, don't botter with the rest since further errors are usually generated because of the first one.
If you would like to check your document for any errors which are
specific to your Document Type Definition, add
--valid
.
|
The xmllint tool may also be used for checking errors in the XML catalogs, see the man pages for more info on how to set this behavior.
If you are a Mac OSX or Windows user, you may also want to check out tkxmllint, a GUI version of xmllint. More information is available from: http://tclxml.sourceforge.net/tkxmllint.html.
Example B-4. Debugging example using xmllint
The example below shows how you can use xmllint to check your documents. I've created some errors that I made a lot, as a beginning XML writer. At first, the document doesn't come through, and errors are shown:
|
Now, as we already mentioned, don't worry about anything except the first error. The first error says there is an inconsistency between the tags on line 6 and line 22 in the file. Indeed, on line 6 we left out the "e" in "articleinfo". Fix the error, and run xmllint again. The first complaint now is about the offending line 37, where the closing tag for list items has been forgotten. Fix the error and run the validation tool again, until all errors are gone. Most common errors include forgetting to open or close the paragraph tag, spelling errors in tags and messed up sections.