SAXON home page

SAXON DTDGenerator

A tool to generate XML DTDs

 


Purpose

DTDGenerator is a program that takes an XML document as input and produces a Document Type Definition (DTD) as output.

The aim of the program is to give you a quick start in writing a DTD. The DTD is one of the many possible DTDs to which the input document conforms. Typically you will want to examine the DTD and edit it to describe your intended documents more precisely.

The program is issued as part of the SAXON product. See the SAXON home page for download instructions.

DTDGenerator runs as a SAXON application, though in fact it exploits very few SAXON features.


Usage

Web service

You can use DTDGenerator without installing the software by submitting an XML file to the online service provided by Paul Tchistopolskii at http://www.pault.com/Xmltube/dtdgen.html.

If you use this service, ensure that the XML file you upload contains no references to other local files such as a DTD or an external entity.

Installable software

First install SAXON, and a SAX-compliant parser. Make sure that the parser you have installed is listed as the default parser in the ParserManager.properties file. Make sure that SAXON, the SAX parser, and the directory containing the DTDGenerator class are all on the class path.

From the command line, enter:

java DTDGenerator inputfile >outputfile

The input file must be an XML document; typically it will have no DTD. If it does have a DTD, the DTD may be used by the parser but it will be ignored by the DTDGenerator utility.

The output file will be an XML external document type definition.

The input file is not modified; if you want to edit it to refer to the generated DTD, you must do this yourself.


What it does

The program makes a list of all the elements and attributes that appear in your document, noting how they are nested, and noting which elements contain character data.

When the document has been completely processed, the DTD is generated according to the following rules:

The resulting DTD will often contain rules that are either too restrictive or too liberal. The DTD may be too restrictive if it prohibits constructs that do not appear in this document, but might legitimately appear in others. It may be too liberal if it fails to detect patterns that are inherent to the structure: for example, the order of elements within a parent element. These limitations are inherent in any attempt to infer general rules from a particular example document.

In general, therefore, you will need to iterate the process. You have a choice:

In a few unusual cases DTDGenerator will create a DTD which is invalid, or one to which the document does not conform. You will then have to edit the DTD before you can use it. The known cases are:


Michael H. Kay
9 January 2001