[Foomatic] DTDs: first try

Johannes Meixner jsmeix at suse.de
Thu Jan 15 08:18:41 PST 2004


Hello,

I added the following stand-alone DTDs to the CVS:

foomatic-db/db/source/driver/driver.dtd
foomatic-db/db/source/opt/opt.dtd
foomatic-db/db/source/printer/printer.dtd

This DTDs are meant only as a first try to have something
to start with.

At the moment I didn't want to change all XML files to have
a reference to the matching DTD, therefore the DTDs can be used
only as external DTD.

When you are in the matching sub-directory you can use
a command like

for i in *.xml
do xmllint --dtdvalid {driver|opt|printer}.dtd --noout $i
done

to validate all XML files against the matching DTD.


I had to make some trivial syntactical changings to
several XML files - otherwise there would have been
too many weird exceptions in the DTDs.

driver:
  sharp.upp.xml

opt:
  jap-PageSize.xml  jap-Resolution.xml

printer:
  Canon-LBP-1760.xml          Epson-ActionLaser_1100.xml
  HP-Color_LaserJet_5000.xml  HP-DesignJet_230.xml
  HP-DesignJet_430.xml        HP-DesignJet_700.xml
  HP-LaserJet_3200.xml        HP-LaserJet_3200se.xml
  Minolta-PagePro_1100.xml    Minolta-PagePro_8L.xml
  NEC-SuperScript_1800.xml    NEC-SuperScript_750C.xml
  NEC-SuperScript_860.xml     NEC-SuperScript_870.xml
  Okidata-OL400.xml           Okidata-OL400e.xml
  Okidata-OL400ex.xml         Okidata-OL600e.xml
  Okidata-OL820.xml           Olivetti-JP470.xml
  Panasonic-KX-PS600.xml      Ricoh-Aficio_220.xml
  Samsung-ML-85.xml           Sharp-AJ-1800.xml
  Sharp-AJ-1805.xml           Sharp-AJ-2000.xml
  Sharp-AJ-2005.xml           Sharp-AJ-2100.xml
  Tektronix-Phaser_PX.xml     Xerox-DocuPrint_M750.xml

Of course there is no new syntax. I made only trivial changings
which use the existing syntax like
  <pcl level="..."></pcl>
instead of
  <pcl level="...">
  </pcl>
(the first is an EMPTY element, the latter is no EMPTY element).




While I made the DTDs I detected five "problems" and I think that
a big benefit of making DTDs is, that it reveals problematic
syntax in the XML files.




1.
Random ordering versus minimal/maximal number of elements:


At the moment the definitions in the DTDs are often loose
to allow random ordering of the elements.

For example in printer.dtd:

<!ELEMENT printer
  (make,model,
   (mechanism|url|lang|autodetect|functionality
    |driver|unverified|contrib_url|buyit|comments)+
  )>

means that in the printer element
- there must be first a make element
- followed by a model element and
- then there must be at least one of the elements
    mechanism url lang autodetect functionality driver
    unverified contrib_url buyit comments
  but there is no limit how many of those elements
  or which sequence must be used.

For example

<printer id="printer/test">
  <make>Foo</make>
  <model>Bar</model>
  <functionality>A</functionality>
  <functionality>F</functionality>
</printer>

is valid according to the DTD.

The problem is:
As far as I know it is not possible (with reasonable effort)
to specify in the DTD that there can be several different elements
in arbitrary order but each element can exist at most once.

Example:
It is easy to specify the element person as folows:

<!ELEMENT person (title?,surname,forename+,nickname*)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT surname (#PCDATA)>
  <!ELEMENT forename (#PCDATA)>
  <!ELEMENT nickname (#PCDATA)>

but this is a list (comma seperated) and the list results
a fixed ordering of the elements in the list - e.g.

<person>
<forename>Johannes</forename>
<surname>Meixner</surname>
</person>

is not valid according to the DTD.

To allow random ordering I use a choice (seperated by "|")

<!ELEMENT person (title|surname|forename|nickname)+>

but now don't know how to specify the minimal/maximal number
of the elements so that now

<person>
<surname>Johannes</surname>
<surname>Meixner</surname>
</person>

would be vaild.




2.
Multiple usage of the same element name with different meaning:


For example in printer.dtd:

<!ELEMENT general
 (((ieee1284|manufacturer|model|description|commandset)+)
 |((unit|top|bottom|left|right)+))>

was necessary because the element general is used with two
different meanings:

To specify margins:
In this case it is used with this syntax:
<!ELEMENT general (unit|top|bottom|left|right)+>

To specify autodetection defaults:
In this case it is used with a different syntax:
<!ELEMENT general (ieee1284|manufacturer|model|description|commandset)+>

The problem is:
As far as I know it is not possible to specify different
syntax for one element depending on the context.

Threfore I specified the element general as shown above but now

<printer id="printer/test2">
  <make>Foo</make>
  <model>Bar</model>
  <mechanism>
    <margins>
      <general>
        <ieee1284>MFG:Foo;MDL:Bar;</ieee1284>
        <manufacturer>Foo</manufacturer>
        <model>Foo Bar</model>
        <description>Foo Bar</description>
        <commandset>PCL</commandset>
      </general>
    </margins>
  </mechanism>
  <autodetect>
      <general>
        <unit>pt</unit>
        <left>10</left>
        <right>20</right>
        <top>30</top>
        <bottom>40</bottom>
      </general>
  </autodetect>
</printer>

is valid according to the DTD.

Note that the element model is used twice as well but this is
no problem because in printer.dtd "<!ELEMENT model (#PCDATA)>"
specifies the element model to hold any plain data.
Therfore it is perfectly o.k. that several model elements
may contain different data.
This is because the syntax of the model element is always the same.
In contrast the syntax of the element general is different
depending on the context.


The same problem is for example the element printer.

The syntax of the element printer is different depending
whether it is used in a XML file in foomatic-db/db/source/printer
(see foomatic-db/db/source/printer/printer.dtd) or in a XML file in
foomatic-db/db/source/opt (see foomatic-db/db/source/opt/opt.dtd).

In fact all XML files in all subdirectories build one single XML
document (which is seperated in many nice to use XML files).

This matches exactly to what Grant wrote that it is in fact
one single database.




3.
Do we really need so many elements and attributes?


For example in printer.dtd:

What is the difference between <url> and <contrib_url>?
What is the purpose of <buyit>?
Do we really need <consumables> and <partno>?
Couldn't this all be covered by simple comments?

The mechanism can be one of
  dotmatrix
  impact
  inkjet
  laser
  led
  sublimation
  thermal
  transfer
I think almost no user cares about so detailed differences.
I think dotmatrix inkjet and laser would be sufficient because
when a user is looking for what he calls a "laser-printer"
then he doesn't want to distinguish between real "laser" and "led".
The user may wonder why the Okidata printers are not shown
when he selected "laser" printers.
Further technical details should be covered by comments.
For example there are several sub-types of inkjet printers like
inkjet (HP's method) and bubblejet (Canon's method)
and additionally there are several color-sub-types like
  K-only CMY-only K-or-CMY CMYK CMYK-or-CMYcm CMYcm CMYKcm
(not only for inkjets but also for color laser printers).
I think all such detailed information could be in comments.

There are many pcl levels
  ?
  3?
  3
  3+
  3c
  3e
  4?
  4
  4/5
  4.5
  5
  5/6
  5c
  5e
  5e/6
  5E,6
  5e (gray only)
  6
  6, 5e
  6/XL
  III+
Therefore I used
  <!ELEMENT pcl EMPTY>
  <!ATTLIST pcl level CDATA #IMPLIED>
(i.e. any value for the pcl level is allowed)
instead of a fixed list of allowed values as in
  <!ELEMENT postscript (ppd*)>
  <!ATTLIST postscript level (1|2|3) "2">




4.
Attributes instead of empty elements:


For example the XML files in foomatic-db/db/source/printer:

If the type of mechanism would be set by atttributes like
  <mechanism type="inkjet" color="K-CMY">
instead of using empty elements <laser/> and <color/>
then I could define exactly the possible values in the DTD like
  <!ELEMENT mechanism (resolution?,margins?,consumables?)>
  <!ATTLIST mechanism type (dotmatrix|inkjet|laser) #REQUIRED>
  <!ATTLIST mechanism color (K|CMY|K-CMY|CMYK)#REQUIRED>

If the functionality would be set by an atttribute for the
driver element like
  <driver functionality="A">hpijs</driver>
then I could define exactly the possible values in the DTD like
  <!ELEMENT driver (#PCDATA)>
  <!ATTLIST driver functionality (A|B|D|F) #IMPLIED>

Note that this does not require that the driver element must
contain data, i.e. for a "paperwight" printer the entry would be
  <driver functionality="F"></driver>

Additionally we could have multiple driver elements
each with the appropriate functionality like
  <driver functionality="A">hpijs</driver>
  <driver functionality="D">deskjet</driver>
and the details are shown in comments like
  <comments>
    <en>
    To get maximum quality, use &quot;hpijs&quot; &lt;p&gt;
    </en>
    <de>
    Fuer beste Qualitaet: &quot;hpijs&quot; &lt;p&gt;
    </de>
    <en>
    To save system resources when printing only black and white
    you can use &quot;deskjet&quot; &lt;p&gt;
    </en>
    <de>
    Um bei reinem Schwarzweissdruck Systemresourcen zu schonen
    kann &quot;deskjet&quot; verwendet werden.&lt;p&gt;
    </de>
  </comments>
The overall functionality for a particular printer model
is the best of the existing functionality attributes
(i.e. "A" in the example above).


I do not recommend to use too much attributes.
For example it would be possible to have the driver element
as attribute to make sure that only valid driver names can
be entered.
But I do not recommend to do this because this way
each new driver must be added to the DTD file.
I.e. a user (e.g. a developer of a new driver) would not be able
to make an entry with his new driver unless we added his driver
name to our DTD.
On the other hand this way we could make sure that only free
drivers can be used but I don't think we should do it this way.




5.
Empty elements which should contain data.


For example the XML files in foomatic-db/db/source/printer:

61 files which have <resolution> and <dpi> but neither <x> nor <y>.
I think if there is <resolution> then there should be a value.

811 files which have <autodetect> but none of <general>
<parallel> <usb> <snmp>.
I think if there is <autodetect> then there should be at
least one of <general> <parallel> <usb> <snmp>.

8 files which have <parallel> but none of <ieee1284> <manufacturer>
<model> <description> <commandset>.
I think if there is <parallel> then there should be at least one of
<ieee1284> <manufacturer> <model> <description> <commandset>.

I made the DTDs so that such empty elements are allowed.
But I think it would be better to have either the element
with at least one associated sub-element or nothing at all.



Regards
Johannes Meixner
-- 
SUSE LINUX AG, Maxfeldstrasse 5                 Mail: jsmeix at suse.de
90409 Nuernberg, Germany                    WWW: http://www.suse.de/





More information about the Printing-foomatic mailing list