[Foomatic] DTDs: first try

Grant Taylor gtaylor+foodev_ciebf011504 at picante.com
Thu Jan 15 14:47:35 PST 2004


Johannes Meixner <jsmeix at suse.de> writes:

> I added the following stand-alone DTDs to the CVS:

Nifty.  Interesting to see a schema definition for our goofy ad-hoc
xml ;)

> Of course there is no new syntax. I made only trivial changings
> which use the existing syntax like
>   <pcl level="..."></pcl>
> instead of
>   <pcl level="...">
>   </pcl>
> (the first is an EMPTY element, the latter is no EMPTY element).

Yeah, there is a lot of that.  It snuck in because one of the various
Perl modules used way back when wrote "tidy" xml out that way.  It was
otherwise tidy, so it lived on that way...

>    (mechanism|url|lang|autodetect|functionality
>     |driver|unverified|contrib_url|buyit|comments)+

Curious, other parsers typically have this mechanism.  There is not
{foo|bar}+, for example?

> As far as I know it is not possible (with reasonable effort)
> to specify in the DTD that there can be several different elements
> in arbitrary order but each element can exist at most once.

I guess the DTD syntax is a holdover from when it was only a document
class description for SGML.  With XML SGMLish text streams are now
used to represent arbitrary bundles of data, and the DTD syntax is not
so nice a fit.

Is there an XML-specific schema declaration language?  Perhaps it
doesn't have this problem.

> Multiple usage of the same element name with different meaning:

That's probably fine.  Calling one "left" and another "left" different
lefts allows the parser to distinguish between them despite the fact
that the distinguishing is not useful.  It's just a bit that
applications will throw away...

> Do we really need so many elements and attributes?

> What is the difference between <url> and <contrib_url>?
> What is the purpose of <buyit>?
> Do we really need <consumables> and <partno>?
> Couldn't this all be covered by simple comments?

Often, yes.  It's always hard to find the right mix of formally
organized data vs ad-hoc text details.

The buyit field is related to my affiliate links.  IIRC the current
implementation makes no use of in-xml data; there is an outboard extra
dataset of Amazon part numbers used to compute the links.

The consumables stuff is ostensibly to list ink by manufacturer part
number, which would give rise to some interesting or useful queries
like "what ink do I need?", or "what new printer will work with my
existing ink supply?", or "which suggested printers take the cheapest
ink per ml?".

The url is for a manufacturer's web page.  The contrib url is for any
sort of third-party web page.

> The user may wonder why the Okidata printers are not shown
> when he selected "laser" printers.

They are; the laser query explicitly includes LED ;)

(Actually, do we still have a laser query?  I think it went away).

> There are many pcl levels

Yes, there was a thread a while ago about it.  Really there are about
three or four vague flavours, and then a thousand variations inside
each.  No good solution.

> Attributes instead of empty elements:

> If the type of mechanism would be set by atttributes like
>   <mechanism type="inkjet" color="K-CMY">
> instead of using empty elements <laser/> and <color/>

This was a programming choice derived from the mapping of the original
schema to XML.  In general every possible datum of interest exists at
a simple path like "/printer/driver" or "/printer/mechanism", instead
of sometimes at a path and sometimes at a path+attribute.  This
provided for great programming simplicity; there was a function that
took a path and returned a string value, no muss no fuss.

No doubt modern methods like XQL have pleasing ways to mask this
complexity, so perhaps it's less of an issue now.  However, we
shouldn't run around changing things just to make the DTD easier; for
all I know the world is full of third party foomatic-data reading
implementations.

> Empty elements which should contain data.

> 61 files which have <resolution> and <dpi> but neither <x> nor <y>.
> I think if there is <resolution> then there should be a value.

This falls out of the implementation, again.  Having
"/printer/resolution" alone is no problem for the implementation, as
the implementation looks for only "/printer/resolution/dpi".

But I don't see why we couldn't remove those if it's an issue for
something else.

-- 
Grant Taylor - gtaylor<at>picante.com - http://www.picante.com/~gtaylor/
   Linux Printing Website and HOWTO:  http://www.linuxprinting.org/



More information about the Printing-foomatic mailing list