Original Form in XML

Paul & Eleanor goodhill@xmission.com
Fri, 12 Jan 2001 22:29:59 -0700

Garry Petrie wrote:
>  > John Halleck wrote:
>  >  Keeping a clean unmodified original text, and letting programs produce
>  >  the result of the parse, instead of trying to mark up the original line
>  >  with what it was identified with was a great help when it came to things
>  >  like proofreading back in my LBCC days.

> I can not imagine anything more absurd to do, transcribe your survey notes,
> run it through a filter to produce XML and then include the original source
> in the results. What would
> you do if you had more notes to transcribe? Is not your primitive text editor
> just a step of your "surveying software solution?" 

Yes the original text that is faithful to the original form of the data
just a step.  But being able to send this original to a friend might
be a step.
I'd see something like:
<caveXML ...>
<original-data name="Neff's Canyon Cave" form=Utah-Archive-Format-V1.0>
Why? They have offered to proof read it against the original notes.

Meanwhile, I might also send to the some other friend the data in
<CaveXML ...>
<survey ... >
<name>AAA crawl</name>

</survey ...>
because he is not a archivist, but wants to pull it up in his favorite
3D viewer.

Thus everybody who knows the caveXML format knows what is going on
IN BOTH CASES. Their software may not know my original format, but
know what I've sent some stuff to them which they can store as provided
and even send back to me, even if they think I'm crazy and wouldn't
think of parsing it.

Both raw and processed don't have to be in the same file, but I'm sure
John H. is smart enough to write a program to only deal with new raw
if he found a mix of raw and processed in the same file.

One thing to keep in mind is that all tags are not required, thus if
Gary P. would never even keep the raw form, John H. would never think
of separating the raw form from fully marked CaveXML, but Paul H.
might send some of each depending on who he is sending to, they are
all happy if there a CDATA section that COULD include raw and be
approriately marked.

> Back to XML. I gather that XPointers are a way to reference files. Our markup language
> needs a way to reference external files, e.g. images.

I think I agree with John H.'s comment and Gary's comments.

A tag for optional images would be useful. This tag COULD be used by
someone IF THEY WANTED TO place the image in-line.  But there is
also nothing wrong with an external reference.

One way to think of this is: If you wanted to send everything you had
down a wire to another person, is there a place to put everything?

Providing a raw-data section allows the really unusual to stay in its
original form.  This would give CaveXML an (un-friendly) place 
for those who want something really crazy in their data stream.

Note: as far as real syntax in the above examples, the only thing
in the above XML that really means anything to me is that the raw
(assuming there is such a thing allowed) would include a name
attribute and a 'form' attribute. Name would be a typical ID. Form would
help someone identify the raw format.  What is the keyword for the
form of image, i.e. GIF, etc., in HTML, I'd borrow that
attribute keyword for lack of a reason to use any other word.