CDFT: what goes in it/ source dataB

John Halleck John.Halleck@utah.edu
Thu, 11 Jan 2001 13:28:25 -0700 (MST)


On Thu, 11 Jan 2001, Julian Todd wrote:

> [...]
> This text (in the CDATA[]), being a faithful representation of your notes 
> from the cave, will not change unless there has been a transcription 
> error or other blunder.  After you have run your Survex parser and 
> extracted the data into XML notation you could delete it and 
> carry on without it.  However, aside from trying to comply with 
> the dogma that one must Never Represent the Same Data Twice 
> Because it Might Clash, I would claim that nothing is really 
> gained by throwing it away.  It is in fact serving the purpose of those 
> little envelopes of dried out notes you staple into your neat survey 
> book.  In that form it is easy to compare and check for transcription 
> errors.  And everyone can keep to their own quirky notational 
> habits without losing anything.  
> [...]

  Having dealt with a large survey for many years, I agree wholeheartedly
  with the underlying idea here.
  Keeping a clean unmodified original text, and letting programs produce
  the result of the parse, instead of trying to mark up the original line
  with what it was identified with was a great help when it came to things
  like proofreading back in my LBCC days.  It also meant that errant programs
  were more likely to add their mangled markup in the stuff they dealt with
  than they were to mangle the lines of the original.
  (Of course, back then the idea of storing an image of the page was totally
   out of the question.)

  There seems to be no end of interesting ways that people can
  abuse^H^H^H^H^H^Hdesign notations.  Trying to mark the actual original
  lines up with what was what can be difficult at best.   An untouched
  original (: Marked as such, of course :), and heavily marked up generated
  stuff seems the best of both worlds to me.

  I'd second a vote for   Image of original,   transscribed original, 
  generated marked up stuff.   (With the image being optional for us
  memeory challenged folk.)