Survex (and therion) source formats

Olly Betts olly at survex.com
Tue Aug 23 00:01:20 BST 2016


Just a note on the background of this discussion - Wookey's taken a
private email I wrote and re-sent it to the list without first telling
me he was going to.  The idea of a "v2" source format was an
off-the-cuff example of a way to support "," as the decimal separator
without breaking existing data sets (Wookey failed to include the part
I'd quoted where Mateusz suggested just changing the default meaning of
"," over the course of the next 20 years).

Had I been writing up the idea for public comment I'd have taken more
time to think it through first, but I guess we're here now.  Just bear
in mind that this isn't any sort of active plan - it's not even a
proposed plan, just a random thought I came up with in two minutes to
show there was a less bad way to solve a problem.

On Mon, Aug 22, 2016 at 11:47:51AM +0100, Wookey wrote:
> On 2016-08-22 06:04 +0100, Olly Betts wrote:
> 
> [Re the default meaning of ',' in survex files]
>  
> > Hmm, I'm not keen on breaking existing data even if some self-selected
> > subset of users think it's OK.
> > 
> > I'd far rather find some way to address this that doesn't break existing
> > data and also which doesn't take 20 years to execute.
> > 
> > For example, we could release "Survex 2.0" which (as well as the current
> > source format) also supported a new "version 2" of the source format,
> > indicated by (say) "*version 2".  This could have a different default
> > meaning for ",", and could clean up other stuff: finally drop all the
> > long-deprecated stuff like "*prefix", actually enforce structure on
> > commands like "*team" which almost everyone just seems to treat as a
> > free-form text field (ignoring what the manual says should be in there)
> > meaning it's hard to process them automatically now.
> 
> If we are thinking about an updated source format I think it's worth
> thinking quite hard about therion compatibility. There was discussion
> at Eurospeleo about the annoyance of having two very similar, but
> incompatible, formats for what are arguably the two most popular
> surveying tools: survex and therion.

The differences are annoying, though bear in mind none of these
incompatibilities were introduced by Survex - where I've added new
features to Survex which therion already had an equivalent of, I've
aimed to make them as compatible as is feasible.

For example, *cs is named after therion's "cs" command and accepts the
same coordinate systems as therion's "cs" command (even the oddball
special cases which weren't documented until I sent a patch for the
"thbook" to add them), with the single exception of "LAT-LONG" - that's
only not included for technical reasons, as unlike all the other
coordinate systems it makes the coordinate order Y,X,Z instead of
X,Y,Z, which would require special casing everywhere the order of
coordinates is relevant, or perhaps require that PROJ 4.8 or newer is
available (which supports +axis for permuting the coordinates, though
then the order of the standard deviations on *fix wouldn't match the
order of the coordinates).  Note that "LONG-LAT" works in both.

> If survex added a *centreline then it would be structurally equivalent
> (and this is sensible to allow for future not-centreline data (walls,
> point clouds?)).

Survex's default is that data is centreline data, since that is what
it primarily works with.  Non-centreline data is already handled - e.g.
"*data passage".  Forcing users to specify that centreline data is
centreline data explicitly would mean more boilerplate to type, even for
those who care nothing about therion - so there's a definite downside to
doing so.

> Are you interested in at least having a discussion with the therion
> people and survex users about using a common 'v2' data format? The
> differences are so small it seems like something that should be
> attainable, and could significantly improve workflows and reduce the
> significant amount of time people spend munging data between
> formats. 

Perhaps, though are there actually any active developers of therion to
discuss this with?  None of the patches I've sent in the last couple of
years seem to have had any sort of response at all, and it seems Vlad
has had a similar experience with his patches.

If there's to be convergence, then realistically some of the movement
needs to be on the therion side, especially since the divergence was
entirely on that side, and without any prior discussion with us AFAICR.

But why are people "munging data between the formats" anyway?  The
workflow I use keeps the centreline data is in .svx format, which I
process to .3d, and then tell therion to use that for coordinates, e.g.:

    import ../../../loser/all.3d -surveys use

There's no need to have the centreline data in the .th file at all.
Perhaps we just need to promote this feature more if people aren't aware
it exists.

For me the most annoying difference is actually that the syntax used
for station names is different - instead of "cave.passage.1", therion
uses "1 at cave.passage" (thankfully you can mostly ignore this when using
"import" as above).  Using two different separators and making the order
non-hierarchical seems much less logical to me - I'd not be keen to
change that in Survex.

> There will be some tricky issues about backwards compatibility and
> whether change is worth it, but I think that if we are having some
> change anyway, it's a great opportunity to try and re-merge the two
> formats (where much of the difference is pretty gratuitous).

The idea I sketched out for a "v2" format was fundamentally one that
was mostly upwardly compatible with common use of the current format.

"," gets a different meaning by default, but I suspect most people
aren't relying on "," being treated the same as a space (at least not
intentionally!); long-deprecated commands go away, but they've resulted
in a warning for years (probably decades in some cases).  Enforcing the
documented format of commands like "*team", "*instrument", etc might be
a bit less smooth - while the expected format has been documented for
a long time, judging by the data points I have most people treat them as
taking free-form text currently - most of the CUCC data set doesn't
match the documented format, and topodroid's .svx export just plonks
a single quoted string in for *team.

But at least you can change your "*team" and "*instrument" lines to
actually match the documented format and the data would still work with
current releases.

Bundling in changes which introduce significant incompatibilities
would make for a format change that was very different in nature to
that which I sketched out above.

Cheers,
    Olly



More information about the Survex mailing list