survex issues on Chinese Windows

Olly Betts olly at survex.com
Fri Aug 19 03:10:15 BST 2016


On Thu, Aug 18, 2016 at 10:31:24AM +0100, Wookey wrote:
> The thing that has caused this to become a live issue is topodroid.
> You enter a survey name in that, and that's the string it uses in the
> the *begin. It's natural to enter the survey name in Chinese if you
> are Chinese, and the Topodroid and Therion part works fine, but Survex
> doesn't accept it.

As things stand, that's a bug in Topodroid's export feature, as it's
clearly documented what you can use by default in a survey station name.

> > Once we know the encoding we could do a full-Unicode "is
> > alphanumeric" (or just treat non-ASCII values as valid in names
> > perhaps).
> 
> Anything except the separator and control characters is arguably
> acceptable, but keeping smileys and wierd 'lookalike' characters out
> of surveynames is probably a good thing, so "is alphanumeric" seems
> like a good test (May not exclude the wierd chars I suppose). Do all
> unicode codepoints have a flag to this effect?

All codepoints are categorised, though the contents of the categories
varies by Unicode version - new characters get added with each new
Unicode version, and sometimes the category of a character is changed.
For example, U+19B7 (New Tai Lue Vowel Sign O) was added in 4.1 as
"Space Combining Mark" but Unicode 6.0 changed it to be "Other Letter".

It seems awkward for the acceptable characters in a survey name to
depend on the Unicode version being used by the build of Survex you
process it with (especially as it's not a simple upwardly compatible
addition of more characters with time).  That's mostly why I wondered
about just allowing anything outside of ASCII.  The main downside is
some are whitespace (e.g. U+A0 - aka   in HTML).

Cheers,
    Olly



More information about the Survex mailing list