Spud - a broader definition of "survey data"?

Wookey wookey@aleph1.co.uk
Mon, 18 Dec 2000 12:57:24 +0000 (GMT)


On Sun 17 Dec, Olly Betts wrote:

> Now CUCC's Austria expedition isn't the only survey project with a website

> (And I'd be interested to hear about others people know of).

The biggest equivalent site I am aware of is Matienzo, which essentially has
a similar set of information-management problems:

http://www.lancs.ac.uk/staff/gyaaq/intro.htm

This expedition has used a home-grown database for many years and Iassume
this has been webified to some extent. cc:ed to Juan to tell us what he has
learn about doing this over the years (Juan, we are discussing whether/how
the next generation survex software should expand beyond just 'survey data'
to cover the whole area of producing a (potentially large) cave
website/database)

> It's not clear to me what form this mechanism might take, but as a simple
> example perhaps you can attach arbitrary data (e.g. marked-up text, images,
> video clips, flags to signal "full explored", "still going", etc) to
> stations, surveys, caves, etc, and then spud would collate all these and
> make them available to filters which produced the actual web pages (perhaps
> either on demand, or in one pass).

> I'm wondering if other people have thought about these issues, and perhaps
> even written software already.

I think the answer is 'yes, sort of'. Both SMAPS and compass provide a simple
but effective mechanism to associate survey stations with database items. (It
was done in DBase III last time I looked, but compass may well have moved on
by now. The principle is siple and generic and could work fine for attaching
arbitary data to stations.

The other resource that is of use is ths UIS cave database spec. This is a
set of flags and labels for all speleological data which can be used to
generate a database that can be merged and exchanged between different
countries and languages, which is important if you want to put your cave
database together with someone elses. This is largely the work of Peter
Matthews.

http://rubens.its.unimelb.edu.au/~pgm/uisic/

I have considered this problem somewhat already wrt to the CUCC database and
got as far as downloading all the UIS guff to look it (it's incredibly dull
:-) and realising that the problem is difficult to solve well.

As with all database projects the devil is in the detailed database design.
For example if you just consider the case of generating a list of caves along
with their names, identifiers, entrance co-ordinates, exploration status and
survey status. This seems easy at first sight, but as soon as you start
putting real-world data in it starts to get complicated.

Firstly we have caves with multiple entrances so you need to mark the cave
that an entrance occurs in in order to list it in the right place (you can't
necessarily rely on this being inferrable from the name). An entrance can be
deemed to be an entrance to more than one cave in an area like CUCC's. How do
we describe this connectivity in the database? The austrian's have a system
where entrances are 'absorbed' by the 'lowest-numbered' cave - so this is
probably just a case of alternative entrance names, but it requires some
thought to get right. In CUCC's case caves may have several designations,
either due to rediscoverey by different groups or due to temporary number
allocation before a real one is available so you need to index things so that
it is clear what is what. Where a 'proper number' has not been allocated then
what do you sort the list on?

These examples are quite CUCC-specific and solutions could be found -
involving sound database design with robust indexing and CUCC=specific
reporting software which understands how to generate a sensible list from the
database - software to do sensible reporting already exists. However, i
assume that any project will have a set of such issues so designing a generic
solution could be a challenge. We have the advantage that we should be able
to make use of a great deal of pre-existing database/reporting software
leaving a manageable amount left for us to do.

The other problem is, assuming we are going to connect cave data to survey
stations, how is data about the cave spatially distributed? And is it linked
both ways? A lot of cave data is not really spatial (cave name, 'surface
route description exists'). Some is somewhat spatial: (route descriptions,
photos), but a route description covers a range of stations, at junctions a
stations will be covered by more than one section of description. Photos may
be at a station, and lookng towards another, or it may not ber anywhere near
a station. Should all photos be associated with a station, perhaps defaulting
to the entrance point where there isn't a sensible one? It is useful to query
the database to find out if the cave is surveyed of photoed. Are we going to
do this by searching the structure for any photos attached to stations, or
survey data indexed for this cave, or are we going to have separate fields
that someone should update? You need to be able to avoid lengthy searches of
the whole survey structure for station-with-photos (and the like) every time
you want to generate a list of caves-with-status. This requires the
station-photo info to be linked 'both ways' (database people probably have a
names for this) so you can look for photos and find their stations or statons
and find their photos. You want this to work even when there is no survey but
there are some photos with nowhere to 'hang' yet.

Overall I'm not certain that the data should hang from the survey at all.
Isuspect it should be primarily set up by 'cave' or 'cave segment' and
surveys will hang from that, with survey stations often being associated with
data items to indicate position where appropriate. Big concepts like this
need to be decided at the begining.

You can make a very simple database that sort of does the job that
essentially has the current web pages 'as is' attached to a cave reference,
but that doesn't really solve the maintainance problem. The more you split up
the info the more versatile it gets, but the more complicated the database,
and the reporting software to stick it back together. The huge difference in
scale between caves makes the problem harder. ie splitting things up 'by
cave' works fine for 100m long caves, but for 20km long caves it is useless
and you need increasing levels of subdivision to make the data manageable.
I'm not quite sure how to express this in database terms.

Anyway, that's enough for now. (I'm turning into waddington!). The main
points are:
1) yes I think this is the way survey software must develop.
2) The UIS database is a good starting point for database definitions as it
   provides commonality and should prevent some wheel-reinvention.
3) The new UIS XML commission (or whatever it called itself) is probably
considering many related issues and we should work with them.
   (I had tentatively agreed to research this subject as a 'UIS' thing, but
had got nowhere at all with doing it. I don't know if the XML thing now
superceedes this effort or not?)
4) My efforts so far strongly suggest to me that the only way to do this is
to read the relevant guff then try designing something around a known dataset
like CUCC's which will provide you with plenty of difficult cases and
counterexamples producing a farily robust design. Input from other groups
like matienzo on the problems will also help make something that works.

Wookey
-- 
Aleph One Ltd, Bottisham, CAMBRIDGE, CB5 9BA, UK  Tel (00 44) 1223 811679
work: http://www.aleph1.co.uk/     play: http://www.chaos.org.uk/~wookey/