2018-06-28 18:37:58 +01:00
|
|
|
<html>
|
|
|
|
<head>
|
|
|
|
<title>CUCC Expedition Handbook: Website History</title>
|
|
|
|
<link rel="stylesheet" type="text/css" href="../css/main2.css" />
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<h2 id="tophead">CUCC Expedition Handbook</h2>
|
2018-08-03 21:41:20 +01:00
|
|
|
<h1>EXPO Data Management History</h1>
|
2018-06-28 18:37:58 +01:00
|
|
|
|
2018-08-04 10:54:15 +01:00
|
|
|
<div style="text-align:left">
|
|
|
|
<!-- Comment
|
|
|
|
Justified text is hard to read:
|
|
|
|
https://designshack.net/articles/mobile/the-importance-of-designing-for-readability/
|
|
|
|
-->
|
2018-08-03 21:41:20 +01:00
|
|
|
<h2>History in review</h2>
|
|
|
|
<p>
|
|
|
|
Over 32 years, CUCC has developed methods for handling such information. Refinements in data
|
|
|
|
management were made necessary by improved quantity and quality of survey; but refinements in
|
|
|
|
data management also helped to drive those improvements. The first CUCC Austria expedition, in
|
|
|
|
1976, produced only Grade 1 survey for the most part (ref <a href="http://expo.survex.com/years/1977/report.htm">
|
|
|
|
Cambridge Underground 1977 report</a>). In
|
|
|
|
the 1980s, the use of programmable calculators to calculate survey point position from compass,
|
|
|
|
tape, and clinometer values helped convince expedition members to conduct precise surveys of
|
|
|
|
every cave encountered. Previously, such work required hours of slide rule or log table work. On
|
|
|
|
several expeditions, such processing was completed after the expedition by a FORTRAN program
|
|
|
|
running on shared mainframe time. BASIC programs running on personal computers took over with
|
|
|
|
the release of the BBC Micro and then the Acorn A4.
|
2018-06-28 18:37:58 +01:00
|
|
|
|
2018-08-03 21:41:20 +01:00
|
|
|
<p>In the 1990s, Olly Betts and Wookey began
|
|
|
|
work on "<a href="http://www.survex.com">Survex</a>", a
|
|
|
|
program in C for the calculation and 3-D visualization of centerlines, with
|
|
|
|
intelligent loop closure processing. Julian Todd's Java program "Tunnel" facilitated the
|
|
|
|
production of attractive, computerized passage sketches from Survex centerline data and scanned
|
|
|
|
hand-drawn notes.
|
2018-06-28 18:37:58 +01:00
|
|
|
|
2018-08-03 21:41:20 +01:00
|
|
|
<p>Along with centrelines and sketches, descriptions of caves were also affected by improvements
|
|
|
|
in data management. In a crucial breakthrough, Andrew Waddinton introduced the use of the
|
|
|
|
nascent markup language HTML to create an interlinked, navigable system of descriptions. Links
|
|
|
|
in HTML documents could mimic the branched and often circular structure of the caves themselves.
|
|
|
|
For example, the reader could now follow a link out of the main passage into a side passage, and
|
|
|
|
then be linked back into the main passage description at the point where the side passage
|
|
|
|
rejoined the main passage. This elegant use of technology enabled and encouraged expedition
|
|
|
|
members to better document their exploration.
|
2018-06-28 18:37:58 +01:00
|
|
|
|
2018-08-03 21:41:20 +01:00
|
|
|
<p>To organize all other data, such as lists of caves and their explorers, expedition members
|
|
|
|
eventually wrote a number of scripts which took spreadsheets (or comma separated value
|
|
|
|
files, .CSV ) of information and produced webpages in HTML. Other scripts also used information
|
|
|
|
from Survex data files. Web pages for each cave as well as the indexes which listed all of the
|
|
|
|
caves were generated by one particularly powerful script, <em>make-indxal4.pl</em> . The same data was
|
|
|
|
used to generate a prospecting map as a JPEG image. The system of automatically generating
|
|
|
|
webpages from data files reduced the need for repetitive manual HTML coding. Centralized storage
|
|
|
|
of all caves in a large .CSV file with a cave on each row made the storage of new information
|
|
|
|
more straightforward.
|
2018-06-28 18:37:58 +01:00
|
|
|
|
2018-08-03 21:41:20 +01:00
|
|
|
<p>Another important element of this system was version control. The entire data structure was
|
|
|
|
stored initially in a Concurrent Version System repository, and later migrated to
|
|
|
|
Subversion. Any edits to the spreadsheets which caused the scripts to fail, breaking the
|
|
|
|
website, could be easily reversed.
|
|
|
|
|
|
|
|
|
|
|
|
<p>However, not all types of data could be stored in spreadsheets or survey files. In order a
|
|
|
|
display descriptions on the webpage for an individual cave, the entire description, written in
|
|
|
|
HTML, had to be typed into a spreadsheet cell. A spreadsheet cell makes for an extremely awkward
|
|
|
|
HTML editing environment. To work around this project, descriptions for large caves were written
|
|
|
|
manually as a tree of HTML pages and then the main cave page only contained a link to them.
|
|
|
|
|
|
|
|
|
|
|
|
<p>A less obvious but more deeply rooted problem was the lack of relational information. One
|
|
|
|
table named <em>folk.csv</em> stored names of all expedition members, the years in which they were
|
|
|
|
present, and a link to a biography page. This was great for displaying a table of members by
|
|
|
|
expedition year, but what if you wanted to display a list of people who wrote in the logbook
|
|
|
|
about a certain cave in a certain expedition year? Theoretically, all of the necessary
|
|
|
|
information to produce that list has been recorded in the logbook, but there is no way to access
|
|
|
|
it because there is no connection between the person's name in <em>folk.csv</em> and the entries he wrote
|
|
|
|
in the logbook.
|
|
|
|
|
|
|
|
|
|
|
|
<p>The only way that relational information was stored in our csv files was by putting
|
|
|
|
references to other files into spreadsheet cells. For example, there was a column in the main
|
|
|
|
cave spreadsheet, <em>cavetab2.csv</em> , which contained the path to the QM list for each cave. The
|
|
|
|
haphazard nature of the development of the "script and spreadsheet" method meant that every cave
|
|
|
|
had an individual system for storing QMs. Without a standard system, it was sometimes unclear
|
|
|
|
how to correctly enter data.
|
|
|
|
|
|
|
|
<p><em>From "<a href="../../troggle/docsEtc/troggle_paper.odt" download>
|
|
|
|
Troggle: a novel system for cave exploration information management</a>", by Aaron Curtis, CUCC.</em>
|
|
|
|
<hr />
|
|
|
|
|
|
|
|
<h2>History in summary</h2>
|
|
|
|
|
|
|
|
<p>The CUCC Website, which publishes the cave data, was originally created by
|
|
|
|
Andy Waddington in the early 1990s and was hosted by Wookey.
|
|
|
|
|
|
|
|
The version control system was <a href="https://www.nongnu.org/cvs/">CVS</a>. The whole site was just static HTML, carefully
|
|
|
|
designed to be RISCOS-compatible (hence the short 10-character filenames)
|
|
|
|
as both Wadders and Wookey were <a href="https://en.wikipedia.org/wiki/RISC_OS">RISCOS"</a> people then (in the early 1990s).
|
|
|
|
Wadders wrote a huge amount of info collecting expo history, photos, cave data etc.</p>
|
|
|
|
|
|
|
|
<p>Martin Green added the <em>survtab.csv</em> file to contain tabulated data for many caves around 1999, and a
|
|
|
|
script to generate the index pages from it. Dave Loeffler added scripts and programs to generate the
|
|
|
|
prospecting maps in 2004. The server moved to Mark Shinwell's machine in the early
|
|
|
|
2000s, and the version control system was updated to <a href="https://subversion.apache.org/">subversion</a>.</p>
|
|
|
|
|
|
|
|
<p>In 2006 Aaron Curtis decided that a more modern set of generated, database-based pages
|
|
|
|
made sense, and so wrote Troggle.
|
|
|
|
This uses Django to generate pages.
|
|
|
|
This reads in all the logbooks and surveys and provides a nice way to access them, and enter new data.
|
|
|
|
It was separate for a while until Martin Green added code to merge the old static pages and
|
|
|
|
new troggle dynamic pages into the same site. Work on Troggle still continues sporadically.</p>
|
|
|
|
|
|
|
|
<p>After Expo 2009 the version control system was updated to hg (Mercurial),
|
|
|
|
because a distributed version control system makes a great deal of sense for expo
|
|
|
|
(where it goes offline for a month or two and nearly all the year's edits happen).</p>
|
|
|
|
|
|
|
|
<p>The site was moved to Julian Todd's seagrass server (in 2010),
|
|
|
|
but the change from a 32-bit to 64-bit machine broke the website autogeneration code,
|
2018-06-28 18:37:58 +01:00
|
|
|
which was only fixed in early 2011, allowing the move to complete. The
|
2018-08-03 21:41:20 +01:00
|
|
|
data was split into separate repositories: the website,
|
2018-06-28 18:37:58 +01:00
|
|
|
troggle, the survey data, the tunnel data. Seagrass was turned off at
|
|
|
|
the end of 2013, and the site has been hosted by Sam Wenham at the
|
2018-08-03 21:41:20 +01:00
|
|
|
university since Feb 2014. In 2018 we have 4 repositories, see <a href="update.htm">the website manual</a></p>.
|
|
|
|
|
|
|
|
<p>In spring 2018 Sam, Wookey and Paul Fox updated the Linux version and the Django version to
|
|
|
|
something vaguely acceptable to the university computing service and fixed all the problems that were then observed.
|
2018-08-04 10:54:15 +01:00
|
|
|
</div>
|
2018-06-28 18:37:58 +01:00
|
|
|
Return to<br>
|
|
|
|
<a href="update.html">Website update</a><br>
|
|
|
|
<a href="expodata.html">Website developer information</a><br>
|
|
|
|
|
|
|
|
<hr>
|