expoweb/handbook/website-history.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>CUCC Expedition Handbook: Website History</title>
<link rel="stylesheet" type="text/css" href="../css/main2.css" />
</head>
<body>
<h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>EXPO Data Management History</h1>

<div style="text-align:left">
<!-- Comment 
Justified text is hard to read:
https://designshack.net/articles/mobile/the-importance-of-designing-for-readability/
https://designforhackers.com/blog/justify-text-html-css/
-->
<h2>Early history</h2>
<p>
Over 42 years, CUCC has developed methods for handling such information. Refinements in data 
management were made necessary by improved quantity and quality of survey; but refinements in 
data management also helped to drive those improvements. The first CUCC Austria expedition, in 
1976, produced only Grade 1 survey for the most part (ref <a href="http://expo.survex.com/years/1977/report.htm">
Cambridge Underground 1977 report</a>). 
<p>In 
the 1980s, the use of programmable calculators to calculate survey point position from compass, 
tape, and clinometer values helped convince expedition members to conduct precise surveys of 
every cave encountered. Previously, such work required hours of slide rule or log table work. On 
several expeditions, such processing was completed after the expedition by a FORTRAN program 
running on shared mainframe time. BASIC programs running on personal computers took over with 
the release of the BBC Micro and then the Acorn A4. A full history of this period is described in
<a href="c21bs.html">Taking Expo Bullshit into the 21st Century</a> - a story of the data management system up to Spring 1996. [This was less than five years after Tim Berners-Lee published the world's very first web page on 6th August 1991. So the expo website is nearly as old as the web itself.]

<h3>Survex - cave surveying</h3>
<p>In the 1990s, Olly Betts and Wookey began 
work on "<a href="computing/getsurvex.html">Survex</a>", a 
program in C for the calculation and 3-D visualization of centerlines, with 
intelligent loop closure processing. Julian Todd's Java program "Tunnel" facilitated the 
production of attractive, computerized passage sketches from Survex centerline data and scanned 
hand-drawn notes. 
A <a href="survexhistory96.htm">history of survex</a> article covering the period 1988-1996 was published in Cambridge Underground 1996. 

<h3>Initial cave data management</h3>
<p>Along with centrelines and sketches, descriptions of caves were also affected by improvements 
in data management. In a crucial breakthrough, Andrew Waddinton introduced the use of the 
nascent markup language HTML to create an interlinked, navigable system of descriptions (see <a href="c21bs.html">"Expo Bullshit"</a>). Links 
in HTML documents could mimic the branched and often circular structure of the caves themselves. 
For example, the reader could now follow a link out of the main passage into a side passage, and 
then be linked back into the main passage description at the point where the side passage 
rejoined the main passage. This elegant use of technology enabled and encouraged expedition 
members to better document their exploration.

<p>To organize all other data, such as lists of caves and their explorers, expedition members 
eventually wrote a number of scripts which took spreadsheets (or comma separated value 
files, .CSV ) of information and produced webpages in HTML. Other scripts also used information 
from Survex data files. Web pages for each cave as well as the indexes which listed all of the 
caves were generated by one particularly powerful script, <em>make-indxal4.pl</em> . The same data was 
used to generate a prospecting map as a JPEG image. The system of automatically generating 
webpages from data files reduced the need for repetitive manual HTML coding. Centralized storage 
of all caves in a large .CSV file with a cave on each row made the storage of new information 
more straightforward.

<h3>Version control</h3>
<p>Another important element of this system was version control. The entire data structure was 
stored initially in a <a href="https://en.wikipedia.org/wiki/Concurrent_Versions_System">Concurrent Version System</a> repository, and later migrated to 
<a href="https://en.wikipedia.org/wiki/Apache_Subversion">Subversion</a> [<em>now using a <a href="computing/onlinesystems.html#mercurial">DVCS</a> in 2019</em>]. 
Any edits to the spreadsheets which caused the scripts to fail, breaking the 
website, could be easily reversed.

<h3>Other types of data</h3>
<p>However, not all types of data could be stored in spreadsheets or survey files. In order a 
display descriptions on the webpage for an individual cave, the entire description, written in 
HTML, had to be typed into a spreadsheet cell. A spreadsheet cell makes for an extremely awkward 
HTML editing environment. To work around this project, descriptions for large caves were written 
manually as a tree of HTML pages and then the main cave page only contained a link to them.


<p>A less obvious but more deeply rooted problem was the lack of relational information. One 
table named <em>folk.csv</em> stored names of all expedition members, the years in which they were 
present, and a link to a biography page. This was great for displaying a table of members by 
expedition year, but what if you wanted to display a list of people who wrote in the logbook 
about a certain cave in a certain expedition year? Theoretically, all of the necessary 
information to produce that list has been recorded in the logbook, but there is no way to access 
it because there is no connection between the person's name in <em>folk.csv</em> and the entries he wrote 
in the logbook.


<p>The only way that relational information was stored in our csv files was by putting 
references to other files into spreadsheet cells. For example, there was a column in the main 
cave spreadsheet, <em>cavetab2.csv</em> , which contained the path to the QM list for each cave. The 
haphazard nature of the development of the "script and spreadsheet" method meant that every cave 
had an individual system for storing QMs. Without a standard system, it was sometimes unclear 
how to correctly enter data. 

<p><em>From "<a href="../../troggle/docsEtc/troggle_paper.odt" download>
Troggle: a novel system for cave exploration information management</a>", by Aaron Curtis, CUCC [with some additions]</em>
<hr />

<h2>History in summary</h2>

<p>The CUCC Website, which publishes the cave data, was originally created by 
Andy Waddington in the early 1990s and was hosted by Wookey. 

The version control system was <a href="https://www.nongnu.org/cvs/">CVS</a>. The whole site was just static HTML, carefully 
designed to be RISCOS-compatible (hence the short 10-character filenames) 
as both Wadders and Wookey were <a href="https://en.wikipedia.org/wiki/RISC_OS">RISCOS"</a> people then (in the early 1990s). 
Wadders wrote a huge amount of info collecting expo history, photos, cave data etc.</p>

<p>Martin Green added the <em>survtab.csv</em> file to contain tabulated data for many caves around 1999, and a 
script to generate the index pages from it. Dave Loeffler added scripts and programs to generate the 
prospecting maps in 2004. The server moved to Mark Shinwell's machine in the early 
2000s, and the version control system was updated to <a href="https://subversion.apache.org/">subversion</a>.</p>

<p>In 2006 Aaron Curtis decided that a more modern set of generated, database-based pages 
made sense, and so wrote Troggle. 
This uses Django to generate pages. 
This reads in all the logbooks and surveys and provides a nice way to access them, and enter new data. 
It was separate for a while until Martin Green added code to merge the old static pages and 
new troggle dynamic pages into the same site. This is now the live system running everything (in 2019). Work on developing Troggle further still continues (see <a href="troggle/trogintro.html">Troggle intro</a>).</p>

<p>After Expo 2009 the version control system was updated to a <a href="computing/onlinesystems.html#mercurial">DVCS</a> (Mercurial, aka 'hg'), 
because a distributed version control system makes a great deal of sense for expo 
(where it goes offline for a month or two and nearly all the year's edits happen).</p>

<p>The site was moved to Julian Todd's seagrass server (in 2010), 
but the change from a 32-bit to 64-bit machine broke the website autogeneration code,
which was only fixed in early 2011, allowing the move to complete. The
data was split into separate repositories: the website,
troggle, the survey data, the tunnel data. Seagrass was turned off at
the end of 2013, and the site has been hosted by Sam Wenham at the
university since Feb 2014. 


<h3>2018</h3>
<p>In 2018 we had 4 repositories, 2 mercurial, 2 git

<ul>
 <li><a href="/hgrepositories/home/expo/loser/graph/">loser</a> - the survex cave survey data (hg)</li>
 <li><a href="/repositories/drawings/.git/log">drawings</a> - the tunnel and therion cave data and drawings (git)</li>
 <li><a href="/hgrepositories/home/expo/expoweb/graph">expoweb</a> - the website pages, handbook, generation scripts (hg)</li>
 <li><a href="/repositories/troggle/.git/log">troggle</a> - the database/software part of the survey data management system - see <a href="troggle/trogintro.html">notes on troggle</a> for further explanation (git)</li>
</ul>

<p>In spring 2018 Sam, Wookey and Paul Fox updated the  Linux version and the Django version (i.e. troggle) to 
something vaguely acceptable to the university computing service and fixed all the problems that were then observed.

<h3>More recent</h3>
<p>
For the current situation see <a href="troggle/trogstatus.html">expo systems status</a>.
<hr />

Return to<br />
<a href="computing/onlinesystems.html">expo online systems overbiew</a><br />

<hr />
</body>
</html>
fix and moving a few pages into the handbook snf fixing internal links. Soft links probably not working when created with Windows. Will fix later. 2019-12-23 16:29:50 +00:00			`<!DOCTYPE html>`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00			`<html>`
			`<head>`
fix and moving a few pages into the handbook snf fixing internal links. Soft links probably not working when created with Windows. Will fix later. 2019-12-23 16:29:50 +00:00			`<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00			`<title>CUCC Expedition Handbook: Website History</title>`
			`<link rel="stylesheet" type="text/css" href="../css/main2.css" />`
			`</head>`
			`<body>`
			`<h2 id="tophead">CUCC Expedition Handbook</h2>`
Data management updates 2018-08-03 21:41:20 +01:00			`<h1>EXPO Data Management History</h1>`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00
Left justified large blocks are easier to read 2018-08-04 10:54:15 +01:00			`<div style="text-align:left">`
			`<!-- Comment`
			`Justified text is hard to read:`
			`https://designshack.net/articles/mobile/the-importance-of-designing-for-readability/`
renamed a prospecting page, added logbook 2018-08-04 12:17:54 +01:00			`https://designforhackers.com/blog/justify-text-html-css/`
Left justified large blocks are easier to read 2018-08-04 10:54:15 +01:00			`-->`
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`<h2>Early history</h2>`
Data management updates 2018-08-03 21:41:20 +01:00			`<p>`
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`Over 42 years, CUCC has developed methods for handling such information. Refinements in data`
Data management updates 2018-08-03 21:41:20 +01:00			`management were made necessary by improved quantity and quality of survey; but refinements in`
			`data management also helped to drive those improvements. The first CUCC Austria expedition, in`
			`1976, produced only Grade 1 survey for the most part (ref <a href="http://expo.survex.com/years/1977/report.htm">`
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`Cambridge Underground 1977 report</a>).`
			`<p>In`
Data management updates 2018-08-03 21:41:20 +01:00			`the 1980s, the use of programmable calculators to calculate survey point position from compass,`
			`tape, and clinometer values helped convince expedition members to conduct precise surveys of`
			`every cave encountered. Previously, such work required hours of slide rule or log table work. On`
			`several expeditions, such processing was completed after the expedition by a FORTRAN program`
			`running on shared mainframe time. BASIC programs running on personal computers took over with`
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`the release of the BBC Micro and then the Acorn A4. A full history of this period is described in`
expo website bullshit 2020-01-10 01:59:36 +00:00			`<a href="c21bs.html">Taking Expo Bullshit into the 21st Century</a> - a story of the data management system up to Spring 1996. [This was less than five years after Tim Berners-Lee published the world's very first web page on 6th August 1991. So the expo website is nearly as old as the web itself.]`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`<h3>Survex - cave surveying</h3>`
Data management updates 2018-08-03 21:41:20 +01:00			`<p>In the 1990s, Olly Betts and Wookey began`
fixing links and adding cross-references 2020-03-01 20:18:46 +00:00			`work on "<a href="computing/getsurvex.html">Survex</a>", a`
Data management updates 2018-08-03 21:41:20 +01:00			`program in C for the calculation and 3-D visualization of centerlines, with`
			`intelligent loop closure processing. Julian Todd's Java program "Tunnel" facilitated the`
			`production of attractive, computerized passage sketches from Survex centerline data and scanned`
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`hand-drawn notes.`
			`A <a href="survexhistory96.htm">history of survex</a> article covering the period 1988-1996 was published in Cambridge Underground 1996.`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`<h3>Initial cave data management</h3>`
Data management updates 2018-08-03 21:41:20 +01:00			`<p>Along with centrelines and sketches, descriptions of caves were also affected by improvements`
			`in data management. In a crucial breakthrough, Andrew Waddinton introduced the use of the`
More links to historical stuff and troggle docm to do list 2019-02-26 18:19:16 +00:00			`nascent markup language HTML to create an interlinked, navigable system of descriptions (see <a href="c21bs.html">"Expo Bullshit"</a>). Links`
Data management updates 2018-08-03 21:41:20 +01:00			`in HTML documents could mimic the branched and often circular structure of the caves themselves.`
			`For example, the reader could now follow a link out of the main passage into a side passage, and`
			`then be linked back into the main passage description at the point where the side passage`
			`rejoined the main passage. This elegant use of technology enabled and encouraged expedition`
			`members to better document their exploration.`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00
Data management updates 2018-08-03 21:41:20 +01:00			`<p>To organize all other data, such as lists of caves and their explorers, expedition members`
			`eventually wrote a number of scripts which took spreadsheets (or comma separated value`
			`files, .CSV ) of information and produced webpages in HTML. Other scripts also used information`
			`from Survex data files. Web pages for each cave as well as the indexes which listed all of the`
			`caves were generated by one particularly powerful script, <em>make-indxal4.pl</em> . The same data was`
			`used to generate a prospecting map as a JPEG image. The system of automatically generating`
			`webpages from data files reduced the need for repetitive manual HTML coding. Centralized storage`
			`of all caves in a large .CSV file with a cave on each row made the storage of new information`
			`more straightforward.`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`<h3>Version control</h3>`
Data management updates 2018-08-03 21:41:20 +01:00			`<p>Another important element of this system was version control. The entire data structure was`
Setting up your own laptop with all the software on the expo laptop 2019-12-04 23:39:41 +00:00			`stored initially in a <a href="https://en.wikipedia.org/wiki/Concurrent_Versions_System">Concurrent Version System</a> repository, and later migrated to`
Moving online systems overview and computing manual pages 2020-04-09 19:13:18 +01:00			`<a href="https://en.wikipedia.org/wiki/Apache_Subversion">Subversion</a> [<em>now using a <a href="computing/onlinesystems.html#mercurial">DVCS</a> in 2019</em>].`
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`Any edits to the spreadsheets which caused the scripts to fail, breaking the`
Data management updates 2018-08-03 21:41:20 +01:00			`website, could be easily reversed.`

Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`<h3>Other types of data</h3>`
Data management updates 2018-08-03 21:41:20 +01:00			`<p>However, not all types of data could be stored in spreadsheets or survey files. In order a`
			`display descriptions on the webpage for an individual cave, the entire description, written in`
			`HTML, had to be typed into a spreadsheet cell. A spreadsheet cell makes for an extremely awkward`
			`HTML editing environment. To work around this project, descriptions for large caves were written`
			`manually as a tree of HTML pages and then the main cave page only contained a link to them.`


			`<p>A less obvious but more deeply rooted problem was the lack of relational information. One`
			`table named <em>folk.csv</em> stored names of all expedition members, the years in which they were`
			`present, and a link to a biography page. This was great for displaying a table of members by`
			`expedition year, but what if you wanted to display a list of people who wrote in the logbook`
			`about a certain cave in a certain expedition year? Theoretically, all of the necessary`
			`information to produce that list has been recorded in the logbook, but there is no way to access`
			`it because there is no connection between the person's name in <em>folk.csv</em> and the entries he wrote`
			`in the logbook.`


			`<p>The only way that relational information was stored in our csv files was by putting`
			`references to other files into spreadsheet cells. For example, there was a column in the main`
			`cave spreadsheet, <em>cavetab2.csv</em> , which contained the path to the QM list for each cave. The`
			`haphazard nature of the development of the "script and spreadsheet" method meant that every cave`
			`had an individual system for storing QMs. Without a standard system, it was sometimes unclear`
			`how to correctly enter data.`

			`<p><em>From "<a href="../../troggle/docsEtc/troggle_paper.odt" download>`
Making links so that it all hangs together 2019-02-26 16:39:52 +00:00			`Troggle: a novel system for cave exploration information management</a>", by Aaron Curtis, CUCC [with some additions]</em>`
Data management updates 2018-08-03 21:41:20 +01:00			`<hr />`

			`<h2>History in summary</h2>`

			`<p>The CUCC Website, which publishes the cave data, was originally created by`
			`Andy Waddington in the early 1990s and was hosted by Wookey.`

			`The version control system was <a href="https://www.nongnu.org/cvs/">CVS</a>. The whole site was just static HTML, carefully`
			`designed to be RISCOS-compatible (hence the short 10-character filenames)`
			`as both Wadders and Wookey were <a href="https://en.wikipedia.org/wiki/RISC_OS">RISCOS"</a> people then (in the early 1990s).`
			`Wadders wrote a huge amount of info collecting expo history, photos, cave data etc.</p>`

			`<p>Martin Green added the <em>survtab.csv</em> file to contain tabulated data for many caves around 1999, and a`
			`script to generate the index pages from it. Dave Loeffler added scripts and programs to generate the`
			`prospecting maps in 2004. The server moved to Mark Shinwell's machine in the early`
			`2000s, and the version control system was updated to <a href="https://subversion.apache.org/">subversion</a>.</p>`

			`<p>In 2006 Aaron Curtis decided that a more modern set of generated, database-based pages`
			`made sense, and so wrote Troggle.`
			`This uses Django to generate pages.`
			`This reads in all the logbooks and surveys and provides a nice way to access them, and enter new data.`
			`It was separate for a while until Martin Green added code to merge the old static pages and`
Restructuring troggle notes and creation of troggle documentation directory in the handbook 2020-04-02 14:25:58 +01:00			`new troggle dynamic pages into the same site. This is now the live system running everything (in 2019). Work on developing Troggle further still continues (see <a href="troggle/trogintro.html">Troggle intro</a>).</p>`
Data management updates 2018-08-03 21:41:20 +01:00
Moving online systems overview and computing manual pages 2020-04-09 19:13:18 +01:00			`<p>After Expo 2009 the version control system was updated to a <a href="computing/onlinesystems.html#mercurial">DVCS</a> (Mercurial, aka 'hg'),`
Data management updates 2018-08-03 21:41:20 +01:00			`because a distributed version control system makes a great deal of sense for expo`
			`(where it goes offline for a month or two and nearly all the year's edits happen).</p>`

			`<p>The site was moved to Julian Todd's seagrass server (in 2010),`
			`but the change from a 32-bit to 64-bit machine broke the website autogeneration code,`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00			`which was only fixed in early 2011, allowing the move to complete. The`
Data management updates 2018-08-03 21:41:20 +01:00			`data was split into separate repositories: the website,`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00			`troggle, the survey data, the tunnel data. Seagrass was turned off at`
			`the end of 2013, and the site has been hosted by Sam Wenham at the`
Renaming update.htm as onlinesystems.html and fixing all links 2019-02-25 22:38:46 +00:00			`university since Feb 2014.`
moving archaic stuff 2019-07-17 13:44:38 +01:00


			`<h3>2018</h3>`
Update repository URLs and checkout info now expoweb is in git 2020-04-22 01:39:14 +01:00			`<p>In 2018 we had 4 repositories, 2 mercurial, 2 git`
moving archaic stuff 2019-07-17 13:44:38 +01:00
			`<ul>`
Update repository URLs and checkout info now expoweb is in git 2020-04-22 01:39:14 +01:00			`<li><a href="/hgrepositories/home/expo/loser/graph/">loser</a> - the survex cave survey data (hg)</li>`
			`<li><a href="/repositories/drawings/.git/log">drawings</a> - the tunnel and therion cave data and drawings (git)</li>`
			`<li><a href="/hgrepositories/home/expo/expoweb/graph">expoweb</a> - the website pages, handbook, generation scripts (hg)</li>`
			`<li><a href="/repositories/troggle/.git/log">troggle</a> - the database/software part of the survey data management system - see <a href="troggle/trogintro.html">notes on troggle</a> for further explanation (git)</li>`
moving archaic stuff 2019-07-17 13:44:38 +01:00			`</ul>`

Adding a page of consolidated troggle notes and hopes 2019-02-25 20:54:07 +00:00			`<p>In spring 2018 Sam, Wookey and Paul Fox updated the Linux version and the Django version (i.e. troggle) to`
Data management updates 2018-08-03 21:41:20 +01:00			`something vaguely acceptable to the university computing service and fixed all the problems that were then observed.`
update history 2019-07-10 15:46:25 +01:00
Restructuring troggle notes and creation of troggle documentation directory in the handbook 2020-04-02 14:25:58 +01:00			`<h3>More recent</h3>`
moving archaic stuff 2019-07-17 13:44:38 +01:00			`<p>`
Restructuring troggle notes and creation of troggle documentation directory in the handbook 2020-04-02 14:25:58 +01:00			`For the current situation see <a href="troggle/trogstatus.html">expo systems status</a>.`
Completing fomratting for for <!DOCTYPE html> to make sure ann <br> and <hr> are now conformant 2020-03-05 23:23:41 +00:00			`<hr />`
moving archaic stuff 2019-07-17 13:44:38 +01:00
Completing fomratting for for <!DOCTYPE html> to make sure ann <br> and <hr> are now conformant 2020-03-05 23:23:41 +00:00			`Return to<br />`
Moving online systems overview and computing manual pages 2020-04-09 19:13:18 +01:00			`<a href="computing/onlinesystems.html">expo online systems overbiew</a><br />`
Making the website manual parts of the handbook more useable 2018-06-28 18:37:58 +01:00
Completing fomratting for for <!DOCTYPE html> to make sure ann <br> and <hr> are now conformant 2020-03-05 23:23:41 +00:00			`<hr />`
update history 2019-07-10 15:46:25 +01:00			`</body>`
moving archaic stuff 2019-07-17 13:44:38 +01:00			`</html>`