expoweb/handbook/manual.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>CUCC Expedition Handbook: Programmers manual</title>
<link rel="stylesheet" type="text/css" href="../css/main2.css" />
</head>
<body>
<h2 id="tophead">CUCC Expedition Handbook - Online systems</h2>
<h1>Expo Data Maintenance Manual</h1>

<h2><a id="manual">Expo data management programmers' manual</a></h2>

<ul>
<li>This page is <i>not</i> for cavers wanting to know how to record their cave survey data.
<li>This page is <i>not</i> for cavers wanting to know how to type in logbooks or upload photographs.

<li>This page <i>is for programmers</i> who are helping cavers do their thing and <a href="#yourownlaptop">setting up their own laptop</a>.
</ul>

<p>Editing the expo data management system is an adventure.  Learning
it by trial and error is non-trivial. There are lots of things we
could improve about the system, and anyone with some computer nous is
very welcome to muck in. It is slowly getting better organised.</p>

<p>This manual is organized in a how-to sort of style. The categories,
rather than referring to specific elements of the data management system, refer to
processes that a maintainer would want to do.</p>
<p>Note that to display the survey data you will need a copy of the <a href="getsurvex.html">survex</a> software.

<p>Follow these links if you have reached this page by accident and this is what you want to know:
<ul>
<li><a href="uploading.html">How to upload photos</a></li>
<li><a href="logbooks.html">Typing in logbook entries</a></li>
<li><a href="gpxupload.html">Recording the GPS location of a cave</a></li>
<li><a href="survey/index.htm">How to do cave surveying</a></li>
<li><a href="index.htm">List of "How to" pages for everything else</a></li>
</ul>

<h3>Contents of this manual</h3>

<ol>
<li><a href="#usernamepassword">Getting a username, password and key</a></li>
<li><a href="#repositories">The repositories</a></li>
<li><a href="#howitworks">How the data management system works</a></li>
<li><a href="#yourownlaptop">Your own laptop</a></li>
<li><a href="#quickstart">Quick start</a></li>
<li><a href="#editingthedata management system">Modifying the data management system</a></li>
<li><a href="#expowebupdate">The expoweb-update script</a></li>
<li><a href="#cavepages">Updating cave pages</a></li>
<li><a href="#updatingyears">Updating expo year pages</a></li>
<li><a href="#surveystatus">Maintaining the survey status table</a></li>
<li><a href="#automation">Automation</a></li>

</ol>
Appendices:
<ul>
<li><a href="website-history.html">Website history</a> - a history of the data management system up to 2019</li>
<li><a href="c21bs.html">Taking Expo Bullshit into the 21st Century</a> - initial report from 1996</li>
</ul>

<h3><a id="usernamepassword">Getting a username, password and key</a></h3>

<p>You don't need a password to view most things, but you will need one to change them.</p>

<p>Use these credentials for access to the troggle site. The user is 'expo',
  with a cavey:beery password. Ask someone if this isn't enough clue for you.
  <b>This password is important for security</b>. The whole site <strong>will</strong> get hacked by spammers or worse if you are not careful with it. Use a secure method for passing it on to others that need to know (i.e not unencrypted email), don't publish it anywhere, don't check it in to the data management system by accident. A lot of people use it and changing it is a pain for everyone so do take a bit of care.
</p>

<p>This password is all you need to log in to troggle and to use the troggle control panel (very few people need to do this). But if you want to update webpages (a much more common requirement) or to edit the software itself (very rare), then
you will also need to get a cryptographic key and register it with the server. See <a href="computing/keyexchange.html">key exchange</a> for details.

<p>Unfortunately, pushing cave data to the ::loser:: and ::drawings:: repositories also needs a key. So cavers entering their cave survey data
currently have to use a machine on which this already set up. These machines are
the <i>expo laptop</i> and the laptop '<i>aziraphale</i>' which live in the potato hut during expo. If you want to use your own laptop then
see <a href="#yourownlaptop">below</a>.


<h3><a id="repositories">The repositories</a></h3>

<p>All the expo data is contained in 4 "repositories" at
expo.survex.com. This is currently hosted on a free virtual server we have blagged on a server farm.
We use a distributed version control system (DVCS) to manage these repositories because this allows simultaneous collaborative
editing and keeps track of all changes so we can roll back and have branches if needed.</p>

<p>The site has been split into four parts:</p>

<ul>
 <li><a href="/repositories/home/expo/loser/graph/">loser</a> - the survex cave survey data (hg)</li>
 <li><a href="/cgit/drawings/.git/log">drawings</a> - the tunnel and therion cave data and drawings (git)</li>
 <li><a href="/repositories/home/expo/expoweb/graph">expoweb</a> - the website pages, handbook, generation scripts (hg)</li>
 <li><a href="/cgit/troggle/.git/log">troggle</a> - the database/software part of the survey data management system - see <a href="computing/troggle-ish.html">notes on troggle</a> for further explanation (git)</li>
</ul>
<p>We have migrated two of these to git but the other two still use mercurial.

<h4>Mercurial Website Hack 2019</h4>
<p> Currently (December 2019) after commiting and pushing your changes to expoweb to the mercurial server, you will need to
login to expo.survex.com using ssh, cd to /expoweb/ and issue a "<a href="https://www.selenic.com/mercurial/hg.1.html">hg update</a>" command to make your changes noticed by the webserver. This problem will go away before Expo 2020 - we hope - when we finish migrating from mercurial to git.

<p>All the scans, photos, presentations, fat documents and videos are
stored just as files (not in version control) in 'expofiles'. See
below for details on that.</p>

<h3><a id="howitworks">How the data management system works</a></h3>

<p>Part of the data management system is static HTML, but quite a lot is generated by scripts and troggle (a web framework built using Django).
<p>Examples of troggle-generated pages from data:
<ul>
<li><a href="http://expo.survex.com/caves">expo.survex.com/caves</a> - list of caves surveyed and links to guidebook descriptions
<li><a href="http://expo.survex.com/pubs.htm">expo.survex.com/pubs.htm</a> -  reports, accounts and logbooks
<li><a href="http://expo.survex.com/expedition/2018">expo.survex.com/expedition/2018</a> - Members on expo 2018: . Scroll down for a list of all the data typed in from survey trips.
<li><a href="http://expo.survex.com/survexfile/caves/">expo.survex.com/survexfile/caves/</a> - List of caves with all the surveys done for each.
<li><a href="http://expo.survex.com/survexfile/caves-1623/115/cucc/futility.svx">expo.survex.com/survexfile/caves-1623/115/cucc/futility.svx</a> - CUCC cave survey data from 1983 in Schnellzughohle.
<li><a href="http://expo.survex.com/survey_scans/">expo.survex.com/survey_scans/</a> - List of all scanned original survey notes.
<li><a href="http://expo.survex.com/survey_scans/2018%252343/">expo.survex.com/survey_scans/2018%252343/</a> - list of links to scanned notes for wallet #43 during the 2018 expo.
</ul>
<p>Anything you check in which affects cave data or descriptions won't appear on the site until
the data management system update scripts are run.
This happens automatically every 30 mins, but you can also kick off a manual update.
See 'The expoweb-update script' below for details.</p>

<p>Also note that the ::expoweb:: web pages and cave data reports you see on the visible website
are not the same as the version-controlled  "master" expoweb repo.
So in order that your committed and pushed changes become visible on the website,
they have to be 'pulled' from the repo (on teh server machine) onto the webserver (another place on the same server machine) before your changes are reflected.</p>

<h3><a id="yourownlaptop">Your own laptop</a></h3>
<p>Setting your own laptop so that it can do everything the <i>expo laptop</i> can do is quite a
complicated process. At a minimum you will be an experienced software nerd already and will have git, mercurial and a text editor installed and you will know how to use them.
You will have done the
<a href="computing/keyexchange.html">key exchange</a> process - which you can only do entirely on your own if
you have access to the <i>expo laptop</i>.
<p>See <a href="computing/yourlaptop.html">setting up your own laptop</a> for the full list of software we use and where to get it.
<p>Note that the instructions are primarily for people using Linux with some help for those using Windows. If you are a Mac user then you are on your own.


<h3><a id="editthispage">Using 'Edit This Page'</a></h3>
<p>This can be used to edit web pages without installing any software or doing any key exchange. It even works if your laptop is a Mac.
<p>This is the capability that you can see in the top-left-hand menu on any website page if you <a href="/accounts/login/">log in to troggle</a> using the <a href="#usernamepassword">cavey:beery password</a>.
<p>'Edit This Page' is a troggle capability edits the file served by the webserver  but it does not update the copy of the file in the
repository (the invese of the problem described above as 'Mercurial Website Hack'). To properly finish the job you need to
<ul>
<li>
ssh into expo@expo.survex.com (use putty on a Windows machine)
<li>cd to the directory containing the repo you want, i.e. "cd loser" for
cave data or "cd expoweb" for the handbook and visible data management system, which takes you to /home/expo/expoweb
<li>Then run "<a href="https://www.selenic.com/mercurial/hg.1.html">hg status</a>" (to check what
changes are pending),
<li>then "hg diff" to see the changes in detail
(or "hg diff|less" if you know how to use "less" or "more") and
<li>then DO NOT just run '<a href="https://www.selenic.com/mercurial/hg.1.html">hg commit</a>' unless you know how emacs works as it will dump
you into an emacs editing window (C-x C-C is the way to exit emacs). Instead, do
'hg commit -m "found files left over - myName" '
which submits the obligatory comment witht he commit operation.
</ul>
<p>Again, we hope that this issue will go away when we migrate the expoweb repo from mercurial to git before the 2020 Expo.

<h3><a id="quickstart">Quick start</a></h3>

<p>If you know what you are doing here is the basic info on what's where:<br>
(if you don't know what you're doing, skip to <a href="#editingthedata management system">Editing the data management system</a> below.)

<p>This section is all about how to use mercurial. Since we are changing to git it has been
removed to <a href="computing/qstart-hg.html">a separate place</a>.


<dl>
    <dt>expofiles (all the big files and documents)</dt>

<p>Photos, scans (logbooks, drawn-up cave segments) (This was about
40GB of stuff in 2019 which you probably don't actually need locally).
<p>If you don't need an entire copy of all of it, then it is probably best to use Filezilla/ftp to
copy just a small part of the filesystem to your own machine and to upload the bits you add to or edit.
Instructions for installing and using Filezilla are found in the expo user instructions for
uploading photographs: <a href="uploading.html">uploading.html</a>.

<p> To sync all
the files from the server to your local expofiles directory on your laptop:</p>

<p><tt>rsync -nazv --delete-after --prune-empty-dirs expo@expo.survex.com:expofiles/ /home/expo/expofiles</tt></p>

<p>To sync the local expofiles directory back to the server after you have edited updates (e.g. scanned some hand-drawn surveys into expofiles/surveyscans/ (but only if your machine runs Linux):</p>

<p><tt>rsync -nazv  /home/expo/expofiles/surveyscans/2019/ expo@expo.survex.com/expofiles/surveyscans/2019</tt></p>

then CHECK that the list of files it produces matches the ones you absolutely intend to delete forever! ONLY THEN do it without the "-n" option. "-n" is the same as "--dry-run" which shows you the overwriting changes but doesn't actually do them.

<p>Always
<ul>

<li>do a dry-run of rsync from the server to your laptop immediately before you do an upload to the server
<li>use --delete-after --prune-empty-dirs when downloading, but never when uploading
<li>work at the minimum scope of folders you need, e.g. within expofiles/photos/ or expofiles/surveyscans/ not for the whole of expofiles all at once.
<li>take exagerated care with the placement of the final slash in directory parameters to the rsync. Get it wrong and you duplicate things instead of updating them and it takes ages to sort out.
</ul>

<p>(do be <b>incredibly</b> careful not to delete piles of stuff then rsync back, or to get the directory level of the command wrong - as it'll all get deleted on the server too, and we may not have backups!). It's <b>absolutely vital</b> to use rsync --dry-run --delete-after  first to check what would be deleted.

<p>If you are using rsync from a Windows machine you will <em>not</em> get all the files as some filenames are incompatible with Windows. What will happen is that rsync will invisibly change the names as it downloads them from the Linux expo server to your Windows machine, but then it forgets what it has done and tries to re-upload all the renamed files to the server even if you have touched none of them. Now there won't be any problems with simple filenames using all lowercase letters and no funny characters, but we have nothing in place to stop anyone creating such a filename somewhere in that 40GB or of detecting the problem at the time. So don't do it. If you have a Windows machine use Filezilla not rsync.

<p>(We may also have an issue with rsync not using the appropriate user:group  attributes for files pushed back to the server. This may not cause any problems, but watch out for it.)</p>
</dl>
<h3><a id="editingthedata management system">Editing the data management system</a></h3>

<p>To edit the data management system fully, you need to use the disributed version control system
(DVCM) software which is currently mercurial/TortoiseHg.
Some (static text) pages can be edited directly on-line using the 'edit this page link' which you'll
see if you are logged into troggle. In general the dynamically-generated pages, such as those describing
caves which are generated from the cave survey data, can not be edited in this way, but forms are provided
for some types of these like 'caves'.</p>

<p><tt>
[ui]<br/>username = Firstname Lastname &lt;myemail@example.com&gt;
</tt></p>

<p>The commit has stored the changes in your local Mercurial DVCS, but it has not sent anything back to the server. To do that you need to:</p>

<p><tt>hg push</tt></p>

<p>Before pushing, you should do an <tt>hg pull</tt> to sync with upstream first. If someone else has edited the same files you may also need to do:</p>

<p><tt>hg merge</tt></p>

<p>(and sort out any conflicts if you've both edited the same file) before pushing again</p>

<p>Simple changes to static files will take effect immediately, but changes to dynamically-generated files (cave descriptions, QM lists etc) will not take effect, until the server runs the expoweb-update script.</p>


<h3><a id="expowebupdate">The expoweb-update script</a></h3>

<p>The script at the heart of the data management system update mechanism is a makefile that runs the various generation scripts. It is run every 15 minutes as a cron job (at 0,15,30 and 45 past the hour), but if you want to force an update more quickly you can run it he</p>

<p>The scripts are generally under the 'noinfo' section of the site just because that has (had) some access control. This will get changed to something more sensible at some point</p>


<h3><a id="cavepages">Updating cave pages</a></h3>

<p>Cave description pages are automatically generated from a set of
cave files in noinfo/cave_data/ and noinfo/entrance_data/. These files
are named <area>-<cavenumber>.html (where area is 1623 or 1626). These
files are processed by troggle. Use <tt>python databaseReset.py
caves</tt> in /expofiles/troggle/ to update the site/database after
editing these files.</p>

<p>Clicking on 'New cave' (at the bottom of the cave index) lets you enter a new cave. <a href="caveentry.html">Info on how to enter new caves has been split into its own page</a>.</p>

<p>(If you remember something about CAVETAB2.CSV for editing caves, that was
superseded in 2012).</p>
<p>This may be a useful reminder of what is in a survex file <a href="survey/how_to_make_a_survex_file.pdf">how to create a survex file</a>.


<h3><a id="updatingyears">Updating expo year pages</a></h3>

<p>Each year's expo is recorded in the folder</p>

<tt>/expoweb/years/</tt>

<p>which contains a number of files used to manage and record that year's expo. Have a look at expoweb/years/2018/ for a recent well-documented expo (the weather was good). Files are added and edited using the version control system for the expoweb repository.</p>

<p>To create a new 'year' for next year's expo see <a href="computing/newyear.html">adding a new year</a>.


<h3><a id="surveystatus">Maintaining the survey status table</a></h3>
<p>See the <a href="survey/onlinewallet.html">documentation</a> on updating the online surveyscans folders using the lever-arch file of plastic wallets.

<p> This below is obsolete:
<ul>
<li>There was a table in the survey book which has a list of all the surveys and whether or not they have been drawn up, and some other info.

<li>This used to be generated by the script tablizebyname-csv.pl from the input file Surveys.csv
</ul>

<hr />
</body>
</html>