expoweb/handbook/manual.html

<html>
<head>
<title>CUCC Expedition Handbook: Online system manual</title>
<link rel="stylesheet" type="text/css" href="../css/main2.css" />
</head>
<body>
<h2 id="tophead">CUCC Expedition Handbook - Online systems</h2>
<h1>Expo Online Systems Manual</h1>

<h2><a id="manual">Expo data management systems manual</a></h2>

<p>Editing the expo data management system is an adventure. Until 2007, there was no
guide which explained the whole thing as a functioning system. Learning
it by trial and error is non-trivial. There are lots of things we
could improve about the system, and anyone with some computer nous is
very welcome to muck in. It is slowly getting better organised.</p>

<p>This manual is organized in a how-to sort of style. The categories,
rather than referring to specific elements of the data management system, refer to
processes that a maintainer would want to do.</p>
<p>Note that to display the survey data you will need a copy of the survex software.

<h3>Contents</h3>

<ol>
<li><a href="#usernamepassword">Getting a username and password</a></li>
<li><a href="#repositories">The repositories</a></li>
<li><a href="#howitworks">How the data management system works</a></li>
<li><a href="#quickstart">Quick start</a></li>
<li><a href="#editingthedata management system">Editing the data management system</a></li>
<li><a href="#Mercurialinwindows">Using version control software in Windows</a></li>
<li><a href="#expowebupdate">The expoweb-update script</a></li>
<li><a href="#cavepages">Updating cave pages</a></li>
<li><a href="#updatingyears">Updating expo year pages</a></li>
<li><a href="logbooks.html">Adding typed logbooks</a></li>
<li><a href="uploading.html">Uploading photos</a></li>
<li><a href="#tickingoff">Ticking off QMs</a></li>
<li><a href="#surveystatus">Maintaining the survey status table</a></li>
<li><a href="#automation">Automation</a></li>
<li><a href="#arch">Archived updates</a></li>
</ol>
Appendices:
<ul>
<li><a href="data management system-history.html">History of the data management system</a></li>
</ul>

<h3><a id="usernamepassword">Getting a username and password</a></h3>

<p>Use these credentials for access to the site. The user is 'expo',
  with a cavey:beery password. Ask someone if this isn't enough clue for you.
  <b>This password is important for security</b>. The whole site <strong>will</strong> get hacked by spammers or worse if you are not careful with it. Use a secure method for passing it on to others that need to know (i.e not unencrypted email), don't publish it anywhere, don't check it in to the data management system by accident. A lot of people use it and changing it is a pain for everyone so do take a bit of care.
</p>

<p>Note that you don't need a password to view most things, but you will need one to change them</p>

<h3><a id="repositories">The repositories</a></h3>

<p>All the expo data is contained in 4 "repositories" at
expo.survex.com. This is currently hosted on a server at the university. We use a distributed version control system (DVCS) to manage these repositories because this allows simultaneous collaborative editing and keeps track of all changes so we can roll back and have branches if needed.</p>

<p>The site has been split into four parts:</p>

<ul>
 <li><a href="http://expo.survex.com/repositories/home/expo/expoweb/graph">expoweb</a> - the data management system itself, including generation scripts</li>
 <li><a href="http://expo.survex.com/repositories/home/expo/troggle/graph/">troggle</a> - the database-driven part of the data management system - see <a href="troggle-ish.html">notes on troggle</a> for further explanation</li>
 <li><a href="http://expo.survex.com/repositories/home/expo/loser/graph/">loser</a> - the survex survey data</li>
 <li><a href="http://expo.survex.com/repositories/home/expo/tunneldata/graph/">tunneldata</a> - the tunnel (and therion) data and drawings</li>
</ul>


<p>All the scans, photos, presentations, fat documents and videos are
stored just as files (not in version control) in 'expofiles'. See
below for details on that.</p>

<h3><a id="howitworks">How the data management system works</a></h3>

<p>Part of the data management system is static HTML, but quite a lot is generated by scripts. So anything you check in which affects cave data or descriptions won't appear on the site until the data management system update scripts are run. This happens automatically every 30 mins, but you can also kick off a manual update. See 'The expoweb-update script' below for details.</p>

<p>Also note that the data management system you see is its own Mercurial checkout (just like your local one) so that has to be 'pulled' from the server before your changes are reflected.</p>

<h3><a id="editthispage">Using 'Edit This Page'</a></h3>
<p>This edits the file served by the webserver  on
the expo server in Cambridge but it does not update the copy of the file in the
repository in expo.survex.com. To properly finish the job you need to
<ul>
<li>
ssh into expo@expo.survex.com (use putty on a Windows machine)
<li>cd to the directory containing the repo you want, i.e. "cd loser" for
cave data or "cd expoweb" for the handbook and visible data management system, which takes you to /home/expo/expoweb
<li>Then run "hg status" (to check what
changes are pending),
<li>then "hg diff" to see the changes in detail
(or "hg diff|less" if you know how to use "less" or "more") and
<li>then DO NOT just run 'hg commit' unless you know how emacs works as it will dump
you into an emacs editing window (C-x C-C is the way to exit emacs). Instead, do
'hg commit -m "found files left over - myName" '
which submits the obligatory comment witht he commit operation.
</ul>


<h3><a id="quickstart">Quick start</a></h3>

<p>If you know what you are doing here is the basic info on what's where:<br>
(if you don't know what you're doing, skip to <a href="#editingthedata management system">Editing the data management system</a> below.)


<dl>
    <dt>expoweb (The data management system)</dt>
    <dd>
      <tt>hg clone ssh://expo@expo.survex.com/expoweb</tt> (read/write)<br />
      <tt>hg clone http://expo.survex.com/repositories/home/expo/expoweb/</tt> (read-only checkout)
    </dd>

    <dt>troggle (The data management system backend)</dt>
    <dd>
      <tt>hg clone ssh://expo@expo.survex.com/troggle</tt> (read/write)<br />
      <tt>hg clone http://expo.survex.com/repositories/home/expo/troggle/</tt> (read-only checkout)
    </dd>

    <dt>loser (The survey data)</dt>
    <dd>
      <tt>hg clone ssh://expo@expo.survex.com/loser</tt> (read/write)<br />
      <tt>hg clone http://expo.survex.com/repositories/home/expo/loser/</tt> (read-only)
    </dd>

    <dt>tunneldata (The Tunnel drawings)</dt>
    <dd>
      <tt>hg clone ssh://expo@expo.survex.com/tunneldata</tt> (read/write)<br />
      <tt>hg clone http://expo.survex.com/repositories/home/expo/expoweb/</tt> (read-only)
    </dd>
</dl>
<dl>
    <dt>expofiles (all the big files and documents)</dt>

<p>Photos, scans (logbooks, drawn-up cave segments) (This was about
60GB of stuff in 2017 which you probably don't actually need locally).
<p>If you don't need an entire copy of all 60GB, then it is probably best to use Filezilla to copy just a small part of the filesystem to your own machine and to upload the bits you add to or edit.
Instructions for installing and using Filezilla are found in the expo user instructions for uploading photographs: <a href="uploading.html">uploading.html</a>.

<p> To sync all
the files from the server to local expofiles directory:</p>

<p><tt>rsync -av expo@expo.survex.com:expofiles /home/expo</tt></p>

<p>To sync the local expofiles directory back to the server (but only if your machine runs Linux):</p>
<p><tt>rsync --dry-run --delete-after -a /home/expo/expofiles expo@expo.survex.com</tt></p>
then CHECK that the list of files it produces matches the ones you absolutely intend to delete forever! ONLY THEN do:
<p><tt>rsync -av /home/expo/expofiles expo@expo.survex.com:</tt></p>

<p>(do be <b>incredibly</b> careful not to delete piles of stuff then rsync back, or to get the directory level of the command wrong - as it'll all get deleted on the server too, and we may not have backups!). It's <b>absolutely vital</b>Use rsync --dry-run --delete-after -a first to check what would be deleted.

<p>If you are using rsync from a Windows machine you will <em>not</em> get all the files as some filenames are incompatible with Windows. What will happen is that rsync will invisibly change the names as it downloads them from the Linux expo server to your Windows machine, but then it forgets what it has done and tries to re-upload all the renamed files to the server even if you have touched none of them. Now there won't be any problems with simple filenames using all lowercase letters and no funny characters, but we have nothing in place to stop anyone creating such a filename somewhere in that 60GB or of detecting the problem at the time. So don't do it. If you have a Windows machine use Filezilla not rsync.

<p>(We may also have an issue with rsync not using the appropriate user:group  attributes for files pushed back to the server. This may not cause any problems, but watch out for it.)</p>
</dl>
<h3><a id="editingthedata management system">Editing the data management system</a></h3>

<p>To edit the data management system fully, you need to use the disributed version control system (DVCM) software which is currently mercurial/TortoiseHg. Some (static text) pages can be edited directly on-line using the 'edit this page link' which you'll see if you are logged into troggle. In general the dynamically-generated pages, such as those describing caves which are generated from the cave survey data, can not be edited in this way, but forms are provided for some types of these like 'caves'.</p>

<p>What follows is for Linux. If you are running Windows then see below <a href="#Mercurialinwindows">Using Mercurial/TortoiseHg in Windows</a>.

<p>Mercurial can be used from the command line, but if you prefer a GUI, TourtoiseHg is highly recommended on all OSes.</p>

<p>Linux: Install mercurial and tortoisehg-nautilus from synaptic,
then restart nautilus <tt>nautilus -q</tt>. If it works, you'll be able to see the menus of tortoise within your Nautilus windows. </p>

<p>Once you've downloaded and installed a client, the first step is to create what is called a checkout of the data management system. This creates a copy on your machine which you can edit to your heart's content. The command to initially check out ('clone') the entire expo data management system is:</p>

<p><tt>hg clone ssh://expo@expo.survex.com/expoweb</tt></p>

<p>for subsequent updates</p>

<p><tt>hg update</tt></p>

<p>will generally do the trick.</p>

<p>In TortoiseHg, merely right-click on a folder you want to check out to, choose "Mercurial checkout," and enter</p>

<p><tt>ssh://expo@expo.survex.com/expoweb</tt></p>

<p>After you've made a change, commit it to you local copy with:</p>

<p><tt>hg commit</tt>   (you can specify filenames to be specific)</p>

<p>or right clicking on the folder and going to commit in TortoiseHg. Mercurial can't always work out who you are. If you see a message like "abort: no username supplied" it was probably not set up to deduce that from your environment. It's easiest to give it the info in a config file at ~/.hgrc (create it if it doesn't exist, or add these lines if it already does) containing something like</p>

<p><tt>
[ui]<br/>username = Firstname Lastname &lt;myemail@example.com&gt;
</tt></p>

<p>The commit has stored the changes in your local Mercurial DVCS, but it has not sent anything back to the server. To do that you need to:</p>

<p><tt>hg push</tt></p>

<p>Before pushing, you should do an <tt>hg pull</tt> to sync with upstream first. If someone else has edited the same files you may also need to do:</p>

<p><tt>hg merge</tt></p>

<p>(and sort out any conflicts if you've both edited the same file) before pushing again</p>

<p>Simple changes to static files will take effect immediately, but changes to dynamically-generated files (cave descriptions, QM lists etc) will not take effect, until the server runs the expoweb-update script.</p>


<h3><a id="Mercurialinwindows">Using Mercurial/TortoiseHg in Windows</a></h3>

<p>Read the instructions for setting up TortoiseHG in <a href="tortoise/tortoise-win.htm">Tortoise-on-Windows</a>.
<p>In Windows: install Mercurial and TortoiseHg of the relevant flavour from <a href="https://tortoisehg.bitbucket.io/">https://tortoisehg.bitbucket.io/</a> (ignoring antivirus/Windows warnings). This will install a submenu in your Programs menu)</p>

<p>To start cloning a repository: first create the folders you need for the repositories you are going to use, e.g. D:\CUCC-Expo\loser and D:\CUCC-Expo\expoweb. Then start TortoiseHg Workbench from your Programs menu, click File -> Clone repository, a dialogue box will appear. In the Source box type</p>

<p><tt>ssh://expo@expo.survex.com/expoweb</tt></p>

<p>for expoweb (or similar for the other repositories). In the Destination box type whatever destination you want your local copies to live in on your laptop e.g. D:\CUCC-Expo\expoweb. Hit Clone, and it should hopefully prompt you for the usual beery password.

<p>The first time you do this it will probably not work as it does not recognise the server. Fix this by running putty (downloading it from <a href="https://www.chiark.greenend.org.uk/~sgtatham/putty/">https://www.chiark.greenend.org.uk/~sgtatham/putty/</a>), and connecting to the server 'expo@expo.survex.com' (on port 22). Confirm that this is the right server. If you succeed in getting a shell prompt then ssh connection are working and TortoiseHg should be able to clone the repo, and send changes back.</p>


<h3><a id="expowebupdate">The expoweb-update script</a></h3>

<p>The script at the heart of the data management system update mechanism is a makefile that runs the various generation scripts. It is run every 15 minutes as a cron job (at 0,15,30 and 45 past the hour), but if you want to force an update more quickly you can run it he</p>

<p>The scripts are generally under the 'noinfo' section of the site just because that has (had) some access control. This will get changed to something more sensible at some point</p>


<h3><a id="cavepages">Updating cave pages</a></h3>

<p>Cave description pages are automatically generated from a set of
cave files in noinfo/cave_data/ and noinfo/entrance_data/. These files
are named <area>-<cavenumber>.html (where area is 1623 or 1626). These
files are processed by troggle. Use <tt>python databaseReset.py
caves</tt> in /expofiles/troggle/ to update the site/database after
editing these files.</p>

<p>Clicking on 'New cave' (at the bottom of the cave index) lets you enter a new cave. <a href="caveentry.html">Info on how to enter new caves has been split into its own page</a>.</p>

<p>(If you remember something about CAVETAB2.CSV for editing caves, that was
superseded in 2012).</p>
<p>This may be a useful reminder of what is in a survex file <a href="survey/how_to_make_a_survex_file.pdf">how to create a survex file</a>.

<h3><a id="updatingyears">Updating expo year pages</a></h3>

<p>Each year's expo has a documentation index which is in the folder</p>

<p>/expoweb/years</tt></p>

<p>, so to checkout the 2011 page, for example, you would use</p>

<p>hg clone ssh://expo@expo.survex.com/expoweb/years/2011</tt></p>

<p> Once you have pushed your changes to the repository you need to update the server's local copies, by ssh into the server and running hg update in the expoweb folder. </p>

<h3>Adding a new year</h3>
<p>Edit noinfo/folk.csv, adding the new year to the end of the header
line, a new column, with just a comma (blank
cell) for people who weren't there, a 1 for people who were there, and
a -1 for people who were there but didn't go caving. Add new lines for
new people, with the right number of columns.</p>

<p>This proces is tedious and error-prone and ripe for improvement.
Adding a list of people, fro the bier book, and their aliases would be
a lot better, but some way to make sure that names match with previous
years would be
good.</p>

<h3><a id="tickingoff">Ticking off QMs</a></h3>

<p>To be written.</p>


<h3><a id="surveystatus">Maintaining the survey status table</a></h3>

<p>There is a table in the survey book which has a list of all the surveys and whether or not they have been drawn up, and some other info.</p>

<p>This is generated by the script tablizebyname-csv.pl from the input file Surveys.csv</p>


<h3 id="automation">Automation on expo.survex.com</h3>

<p>Ths section is entirely out of date (June 2014), and awaiting deletion or removal</p>.

<p>The way things normally work, python or perl scripts turn CSV input into HTML for the data management system. Note that:</p>
<p>The CSV files are actually tab-separated, not comma-separated despite the extension.</p>
<p>The scripts can be very picky and editing the CSVs with microsoft excel has broken them in the past- not sure if this is still the case.</p>
<p>Overview of the automagical scripts on the expo data management system</p>
[Clearly very out of date is it is assuming the version control is svn whereas we changed to hg years ago.]
<pre>
Script location 	Input file 	Output file 	Purpose
/svn/trunk/expoweb/noinfo/make-indxal4.pl 	/svn/trunk/expoweb/noinfo/CAVETAB2.CSV 	many 	produces all cave description pages
/svn/trunk/expoweb/noinfo/make-folklist.py 	/svn/trunk/expoweb/noinfo/folk.csv 	http://expo.survex.com/folk/index.htm 	Table of all expo members

/svn/trunk/surveys/tablize-csv.pl /svn/trunk/surveys/tablizebyname-csv.pl
	/svn/trunk/surveys/Surveys.csv

http://expo.survex.com/expo/surveys/surveytable.html http://expo.survex.com/surveys/surtabnam.html
	Survey status page: "wall of shame" to keep track of who still needs to draw which surveys

</pre>

<h3><a id="arch">Archived updates</a></h3>
<p>Since 2008 we have been keeping detailed records of all data management system updates in the version control system.
Before then we manually maintained <a href="../update.htm">a list of updates</a> which are now only of historical interest.

<h2>The data management system conventions bit</h2>
 <p>This is likely to change with structural change to the site, with style changes which we expect to implement and with the method by which the info is actually stored and served up.</p>
<p>... and it's not written yet, either :-)</p>
<ul>

<li>Structure</li>
<li>Info for each cave &ndash; automatically generated by <tt>make-indxal4.pl</tt></li>
<li>Contents lists &amp; relative links for multi-article publications like  journals. Complicated by expo articles being in a separate hierarchy from journals.</li>
<li>Translations</li>
<li>Other people's work - the noinfo hierarchy.</li>
<li>Style guide for writing cave descriptions: correct use of boldface (<em>once</em> for each passage name, at the primary definition thereof; other uses of the name should be links to this, and certainly should not be bold.) </li>
</ul>


<hr />
</body>
</html>