CUCC Expedition Handbook - Online systems

Expo Data Maintenance Manual

Expo data management programmers' manual

Editing the expo data management system is an adventure. Learning it by trial and error is non-trivial. There are lots of things we could improve about the system, and anyone with some computer nous is very welcome to muck in. It is slowly getting better organised.

This manual is organized in a how-to sort of style. The categories, rather than referring to specific elements of the data management system, refer to processes that a maintainer would want to do.

Note that to display the survey data you will need a copy of the survex software.

Go elsewhere if this is what you want to know:

Contents of this manual

  1. Getting a username, password and key
  2. The repositories
  3. How the data management system works
  4. Your own laptop
  5. Quick start
  6. Modifying the data management system
  7. The expoweb-update script
  8. Updating cave pages
  9. Updating expo year pages
  10. Ticking off QMs
  11. Maintaining the survey status table
  12. Automation
Appendices:

Getting a username, password and key

You don't need a password to view most things, but you will need one to change them.

Use these credentials for access to the troggle site. The user is 'expo', with a cavey:beery password. Ask someone if this isn't enough clue for you. This password is important for security. The whole site will get hacked by spammers or worse if you are not careful with it. Use a secure method for passing it on to others that need to know (i.e not unencrypted email), don't publish it anywhere, don't check it in to the data management system by accident. A lot of people use it and changing it is a pain for everyone so do take a bit of care.

This password is all you need to log in to troggle and to use the troggle control panel (very few people need to do this). But if you want to update webpages (a much more common requirement) or to edit the software itself (very rare), then you will also need to get a cryptographic key and register it with the server. See key exchange for details.

Unfortunately, pushing cave data to the ::loser:: and ::drawings:: repositories also needs a key. So cavers entering their cave survey data currently have to use a machine on which this already set up. These machines are the expo laptop and the laptop 'aziraphale' which live in the potato hut during expo. If you want to use your own laptop then see below.

The repositories

All the expo data is contained in 4 "repositories" at expo.survex.com. This is currently hosted on a free virtual server we have blagged on a server farm. We use a distributed version control system (DVCS) to manage these repositories because this allows simultaneous collaborative editing and keeps track of all changes so we can roll back and have branches if needed.

The site has been split into four parts:

We have migrated two of these to git but the other two still use mercurial.

Mercurial Website Hack 2019

Currently (December 2019) after commiting and pushing your changes to expoweb to the mercurial server, you will need to login to expo.survex.com using ssh, cd to /expoweb/ and issue a "hg update" command to make your changes noticed by the webserver. This problem will go away before Expo 2020 - we hope - when we finish migrating from mercurial to git.

All the scans, photos, presentations, fat documents and videos are stored just as files (not in version control) in 'expofiles'. See below for details on that.

How the data management system works

Part of the data management system is static HTML, but quite a lot is generated by scripts and troggle (a web framework built using Django).

Examples of troggle-generated pages from data:

Anything you check in which affects cave data or descriptions won't appear on the site until the data management system update scripts are run. This happens automatically every 30 mins, but you can also kick off a manual update. See 'The expoweb-update script' below for details.

Also note that the ::expoweb:: web pages and cave data reports you see on the visible website are not the same as the version-controlled "master" expoweb repo. So in order that your committed and pushed changes become visible on the website, they have to be 'pulled' from the repo onto the webserver before your changes are reflected.

Your own laptop

Setting your own laptop so that it can do everything the expo laptop can do is quite a complicated process. At a minimum you will be an experienced software nerd already and will have git, mercurial and a text editor installed and you will know how to use them. You will have done the key exchange process - which you can only do entirely on your own if you have access to the expo laptop.

See setting up your own laptop for the full list of software we use and where to get it.

Note that the instructions are primarily for people using Linux with some help for those using Windows. If you are a Mac user then you are on your own.

Using 'Edit This Page'

This can be used to edit web pages without installing any software or doing any key exchange. It even works if your laptop is a Mac.

This is the capability that you can see in the top-left-hand menu on any website page if you log in to troggle using the cavey:beery password.

'Edit This Page' is a troggle capability edits the file served by the webserver but it does not update the copy of the file in the repository (the invese of the problem described above as 'Mercurial Website Hack'). To properly finish the job you need to

Again, we hope that this issue will go away when we migrate the expoweb repo from mercurial to git before the 2020 Expo.

Quick start

If you know what you are doing here is the basic info on what's where:
(if you don't know what you're doing, skip to Editing the data management system below.)

This section is all about how to use mercurial. Since we are changing to git it has been removed to a separate place.

expofiles (all the big files and documents)

Photos, scans (logbooks, drawn-up cave segments) (This was about 40GB of stuff in 2019 which you probably don't actually need locally).

If you don't need an entire copy of all of it, then it is probably best to use Filezilla/ftp to copy just a small part of the filesystem to your own machine and to upload the bits you add to or edit. Instructions for installing and using Filezilla are found in the expo user instructions for uploading photographs: uploading.html.

To sync all the files from the server to your local expofiles directory on your laptop:

rsync -nazv --delete-after --prune-empty-dirs expo@expo.survex.com:expofiles/ /home/expo/expofiles

To sync the local expofiles directory back to the server after you have edited updates (e.g. scanned some hand-drawn surveys into expofiles/surveyscans/ (but only if your machine runs Linux):

rsync -nazv /home/expo/expofiles/surveyscans/2019/ expo@expo.survex.com/expofiles/surveyscans/2019

then CHECK that the list of files it produces matches the ones you absolutely intend to delete forever! ONLY THEN do it without the "-n" option. "-n" is the same as "--dry-run" which shows you the overwriting changes but doesn't actually do them.

Always

(do be incredibly careful not to delete piles of stuff then rsync back, or to get the directory level of the command wrong - as it'll all get deleted on the server too, and we may not have backups!). It's absolutely vital to use rsync --dry-run --delete-after first to check what would be deleted.

If you are using rsync from a Windows machine you will not get all the files as some filenames are incompatible with Windows. What will happen is that rsync will invisibly change the names as it downloads them from the Linux expo server to your Windows machine, but then it forgets what it has done and tries to re-upload all the renamed files to the server even if you have touched none of them. Now there won't be any problems with simple filenames using all lowercase letters and no funny characters, but we have nothing in place to stop anyone creating such a filename somewhere in that 40GB or of detecting the problem at the time. So don't do it. If you have a Windows machine use Filezilla not rsync.

(We may also have an issue with rsync not using the appropriate user:group attributes for files pushed back to the server. This may not cause any problems, but watch out for it.)

Editing the data management system

To edit the data management system fully, you need to use the disributed version control system (DVCM) software which is currently mercurial/TortoiseHg. Some (static text) pages can be edited directly on-line using the 'edit this page link' which you'll see if you are logged into troggle. In general the dynamically-generated pages, such as those describing caves which are generated from the cave survey data, can not be edited in this way, but forms are provided for some types of these like 'caves'.

[ui]
username = Firstname Lastname <myemail@example.com>

The commit has stored the changes in your local Mercurial DVCS, but it has not sent anything back to the server. To do that you need to:

hg push

Before pushing, you should do an hg pull to sync with upstream first. If someone else has edited the same files you may also need to do:

hg merge

(and sort out any conflicts if you've both edited the same file) before pushing again

Simple changes to static files will take effect immediately, but changes to dynamically-generated files (cave descriptions, QM lists etc) will not take effect, until the server runs the expoweb-update script.

The expoweb-update script

The script at the heart of the data management system update mechanism is a makefile that runs the various generation scripts. It is run every 15 minutes as a cron job (at 0,15,30 and 45 past the hour), but if you want to force an update more quickly you can run it he

The scripts are generally under the 'noinfo' section of the site just because that has (had) some access control. This will get changed to something more sensible at some point

Updating cave pages

Cave description pages are automatically generated from a set of cave files in noinfo/cave_data/ and noinfo/entrance_data/. These files are named -.html (where area is 1623 or 1626). These files are processed by troggle. Use python databaseReset.py caves in /expofiles/troggle/ to update the site/database after editing these files.

Clicking on 'New cave' (at the bottom of the cave index) lets you enter a new cave. Info on how to enter new caves has been split into its own page.

(If you remember something about CAVETAB2.CSV for editing caves, that was superseded in 2012).

This may be a useful reminder of what is in a survex file how to create a survex file.

Updating expo year pages

Each year's expo has a documentation index which is in the folder

/expoweb/years

, so to checkout the 2011 page, for example, you would use

hg clone ssh://expo@expo.survex.com/expoweb/years/2011

Once you have pushed your changes to the repository you need to update the server's local copies, by ssh into the server and running hg update in the expoweb folder.

Adding a new year

Edit folk/folk.csv, adding the new year to the end of the header line, a new column, with just a comma (blank cell) for people who weren't there, a 1 for people who were there, and a -1 for people who were there but didn't go caving. Add new lines for new people, with the right number of columns.

This proces is tedious and error-prone and ripe for improvement. Adding a list of people, fro the bier book, and their aliases would be a lot better, but some way to make sure that names match with previous years would be good.

Ticking off QMs

To be written.

Maintaining the survey status table

There is a table in the survey book which has a list of all the surveys and whether or not they have been drawn up, and some other info.

This is generated by the script tablizebyname-csv.pl from the input file Surveys.csv