CUCC Expedition Handbook

Updating the website - HOWTO

Rationale for having a system at all:

For years, the website has been built by hand-editing HTML pages, "traditionally" by one or two people, in constant email contact. Obviously those people were thoroughly familiar with the locations of all the pages, the conventions used for naming files, the style used throughout the site and a few conventions for adding things which made the site easier to maintain. There are also a few odd tools written to hack from one format (such as a scanned journal page) to another which those users were happy with (but which may not be useful for anyone else - typically written in BBC Basic on an Acorn machine :-). The survey dataset was maintained in much the same way, though a lot more of that work was done actually on expedition, and therefore by a lot more people.

This work becomes quite onerous when neither of the two usual maintainers went on expo this year, and in any case, it is usually far better for the people who actually explored the cave to integrate their stuff into the existing cave descriptions. To make this easier (indeed, possible !), both the web pages and the survey data are now held in a CVS repository, which means lots of people can edit different parts (even different parts of the same file) at the same time without causing chaos, and a record is kept of who changed what, and why.

This how-to guide is split into two parts - one for those who have never used CVS before, and one for everyone, outlining the structure of the site and the few conventions used within it. It is still possible for you to contribute to the updating of the site without learning CVS, by sending changes or fault reports (as plain text) to one of the regular maintainers - it is not one of the aims of this system to exclude non-computer-nerds, just to reduce the computer-nerds' workload :-)

The CVS bit

This bit should not change significantly with structural change to the website (such as serving it all via PHP from a database). I hope it is all clear - it was written in parallel with my own learning how to use CVS. It may be verbose, but I hope that writing it this way means that it is low on omissions.

CVS Rationale:

The problem with having more than one or two people do the work is that it generates increasing amounts of email to ensure that people don't update old versions of pages, and throw away someone else's recent work. The solution to all this is to put the entire site into a system called "Concurrent Versions System" under which lots of people can make changes, all of which are safely merged together without losing any changes. Just occasionally, the system will detect that two people have changed the same bit of text in conflicting ways, and it is then up to them to sort it out, but this is a great improvement over what has gone before.

How does it work ?

There is a central CVS repository, from which someone needing to do updates can "check out" a copy of the page they want to change. They then make the changes or additions required, and "commit" the new page back into the system. The system keeps a record of all such changes, with a log message in which you should say why the change was made (the system knows who you are and when you committed the change, so you don't need to tell it that).

In order to join in the work, the central CVS repository needs to know about you, so it can allow you to commit changes into the system (and thus to stop random hackers replacing cave surveys with redirects to a porn site, or whatever).

Becoming a website updater

Software requirements

You need a system which has a CVS client and supports SSH, so that you can log in without sending a password in clear text over the Internet. You need an editor with which you are happy to edit web pages. Ideally this will NOT be one of the commercial WYSIWYG web editors which add whole loads of guff to your webpage in a manner that you don't see (and which, incidentally, makes the pages vastly harder to maintain for the next person who comes along with a basic HTML editor, not to mention making the pages load more slowly and typically work in fewer browsers). Most of us use basic text editors with extensions that make editing HTML easier. The easiest way to get all this is to have a Linux machine, since most distributions have all the tools you need ready built in. The rest of this page assumes that you are doing all this on a recent Linux system. There are a few useful links for those using Mac, RISC OS or Windows machines along with the links to more detailed documentation at the end of this page. The CVS machine itself is a Linux box, and some of the commands you need to use involve typing at the command line on that machine, so some familiarity with Unix/Linux will make you feel more at home.

The CVS repository is on a machine called cvs.cucc.survex.com, so you need to get (from Mark Shinwell, at present) a username and password. If you are told these via email, the first thing you will need to do is change the password, using ssh:

ssh username@cvs.cucc.survex.com passwd

(where username is the username given to you; for CU students this will usually be your CRSID). This command is in three parts: "ssh" says to open an encrypted connection "username@cvs.cucc.survex.com" says which user on what machine to connect to, "passwd" is the command you are going to execute on that machine, as that user, in this case a command to change your password.

this will ask for the existing password to open the connection to the cvs machine (this allows the "ssh username@cvs.cucc.survex.com" to happen), then ask for it again (to validate the passwd command). Then you need to type your new password twice (second time to confirm that you really typed what you thought the first time), and should then tell you that the password was updated OK. You will need a password made up of mixed uppercase and lowercase letters, digits and punctuation, and not apparently based on any dictionary word. This will make it a real pain to remember and type, but we get round that in the next step:

Generate a public/private keypair to do authentication automatically. With recent versions of openssh, you need to type

ssh-keygen -t dsa

while with older versions, you may find that "-t" is not a valid option, in which case

ssh-keygen -d

does the same thing. Either way, it will suggest a place to store the keypair (~/.ssh/id-dsa) which you should accept. This needs to be kept secure (so don't generate the keypair and keep it on a publically accessible machine, for example).

We want to force ssh to use protocol 2, but in a typical distribution, it tries protocol 1 first - this will oblige you to type your password every time, which is a pain. You can change this globally, for everyone, by altering /etc/ssh/ssh_config but it is probably best to alter it for just the user you will be when doing the CVS commands. Create a file ~/.ssh/config containing the lines

# Make sure we use protocol 2 to avoid tedious password typing:
Host cvs.cucc.survex.com
Protocol 2

Now copy the public key to the server. One thing that might trip you up is that the directory ~/.ssh may not exist on the remote machine. To create it and copy the key:

ssh username@cvs.cucc.survex.com mkdir ~/.ssh
scp ~/.ssh/id_dsa.pub username@cvs.cucc.survex.com:.ssh/authorized_keys2

(note the nasty American spelling here - easy to mistype if you're English)-: Those commands will ask for your password again, but that should be the last time you'll need to enter it on that machine. Having done all that, you should now be able to do

ssh cvs.cucc.survex.com

without being asked for a password. That would get you a command line to do things on the cvs machine, but for most jobs, you only need to do CVS commands on your own machine, so get out of that command line with

exit

To use the CVS commands on your local machine (for checking out pages to edit and committing them back) you need to tell cvs where the archive is. You can include a "-d username@cvs.cucc.survex.com:/export/cvs" with cvs commands (useful if you use cvs on more than one repository), but it is usually easier to add

export CVSROOT=username@cvs.cucc.survex.com:/export/cvs

to some script that will be executed before you want to use cvs. Easiest would usually be ~/.bashrc (assuming your default shell is bash). Also add

export CVS_RSH=ssh

You are now ready to get a copy of the page(s) you want to edit. If you don't have a copy of the site on CD, it may be easiest to download the whole site - having the other pages for context makes life much easier if you are maintaining links between pages. Move into a local directory where you will edit the pages, I use a tree in my own home directory, which, for historical reasons, is called chaos, but you can choose any directory where you will have the write-access needed to edit pages:

cd ~/chaos
cvs checkout expoweb

and then move into the directory tree to make your changes. Thus far, everything has been at the command line, but often doing the editing will be more convenient through a desktop interface. You might find that you want to set your file browser *not* to display an HTML view of the files, otherwise you will end up browsing the pages, rather than the file tree, which makes editing much harder :-(

When you have made your changes, you need to check that no-one else has changed things in a way which clashes. Its also a good idea to keep your own copies of the pages as up-to-date as possible, so at the top level of your copy ( ~/chaos in my case ):

cvs update

If you are unlucky (most likely if you made changes a long time after you last ran update) this will tell you about conflicts which you'll need to resolve with the other person(s) who made changes. Make sure you do resolve these changes, since just committing your version throws away the other person's changes from the current version (CVS keeps a record of all the changes, so they can be recovered, but it is easier and much more polite to resolve the problem through dialogue). Once all is OK

cvs commit

If you are updating the whole tree like this, it is a good idea to make sure you get any new directories and remove any old ones (which doesn't happen by default). To do this specify

cvs update -Pd

You can just update one subdirectory (and everything under it) or an individual file by adding its name to the end of the command, such as

cvs update expo/smkridge/161
cvs commit expo/smkridge/161/france.htm

if you create a new page, lets say for a description of a new cave on the plateau, 1623/505, it would probably be called 505.htm in the plateau subdir. "cvs commit" will not work on files that cvs does not know about, so to let cvs know it is there use

cvs add expo/plateau/505.htm
cvs commit expo/plateau/505.htm

cvs works by maintaining DIFFerences between files as they are updated. This works on text files, and cvs can convert the line-ending conventions on different platforms. If you add a binary file, that sort of translation can be extremely bad news, so use "-kb" to tell cvs when adding a binary file:

cvs add -kb expo/plateau/others/505.png

Sometimes, you may find that an unwieldy chunk of cave description needs to be split into two or more pages (this happened quite often with 161). It is usually clearer to everyone if none of the new files have the same name as the old one. So in addition to using "cvs add" to add the new pages, you need

rm expo/smkridge/161/nowsplit.htm
cvs remove expo/smkridge/161/nowsplit.htm
cvs commit expo/smkridge/161/nowsplit.htm

Whenever you do a "cvs commit" you will be asked for a log message, which is just some text to help others know what sort of update you were doing. So something like "correcting typos" or "added new passages off Puerile Humour", "fixing broken links" are the minimum sort of level you need to add. (Martin, are you reading this?) It is a good idea to commit files back to the repository one or two at a time, so the comments can be relevant to each particular file. It is often worth while committing unrelated changes separately, even if they affect the same file. For instance, if you correct some typos in one page, and link a new photo to several pages, including that same one, it is better to commit each set of changes separately. This does take some discipline, however, as it is usually just whilst you are making one set of changes that you notice the typo, and if you don't change it then and there, it gets forgotten. Your call.

Of course, if you did some major overhaul to a lot of files (like changing lots of links after some sort of reorganisation) you'd want to commit them all together with a suitable log message. It really is a good idea to avoid doing this whilst other people might be editing some of the files, as you could spend ages resolving conflicts...

Avoiding cocks-up

Running cvs update just before you start editing saves you making changes to out-of-date stuff, and committing changes soon after editing saves everyone else from working on out-of-date pages. Both of which will save you work resolving conflicts later on - but only if *everyone* remembers to do this.

If you had to leave some editing part way through, and came back to it later, its easy to forget what you have finished and what you haven't. Running

cvs diff -u

will tell you what you changed *from the copy you checked out* (so you don't get confused by a list of things which other people have changed in the meantime). This helps to avoid leaving things like notes to yourself lying around in the file, and should help to avoid failing to update links, though that is harder, since you have to notice that a change you meant to make hasn't appeared in the list of changes.

cvs -n update 

doesn't actually update anything, but tells you what would have happened. This is useful at various times, such as for spotting conflicts early on whilst you are part way through doing a big update. Changing it to

cvs -nq update

suppresses some of the less useful output. Files which are marked with a "?" are ones which cvs doesn't know about - maybe you haven't "cvs add"ed them yet.

Updating the website

Having committed any changes to the cvs tree, connect to cvs.cucc.survex.com via ssh and run the command /opt/expo/bin/www-update. (You can do this all in one step by just typing ssh cvs.cucc.survex.com /opt/expo/bin/www-update.)

CVS documentation

Obviously, CVS has lots of bells and whistles that you don't need just to edit a few web pages. Here are a few links which you might care to look at. Many more are accessible via CVS' home page.

OK, that's how to use CVS. You might now like to read a bit about editing the web pages you checked out - there are a few conventions to help to maintain a consistent style (although we might change that style soon, as soon as we can agree about a new look). Just as in programming, there are also a lot of useful things you can do by adding comments (which the end reader of the pages won't see).

Further reading

The website conventions bit

This is likely to change with structural change to the site, with style changes which we expect to implement and with the method by which the info is actually stored and served up.

... and it's not written yet, either :-)