expoweb/handbook/update.htm

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>CUCC Expedition Handbook: Updating the website</title>
<link rel="stylesheet" type="text/css" href="../css/main2.css" />
</head>
<body>
<h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>Updating the website - HOWTO</h1>

<h2>Rationale for having a system at all: </h2>

<p>For years, the website has been built by hand-editing HTML pages,
"traditionally" by one or two people, in constant email contact. Obviously
those people were thoroughly familiar with the locations of all the pages,
the conventions used for naming files, the style used throughout the site and
a few conventions for adding things which made the site easier to maintain.
There are also a few odd tools written to hack from one format (such as a
scanned journal page) to another which those users were happy with (but which
may not be useful for anyone else - typically written in BBC Basic on an
Acorn machine :-). The survey dataset was maintained in much the same way,
though a lot more of that work was done actually on expedition, and therefore
by a lot more people.</p>

<p>This work becomes quite onerous when neither of the two usual maintainers
went on expo this year, and in any case, it is usually far better for the
people who actually explored the cave to integrate their stuff into the
existing cave descriptions. To make this easier (indeed, possible !), both
the web pages and the survey data are now held in a CVS repository, which
means lots of people can edit different parts (even different parts of the
same file) at the same time without causing chaos, and a record is kept of
who changed what, and why.</p>

<p>This how-to guide is split into two parts - one for those who have never used
CVS before, and one for everyone, outlining the structure of the site and the
few conventions used within it. It is still possible for you to contribute to
the updating of the site without learning CVS, by sending changes or fault
reports (as plain text) to one of the regular maintainers - it is not one of
the aims of this system to exclude non-computer-nerds, just to reduce the
computer-nerds' workload :-)</p>

<h2>The CVS bit</h2>

<!-- Written 2001-10-08 to 2001-10-12, Andy Waddington -->

<p>This bit should not change significantly with structural change to the
website (such as serving it all via PHP from a database). I hope it is all
clear - it was written in parallel with my own learning how to use CVS. It
may be verbose, but I hope that writing it this way means that it is low on
omissions.</p>

<h3>CVS Rationale:</h3>

<p>The problem with having more than one or two people do the work is that it
generates increasing amounts of email to ensure that people don't update old
versions of pages, and throw away someone else's recent work. The solution to
all this is to put the entire site into a system called "Concurrent Versions
System" under which lots of people can make changes, all of which are safely
merged together without losing any changes. Just occasionally, the system
will detect that two people have changed the same bit of text in conflicting
ways, and it is then up to them to sort it out, but this is a great
improvement over what has gone before.</p>

<h3>How does it work ?</h3>

 <p>There is a central CVS repository, from which someone needing to do updates
can "check out" a copy of the page they want to change. They then make the
changes or additions required, and "commit" the new page back into the
system. The system keeps a record of all such changes, with a log message
in which you should say why the change was made (the system knows who you
are and when you committed the change, so you don't need to tell it that).</p>

<p>In order to join in the work, the central CVS repository needs to know
about you, so it can allow you to commit changes into the system (and thus
to stop random hackers replacing cave surveys with redirects to a porn
site, or whatever).</p>

<h2>Becoming a website updater</h2>

<h3>Software requirements</h3>

<p>You need a system which has a CVS client and supports SSH, so that you can
log in without sending a password in clear text over the Internet. You need
an editor with which you are happy to edit web pages. Ideally this will
NOT be one of the commercial WYSIWYG web editors which add whole loads of
guff to your webpage in a manner that you don't see (and which, incidentally,
makes the pages vastly harder to maintain for the next person who comes
along with a basic HTML editor, not to mention making the pages load more
slowly and typically work in fewer browsers). Most of us use basic
text editors with extensions that make editing HTML easier. The easiest
way to get all this is to have a Linux machine, since most distributions
have all the tools you need ready built in. The rest of this page assumes
that you are doing all this on a recent Linux system. There are a few
useful links for those using Mac, RISC OS or Windows machines along
with the links to more detailed documentation <a href="#morelinks">at the end of
this page.</a> The CVS machine itself is a Linux box, and some of the
commands you need to use involve typing at the command line on that machine,
so some familiarity with Unix/Linux will make you feel more at home.</p>

<p>The CVS repository is on a machine called cvs.cucc.survex.com, so you need to get
(from Mark Shinwell, at present) a username and password. If you are told
these via email, the first thing you will need to do is change the password,
using ssh:</p>

<pre>ssh <i>username</i>@cvs.cucc.survex.com passwd</pre>

<p>(where <tt><i>username</i></tt> is the username given to you; for CU
students this will usually be your CRSID). This command is in three parts:
"ssh" says to open an encrypted connection
"<i>username</i>@cvs.cucc.survex.com" says which user on what machine to
connect to, "passwd" is the command you are going to execute on that machine,
as that user, in this case a command to change your password.</p>

<p>this will ask for the existing password to open the connection to the cvs
machine (this allows the "ssh <i>username</i>@cvs.cucc.survex.com" to happen), then
ask for it again (to validate the passwd command). Then you need to type
your new password twice (second time to confirm that you really typed what
you thought the first time), and should then tell you that the password was
updated OK. You will need a password made up of mixed uppercase and
lowercase letters, digits and punctuation, and not apparently based on any
dictionary word. This will make it a real pain to remember and type, but we
get round that in the next step:</p>

<p>Generate a public/private keypair to do authentication automatically.
With recent versions of openssh, you need to type</p>

<pre>ssh-keygen -t dsa</pre>

<p>while with older versions, you may find that "-t" is not a valid option, in
which case</p>

<pre>ssh-keygen -d</pre>

<p>does the same thing. Either way, it will suggest a place to store the
keypair (~/.ssh/id-dsa) which you should accept. This needs to be kept
secure (so don't generate the keypair and keep it on a publically
accessible machine, for example).</p>

<p>We want to force ssh to use protocol 2, but in a typical distribution,
it tries protocol 1 first - this will oblige you to type your password
every time, which is a pain. You can change this globally, for everyone,
by altering /etc/ssh/ssh_config but it is probably best to alter it for
just the user you will be when doing the CVS commands. Create a file
~/.ssh/config containing the lines</p>

<pre># Make sure we use protocol 2 to avoid tedious password typing:
Host cvs.cucc.survex.com
Protocol 2</pre>

<p>Now copy the public key to the server. One thing that might trip you up is
that the directory ~/.ssh may not exist on the remote machine. To create it
and copy the key:</p>

<pre>ssh <i>username</i>@cvs.cucc.survex.com mkdir ~/.ssh
scp ~/.ssh/id_dsa.pub <i>username</i>@cvs.cucc.survex.com:.ssh/authorized_keys2</pre>

<p>(note the nasty American spelling here - easy to mistype if you're English)-:
Those commands will ask for your password again, but that should be the last
time you'll need to enter it on that machine. Having done all that, you
should now be able to do</p>

<pre>ssh cvs.cucc.survex.com</pre>

<p>without being asked for a password. That would get you a command line to do
things on the cvs machine, but for most jobs, you only need to do CVS
commands on your own machine, so get out of that command line with</p>

<pre>exit</pre>

<p>To use the CVS commands on your local machine (for checking out pages
to edit and committing them back) you need to tell cvs where the archive
is. You can include a "-d <i>username</i>@cvs.cucc.survex.com:/export/cvs" with
cvs commands (useful if you use cvs on more than one repository), but
it is usually easier to add</p>

<pre>export CVSROOT=<i>username</i>@cvs.cucc.survex.com:/export/cvs</pre>

<p>to some script that will be executed before you want to use cvs. Easiest
would usually be ~/.bashrc (assuming your default shell is bash). Also
add</p>

<pre>export CVS_RSH=ssh</pre>

<p>You are now ready to get a copy of the page(s) you want to edit. If you
don't have a copy of the site on CD, it may be easiest to download the
whole site - having the other pages for context makes life much easier
if you are maintaining links between pages. Move into a local directory
where you will edit the pages, I use a tree in my own home directory,
which, for historical reasons, is called chaos, but you can choose any
directory where you will have the write-access needed to edit pages:</p>

<pre>cd ~/chaos
cvs checkout expoweb</pre>

<p>and then move into the directory tree to make your changes. Thus far,
everything has been at the command line, but often doing the editing
will be more convenient through a desktop interface. You might find that
you want to set your file browser *not* to display an HTML view of the
files, otherwise you will end up browsing the pages, rather than the
file tree, which makes editing much harder :-(</p>

<p>When you have made your changes, you need to check that no-one else
has changed things in a way which clashes. Its also a good idea to
keep your own copies of the pages as up-to-date as possible, so at
the top level of your copy ( ~/chaos in my case ):</p>

<pre>cvs update</pre>

<p>If you are unlucky (most likely if you made changes a long time after you
last ran update) this will tell you about conflicts which you'll need to
resolve with the other person(s) who made changes. Make sure you do
resolve these changes, since just committing your version throws away
the other person's changes from the current version (CVS keeps a record
of all the changes, so they can be recovered, but it is easier and much
more polite to resolve the problem through dialogue). Once all is OK</p>

<pre>cvs commit</pre>

<p>If you are updating the whole tree like this, it is a good idea to make
sure you get any new directories and remove any old ones (which doesn't
happen by default). To do this specify</p>

<pre>cvs update -Pd</pre>

<p>You can just update one subdirectory (and everything under it) or an
individual file by adding its name to the end of the command, such as</p>

<pre>cvs update expo/smkridge/161
cvs commit expo/smkridge/161/france.htm</pre>

<p>if you create a new page, lets say for a description of a new cave on
the plateau, 1623/505, it would probably be called 505.htm in the
plateau subdir. "cvs commit" will not work on files that cvs does not
know about, so to let cvs know it is there use</p>

<pre>cvs add expo/plateau/505.htm
cvs commit expo/plateau/505.htm</pre>

<p>cvs works by maintaining DIFFerences between files as they are updated.
This works on text files, and cvs can convert the line-ending conventions
on different platforms. If you add a binary file, that sort of translation
can be extremely bad news, so use "-kb" to tell cvs when adding a binary
file:</p>

<pre>cvs add -kb expo/plateau/others/505.png</pre>

<!-- there is a way to tell the repository to know about some standard
filename extensions, like .png, .gif, .jpg, and always treat them as
binaries - we should do this !! -->

<p>Sometimes, you may find that an unwieldy chunk of cave description needs
to be split into two or more pages (this happened quite often with 161).
It is usually clearer to everyone if none of the new files have the same
name as the old one. So in addition to using "cvs add" to add the new
pages, you need</p>

<pre>rm expo/smkridge/161/nowsplit.htm
cvs remove expo/smkridge/161/nowsplit.htm
cvs commit expo/smkridge/161/nowsplit.htm</pre>

<p>Whenever you do a "cvs commit" you will be asked for a log message, which is
just some text to help others know what sort of update you were doing.  So
something like "correcting typos" or "added new passages off Puerile Humour",
"fixing broken links" are the minimum sort of level you need to add. (Martin,
are you reading this?) It is a good idea to commit files back to the repository
one or two at a time, so the comments can be relevant to each particular file.
It is often worth while committing unrelated changes separately, even if they
affect the same file. For instance, if you correct some typos in one page, and
link a new photo to several pages, including that same one, it is better to
commit each set of changes separately. This does take some discipline, however,
as it is usually just whilst you are making one set of changes that you notice
the typo, and if you don't change it then and there, it gets forgotten. Your
call.</p>

<p>Of course, if you did some major overhaul to a lot of files (like changing
lots of links after some sort of reorganisation) you'd want to commit them
all together with a suitable log message. It really is a good idea to
avoid doing this whilst other people might be editing some of the files,
as you could spend ages resolving conflicts...</p>

<h3>Avoiding cocks-up</h3>

<p>Running cvs update just before you start editing saves you making changes to
out-of-date stuff, and committing changes soon after editing saves everyone
else from working on out-of-date pages. Both of which will save you work
resolving conflicts later on - but only if *everyone* remembers to do this.</p>

<p>If you had to leave some editing part way through, and came back to it later,
its easy to forget what you have finished and what you haven't. Running</p>

<pre>cvs diff -u</pre>

<p>will tell you what you changed *from the copy you checked out* (so you
don't get confused by a list of things which other people have changed in
the meantime). This helps to avoid leaving things like notes to yourself
lying around in the file, and should help to avoid failing to update
links, though that is harder, since you have to notice that a change you
meant to make hasn't appeared in the list of changes.</p>

<pre>cvs -n update </pre>

<p>doesn't actually update anything, but tells you what would have happened.
This is useful at various times, such as for spotting conflicts early on
whilst you are part way through doing a big update. Changing it to</p>

<pre>cvs -nq update</pre>

<p>suppresses some of the less useful output. Files which are marked with
a "?" are ones which cvs doesn't know about - maybe you haven't "cvs add"ed
them yet.</p>

<h3>Updating the website</h3>
<p>Having committed any changes to the cvs tree, connect to cvs.cucc.survex.com
via ssh and run the command <tt>/home/cucc/bin/expoweb-update</tt>. (You can do this
all in one step by just typing <tt>ssh cvs.cucc.survex.com
/home/cucc/bin/expoweb-update</tt>.)</p>

<h3>CVS documentation</h3>

<p>Obviously, CVS has lots of bells and whistles that you don't need just to
edit a few web pages. Here are a few links which you might care to look at.
Many more are accessible via <a href="http://www.cvshome.org/">CVS' home
page</a>.</p>

<p>OK, that's how to use CVS. You might now like to read a bit about editing
the web pages you checked out - there are a few conventions to help to
maintain a consistent style (although we might change that style soon, as
soon as we can agree about a new look). Just as in programming, there are
also a lot of useful things you can do by adding comments (which the end
reader of the pages won't see).</p>

<h3><a id="morelinks">Further reading</a></h3>

<ul>
<li><a href="http://www.cvshome.org/new_users.html">CVS for new users</a></li>
<li><a href="http://www.cvshome.org/docs/manual">Version Management with
CVS</a> (the official manual)</li>
<li><a href="http://cvsbook.red-bean.com/">Open Source development with
CVS</a> (chapters from a book, aimed at programmers, but almost all
applicable to open source document authors too)</li>
<li><a href="http://www.cvshome.org/dev/addons.html">CVS Add-ons</a> page
includes graphical CVS clients for Mac, Windows and, of course, Linux.</li>
<li>And a <a href="http://gallery.uunet.be/John.Tytgat/cvs/">CVS client
for RISC OS</a> (but note that this didn't appear to support ssh when this
page was written (2001-10-12)).</li>
<li><a href="http://www.durak.org/cvswebsites/">CVS for websites</a>:
most manuals assume you are using CVS to develop software - this site is
specific to using CVS to maintain web pages.</li>
</ul>

<h2>The website conventions bit</h2>

<p>This is likely to change with structural change to the site, with style
changes which we expect to implement and with the method by which the
info is actually stored and served up.</p>

<p>... and it's not written yet, either :-)</p>

<ul>
<li>Structure</li>
<li>Info for each cave &ndash; automatically generated by <tt>make-indxal4.pl</tt></li>
<li>Contents lists &amp; relative links for multi-article publications like
 journals. Complicated by expo articles being in a separate hierarchy.</li>
<li>Translations</li>
<li>Other people's work - the noinfo hierarchy.</li>
<li>Style guide for writing cave descriptions: correct use of boldface
(<i>once</i> for each passage name, at the primary definition thereof; other
uses of the name should be links to this, and certainly should not be bold.)
</ul>

<hr />

<ul id="links">
<li><a href="index.htm">Expedition Handbook</a>
<ul>
	<li><a href="survey/index.htm">Surveying guide</a> - Overview</li>
	<li><a href="look4.htm">Prospecting guide</a> &ndash; Overview</li>
	<li><a href="rescue.htm">Rescue guide</a></li>
	<li><a href="rigit.htm">Rigging guide</a></li>
	<li><a href="photo.htm">Photography guide</a></li>
</ul></li>
<li><a href="../infodx.htm">Index to info/topics pages</a></li>
<li><a href="../indxal.htm">Full Index to area 1623</a>
<ul>
	<li><a href="../areas.htm">Area/subarea descriptions</a></li>
</ul></li>
<li><a href="../index.htm">Back to Expedition Intro page</a></li>
<li><a href="../../index.htm">Back to CUCC Home page</a></li>
</ul>
</body>
</html>