It's a bit easy to get lost in this process and forget where you were, especially if you are interrupted. So @@ -45,7 +53,7 @@ it is handy to print out this page and tick off the steps as you do them.
The UK Caving Blog regularly upgrades its software which completely changes the hidden structure of the posts. They did this sometime between the 2017 and 2018 expos. When they do it again, the function -parser_blog(year, expedition, txt, sq="") in troggle/parsers/logbooks.py will need to be completely re-written. It is currently 70 lines long and uses several regular expression recognizers. +parser_blog(year, expedition, txt, sq="") in troggle/parsers/logbooks.py will need to be completely re-written. It is currently 102 lines long and uses several regular expression recognizers.
@@ -55,7 +63,11 @@ it is handy to print out this page and tick off the steps as you do them.
Now delete all the non-image files in the "ukcavingblog_files/" and "ukcavingblog2_files/" folders. -
Now use your favourite photo editor (e.g. Irfanview on Windows) or a command-line tool to resize all the photos. A maximum of 600 pixels wide or high, or 400 or 300 pixels wide if the image quality is poor. Keep the same filename then you don't have to try to edit the horrendously horrible HTML which was generated by the blog software. If there are any .png files, convert them to .jpg. +
Now use your favourite photo editor (e.g. Irfanview on Windows) or a command-line tool to resize all the photos. A maximum of 800 pixels wide or high, or 400 or 300 pixels wide if the image quality is poor. Keep the same filename then you don't have to try to edit the horrendously horrible HTML which was generated by the blog software. If there are any .png files, convert them to .jpg. +
There will be a lot. So install imagemagick and use the
+mogrify tool:
+
mogrify -resize 800>x800> *.jpg
+
will resize in-place, overwriting the files, and make the maximum dimension 800 pixels.
Look at all the photos in the file browser set to show thumbnails and delete all advertising logos etc., and delete the UK Caving header image which will be of random people not us.
It it noticeable that a single blog post may cover several trips, and that the blog post date may be several days after the trip(s). So you need to manually find out the exact date of the trip (from the other trip records and particularly from the Bier Book) and change the date on the entry. -
One blog post may also need to be split into several entries - in which case be careful with the 'id=' string as this needs to be unique for each entry. +
One blog post may also need to be split into several entries - don't worry about the 'id=' string as the parser now rewrites these uniquely. When you split a blog into different entries the quickest way to re-order everything in date-order is to export the logbook and re-import it. @@ -119,6 +131,7 @@ When you split a blog into different entries the quickest way to re-order everyt
Back to Logbooks Import for Nerds documentation.
+Back to Logbook internal format documentation.
Back to Logbooks for Cavers documentation.
All these scanned handwritten logbook entries are typed into a laptop (often the expo laptop) +which is then synchronised the version control system. + +
Do whatever you like to try and represent the logbook in html but do keep is simple. Don't try any clever HTML stuff. See the "Edit this Page" instructions for how to insert images and figures. + +
Logbooks are typed up and kept in the [expoweb]/years/[nnnn]/ directory as 'logbook.html'.
+ +When writing logbook entries, just use relative URLs to the same folder as your text, e.g. href="mynicepic.jpg" and the image and the logbook HTML will, for a 2017 expo, be put into /years/2017/. + +
One special suggestion: do not use <P> paragraph tags. Well, you can if you like, but they will be stripped out and replaced by double-newlines when the file is parsed. This is because <P> paragraph tags cannot be nested - that is not allowed in HTML - and the fragment you are writing will be merged with other fragments and may be put inside a higher-level paragraph. [This is also true for Cave Description text in "Edit this Cave".] + +
When you use the online form to create a new logbook entry or to edit an old one, when you click the button the changes are made immediately to the online database on the server and you can see the results immediately (except for the list of logbook entry titles in the Expo webpage). Also, when you click the button the entire database of logbook entries is written out to disc, with your new entry in the right place by date, and this file 'logbook.html' is registered with the version control system (git add and git commit). +
So when you click on any of the links to see the whole logbook, your edited entry will be there for all to see. + +
+Implementation note: the logbook.html file is not, at that time, re-parsed and re-imported into the database. This is unnecessary and would also expose us to potential loss of data if two people were editing the logbook of the same year at the same time. So the software doesn't do that. + +
+The only rigid structure is the markup to allow troggle to parse the logbook files into 'trips':
+
+<hr />
+<div class="tripdate" id="2007-07-12b">2007-07-12</div>
+<div class="trippeople"><u>Jenny Black</u>, Olly Betts</div>
+<div class="triptitle">Top Camp - Setting up 76 bivi</div>
+...text of the logbook entry...
+<div class="timeug">T/U 0.2 hrs</div>
+<div class="editentry"><br /><a href="/logbookedit/2007-07-12b">Edit this entry</a≷>br /></div>
+When using the online form all this complexity is handled automatically: +
Note: the ID's must be unique, so are generated from the trip date plus a,b,c etc. +when there is more than one trip on a day (if more than 26 on one day, then it uses a cyptographic hash of the content as a suffix).
+Note: T/U stands for "Time Underground" in decimal hours, e.g. "0.2" for 12 minutes (approx.) . We do not parse or collate this information currently. +
Note: the <hr /> is significant and used in parsing, it is not just prettiness. +
Note: follow this format exactly. No HTML comments or tabs or newlines. + +
Note this special format "Top Camp - " in the triptitle line:
+
+It denotes the cave or area the trip or activity happened in. It is a word or two separated from the rest of the triptitle with " - " (space-dash-space). Usual values
+for this are "Plateau", "Base camp", "264", "Balkon", "Tunnocks", "Travel" etc.
+
+<div class="triptitle">Top Camp - Setting up 76 bivi</div>
Note this special format "<u>Jenny Black</u>" in the trip-people line:
+
+It is necessary that one (and only one) of the people on the trip is set in <u></u> underline format. This is interpreted to mean that this is the author of the logbook entry. If there is no author set, then this is an error and the entry is ignored.
+
+<div class="trippeople"><u>Jenny Black</u>, Olly Betts</div>
+
If you like, you can put non-expo people in the trip-people line: "*Ol's Mum" with a * prefix and they will be totally ignored by troggle:
+
+or
+<div class="trippeople"><u>Jenny Black</u>, Olly Betts, *Ol's Mum</div>
+
+
+
+<div class="trippeople"><u>Jenny Black</u>, Olly Betts, *4 Hungarian Cavers</div>
+
Very old logbooks were simply typed up text documents with no formatting. + +
Old logbooks (prior to 2007) were stored as logbook.txt with just a bit of consistent markup to allow troggle parsing.
+ +The formatting was largely freeform, with a bit of markup ('===' around header, bars separating date,
There were also several previous (different) styles of using HTML. The one we are using now is the 5th variant. These older variants were eventually all reformatted into the current HTML format so that now (Jan. 2023) we only need to maintain the code for one parser. +
However, we missed one. The logbook for 1979 needs to be hand-edited to use the new format. + + + + + + +
+Back to Logbooks for Cavers documentation.
+Go on to Importing logbooks into troggle.
+Go on to Importing the UK Caving Blog.
+
This is usually done after expo but it is in excellent idea to have a nerd do this a couple of times during expo to discover problems while the people are still around to ask.
The nerd needs to login to the expo server using their own userid, not the 'expo' userid. The nerd also needs to be in the group that is allowed to do 'sudo'. +
This is rather a grand word for the hacked about spaghetti of regexes in troggle/parsers/logbooks.py . It is not a proper parser, just a phrase recognizer, and is horribly, horribly fragile. On the brightside, we now only have one of these instead of 5. +
Ideally this would all be done on a stand-alone laptop to get the bugs in the logbook parsing sorted out before we upload the corrected file to the server. Unfortunately this requires a full troggle software development laptop as the parser is built into troggle. The expo laptop in the potato hut is set up to do this (2023) but requires more nouse than is convenient to describe here.
However, the expo laptop (or any 'bulk update' laptop) is configured to allow an authorized user to log in to the server itself and to run the import process directly on the server. DON'T DO THIS. The slightest mistake in formatting will killl logbook functionality on the server for everyone. @@ -31,6 +29,8 @@ the trips are indexed and we can see who was doing what where. in another page. But read this page first.
With the new data entry form we should have far fewer problems with inventive hacks trying to do clever thngs with HTML, but it is entirely possible that the form can be used to input text which will then break the parser, most obviously by putting in a +<hr /> which is the separator between entries. This is not clever.
The nerd needs to do this:
This is documented on the logbook user-documentation page as even expoers who can do nothing else technical can at least write up their logbook entries. -
Older logbooks (prior to 2007) were stored as logbook.txt with just a bit of consistent markup to allow troggle parsing.
-The formatting was largely freeform, with a bit of markup ('===' around header, bars separating date,
There were also several previous (different) styles of using HTML. The one we are using now is the 5th variant. These older variants were eventually all reformatted into the current HTML format so that now (Jan. 2023) we only need to maintain the code for one parser. -
However, we missed one. The logbook for 1979 needs to be hand-edited to use the new format. - -
Back to Logbooks for Cavers documentation.
+Forward to Logbook internal format documentation.
Forward to Importing the UK Caving Blog.