Updated to use the new form

2026-02-21 11:45:18 +00:00 · 2023-09-02 23:55:00 +03:00
parent 0a1b236278
commit cb2d9c720e
4 changed files with 208 additions and 133 deletions
--- a/handbook/computing/logbooks-parsing.html
+++ b/handbook/computing/logbooks-parsing.html
@@ -10,16 +10,14 @@
 <h2 id="tophead">CUCC Expedition Handbook</h2>
 <h1>Logbooks Import</h1>

-    <!-- Yes we need some proper context-marking here, breadcrumb trails or something. 
-        Maybe a colour scheme for just this sequence of pages
-    -->
-
-
 <h3 id="import">Importing the logbook into troggle</a></h3>
 <p>This is usually done after expo but it is in excellent idea to have a nerd do this a couple of times during expo to discover problems while the people are still around to ask.

 <p>The nerd needs to login to the expo server using <em>their own userid</em>, not the 'expo' userid. The nerd also needs to be in the group that is allowed to do 'sudo'.

+<h4>The 'parser'</h4>
+<p>This is rather a grand word for the hacked about spaghetti of regexes in troggle/parsers/logbooks.py . It is not a proper parser, just a phrase recognizer, and is horribly, horribly fragile. On the brightside, we now only have one of these instead of 5.
+
 <h4>Ideal situation</h4>
 <p>Ideally this would all be done on a stand-alone laptop to get the bugs in the logbook parsing sorted out before we upload the corrected file to the server. Unfortunately this requires a full troggle software development laptop as the parser is built into troggle. The <var>expo laptop</var> in the potato hut is  set up to do this (2023) but requires more nouse than is convenient to describe here.
 <p>However, the <var>expo laptop</var> (or any 'bulk update' laptop) is configured to allow an authorized user to log in to the server itself and to run the import process directly on the server. DON'T DO THIS. The slightest mistake in formatting will killl logbook functionality on the server for everyone.
@@ -31,6 +29,8 @@ the trips are indexed and we can see who was doing what where.
 <a href="log-blog-parsing.html">in another page</a>. But read this page first.

 <h4>Current situation</h4>
+<p>With the new data entry form we should have far fewer problems with inventive hacks trying to do clever thngs with HTML, but it is entirely possible that the form can be used to input text which will then break the parser, most obviously by putting in a 
+<var><span style="color:red">&lt;hr /&gt;</span></var> which is the separator between entries. This is not clever.
 <p>The nerd needs to do this:
 <ol>
 <li>Look at the list of pre-existing old import errors at  <a href="/dataissues">Data Issues</a> </br>
@@ -39,7 +39,7 @@ the trips are indexed and we can see who was doing what where.
 This is documented in the <a href="folkupdate.html">Folk Update</a> process.
 <li>Log in to the expo server and run the update script (see below for details)
 <li>Watch the error messages scroll by, they are more detailed than the messages archived in the old import errors list
-<li>Edit the logbook.html file to fix the errors. These are usually typos, non-unique tripdate ids or unrecognised people. Some unrecognised people will mean that you have to fix them using the  <a href="folkupdate.html">Folk Update</a> process first.
+<li>Edit the logbook.html file to fix the errors. These are usually typos, too-clever HTML or unrecognised people. Some unrecognised people will mean that you have to fix them using the  <a href="folkupdate.html">Folk Update</a> process first.
 <li>Re-run the import script until you have got rid of all the import errors.
 <li>Pat self on back. Future data managers and people trying to find missing surveys will worship you.
 </ol>
@@ -76,30 +76,12 @@ cd troggle
 python databaseReset.py reset
 </code></pre>
 which takes between 300s and 15 minutes on the server.
-<h3 id="history">The logbooks format</h3>
-<p>This is documented on the <a href="../logbooks.html#format">logbook user-documentation page</a> as even expoers who can do nothing else technical can at least write up their logbook entries.

-<h3 id="history">Historical logbooks format</h3>
-<p>Older logbooks (prior to 2007) were stored as logbook.txt with just a bit of consistent markup to allow troggle parsing.</p>

-<p>The formatting was largely freeform, with a bit of markup ('===' around header, bars separating date, <place> - <description>, and who) which allows the troggle import script to read it correctly. The underlines show who wrote the entry. </p>
-<p>There were also several previous (different) styles of using HTML. The one we are using now is the 5th variant. These older variants were eventually all reformatted into the current HTML format so that now (Jan. 2023) we only need to maintain the code for one parser.
-  <p>However, we missed one. The logbook for 1979 needs to be hand-edited to use the new format.
-
-<!--
-<p>So the format should be:</p>
-
-<code>
-===2009-07-21|204 - Rigging entrance series| Becka Lawson, Emma Wilson ===
-</br>
-&#123;Text of logbook entry&#125;
-</br>
-T/U: Jess 1 hr, Emma 0.5 hr
-</code>
-->
 <hr />
 <p>
 Back to <a href="../logbooks.html">Logbooks for Cavers</a> documentation.<br>
+Forward to <a href="logbooks-format.html">Logbook internal format</a> documentation.<br>
 Forward to <a href="log-blog-parsing.html">Importing the UK Caving Blog</a>. 
 <hr /></body>
 </html>