diff --git a/handbook/computing/logbooks-parsing.html b/handbook/computing/logbooks-parsing.html index cfbf16417..94d5782db 100644 --- a/handbook/computing/logbooks-parsing.html +++ b/handbook/computing/logbooks-parsing.html @@ -8,15 +8,23 @@
[This now happens every time a logbook entry is edited using the online form (pre-2025 style) or every time a full logbook as a single page is requested (after Nov.2025).] +
This is usually done after expo but it is in excellent idea to have a nerd do this a couple of times during expo to discover problems while the people are still around to ask.
The nerd needs to login to the expo server using their own userid, not the 'expo' userid. The nerd also needs to be in the group that is allowed to do 'sudo'. +
This is rather a grand word for the hacked about spaghetti of regexes in troggle/parsers/logbooks.py . It is not a proper parser, just a phrase recognizer, and is horribly, horribly fragile. On the brightside, we now only have one of these instead of 5. +
This is rather a grand word for the hacked about spaghetti of regexes in troggle/parsers/logbooks.py . It is not a proper parser, just a phrase recognizer, and is horribly, horribly fragile. On the brightside, we now only have one of these instead of 5. It takes the logbook.html file for each expedition year , parses it, and imported the data into the online database.
Ideally this would all be done on a stand-alone laptop to get the bugs in the logbook parsing sorted out before we upload the corrected file to the server. Unfortunately this requires a full troggle software development laptop as the parser is built into troggle. The expo laptop in the potato hut is set up to do this (2023) but requires more nouse than is convenient to describe here. @@ -29,7 +37,7 @@ the trips are indexed and we can see who was doing what where. in another page. But read this page first.
With the new data entry form we should have far fewer problems with inventive hacks trying to do clever thngs with HTML, but it is entirely possible that the form can be used to input text which will then break the parser, most obviously by putting in a +
With the new data entry online form we should have far fewer problems with inventive hacks trying to do clever thngs with HTML, but it is entirely possible that the form can be used to input text which will then break the parser, most obviously by putting in a <hr /> which is the separator between entries. This is not clever.
The nerd needs to do this: