Docs on QM code status, troggle redesign

2025-12-08 14:54:28 +00:00 · 2020-05-14 22:28:13 +01:00
parent 2cb287ca81
commit 9d75a09cf5
6 changed files with 107 additions and 25 deletions
--- a/handbook/computing/folkupdate.html
+++ b/handbook/computing/folkupdate.html
@@ -5,7 +5,7 @@
 <title>CUCC Expedition Handbook: People Update</title>
 <link rel="stylesheet" type="text/css" href="../../css/main2.css" />
 </head>
-<body>
+<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
 <h2 id="tophead">CUCC Expedition Handbook</h2>
 <h1>The list of people on expo</h1>
--- a/handbook/computing/logbooks-parsing.html
+++ b/handbook/computing/logbooks-parsing.html
@@ -5,7 +5,7 @@
 <title>CUCC Expedition Handbook: Logbook import</title>
 <link rel="stylesheet" type="text/css" href="../../css/main2.css" />
 </head>
-<body>
+<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>>
 <h2 id="tophead">CUCC Expedition Handbook</h2>
 <h1>Logbooks Import</h1>
@@ -59,15 +59,7 @@ Calculating GetPersonExpeditionNameLookup for 2017
 <p>Errors are usually misplaced or duplicated &lt;hr /&gt; tags, names which are not specific enough to be recognised by the parser (though it tries hard) such as "everyone" or "et al." or are simply missing, or a bit of description which has been put into the names section such as "Goulash Regurgitation".
 <h3 id="history">The logbooks format</h3>
-<p>This is documented on the <a href="..logbooks.html#format">logbook user-documentation page</a> as even expoers who can do nothing else technical can at least write up their logbook entries.
+<p>This is documented on the <a href="../logbooks.html#format">logbook user-documentation page</a> as even expoers who can do nothing else technical can at least write up their logbook entries.
 <p>[ Yes this format needs to be re-done using a proper structure:<br />
 <code><pre>
 &lt;div class="logentry"&gt;<br />
 <span style="text-decoration: line-through wavy red;">&nbsp;&nbsp;&nbsp;&nbsp;</span>
 &lt;/div"&gt;</pre></code>
 it's on the to-do list...]
 <h3 id="history">Historical logbooks format</h3>
 <p>Older logbooks (prior to 2007) were stored as logbook.txt with just a bit of consistent markup to allow troggle parsing.</p>
--- a/handbook/logbooks.html
+++ b/handbook/logbooks.html
@@ -144,10 +144,18 @@ idea to type up <i>just your trip(s)</i> in a separate file, e.g. "logbook-mynew
 &lt;div class="timeug"&gt;T/U 10 mins&lt;/div&gt;</pre></code>
 <p>Note:  the ID's must be unique, so are generated from 't' plus the trip date plus a,b,c etc. 
 when there is more than one trip on a day.</p>
-<p>Note: T/U stands for "Time Underground" in hours (6 minutes would be "0.1 hours").
+<p>Note: <var><span style="color:red">T/U</span></var> stands for "Time Underground" in hours (6 minutes would be "0.1 hours").
-<p>Note: the &lt;hr /&gt; is significant and used in parsing, it is not just prettiness.
+<p>Note: the <var><span style="color:red">&lt;hr /&gt;</span></var> is significant and used in parsing, it is not just prettiness.
 <p>Note this special format <var>"<span style="color:red">Top Camp - </span>"</var> in the triptitle line:
 <code><pre>&lt;div class="triptitle"&gt;<span style="color:red">Top Camp - </span>Setting up 76 bivi&lt;/div&gt;</pre></code>
 It denotes the <var>cave</var> or <var>area</var> the trip or activity happened in. It is a word or two separated from the rest of the triptitle with "<var> - </var>" (space-dash-space). Usual values
 for this are "Plateau", "Base camp", "264", "Balkon", "Tunnocks", "Travel" etc.
 <p>Note this special format <var>"<span style="color:red">&lt;u&gt;Jenny Black&lt;/u&gt;</span>"</var> in the trippeople line:
 <code><pre>&lt;div class="trippeople"&gt;<span style="color:red">&lt;u&gt;Jenny Black&lt;/u&gt;</span>, Olly Betts&lt;/div&gt;
 </pre></code>
 It is necessary that one (and only one) of the people on the trip is set in <span style="color:red">&lt;u&gt;&lt;/u&gt;</span> underline format. This is interpreted to mean that this is the author of the logbook entry. If there is no author set, then this is an error and the entry is ignored.
 <hr />
--- a/handbook/survey/qmentry.html
+++ b/handbook/survey/qmentry.html
@@ -12,7 +12,7 @@
 <h2>QM data and cave descriptions</h2>
 <p>
-This document describes how to include Qustion Marks (QMs) and cave descriptions in .svx files. 
+This document describes how to include Question Marks (QMs) and cave descriptions in .svx files. 
 <p>There
 are dedicated fields in the template.svx file for this purpose, but there has been laxness recently on filling them in.
@@ -68,6 +68,17 @@ Here is an example from the last bit of bipedalpassage.svx in 264. Note that eac
 ;QM6  C	bipedalpassage.31	-	Very good location where main phreatic passages and enlarges - but far side of chamber choked. One part of choke was not accessed as needs 2m climb up to poke nose in it. A good free climber could do this or needs one bolt to be sure no way on. Very strong draft in choke! Interesting southerly trend at margin of known system
 </code></pre>
 <p>
 The format for question mark lists is <br>
 <ul>
 <li>QM identifier, <li><a href="../../qm.html">Quality Grade</a>, <li>Area indicator, <li>decription of QM. 
 </ul>
 <p>The QM numbers themselves are in the format <br>
 <ul>
 <li><a href="../../qm.html">Discoverer identifier</a>, <li>Year of discovery, <li>Cave identifier, <li>serial number. 
 </ul>
 <p>This format is <a href="../../qm.html">documented in the QM conventions</a> page.
 <p>
 The example below demonstrates correct and effective use of the QM list referring back to earlier elements in the svx file:
 <p>
@@ -80,6 +91,10 @@ is a very inefficient use of time.
 <p>Also if the person reading it hasn’t been to the bit of cave (which is, like, <em>the whole point</em>, then the data has a higher chance of being incorrect. It is not always easy to interpret Tunnel or Therion 
 drawings correctly with this sort of thing. 
 <h3>Programming note</h3>
 <p>Better handling of historic QMs is a current, occasionally active, area of
 development in our online systems. The current status is <a href="../troggle/scriptsqms.html">documented here</a>.
 <h3>Conclusion</h3>
 <p>Survey data recorded in .svx files is incomplete if there is no QM List data and
 cave description data!
--- a/handbook/troggle/scriptsqms.html
+++ b/handbook/troggle/scriptsqms.html
@@ -34,7 +34,7 @@ tl;dr - use <em>svx2qm.py</em>. Look at the output at:<br>
 <p>There are four ways we have used to manage QMs:
 <ol>
-<li><strong>Perl script</strong> - Historically QMs were not in the survex file but typed up in a separate list <var>qms.csv</var> for each cave system. A perl script turned that into an HTML file for the website.
+<li><strong>Perl script</strong> - Historically QMs were not in the survex file but typed up in a separate list <var>qms.csv</var> for each cave system. A perl script turned that into an HTML file for the website. But there appear to be 3 different formats for this.
 <li><strong>Perl + troggle</strong> - One of troggle's input parsers "QM parser" is specifically designed to import the three HTML files produced from <var>qms.csv</var> but doesn't do anything with that data (yet).
 <li><strong>Python script</strong> - Phil Withnall's 2019 script <em>svx2qm.py</em> scans all the QMs in a single survex file. See below for how to run it on all survex files.
 <li><strong>New troggle</strong> - Sam's recent addition to troggle's "survex parser" makes it recognise and store QMs when it parses the survex files.
@@ -52,6 +52,24 @@ tl;dr - use <em>svx2qm.py</em>. Look at the output at:<br>
 <a href="../../1623/204/qm.html">/1623/204/qm.html</a><br />
 <p>Note that the <var>qms.csv</var> file file used as input by this script is an <em>entirely different format and table structure</em> from the <var>qms.csv</var> file produced by <a href="#svx2qm">svx2qm.py</a>.
 <p>And in fact the formats of these 3 qm.csv files are <em>not the same</em> (These are the
 "older or artisanal QM formats" referred to by Phil Withnall at th ebottom if this page) :
 Fields in 204/qm.csv are:
 <code><pre><span style="font-size:small">Number, grade, area, description, page reference, nearest station, completion description, Comment
 e.g.
 C1999-204-09    C    Wolp    Hole in floor through dangerous boulders        veined.10    Filled with rocks
 </span></pre></code>
 Fields in 258/qm.csv are:
 <code><pre><span style="font-size:small">Cave, year, number, Grade, nearest station, description, completion description, found by, completed by
 e.g.
 258  2006  27        C      258.gknodel.4    Small passage to E in Germkn”del          Sandeep Mavadia and Dave Loeffler
 </span></pre></code>
 Fields in 264/qm.csv are:
 <code><pre><span style="font-size:small">Year, number, Grade, Survey folder ref#, Surveyname, Nearest Station number, Area of the cave, Description, Y if marked on drawn-up survey,
 2014  7          C        2014#11      roomwithaview    4        Room With a View      Room With a View: "Probably chokes"  opposite stations 4 and 5      ALREADY EXPLORED PROBABLY
 </span></pre></code>
 <p>There are also three versions of the QM list for cave 161 (Kaninchenhohle) apparently produced by this method but hand-edited:<br />
 <a href="../../1623/161/qmaven.html">/1623/161/qmaven.html</a> 1996 version<br />
 <a href="../../1623/161/qmtodo.html">/1623/161/qmtodo.html</a> 1998 version<br />
@@ -60,6 +78,25 @@ tl;dr - use <em>svx2qm.py</em>. Look at the output at:<br>
 <p>In the /1623/204/ folder there is a script <em>qmreader.pl</em> which apparently does the inverse of  
 <em>tablize-qms.pl</em>: it transforms a QMs' HTML file into a CSV file.
 <p>As Wookey says (Slack, 7 Jan. 2020): 
 "I'm not quite sure what the best format is. Some combination of the
 258 and 264 formats might be best. Including the cave number seems
 pointless. Including 'conclusion' info seems like a good idea. I'm not
 sure there what the benefit of separating the 'surveyname' and
 'nearest station' fields is. Having an 'area of cave' field is somewhat useful
 for grouping, even though it is sort-of repeating the 'survey-station' info.
 If I was making a QM list I'd enter these fields:
 year, number, Grade, nearest station, folder reference, description, found by, completed (Year), completion description/cave description link, completed by
 with these details:
 <ul>
 <li>number is just the serial number, not the whole year-serial-grade
 <li>'nearest station' does not include the cave number
 <li>completed is blank (for not completed) or a year for when it was done
 <li>completeion description should be a link to the relevant bit of cave description, but if that doesn't exist
 </ul> then a short description here is OK."
 <h4 id="qms.py">troggle/parsers/qms.py</a></h4>
 <p>The parser <em>troggle/parsers/qms.py</em> currently imports those same <var>qm.csv</var> files from the perl script into troggle using a mixture of csv and html parsers:
@@ -88,6 +125,11 @@ The 2019 copies are online in /expofiles/:
 This will work on all survex *.svx files even those which have not yet been run through the troggle import process. 
 <p>Phil says (13 April 2020): <em>"The generated files are not meant to be served by the webserver, it’s a tool for people to run locally. Someone could modify it to create HTML output (or post-process the CSV output to do the same), but that is work still to be done."</em>
 <h4>troggle/parsers/survex.py</a></h4>
 <p>The QMs inside thge survex files are parsed by troggle along with all the other information
 inside survex files and stored in the database. But the webpages which display tis data are rudimentary, e.g. <a href="/getQMs/1623-204">/getQMs/1623-204</a> or <a href="/cave/qms/1623-204">/cave/qms/1623-204</a>.
 Looking through urls.py and core/view_caves.py we see a lot of code for providing new QM numbers, producing lists of QMs for a given cave and for downloading QM.csv files generated by the database. But none of it appears to be working today (14 May 2020), see below.
 <h4 id="samqms">Sam's parser additions</a></h4>
 <p>Troggle <em>troggle/parsers/survex.py</em> currently parses and stores all the QMs it finds in survex files. The tables where the data is put are listed in <a href="datamodel.html">the current data model</a> including structure for ticking them off.
@@ -108,7 +150,7 @@ So someone was busy at one time.
 <h2>QMs - monitoring progress</h2>
 <h4 id="find-dead-qms">find-dead-qms.py</h4>
-<p>This finds references to <em>completed</em> qms in the qm.csv files in the cave folders (/1623/ etc.) in the :expoweb: <a href="../computing/repos.html">repository</a>. It looks to see which QMs have been completed but where there is not yet a matching text in the  cave description.
+<p>This stand-alone script finds references to <em>completed</em> qms in the qm.csv files in the cave folders (/1623/ etc.) in the :expoweb: <a href="../computing/repos.html">repository</a>. It looks to see which QMs have been completed but where there is not yet a matching text in the  cave description.
 <blockquote><em>Quick and dirty Python script to find references to completed qms in the 
 cave description pages. Run this to find which bits of description
 need updating.
@@ -153,7 +195,7 @@ I guess it all depends on what questions people are trying to answer using the Q
 as to how (and where) best to present it. I’m afraid I don’t have any suggestions there.
 :Rob Watson wrote some documentation about QMs
-:http://expo.survex.com/handbook/survey/qmentry.html 
+:<a href="../survey/qmentry.html">http://expo.survex.com/handbook/survey/qmentry.html</a> 
 :is there anything subtle missing  as to how they are used ?
 Nope, I think Rob’s page covers it all. That page also documents the correct QM format 
--- a/handbook/troggle/trogdesignx.html
+++ b/handbook/troggle/trogdesignx.html
@@ -50,6 +50,15 @@ Which is fun, but not useful. And not just because it is immature. None of
 this addresses <strong>our biggest problem: devising  something that can be
 maintained by fewer, less-expert people who can only devote short snippets
 of time and not long-duration immersion</strong>.
 <h3>Our biggest problem</h3>
 We need:
 <ul>
 <li>something that can be maintained by fewer, less-expert people
 <li>who can only devote short snippets of time
 <li>without requiring weeks of long-duration deep immersion
 </ul>
 <h3>Federation of independent scripts</h3>
 <p>
 I know Wookey has been thinking of a loose federation of independent scripts
 working on the same data, but the more I look at troggle and the tasks it
@@ -63,22 +72,38 @@ wallets.py does (originally by Martin Green) is in troggle already - but
 better. [There is a many:many relationship between svx files and wallet
 directories in reality, not 1:1]
 <p>
 <h3>troggle now</h3>
 Troggle is very nearly fully working (not with as many functions as
-originally envisaged admittedly) but very nearly. There are several
+originally envisaged admittedly) but very nearly. 
-import/parsers which are aborting without producing error messages, so most
+The QM data display needs writing; but other than that it's in pretty good
 of the survey blocks don't get loaded where they actually get displayed, and
 the surveyscan images only appear as filename strings which are not checked
 for referential integrity, so we are missing a consistency check there, and
 the QM data display needs writing; but other than that it's in pretty good
 shape. [Ah, yes, we should really add "drawings" as a core concept as well
 as "surveyscans". That will be a bit of work.]
 <p>
 <h3>Need for separate data-import checking scripts</h3>
 The one thing external scripts would be really useful for is syntax checking
 and reference checking prior to import.  I have found some weird and
 wonderful filename paths inside the tunnel and therion drawings, and in
 survex *ref paths.
-<p>
+
-<h3>Addendum</h3>
+<h3>Non-django troggle</h3>
 <p>Another possibility is ripping django out of troggle and leaving bare python
 plus a SQL database. This means that programmers would need to understand more SQL
 but would not need to understand "django". Arguably this 
 could mean that we could gain. 
 <p>Writing our own multi-user code would not be sensible, hence the database. 
 But we could move to a read-only system where the only writing happens on data-import.
 Then we could use python 'pickle()' or 'json()' read-only data structures, but we 
 would need to create all our own indexing and cross-referencing code.
 <p>There would be more lower-level code, but the
 different segments of the system could be in caving-sensible modules not
 django-meaningful modules. And we would not have all the extra 
 language-like constructs that django introduces e.g. <var>X.objects.set_all()</var>, which
 modern editors complain about because it is a django idiom and 
 not a function within the python codebase.
 (We could retain an HTML templating engine though.)
 <h3><em>Addendum</em></h3>
 <p>There is a templating engine <a href="https://mozilla.github.io/nunjucks/">Nunjucks</a> 
 which is a port to JavaScript of the Django templating system we use 
 (via <a href="https://palletsprojects.com/p/jinja/">Jinja</a> - these are the same people who do Flask). This would be an obvious thing to use if we needed to go in that direction.