Docs on QM code status, troggle redesign

This commit is contained in:
Philip Sargent 2020-05-14 22:28:13 +01:00
parent 2cb287ca81
commit 9d75a09cf5
6 changed files with 107 additions and 25 deletions

View File

@ -5,7 +5,7 @@
<title>CUCC Expedition Handbook: People Update</title> <title>CUCC Expedition Handbook: People Update</title>
<link rel="stylesheet" type="text/css" href="../../css/main2.css" /> <link rel="stylesheet" type="text/css" href="../../css/main2.css" />
</head> </head>
<body> <body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook</h2> <h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>The list of people on expo</h1> <h1>The list of people on expo</h1>

View File

@ -5,7 +5,7 @@
<title>CUCC Expedition Handbook: Logbook import</title> <title>CUCC Expedition Handbook: Logbook import</title>
<link rel="stylesheet" type="text/css" href="../../css/main2.css" /> <link rel="stylesheet" type="text/css" href="../../css/main2.css" />
</head> </head>
<body> <body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>>
<h2 id="tophead">CUCC Expedition Handbook</h2> <h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>Logbooks Import</h1> <h1>Logbooks Import</h1>
@ -59,15 +59,7 @@ Calculating GetPersonExpeditionNameLookup for 2017
<p>Errors are usually misplaced or duplicated &lt;hr /&gt; tags, names which are not specific enough to be recognised by the parser (though it tries hard) such as "everyone" or "et al." or are simply missing, or a bit of description which has been put into the names section such as "Goulash Regurgitation". <p>Errors are usually misplaced or duplicated &lt;hr /&gt; tags, names which are not specific enough to be recognised by the parser (though it tries hard) such as "everyone" or "et al." or are simply missing, or a bit of description which has been put into the names section such as "Goulash Regurgitation".
<h3 id="history">The logbooks format</h3> <h3 id="history">The logbooks format</h3>
<p>This is documented on the <a href="..logbooks.html#format">logbook user-documentation page</a> as even expoers who can do nothing else technical can at least write up their logbook entries. <p>This is documented on the <a href="../logbooks.html#format">logbook user-documentation page</a> as even expoers who can do nothing else technical can at least write up their logbook entries.
<p>[ Yes this format needs to be re-done using a proper structure:<br />
<code><pre>
&lt;div class="logentry"&gt;<br />
<span style="text-decoration: line-through wavy red;">&nbsp;&nbsp;&nbsp;&nbsp;</span>
&lt;/div"&gt;</pre></code>
it's on the to-do list...]
<h3 id="history">Historical logbooks format</h3> <h3 id="history">Historical logbooks format</h3>
<p>Older logbooks (prior to 2007) were stored as logbook.txt with just a bit of consistent markup to allow troggle parsing.</p> <p>Older logbooks (prior to 2007) were stored as logbook.txt with just a bit of consistent markup to allow troggle parsing.</p>

View File

@ -144,10 +144,18 @@ idea to type up <i>just your trip(s)</i> in a separate file, e.g. "logbook-mynew
&lt;div class="timeug"&gt;T/U 10 mins&lt;/div&gt;</pre></code> &lt;div class="timeug"&gt;T/U 10 mins&lt;/div&gt;</pre></code>
<p>Note: the ID's must be unique, so are generated from 't' plus the trip date plus a,b,c etc. <p>Note: the ID's must be unique, so are generated from 't' plus the trip date plus a,b,c etc.
when there is more than one trip on a day.</p> when there is more than one trip on a day.</p>
<p>Note: T/U stands for "Time Underground" in hours (6 minutes would be "0.1 hours"). <p>Note: <var><span style="color:red">T/U</span></var> stands for "Time Underground" in hours (6 minutes would be "0.1 hours").
<p>Note: the &lt;hr /&gt; is significant and used in parsing, it is not just prettiness. <p>Note: the <var><span style="color:red">&lt;hr /&gt;</span></var> is significant and used in parsing, it is not just prettiness.
<p>Note this special format <var>"<span style="color:red">Top Camp - </span>"</var> in the triptitle line:
<code><pre>&lt;div class="triptitle"&gt;<span style="color:red">Top Camp - </span>Setting up 76 bivi&lt;/div&gt;</pre></code>
It denotes the <var>cave</var> or <var>area</var> the trip or activity happened in. It is a word or two separated from the rest of the triptitle with "<var> - </var>" (space-dash-space). Usual values
for this are "Plateau", "Base camp", "264", "Balkon", "Tunnocks", "Travel" etc.
<p>Note this special format <var>"<span style="color:red">&lt;u&gt;Jenny Black&lt;/u&gt;</span>"</var> in the trippeople line:
<code><pre>&lt;div class="trippeople"&gt;<span style="color:red">&lt;u&gt;Jenny Black&lt;/u&gt;</span>, Olly Betts&lt;/div&gt;
</pre></code>
It is necessary that one (and only one) of the people on the trip is set in <span style="color:red">&lt;u&gt;&lt;/u&gt;</span> underline format. This is interpreted to mean that this is the author of the logbook entry. If there is no author set, then this is an error and the entry is ignored.
<hr /> <hr />

View File

@ -12,7 +12,7 @@
<h2>QM data and cave descriptions</h2> <h2>QM data and cave descriptions</h2>
<p> <p>
This document describes how to include Qustion Marks (QMs) and cave descriptions in .svx files. This document describes how to include Question Marks (QMs) and cave descriptions in .svx files.
<p>There <p>There
are dedicated fields in the template.svx file for this purpose, but there has been laxness recently on filling them in. are dedicated fields in the template.svx file for this purpose, but there has been laxness recently on filling them in.
@ -68,6 +68,17 @@ Here is an example from the last bit of bipedalpassage.svx in 264. Note that eac
;QM6 C bipedalpassage.31 - Very good location where main phreatic passages and enlarges - but far side of chamber choked. One part of choke was not accessed as needs 2m climb up to poke nose in it. A good free climber could do this or needs one bolt to be sure no way on. Very strong draft in choke! Interesting southerly trend at margin of known system ;QM6 C bipedalpassage.31 - Very good location where main phreatic passages and enlarges - but far side of chamber choked. One part of choke was not accessed as needs 2m climb up to poke nose in it. A good free climber could do this or needs one bolt to be sure no way on. Very strong draft in choke! Interesting southerly trend at margin of known system
</code></pre> </code></pre>
<p>
The format for question mark lists is <br>
<ul>
<li>QM identifier, <li><a href="../../qm.html">Quality Grade</a>, <li>Area indicator, <li>decription of QM.
</ul>
<p>The QM numbers themselves are in the format <br>
<ul>
<li><a href="../../qm.html">Discoverer identifier</a>, <li>Year of discovery, <li>Cave identifier, <li>serial number.
</ul>
<p>This format is <a href="../../qm.html">documented in the QM conventions</a> page.
<p> <p>
The example below demonstrates correct and effective use of the QM list referring back to earlier elements in the svx file: The example below demonstrates correct and effective use of the QM list referring back to earlier elements in the svx file:
<p> <p>
@ -80,6 +91,10 @@ is a very inefficient use of time.
<p>Also if the person reading it hasnt been to the bit of cave (which is, like, <em>the whole point</em>, then the data has a higher chance of being incorrect. It is not always easy to interpret Tunnel or Therion <p>Also if the person reading it hasnt been to the bit of cave (which is, like, <em>the whole point</em>, then the data has a higher chance of being incorrect. It is not always easy to interpret Tunnel or Therion
drawings correctly with this sort of thing. drawings correctly with this sort of thing.
<h3>Programming note</h3>
<p>Better handling of historic QMs is a current, occasionally active, area of
development in our online systems. The current status is <a href="../troggle/scriptsqms.html">documented here</a>.
<h3>Conclusion</h3> <h3>Conclusion</h3>
<p>Survey data recorded in .svx files is incomplete if there is no QM List data and <p>Survey data recorded in .svx files is incomplete if there is no QM List data and
cave description data! cave description data!

View File

@ -34,7 +34,7 @@ tl;dr - use <em>svx2qm.py</em>. Look at the output at:<br>
<p>There are four ways we have used to manage QMs: <p>There are four ways we have used to manage QMs:
<ol> <ol>
<li><strong>Perl script</strong> - Historically QMs were not in the survex file but typed up in a separate list <var>qms.csv</var> for each cave system. A perl script turned that into an HTML file for the website. <li><strong>Perl script</strong> - Historically QMs were not in the survex file but typed up in a separate list <var>qms.csv</var> for each cave system. A perl script turned that into an HTML file for the website. But there appear to be 3 different formats for this.
<li><strong>Perl + troggle</strong> - One of troggle's input parsers "QM parser" is specifically designed to import the three HTML files produced from <var>qms.csv</var> but doesn't do anything with that data (yet). <li><strong>Perl + troggle</strong> - One of troggle's input parsers "QM parser" is specifically designed to import the three HTML files produced from <var>qms.csv</var> but doesn't do anything with that data (yet).
<li><strong>Python script</strong> - Phil Withnall's 2019 script <em>svx2qm.py</em> scans all the QMs in a single survex file. See below for how to run it on all survex files. <li><strong>Python script</strong> - Phil Withnall's 2019 script <em>svx2qm.py</em> scans all the QMs in a single survex file. See below for how to run it on all survex files.
<li><strong>New troggle</strong> - Sam's recent addition to troggle's "survex parser" makes it recognise and store QMs when it parses the survex files. <li><strong>New troggle</strong> - Sam's recent addition to troggle's "survex parser" makes it recognise and store QMs when it parses the survex files.
@ -52,6 +52,24 @@ tl;dr - use <em>svx2qm.py</em>. Look at the output at:<br>
<a href="../../1623/204/qm.html">/1623/204/qm.html</a><br /> <a href="../../1623/204/qm.html">/1623/204/qm.html</a><br />
<p>Note that the <var>qms.csv</var> file file used as input by this script is an <em>entirely different format and table structure</em> from the <var>qms.csv</var> file produced by <a href="#svx2qm">svx2qm.py</a>. <p>Note that the <var>qms.csv</var> file file used as input by this script is an <em>entirely different format and table structure</em> from the <var>qms.csv</var> file produced by <a href="#svx2qm">svx2qm.py</a>.
<p>And in fact the formats of these 3 qm.csv files are <em>not the same</em> (These are the
"older or artisanal QM formats" referred to by Phil Withnall at th ebottom if this page) :
Fields in 204/qm.csv are:
<code><pre><span style="font-size:small">Number, grade, area, description, page reference, nearest station, completion description, Comment
e.g.
C1999-204-09 C Wolp Hole in floor through dangerous boulders veined.10 Filled with rocks
</span></pre></code>
Fields in 258/qm.csv are:
<code><pre><span style="font-size:small">Cave, year, number, Grade, nearest station, description, completion description, found by, completed by
e.g.
258 2006 27 C 258.gknodel.4 Small passage to E in Germkn”del Sandeep Mavadia and Dave Loeffler
</span></pre></code>
Fields in 264/qm.csv are:
<code><pre><span style="font-size:small">Year, number, Grade, Survey folder ref#, Surveyname, Nearest Station number, Area of the cave, Description, Y if marked on drawn-up survey,
2014 7 C 2014#11 roomwithaview 4 Room With a View Room With a View: "Probably chokes" opposite stations 4 and 5 ALREADY EXPLORED PROBABLY
</span></pre></code>
<p>There are also three versions of the QM list for cave 161 (Kaninchenhohle) apparently produced by this method but hand-edited:<br /> <p>There are also three versions of the QM list for cave 161 (Kaninchenhohle) apparently produced by this method but hand-edited:<br />
<a href="../../1623/161/qmaven.html">/1623/161/qmaven.html</a> 1996 version<br /> <a href="../../1623/161/qmaven.html">/1623/161/qmaven.html</a> 1996 version<br />
<a href="../../1623/161/qmtodo.html">/1623/161/qmtodo.html</a> 1998 version<br /> <a href="../../1623/161/qmtodo.html">/1623/161/qmtodo.html</a> 1998 version<br />
@ -60,6 +78,25 @@ tl;dr - use <em>svx2qm.py</em>. Look at the output at:<br>
<p>In the /1623/204/ folder there is a script <em>qmreader.pl</em> which apparently does the inverse of <p>In the /1623/204/ folder there is a script <em>qmreader.pl</em> which apparently does the inverse of
<em>tablize-qms.pl</em>: it transforms a QMs' HTML file into a CSV file. <em>tablize-qms.pl</em>: it transforms a QMs' HTML file into a CSV file.
<p>As Wookey says (Slack, 7 Jan. 2020):
"I'm not quite sure what the best format is. Some combination of the
258 and 264 formats might be best. Including the cave number seems
pointless. Including 'conclusion' info seems like a good idea. I'm not
sure there what the benefit of separating the 'surveyname' and
'nearest station' fields is. Having an 'area of cave' field is somewhat useful
for grouping, even though it is sort-of repeating the 'survey-station' info.
If I was making a QM list I'd enter these fields:
year, number, Grade, nearest station, folder reference, description, found by, completed (Year), completion description/cave description link, completed by
with these details:
<ul>
<li>number is just the serial number, not the whole year-serial-grade
<li>'nearest station' does not include the cave number
<li>completed is blank (for not completed) or a year for when it was done
<li>completeion description should be a link to the relevant bit of cave description, but if that doesn't exist
</ul> then a short description here is OK."
<h4 id="qms.py">troggle/parsers/qms.py</a></h4> <h4 id="qms.py">troggle/parsers/qms.py</a></h4>
<p>The parser <em>troggle/parsers/qms.py</em> currently imports those same <var>qm.csv</var> files from the perl script into troggle using a mixture of csv and html parsers: <p>The parser <em>troggle/parsers/qms.py</em> currently imports those same <var>qm.csv</var> files from the perl script into troggle using a mixture of csv and html parsers:
@ -88,6 +125,11 @@ The 2019 copies are online in /expofiles/:
This will work on all survex *.svx files even those which have not yet been run through the troggle import process. This will work on all survex *.svx files even those which have not yet been run through the troggle import process.
<p>Phil says (13 April 2020): <em>"The generated files are not meant to be served by the webserver, its a tool for people to run locally. Someone could modify it to create HTML output (or post-process the CSV output to do the same), but that is work still to be done."</em> <p>Phil says (13 April 2020): <em>"The generated files are not meant to be served by the webserver, its a tool for people to run locally. Someone could modify it to create HTML output (or post-process the CSV output to do the same), but that is work still to be done."</em>
<h4>troggle/parsers/survex.py</a></h4>
<p>The QMs inside thge survex files are parsed by troggle along with all the other information
inside survex files and stored in the database. But the webpages which display tis data are rudimentary, e.g. <a href="/getQMs/1623-204">/getQMs/1623-204</a> or <a href="/cave/qms/1623-204">/cave/qms/1623-204</a>.
Looking through urls.py and core/view_caves.py we see a lot of code for providing new QM numbers, producing lists of QMs for a given cave and for downloading QM.csv files generated by the database. But none of it appears to be working today (14 May 2020), see below.
<h4 id="samqms">Sam's parser additions</a></h4> <h4 id="samqms">Sam's parser additions</a></h4>
<p>Troggle <em>troggle/parsers/survex.py</em> currently parses and stores all the QMs it finds in survex files. The tables where the data is put are listed in <a href="datamodel.html">the current data model</a> including structure for ticking them off. <p>Troggle <em>troggle/parsers/survex.py</em> currently parses and stores all the QMs it finds in survex files. The tables where the data is put are listed in <a href="datamodel.html">the current data model</a> including structure for ticking them off.
@ -108,7 +150,7 @@ So someone was busy at one time.
<h2>QMs - monitoring progress</h2> <h2>QMs - monitoring progress</h2>
<h4 id="find-dead-qms">find-dead-qms.py</h4> <h4 id="find-dead-qms">find-dead-qms.py</h4>
<p>This finds references to <em>completed</em> qms in the qm.csv files in the cave folders (/1623/ etc.) in the :expoweb: <a href="../computing/repos.html">repository</a>. It looks to see which QMs have been completed but where there is not yet a matching text in the cave description. <p>This stand-alone script finds references to <em>completed</em> qms in the qm.csv files in the cave folders (/1623/ etc.) in the :expoweb: <a href="../computing/repos.html">repository</a>. It looks to see which QMs have been completed but where there is not yet a matching text in the cave description.
<blockquote><em>Quick and dirty Python script to find references to completed qms in the <blockquote><em>Quick and dirty Python script to find references to completed qms in the
cave description pages. Run this to find which bits of description cave description pages. Run this to find which bits of description
need updating. need updating.
@ -153,7 +195,7 @@ I guess it all depends on what questions people are trying to answer using the Q
as to how (and where) best to present it. Im afraid I dont have any suggestions there. as to how (and where) best to present it. Im afraid I dont have any suggestions there.
:Rob Watson wrote some documentation about QMs :Rob Watson wrote some documentation about QMs
:http://expo.survex.com/handbook/survey/qmentry.html :<a href="../survey/qmentry.html">http://expo.survex.com/handbook/survey/qmentry.html</a>
:is there anything subtle missing as to how they are used ? :is there anything subtle missing as to how they are used ?
Nope, I think Robs page covers it all. That page also documents the correct QM format Nope, I think Robs page covers it all. That page also documents the correct QM format

View File

@ -50,6 +50,15 @@ Which is fun, but not useful. And not just because it is immature. None of
this addresses <strong>our biggest problem: devising something that can be this addresses <strong>our biggest problem: devising something that can be
maintained by fewer, less-expert people who can only devote short snippets maintained by fewer, less-expert people who can only devote short snippets
of time and not long-duration immersion</strong>. of time and not long-duration immersion</strong>.
<h3>Our biggest problem</h3>
We need:
<ul>
<li>something that can be maintained by fewer, less-expert people
<li>who can only devote short snippets of time
<li>without requiring weeks of long-duration deep immersion
</ul>
<h3>Federation of independent scripts</h3>
<p> <p>
I know Wookey has been thinking of a loose federation of independent scripts I know Wookey has been thinking of a loose federation of independent scripts
working on the same data, but the more I look at troggle and the tasks it working on the same data, but the more I look at troggle and the tasks it
@ -63,22 +72,38 @@ wallets.py does (originally by Martin Green) is in troggle already - but
better. [There is a many:many relationship between svx files and wallet better. [There is a many:many relationship between svx files and wallet
directories in reality, not 1:1] directories in reality, not 1:1]
<p> <p>
<h3>troggle now</h3>
Troggle is very nearly fully working (not with as many functions as Troggle is very nearly fully working (not with as many functions as
originally envisaged admittedly) but very nearly. There are several originally envisaged admittedly) but very nearly.
import/parsers which are aborting without producing error messages, so most The QM data display needs writing; but other than that it's in pretty good
of the survey blocks don't get loaded where they actually get displayed, and
the surveyscan images only appear as filename strings which are not checked
for referential integrity, so we are missing a consistency check there, and
the QM data display needs writing; but other than that it's in pretty good
shape. [Ah, yes, we should really add "drawings" as a core concept as well shape. [Ah, yes, we should really add "drawings" as a core concept as well
as "surveyscans". That will be a bit of work.] as "surveyscans". That will be a bit of work.]
<p> <p>
<h3>Need for separate data-import checking scripts</h3>
The one thing external scripts would be really useful for is syntax checking The one thing external scripts would be really useful for is syntax checking
and reference checking prior to import. I have found some weird and and reference checking prior to import. I have found some weird and
wonderful filename paths inside the tunnel and therion drawings, and in wonderful filename paths inside the tunnel and therion drawings, and in
survex *ref paths. survex *ref paths.
<p>
<h3>Addendum</h3> <h3>Non-django troggle</h3>
<p>Another possibility is ripping django out of troggle and leaving bare python
plus a SQL database. This means that programmers would need to understand more SQL
but would not need to understand "django". Arguably this
could mean that we could gain.
<p>Writing our own multi-user code would not be sensible, hence the database.
But we could move to a read-only system where the only writing happens on data-import.
Then we could use python 'pickle()' or 'json()' read-only data structures, but we
would need to create all our own indexing and cross-referencing code.
<p>There would be more lower-level code, but the
different segments of the system could be in caving-sensible modules not
django-meaningful modules. And we would not have all the extra
language-like constructs that django introduces e.g. <var>X.objects.set_all()</var>, which
modern editors complain about because it is a django idiom and
not a function within the python codebase.
(We could retain an HTML templating engine though.)
<h3><em>Addendum</em></h3>
<p>There is a templating engine <a href="https://mozilla.github.io/nunjucks/">Nunjucks</a> <p>There is a templating engine <a href="https://mozilla.github.io/nunjucks/">Nunjucks</a>
which is a port to JavaScript of the Django templating system we use which is a port to JavaScript of the Django templating system we use
(via <a href="https://palletsprojects.com/p/jinja/">Jinja</a> - these are the same people who do Flask). This would be an obvious thing to use if we needed to go in that direction. (via <a href="https://palletsprojects.com/p/jinja/">Jinja</a> - these are the same people who do Flask). This would be an obvious thing to use if we needed to go in that direction.