expoweb/handbook/troggle/scriptsqms.html

271 lines
14 KiB
HTML

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Expo documentation - QMs scripts</title>
<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
</head>
<body>
<style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<style>
h4 {
margin-top: 1.5em;
margin-bottom: 0;
}
p {
margin-top: 0;
margin-bottom: 0.5em;
}
var { # to match <code> but inline
font-family: monospace;
font-size: 0.9em;
#font-style: normal;
background-color: #eee;
}
</style>
<h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>QMs and leads</h1>
tl;dr - use the troggle reports for each cave, e.g. <br>
<a href="/cave/qms/1623-290">QMs for Fischgesicht</a>
<h2>QMs - the fourfold path</h2>
<img class="onright" src ="../i/qm-image.jpg" />
<p>You will be familiar with <a href="../survey/qmentry.html">documenting newly found QMs</a> in the survex file when you type it in. But
QMs are only useful if they can be easily scanned by people planning the next pushing trip. That's what we are discussing here.
<p>There are half a dozen ways we have used to manage QMs:
<ol>
<li><strong>troggle and QMs in survex files</strong> - Since Sam wrote a QM svx parser in 2020 we have had the recent QMs in troggle but the report
to display them was not written until July 2022. Note that this means some duplication for 1623-161 and a few others where the same QM is
in both the survex file and the CSV file - see below.
<li><strong>troggle + perl era CSV</strong> - One of troggle's input parsers imports the
three
<var>qms.csv</var> files and produces reports by cave and individually, e.g. see <a href="/cave/qms/1623-161">the 161 QMs</a>
(slow page), which is <em>old</em> compared with the hand-edited <a href="/1623/161/qmtodo.htm">1623-161</a> page which was derived from it.
<li><strong>Hand-edited lists of QMS</strong> - only exist for 1623-161 <a href="/1623/161/qmtodo.htm">Kaninchenh&ouml;hle</a>
<li><strong>Perl script</strong> - Historically QMs were not in the survex file but typed up in a separate list <var>qms.csv</var> for
each cave system. A perl script turned that into an HTML file for the website.
But there are 3 different formats for this. The perl script is not used, but the same three CSV files (caves 161, 204 and 234)
are imported into troggle during initial data load (see above).
<li><strong>Python script</strong> - Phil Withnall's 2019 script <em>svx2qm.py</em> scans all the QMs in a single survex file. See below for how to run it on all survex files.
<li><strong>The elderly Prospecting Guide</strong> - Used to cover some of the same sorts of information as needed by someone wanting to
chase QMs. It was a troggle-generated document at <a href="/prospecting_guide/">expo.survex.com/prospecting_guide/</a>.
It has been retired because the mapping software packages it used were terminally outdated.
</ol>
<p>QMs all use <a href="../survey/qm.html">the same QM description conventions</a>.
<h2>QMs - monitoring progress</h2>
<p>Each cave has a report listing all the extant and ticked-off QMs, e.g.
<a href="/cave/qms/1623-264/">/cave/qms/1623-264/</a> or <a href="/cave/qms/1623-161/">/cave/qms/1623-161/</a>. The 161 report includes both old (spreadhseet-era) QMs and modern (2015 survex file) QMs.
<h2>QMs - how we first record them</h2>
<p>Today we write the QM into the survex file: see <a href="../survey/qmentry.html">documenting newly found QMs</a>.
<p>We used to write them into a spreadsheet file (pre-2015). These old files are today still parsed by troggle to produce reports.
<h4>troggle/parsers/survex.py</a></h4>
<p>Troggle <em>troggle/parsers/survex.py</em> currently parses and stores all the QMs it finds in survex files. The tables where the data
is put are listed in <a href="datamodel.html">the current data model</a>.
<h4 id="qms.py">troggle/parsers/qms.py</a></h4>
<p>Troggle currently reports QMs separately collated for three historic caves and also imports all the QMs inside survex files.
Thus a recent cave such as 1623-264 (Balkh&ouml;hle) will only show QMs imported from the survex files:
<ul>
<li>/cave/qms/&lt;caveslug&gt; e.g. <a href="/cave/qms/1623-264/">/cave/qms/1623-264/</a> works (slow page)
<li>/cave/&lt;caveslug&gt;-&lt;year&gt;&lt;qm_id&gt; e.g. <a href="/cave/qms/1623-264/2019-lipstic2B">/cave/qms/1623-264/2019-lipstic2B</a> broken, no data shown
</ul>
<p>There is an open issue in that although we use the name of the 'block' in the survex file to disambiguate QMs in the same cave and from
the same year, it is still possible for blocks to be named non-uniquely. This would crash the system as two QMs would have the same URL.
<p>The parser <em>troggle/parsers/qms.py</em> currently imports the <var>qm.csv</var> files used by
the 2004 perl script tablize-qms.pl (see below) into troggle using a mixture of csv and html parsers:
<code><pre>parseCaveQMs(cave='stein',inputFile=r"1623/204/qm.csv")
parseCaveQMs(cave='hauch',inputFile=r"1623/234/qm.csv")
parseCaveQMs(cave='kh', inputFile="1623/161/qmtodo.htm")
#parseCaveQMs(cave='balkonhoehle',inputFile=r"1623/264/qm.csv")</pre></code>
and reports these by cave and individually, e.g. see <a href="/cave/qms/1623-204">the 204 QMs</a> (slow page).
</p>
These URL recognisers work:
<ul>
<li>/cave/qms/&lt;caveslug&gt; e.g. <a href="/cave/qms/1623-161/">/cave/qms/1623-161/</a> (slow page)
<li>/cave/&lt;caveslug&gt;-&lt;year&gt;&lt;qm_id&gt; e.g. <a href="/cave/qms/1623-161/1997-1C">/cave/qms/1623-161/1997-1C</a>
</ul>
<p>Note that the hand-edited <var>qm.csv</var> for Balkonhohle was apparently abandoned unfinished as we transitioned to putting the QMs in the survex files instead. It contains QMs from 2014 and 2016:<br />
<a href="../../1623/264/qm.csv" download>/1623/264/qm.csv</a> - unused <br/>
<h2>QM archeology</h2>
<h4 id="QM_helper">js/QM_helper.js</h4>
<p>A relic.
<p>This is referred to in core/admin.py and appears to help with the userinterface within the
Django Admin control panel for manipulating QMs. It is not live as media/js/ is not plumbed in.
(Live javascript lives in media/jslib/ which is routed to the URL /javascript/.)
<h4 id="tabqmsqms">tablize-qms.pl</h4>
<p>This is a perl script dating from November 2004.
<p>it takes a <em>hand-edited</em> CSV file name as the program's argument and generates an HTML page listing all the QMs.
<p><a href="../../1623/258/tablize-qms.pl" download>Varient copies of it</a> (they are all slightly different) live in the three cave file folders in <em>:expoweb:/1623/</em>, in <em>258/, 234/</em>, and <em> 204/</em> . These generated html files are live pages in the cave descriptions: <br />
<a href="../../1623/258/qm.html">/1623/258/qm.html</a><br />
<a href="../../1623/234/qm.html">/1623/234/qm.html</a><br />
<a href="../../1623/204/qm.html">/1623/204/qm.html</a><br />
<p>Note that the <var>qms.csv</var> file file used as input by this script is an <em>entirely different format and table structure</em> from the <var>qms.csv</var> file produced by <a href="#svx2qm">svx2qm.py</a>.
<p>And in fact the formats of these 3 qm.csv files are <em>not the same</em> (These are the
"older or artisanal QM formats" referred to by Phil Withnall at the bottom if this page) :
Fields in 204/qm.csv are:
<code><pre><span style="font-size:small">Number, grade, area, description, page reference, nearest station, completion description, Comment
e.g.
C1999-204-09 C Wolp Hole in floor through dangerous boulders veined.10 Filled with rocks
</span></pre></code>
Fields in 258/qm.csv are:
<code><pre><span style="font-size:small">Cave, year, number, Grade, nearest station, description, completion description, found by, completed by
e.g.
258 2006 27 C 258.gknodel.4 Small passage to E in Germkn&ouml;del Sandeep Mavadia and Dave Loeffler
</span></pre></code>
Fields in 264/qm.csv are:
<code><pre><span style="font-size:small">Year, number, Grade, Survey folder ref#, Surveyname, Nearest Station number, Area of the cave, Description, Y if marked on drawn-up survey,
2014 7 C 2014#11 roomwithaview 4 Room With a View Room With a View: "Probably chokes" opposite stations 4 and 5 ALREADY EXPLORED PROBABLY
</span></pre></code>
<p>There are also three versions of the QM list for cave 161 (Kaninchenhohle) apparently produced by this method but hand-edited:<br />
<a href="../../1623/161/qmaven.htm">/1623/161/qmaven.htm</a> 1996 version<br />
<a href="../../1623/161/qmtodo.htm">/1623/161/qmtodo.htm</a> 1998 version<br />
<a href="../../1623/161/qmdone.htm">/1623/161/qmdone.htm</a> 1999 (incomplete) version
</p>
<p>In the /1623/204/ folder there is a script <em>qmreader.pl</em> which apparently does the inverse of
<em>tablize-qms.pl</em>: it transforms a QMs' HTML file into a CSV file.
<p>As Wookey says (Slack, 7 Jan. 2020):
"I'm not quite sure what the best format is. Some combination of the
258 and 264 formats might be best. Including the cave number seems
pointless. Including 'conclusion' info seems like a good idea. I'm not
sure there what the benefit of separating the 'surveyname' and
'nearest station' fields is. Having an 'area of cave' field is somewhat useful
for grouping, even though it is sort-of repeating the 'survey-station' info.
If I was making a QM list I'd enter these fields:
year, number, Grade, nearest station, folder reference, description, found by, completed (Year), completion description/cave description link, completed by
with these details:
<ul>
<li>number is just the serial number, not the whole year-serial-grade
<li>'nearest station' does not include the cave number
<li>completed is blank (for not completed) or a year for when it was done
<li>completion description should be a link to the relevant bit of cave description, but if that doesn't exist
</ul> then a short description here is OK."
<h4 id="svx2qm">svx2qm.py</a></h4>
<p>Philip Withnall's 2019 QM extractor <em>svx2qm.py</em> (in :loser:/qms/) can be used to generate a list of all the QMs in all the svx files in either text or CSV format. When run together with <em>file</em> and <em>xargs</em> it will produce a output listing all the QMs:
<pre><code>cd loser
find -name '*.svx' | xargs qms/svx2qm.py --format csv
</code></pre>
and --format human produces a simple text format.
<p>
The 2019 copies are online in /expofiles/:
<a href="/expofiles/writeups/2019/qms2019.txt">qms2019.txt</a> and
<a href="/expofiles/writeups/2019/qms2019.csv">qms2019.csv</a>.
<p>
This will work on all survex *.svx files even those which have not yet been run through the troggle import process.
<p>Phil says (13 April 2020): <em>"The generated files are not meant to be served by the webserver, it's a tool for people to run locally. Someone could modify it to create HTML output (or post-process the CSV output to do the same), but that is work still to be done."</em>
<h4>Even older troggle archeology</a></h4>
<p>Looking through urls.py and core/view_caves.py we see a lot
of archaic code for providing new QM numbers, producing lists of QMs for a given cave and for downloading QM.csv files generated by the database.
But none of it appears to be working today (5 July 2022).
<p>Troggle has archaic URL recognisers in <var>:troggle:/urls.py</var> for:
<ul>
<li>/newqmnumber/ - crashes troggle
<li>/getQMs/&lt;caveslug&gt; - crashes troggle
<li>/cave/&lt;cave-id&gt;/qm.csv - to download a <var>qm.csv</var> file (NB not qms.csv) - crashes troggle
<li>/downloadqms - crashes troggle
</ul>
So someone was busy at one time.
<h3>QMs - monitoring progress - old script</h3>
<h4 id="find-dead-qms">find-dead-qms.py</h4>
<p>This stand-alone script finds references to <em>completed</em> qms in the qm.csv files in the cave folders (/1623/ etc.) in the :expoweb: <a href="../computing/repos.html">repository</a>. It looks to see which QMs have been completed but where there is not yet a matching text in the cave description.
<blockquote><em>Quick and dirty Python script to find references to completed qms in the
cave description pages. Run this to find which bits of description
need updating.
<br>
The list of qms is read from the qm.csv file and any with an entry in the
"Completion description" column (column 7) are searched for in all the html
files.
<br>
The script prints a list of the completed qms that it found references to
and in which file.
<br>
Nial Peters - 2011
</em></blockquote>
<hr>
<pre>
From: Philip Withnall [tecnocode]
Sent: 13 April 2020 23:41
To: Philip Sargent (Gmail)
Subject: Re: svx2qm
Hi Philip,
Hope you're well, thanks for getting in touch about this.
The generated files are not meant to be served by the webserver, it's a tool for people to run locally.
Someone could modify it to create HTML output (or post-process the CSV output to do the same),
but that is work still to be done.
I can't see any problem with moving it all to expoweb/scripts/ - so long as it is
run with the loser top level directory specified - but I might be mistaken:
find /home/expo/loser -name '*.svx' | xargs ./svx2qm.py --format human
and it should go into the Makefile too at some point.
Feel free to move it wherever; I am not planning on doing any further work on it.
The script itself just expects to be passed some (relative or absolute) paths to SVX files,
so can be placed wherever, as long as it's passed appropriate relative paths.
I haven't written any other scripts which post-process the data or otherwise format it.
I guess it all depends on what questions people are trying to answer using the QM data,
as to how (and where) best to present it. I'm afraid I don't have any suggestions there.
:Rob Watson wrote some documentation about QMs
:<a href="../survey/qmentry.html">http://expo.survex.com/handbook/survey/qmentry.html</a>
:is there anything subtle missing as to how they are used ?
Nope, I think Rob's page covers it all. That page also documents the correct QM format
which is what svx2qm.py understands. (There were some older or artisanal QM formats
floating around at one point, although I think I reformatted them all so the tool
would understand them, and so people would hopefully standardise on what Rob's
documented from then on.)
Philip</pre>
<hr>
Return to: <a href="scriptsother.html">Other scripts</a><br />
Return to: <a href="trogintro.html">Troggle intro</a><br />
Troggle index:
<a href="trogindex.html">Index of all troggle documents</a><br /><hr /></body>
</html>