folk names plan

This commit is contained in:
Philip Sargent 2022-08-31 18:02:16 +03:00
parent f8f3328486
commit b6d7057765
4 changed files with 141 additions and 0 deletions

View File

@ -17,6 +17,8 @@ The folk.csv file is stored on the server under version control in the <var>:exp
href="../computing/repos.html">repository</a> in
<code>expoweb/folk/folk.csv</code>
<p>Note that this area is subject to a <a href="../troggle/namesredesign.html">redesign proposal</a>.
<p>Before expo starts the folk.csv file is updated.
<p>Edit folk/folk.csv, adding the new year to the end of the header line, a new column, with just a comma (blank cell) for people

View File

@ -0,0 +1,126 @@
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>CUCC Expedition Handbook: Peoples' names design options</title>
<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
</head>
<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook - Peoples' names design options</h2>
<h1>What, How and Why : Peoples' names</h1>
<ul>
<li><a href="#why">Why</a>
<li><a href="#maint">Maintenance constraints</a>
<li><a href="#whatold">What we have now</a>
<li><a href="#otherfolk">Further options for folk</a>
</ul>
<h2 id="why">Names: Why we need a change</h2>
<p>The <a href="#whatold">current system</a> completely fails with names which are in any way "non standard".
Troggle can't cope with a name not structured as
"Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation,
capital letters or other names or initials).
<p>There are 19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing
are different. Reconciling these (find easily using a link checker scanner on the
folk/.index.htm file) is a job that needs to be done. Every name in the generated
index.htm now has a hyperlink which goes to the troggle page about that person. Except
for those 19 people.
This has to be fixed as it affects ~5% of our expoers.
<p><em>[This document originally written 31 August 2022]</em>
<h2 id="maint">Names: Maintenance constraints</h2>
<p>We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain.
<h2 id="whatold">Names: How it works now</h2>
<p>Fundamentally we have regexes detecting whether something is a name or not - in several places. These should all be replaced by properly delimited strings.
<h4>Four different bits</h4>
<ul>
<li>In <var>urls.py</var> we have
<code>
re_path(r'^person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"),
re_path(r'^personexpedition/(?P<first_name>[A-Z]*[a-z&;]*)[^a-zA-Z]*(?P<last_name>[A-Z]*[a-zA-Z&;]*)/(?P<year>\d+)/?$', personexpedition, name="personexpedition"),
</code>
where the transmission noise is attmpting to recognise a name and split it into &lt;first_name&gt; and &lt;last_name&gt;.
Naturally this fails horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>.
<li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding "Forename Surname (nickname)" and "Surname" as the first two columns in the CSV file.
These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var>
parsers/people.py</var>) only when a full data import is done. Which it gets wrong for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle
'names', McLean, MacLeod and McAdam.
<li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>) only when a full data
import is done.
<li>We have the <var>&lt;div class="trippeople"&gt;&lt;u&gt;Luke&lt;/u&gt;, Hannah&lt;/div&gt;</var> trip people line in each logbook entry.
These are recognised by a regex in <var>parsers/logbooks.py</var> only when a full data import is done.
</ul>
<p>Frankly it's amazing it even appears to work at all.
<h4>Troggle folk data importing</h4>
<p>
Troggle reads the mugshot and blurb about each person.
It reads it direct from folk.csv which has fields of URL links to those files.
It does this when troggle is run with
<code>python databaseReset.py people</code>
<p>
Troggle generates its own blurb about each person, including past expeditions and trips
taken from the logbooks (and from parsing svx files)
A link to this troggle page has been added to folk/index.htm
by making it happen in make-folklist.py
<p>
Troggle scans the blurb and looks for everything between &lt;body&gt; and &lt;hr&gt;
to find the text of the blurb
(see <var>parsers/people.py</var>)
<p>
All the blurb files have to be .htm - .html is not recognised by people.py
and trying to fix this breaks something else (weirdly, not fully investigated).
<p>
There seems to be a problem with importing blurbs with more than one image file, even those the code
in people.py only looks for the first image file but then fails to use it.
<h4>Proposal</h4>
<p>I would start by replacing the recognisers in <var>urls.py</var> with a slug for an arbitrary text string, and interpreting it in the python code handling the page.
This would entail replacing all the database parsing bits to produce the same slug in the same way.
<p>At that point we should get the 19 people into the system even if all the other crumdph is still there.
Then we take a deep breath and look at it all again.
<h2 id="otherfolk">Folk: pending possible improvements</h2>
<p>Read about the <a href="../computing/folkupdate.html">folklist script</a> before reading the rest of this.
<p>This does some basic validation: it checks that the mugshot
images and blurb HTML files exist.
<p> The folk.csv file could be split:
<br>
folk-1.csv will be for old cavers who will not come again, so this file need never be touched.
<br>
folk-2.csv will be for recent cavers and the current expo, this needs editing every year
<p>
The year headings of folk-1 and folk-2 need to be accurate , but they do not need to be
the same columns. So folk-2 can start in a much later year.
<p>
folk-0 will be for awkward buggers whose attendance spans decades. This needs updating whenever
one of these lags attends:
AERW, Becka, Mark Dougherty, Philip Sargent, Chris Densham, Mike Richardson
<p>
Currently (August 2022) the software ignores folk-0, -1, -2 and we have used the old folk.csv for
the 2022 expo. But we hope to have this fixed next year...
<hr />
Return to: <a href="trogdesign.html">Troggle design and future implementations</a><br />
Return to: <a href="trogintro.html">Troggle intro</a><br />
Troggle index:
<a href="trogindex.html">Index of all troggle documents</a><br /><hr />
</body>
</html>

View File

@ -43,6 +43,13 @@ so that the entrance description pops up.
<h3>Using Question Marks in active exploration</h3>
<p>See <a href="scriptsqms.html">the current ugly situation</a>.
<h3>Proper archive/restore of Tunnel and Therion files</h3>
<p>Strangely, we have no process at all to allow anyone to download the archived Tunnel or Therion XML files and also
download the referenced source scan files at the same time so that the references within the XML files
actually work.
<p>The XML files contain cross-reference links to the scan files <em>on the computer the tunnelling/therioning was done</em>
which is different for every machine as we have no recommended standard setup.
<h3>Supporting Final Survey Preparation</h3>
<p>We have no procedure for this. And also no proper procedures (or even agreed single final location) for rigging topos either. We have a bucket folder for final drawn-up surveys on expofiles.
@ -57,6 +64,10 @@ Element, which we can archive ourseleves, and maybe we can use Kanboard (ditto)
<h2 id="badly">Things Troggle Does Badly</h2>
<h3>Managing periople's names</h3>
<p>As of 2022, there are 15 people troggle can't cope with at all because their name is not structured as
"Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation,
capital letters or other names or initials). See the design document <a href="namesredesign.html">handling people's names properly</a>.
<h3>Writing Cave Descriptions</h3>
<p>In 2022 we have a working online form to create or edit the cave and entrance descriptions. But the URL
@ -150,6 +161,7 @@ complete copy, but if universal internet access is coming anyway, any such work
<h2 id="specific">Specific, Immediate problems</h2>
<ul>
<li>New systems for <a href="namesredesign.html">handling people's names properly</a>
<li>New systems for <a href="menudesign.html">website menus</a>
<li>New <a href="lbredesign.html">logbook coding system</a> - not at all urgent
<li><s>Short-term note on "logon" <a href="trogregistr.html">django-registration</a></s>

View File

@ -24,6 +24,7 @@
<li><a href="trogregistr.html">Troggle Login and user registration</a> - proposal to remove registration (DONE)<br>
<li><a href="lbredesign.html">Troggle Logbook Format Redesign</a> - options for revising the logbook HTML format<br>
<li><a href="menudesign.html">Troggle Menu Design</a> - options for replacing the menuing system<br><br>
<li><a href="namesredesign.html">Troggle people's names' redesign</a>
<li><a href="trogsimpler.html">Troggle - a kinder simpler troggle?</a> - Radost's proposals (critiqued)<br>
<li><a href="trogspeculate.html">Troggle Architecture Speculations</a> - as in April 2020<br>
<li><a href="trog2030.html">Troggle in 2025-2030</a> - architectural evolution proposal<br>