mirror of
https://expo.survex.com/repositories/expoweb/.git/
synced 2024-11-22 15:21:55 +00:00
127 lines
6.3 KiB
HTML
127 lines
6.3 KiB
HTML
|
<!DOCTYPE html>
|
||
|
<html>
|
||
|
<head>
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||
|
<title>CUCC Expedition Handbook: Peoples' names design options</title>
|
||
|
<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
|
||
|
</head>
|
||
|
<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
|
||
|
<h2 id="tophead">CUCC Expedition Handbook - Peoples' names design options</h2>
|
||
|
|
||
|
<h1>What, How and Why : Peoples' names</h1>
|
||
|
|
||
|
<ul>
|
||
|
<li><a href="#why">Why</a>
|
||
|
<li><a href="#maint">Maintenance constraints</a>
|
||
|
<li><a href="#whatold">What we have now</a>
|
||
|
<li><a href="#otherfolk">Further options for folk</a>
|
||
|
|
||
|
</ul>
|
||
|
|
||
|
<h2 id="why">Names: Why we need a change</h2>
|
||
|
|
||
|
|
||
|
<p>The <a href="#whatold">current system</a> completely fails with names which are in any way "non standard".
|
||
|
Troggle can't cope with a name not structured as
|
||
|
"Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation,
|
||
|
capital letters or other names or initials).
|
||
|
<p>There are 19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing
|
||
|
are different. Reconciling these (find easily using a link checker scanner on the
|
||
|
folk/.index.htm file) is a job that needs to be done. Every name in the generated
|
||
|
index.htm now has a hyperlink which goes to the troggle page about that person. Except
|
||
|
for those 19 people.
|
||
|
|
||
|
This has to be fixed as it affects ~5% of our expoers.
|
||
|
<p><em>[This document originally written 31 August 2022]</em>
|
||
|
|
||
|
<h2 id="maint">Names: Maintenance constraints</h2>
|
||
|
<p>We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain.
|
||
|
|
||
|
<h2 id="whatold">Names: How it works now</h2>
|
||
|
<p>Fundamentally we have regexes detecting whether something is a name or not - in several places. These should all be replaced by properly delimited strings.
|
||
|
<h4>Four different bits</h4>
|
||
|
<ul>
|
||
|
<li>In <var>urls.py</var> we have
|
||
|
<code>
|
||
|
re_path(r'^person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"),
|
||
|
re_path(r'^personexpedition/(?P<first_name>[A-Z]*[a-z&;]*)[^a-zA-Z]*(?P<last_name>[A-Z]*[a-zA-Z&;]*)/(?P<year>\d+)/?$', personexpedition, name="personexpedition"),
|
||
|
</code>
|
||
|
where the transmission noise is attmpting to recognise a name and split it into <first_name> and <last_name>.
|
||
|
Naturally this fails horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>.
|
||
|
|
||
|
<li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding "Forename Surname (nickname)" and "Surname" as the first two columns in the CSV file.
|
||
|
These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var>
|
||
|
parsers/people.py</var>) only when a full data import is done. Which it gets wrong for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle
|
||
|
'names', McLean, MacLeod and McAdam.
|
||
|
|
||
|
<li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>) only when a full data
|
||
|
import is done.
|
||
|
|
||
|
|
||
|
|
||
|
<li>We have the <var><div class="trippeople"><u>Luke</u>, Hannah</div></var> trip people line in each logbook entry.
|
||
|
These are recognised by a regex in <var>parsers/logbooks.py</var> only when a full data import is done.
|
||
|
</ul>
|
||
|
<p>Frankly it's amazing it even appears to work at all.
|
||
|
|
||
|
<h4>Troggle folk data importing</h4>
|
||
|
<p>
|
||
|
Troggle reads the mugshot and blurb about each person.
|
||
|
It reads it direct from folk.csv which has fields of URL links to those files.
|
||
|
It does this when troggle is run with
|
||
|
<code>python databaseReset.py people</code>
|
||
|
<p>
|
||
|
Troggle generates its own blurb about each person, including past expeditions and trips
|
||
|
taken from the logbooks (and from parsing svx files)
|
||
|
A link to this troggle page has been added to folk/index.htm
|
||
|
by making it happen in make-folklist.py
|
||
|
<p>
|
||
|
Troggle scans the blurb and looks for everything between <body> and <hr>
|
||
|
to find the text of the blurb
|
||
|
(see <var>parsers/people.py</var>)
|
||
|
<p>
|
||
|
All the blurb files have to be .htm - .html is not recognised by people.py
|
||
|
and trying to fix this breaks something else (weirdly, not fully investigated).
|
||
|
<p>
|
||
|
There seems to be a problem with importing blurbs with more than one image file, even those the code
|
||
|
in people.py only looks for the first image file but then fails to use it.
|
||
|
|
||
|
<h4>Proposal</h4>
|
||
|
<p>I would start by replacing the recognisers in <var>urls.py</var> with a slug for an arbitrary text string, and interpreting it in the python code handling the page.
|
||
|
This would entail replacing all the database parsing bits to produce the same slug in the same way.
|
||
|
<p>At that point we should get the 19 people into the system even if all the other crumdph is still there.
|
||
|
Then we take a deep breath and look at it all again.
|
||
|
|
||
|
<h2 id="otherfolk">Folk: pending possible improvements</h2>
|
||
|
|
||
|
<p>Read about the <a href="../computing/folkupdate.html">folklist script</a> before reading the rest of this.
|
||
|
<p>This does some basic validation: it checks that the mugshot
|
||
|
images and blurb HTML files exist.
|
||
|
|
||
|
<p> The folk.csv file could be split:
|
||
|
<br>
|
||
|
folk-1.csv will be for old cavers who will not come again, so this file need never be touched.
|
||
|
<br>
|
||
|
folk-2.csv will be for recent cavers and the current expo, this needs editing every year
|
||
|
|
||
|
<p>
|
||
|
The year headings of folk-1 and folk-2 need to be accurate , but they do not need to be
|
||
|
the same columns. So folk-2 can start in a much later year.
|
||
|
|
||
|
<p>
|
||
|
folk-0 will be for awkward buggers whose attendance spans decades. This needs updating whenever
|
||
|
one of these lags attends:
|
||
|
AERW, Becka, Mark Dougherty, Philip Sargent, Chris Densham, Mike Richardson
|
||
|
|
||
|
<p>
|
||
|
Currently (August 2022) the software ignores folk-0, -1, -2 and we have used the old folk.csv for
|
||
|
the 2022 expo. But we hope to have this fixed next year...
|
||
|
|
||
|
<hr />
|
||
|
Return to: <a href="trogdesign.html">Troggle design and future implementations</a><br />
|
||
|
Return to: <a href="trogintro.html">Troggle intro</a><br />
|
||
|
Troggle index:
|
||
|
<a href="trogindex.html">Index of all troggle documents</a><br /><hr />
|
||
|
</body>
|
||
|
</html>
|