expoweb/handbook/troggle/namesredesign.html

145 lines
6.8 KiB
HTML
Raw Normal View History

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>CUCC Expedition Handbook: Peoples' names design options</title>
2024-02-09 00:01:17 +00:00
<link rel="stylesheet" type="text/css" href="/css/main2.css" />
</head>
<body>
<style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook - Peoples' names design options</h2>
<h1>What, How and Why : Peoples' names</h1>
<ul>
<li><a href="#why">Why</a>
<li><a href="#maint">Maintenance constraints</a>
<li><a href="#whatold">What we have now</a>
</ul>
<span style="color:red">This was basically fixed in 2023. A root-and-branch replacement of peoples names with a 'slug' derived from peoples' names. However we still have things we could do to implrove 'folk'</span>
<ul>
<li><a href="#otherfolk">Further options for folk</a>
</ul>
2023-10-04 21:21:14 +01:00
<h2 id="why">Names: Why it is a problem</h2>
2023-10-04 21:21:14 +01:00
<p>The <a href="#whatold">former system</a> completely failed with names which are in any way "non standard".
Troggle ccouldn't cope with a name not structured as
"Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation,
capital letters or other names or initials).
2023-10-04 21:21:14 +01:00
<p>There were 19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing
were different.
<h2 id="maint">Names: Maintenance constraints</h2>
<p>We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain.
2023-10-04 21:21:14 +01:00
<h2 id="whatold">Names: How it works</h2>
<p>Fundamentally we have regexes detecting whether something is a name or not - in several places in the different types of raw data. However we do now use unique 'slugs' for the references between pages (since Sept. 2023).
<h4>Four different bits</h4>
2023-10-04 21:21:14 +01:00
<ul>
<li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding "Forename Surname (nickname)" and "Surname" as the first two columns in the CSV file.
These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var>
2023-10-04 21:21:14 +01:00
parsers/people.py</var>) only when a full data import is done. Which is a problem for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle
'names', McLean, MacLeod and McAdam.
2023-10-04 21:21:14 +01:00
<li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>) when a full data import is done (or when a survex file is edited online).
<li>We have the <var>&lt;div class="trippeople"&gt;&lt;u&gt;Luke&lt;/u&gt;, Hannah&lt;/div&gt;</var> trip people line in each logbook entry.
2023-10-04 21:21:14 +01:00
These are recognised by a regex in <var>parsers/logbooks.py</var> when a full data import is done (or when a logbook entry is edited online).
<li>We have the names of people in a list on a wallet: which is necessary when the wallet has no attached survex file. But even when there are (one or more) attached survexfiles, there is a place to input a list of peoples' names as well. This is parsed by <var>parsers/scans.py</var>.
</ul>
<p>Frankly it's amazing it even appears to work at all.
2023-10-04 21:21:14 +01:00
<p>
In <var>urls.py</var> we used to have
<code>
re_path(r'^person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"),
<br /><br />
2023-10-04 21:21:14 +01:00
re_path(r'^personexpedition/(?P<first_name>[A-Z]*[a-z&;]*)[^a-zA-Z]*(?P<last_name>[A-Z]*[a-zA-Z&;]*)/(?P<year>\d+)/?$', personexpedition, name="personexpedition"),
<br /><br />
2023-10-04 21:21:14 +01:00
re_path('wallets/person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', walletslistperson, name="walletslistperson"),
</code>
where the 'transmission noise' is attmpting to recognise a name and split it into &lt;first_name&gt; and &lt;last_name&gt;.
Naturally this failed horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>.
<p>
<span style="color:red">We now [October 2023] have</span>
2023-10-04 21:21:14 +01:00
<code>
path('person/&lt;slug:slug&gt;', person, name="person"),<br />
path('personexpedition/&lt;slug:slug&gt;/&lt;int:year&gt;', personexpedition, name="personexpedition"),<br />
path('wallets/person/&lt;slug:slug&gt;', walletslistperson, name="walletslistperson"),
</code>
which is a lot easier to maintain.
<h4>Troggle folk data importing</h4>
<p>
This still needs fixing [Feb.2024]
<p>
Troggle reads the mugshot and blurb about each person.
It reads it direct from folk.csv which has fields of URL links to those files.
It does this when troggle is run with
<code>python databaseReset.py people</code>
<p>
Troggle generates its own blurb about each person, including past expeditions and trips
taken from the logbooks (and from parsing svx files)
A link to this troggle page has been added to folk/index.htm
by making it happen in make-folklist.py
<p>
Troggle scans the blurb and looks for everything between &lt;body&gt; and &lt;hr&gt;
to find the text of the blurb
(see <var>parsers/people.py</var>)
<p style="margin:20px">
[This now seems to have have been fixed (July 2023):<ul><li>
All the blurb files have to be .htm - .html is not recognised by people.py
and trying to fix this breaks something else (weirdly, not fully investigated).
<li>
There seems to be a problem with importing blurbs with more than one image file, even those the code
in people.py only looks for the first image file but then fails to use it.]</ul>
<h2 id="otherfolk">Folk: pending possible improvements</h2>
<p>Read about the <a href="../computing/folkupdate.html">folklist script</a> before reading the rest of this.
<p>This does some basic validation: it checks that the mugshot
images and blurb HTML files exist.
<p> The folk.csv file could be split:
<br>
folk-1.csv will be for old cavers who will not come again, so this file need never be touched.
<br>
folk-2.csv will be for recent cavers and the current expo, this needs editing every year
<p>
The year headings of folk-1 and folk-2 need to be accurate , but they do not need to be
the same columns. So folk-2 can start in a much later year.
<p>
folk-0 will be for awkward buggers whose attendance spans decades. This needs updating whenever
one of these lags attends:
AERW, Becka, Mark Dougherty, Philip Sargent, Chris Densham, Mike Richardson
<p>
Currently (July 2023) the software ignores folk-0, -1, -2 and we have used the old folk.csv for
the 2023 expo. But we hope to have this fixed next year...
<hr />
Return to: <a href="trogdesign.html">Troggle design and future implementations</a><br />
Return to: <a href="trogintro.html">Troggle intro</a><br />
Troggle index:
<a href="trogindex.html">Index of all troggle documents</a><br /><hr /></body>
</html>