update stuff now working - online edit of handbook/troggle/namesredesign.html

This commit is contained in:
Philip Sargent 2023-07-26 13:56:33 +01:00
parent d74becbb34
commit 7f6fc31324

View File

@ -1,126 +1,127 @@
<!DOCTYPE html> <!DOCTYPE html>
<html> <html>
<head> <head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>CUCC Expedition Handbook: Peoples' names design options</title> <title>CUCC Expedition Handbook: Peoples' names design options</title>
<link rel="stylesheet" type="text/css" href="../../css/main2.css" /> <link rel="stylesheet" type="text/css" href="../../css/main2.css" />
</head> </head>
<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style> <body>
<h2 id="tophead">CUCC Expedition Handbook - Peoples' names design options</h2> <style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook - Peoples' names design options</h2>
<h1>What, How and Why : Peoples' names</h1>
<h1>What, How and Why : Peoples' names</h1>
<ul>
<li><a href="#why">Why</a> <ul>
<li><a href="#maint">Maintenance constraints</a> <li><a href="#why">Why</a>
<li><a href="#whatold">What we have now</a> <li><a href="#maint">Maintenance constraints</a>
<li><a href="#otherfolk">Further options for folk</a> <li><a href="#whatold">What we have now</a>
<li><a href="#otherfolk">Further options for folk</a>
</ul>
</ul>
<h2 id="why">Names: Why we need a change</h2>
<h2 id="why">Names: Why we need a change</h2>
<p>The <a href="#whatold">current system</a> completely fails with names which are in any way "non standard".
Troggle can't cope with a name not structured as <p>The <a href="#whatold">current system</a> completely fails with names which are in any way "non standard".
"Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation, Troggle can't cope with a name not structured as
capital letters or other names or initials). "Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation,
<p>There are 19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing capital letters or other names or initials).
are different. Reconciling these (find easily using a link checker scanner on the <p>There are 19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing
folk/.index.htm file) is a job that needs to be done. Every name in the generated are different. Reconciling these (find easily using a link checker scanner on the
index.htm now has a hyperlink which goes to the troggle page about that person. Except folk/.index.htm file) is a job that needs to be done. Every name in the generated
for those 19 people. index.htm now has a hyperlink which goes to the troggle page about that person. Except
for those 19 people.
This has to be fixed as it affects ~5% of our expoers.
<p><em>[This document originally written 31 August 2022]</em> This has to be fixed as it affects ~5% of our expoers.
<p><em>[This document originally written 31 August 2022]</em>
<h2 id="maint">Names: Maintenance constraints</h2>
<p>We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain. <h2 id="maint">Names: Maintenance constraints</h2>
<p>We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain.
<h2 id="whatold">Names: How it works now</h2>
<p>Fundamentally we have regexes detecting whether something is a name or not - in several places. These should all be replaced by properly delimited strings. <h2 id="whatold">Names: How it works now</h2>
<h4>Four different bits</h4> <p>Fundamentally we have regexes detecting whether something is a name or not - in several places. These should all be replaced by properly delimited strings.
<ul> <h4>Four different bits</h4>
<li>In <var>urls.py</var> we have <ul>
<code> <li>In <var>urls.py</var> we have
re_path(r'^person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"), <code>
re_path(r'^personexpedition/(?P<first_name>[A-Z]*[a-z&;]*)[^a-zA-Z]*(?P<last_name>[A-Z]*[a-zA-Z&;]*)/(?P<year>\d+)/?$', personexpedition, name="personexpedition"), re_path(r'^person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"),
</code> re_path(r'^personexpedition/(?P<first_name>[A-Z]*[a-z&;]*)[^a-zA-Z]*(?P<last_name>[A-Z]*[a-zA-Z&;]*)/(?P<year>\d+)/?$', personexpedition, name="personexpedition"),
where the transmission noise is attmpting to recognise a name and split it into &lt;first_name&gt; and &lt;last_name&gt;. </code>
Naturally this fails horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>. where the transmission noise is attmpting to recognise a name and split it into &lt;first_name&gt; and &lt;last_name&gt;.
Naturally this fails horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>.
<li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding "Forename Surname (nickname)" and "Surname" as the first two columns in the CSV file.
These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var> <li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding "Forename Surname (nickname)" and "Surname" as the first two columns in the CSV file.
parsers/people.py</var>) only when a full data import is done. Which it gets wrong for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var>
'names', McLean, MacLeod and McAdam. parsers/people.py</var>) only when a full data import is done. Which it gets wrong for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle
'names', McLean, MacLeod and McAdam.
<li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>) only when a full data
import is done. <li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>) only when a full data
import is done.
<li>We have the <var>&lt;div class="trippeople"&gt;&lt;u&gt;Luke&lt;/u&gt;, Hannah&lt;/div&gt;</var> trip people line in each logbook entry.
These are recognised by a regex in <var>parsers/logbooks.py</var> only when a full data import is done. <li>We have the <var>&lt;div class="trippeople"&gt;&lt;u&gt;Luke&lt;/u&gt;, Hannah&lt;/div&gt;</var> trip people line in each logbook entry.
</ul> These are recognised by a regex in <var>parsers/logbooks.py</var> only when a full data import is done.
<p>Frankly it's amazing it even appears to work at all. </ul>
<p>Frankly it's amazing it even appears to work at all.
<h4>Troggle folk data importing</h4>
<p> <h4>Troggle folk data importing</h4>
Troggle reads the mugshot and blurb about each person. <p>
It reads it direct from folk.csv which has fields of URL links to those files. Troggle reads the mugshot and blurb about each person.
It does this when troggle is run with It reads it direct from folk.csv which has fields of URL links to those files.
<code>python databaseReset.py people</code> It does this when troggle is run with
<p> <code>python databaseReset.py people</code>
Troggle generates its own blurb about each person, including past expeditions and trips <p>
taken from the logbooks (and from parsing svx files) Troggle generates its own blurb about each person, including past expeditions and trips
A link to this troggle page has been added to folk/index.htm taken from the logbooks (and from parsing svx files)
by making it happen in make-folklist.py A link to this troggle page has been added to folk/index.htm
<p> by making it happen in make-folklist.py
Troggle scans the blurb and looks for everything between &lt;body&gt; and &lt;hr&gt; <p>
to find the text of the blurb Troggle scans the blurb and looks for everything between &lt;body&gt; and &lt;hr&gt;
(see <var>parsers/people.py</var>) to find the text of the blurb
<p> (see <var>parsers/people.py</var>)
All the blurb files have to be .htm - .html is not recognised by people.py <p style="margin:20px">
and trying to fix this breaks something else (weirdly, not fully investigated). [This now seems to have have been fixed (July 2023):<ul><li>
<p> All the blurb files have to be .htm - .html is not recognised by people.py
There seems to be a problem with importing blurbs with more than one image file, even those the code and trying to fix this breaks something else (weirdly, not fully investigated).
in people.py only looks for the first image file but then fails to use it. <li>
There seems to be a problem with importing blurbs with more than one image file, even those the code
<h4>Proposal</h4> in people.py only looks for the first image file but then fails to use it.]</ul>
<p>I would start by replacing the recognisers in <var>urls.py</var> with a slug for an arbitrary text string, and interpreting it in the python code handling the page.
This would entail replacing all the database parsing bits to produce the same slug in the same way. <h4>Proposal</h4>
<p>At that point we should get the 19 people into the system even if all the other crumdph is still there. <p>I would start by replacing the recognisers in <var>urls.py</var> with a slug for an arbitrary text string, and interpreting it in the python code handling the page.
Then we take a deep breath and look at it all again. This would entail replacing all the database parsing bits to produce the same slug in the same way.
<p>At that point we should get the 19 people into the system even if all the other crumdph is still there.
<h2 id="otherfolk">Folk: pending possible improvements</h2> Then we take a deep breath and look at it all again.
<p>Read about the <a href="../computing/folkupdate.html">folklist script</a> before reading the rest of this. <h2 id="otherfolk">Folk: pending possible improvements</h2>
<p>This does some basic validation: it checks that the mugshot
images and blurb HTML files exist. <p>Read about the <a href="../computing/folkupdate.html">folklist script</a> before reading the rest of this.
<p>This does some basic validation: it checks that the mugshot
<p> The folk.csv file could be split: images and blurb HTML files exist.
<br>
folk-1.csv will be for old cavers who will not come again, so this file need never be touched. <p> The folk.csv file could be split:
<br> <br>
folk-2.csv will be for recent cavers and the current expo, this needs editing every year folk-1.csv will be for old cavers who will not come again, so this file need never be touched.
<br>
<p> folk-2.csv will be for recent cavers and the current expo, this needs editing every year
The year headings of folk-1 and folk-2 need to be accurate , but they do not need to be
the same columns. So folk-2 can start in a much later year. <p>
The year headings of folk-1 and folk-2 need to be accurate , but they do not need to be
<p> the same columns. So folk-2 can start in a much later year.
folk-0 will be for awkward buggers whose attendance spans decades. This needs updating whenever
one of these lags attends: <p>
AERW, Becka, Mark Dougherty, Philip Sargent, Chris Densham, Mike Richardson folk-0 will be for awkward buggers whose attendance spans decades. This needs updating whenever
one of these lags attends:
<p> AERW, Becka, Mark Dougherty, Philip Sargent, Chris Densham, Mike Richardson
Currently (August 2022) the software ignores folk-0, -1, -2 and we have used the old folk.csv for
the 2022 expo. But we hope to have this fixed next year... <p>
Currently (July 2023) the software ignores folk-0, -1, -2 and we have used the old folk.csv for
<hr /> the 2023 expo. But we hope to have this fixed next year...
Return to: <a href="trogdesign.html">Troggle design and future implementations</a><br />
Return to: <a href="trogintro.html">Troggle intro</a><br /> <hr />
Troggle index: Return to: <a href="trogdesign.html">Troggle design and future implementations</a><br />
<a href="trogindex.html">Index of all troggle documents</a><br /><hr /> Return to: <a href="trogintro.html">Troggle intro</a><br />
</body> Troggle index:
</html> <a href="trogindex.html">Index of all troggle documents</a><br /><hr /></body>
</html>