From 1750c5a62314977be6ed4f911ffea7c0747ac816 Mon Sep 17 00:00:00 2001 From: Philip Sargent <philip.sargent@gmail.com> Date: Wed, 4 Oct 2023 23:21:14 +0300 Subject: [PATCH] updated re recent chnages --- handbook/troggle/namesredesign.html | 72 +++++++++++++++-------------- 1 file changed, 38 insertions(+), 34 deletions(-) diff --git a/handbook/troggle/namesredesign.html b/handbook/troggle/namesredesign.html index d63de7d83..b726e15bb 100644 --- a/handbook/troggle/namesredesign.html +++ b/handbook/troggle/namesredesign.html @@ -19,30 +19,43 @@ </ul> -<h2 id="why">Names: Why we need a change</h2> +<h2 id="why">Names: Why it is a problem</h2> -<p>The <a href="#whatold">current system</a> completely fails with names which are in any way "non standard". -Troggle can't cope with a name not structured as +<p>The <a href="#whatold">former system</a> completely failed with names which are in any way "non standard". +Troggle ccouldn't cope with a name not structured as "Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation, capital letters or other names or initials). -<p>There are 19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing -are different. Reconciling these (find easily using a link checker scanner on the -folk/.index.htm file) is a job that needs to be done. Every name in the generated -index.htm now has a hyperlink which goes to the troggle page about that person. Except -for those 19 people. +<p>There were 19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing +were different. -This has to be fixed as it affects ~5% of our expoers. -<p><em>[This document originally written 31 August 2022]</em> <h2 id="maint">Names: Maintenance constraints</h2> <p>We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain. -<h2 id="whatold">Names: How it works now</h2> -<p>Fundamentally we have regexes detecting whether something is a name or not - in several places. These should all be replaced by properly delimited strings. +<h2 id="whatold">Names: How it works</h2> +<p>Fundamentally we have regexes detecting whether something is a name or not - in several places in the different types of raw data. However we do now use unique 'slugs' for the references between pages (since Sept. 2023). <h4>Four different bits</h4> + <ul> -<li>In <var>urls.py</var> we have +<li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding "Forename Surname (nickname)" and "Surname" as the first two columns in the CSV file. +These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var> +parsers/people.py</var>) only when a full data import is done. Which is a problem for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle +'names', McLean, MacLeod and McAdam. + +<li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>) when a full data import is done (or when a survex file is edited online). + + + +<li>We have the <var><div class="trippeople"><u>Luke</u>, Hannah</div></var> trip people line in each logbook entry. +These are recognised by a regex in <var>parsers/logbooks.py</var> when a full data import is done (or when a logbook entry is edited online). + +<li>We have the names of people in a list on a wallet: which is necessary when the wallet has no attached survex file. But even when there are (one or more) attached survexfiles, there is a place to input a list of peoples' names as well. This is parsed by <var>parsers/scans.py</var>. +</ul> +<p>Frankly it's amazing it even appears to work at all. + +<p> +In <var>urls.py</var> we used to have <code> re_path(r'^person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"), @@ -50,23 +63,19 @@ This has to be fixed as it affects ~5% of our expoers. re_path('wallets/person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', walletslistperson, name="walletslistperson"), </code> -where the transmission noise is attmpting to recognise a name and split it into <first_name> and <last_name>. -Naturally this fails horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>. -<li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding "Forename Surname (nickname)" and "Surname" as the first two columns in the CSV file. -These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var> -parsers/people.py</var>) only when a full data import is done. Which it gets wrong for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle -'names', McLean, MacLeod and McAdam. - -<li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>) only when a full data -import is done. - - - -<li>We have the <var><div class="trippeople"><u>Luke</u>, Hannah</div></var> trip people line in each logbook entry. -These are recognised by a regex in <var>parsers/logbooks.py</var> only when a full data import is done. -</ul> -<p>Frankly it's amazing it even appears to work at all. +where the 'transmission noise' is attmpting to recognise a name and split it into <first_name> and <last_name>. +Naturally this failed horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>. +<p> +We now [October 2023] have +<code> + path('person/<slug:slug>', person, name="person"),<br /> + + path('personexpedition/<slug:slug>/<int:year>', personexpedition, name="personexpedition"),<br /> + + path('wallets/person/<slug:slug>', walletslistperson, name="walletslistperson"), +</code> +which is a lot easier to maintain. <h4>Troggle folk data importing</h4> <p> @@ -91,11 +100,6 @@ and trying to fix this breaks something else (weirdly, not fully investigated). There seems to be a problem with importing blurbs with more than one image file, even those the code in people.py only looks for the first image file but then fails to use it.]</ul> -<h4>Proposal</h4> -<p>I would start by replacing the recognisers in <var>urls.py</var> with a slug for an arbitrary text string, and interpreting it in the python code handling the page. -This would entail replacing all the database parsing bits to produce the same slug in the same way. -<p>At that point we should get the 19 people into the system even if all the other crumdph is still there. -Then we take a deep breath and look at it all again. <h2 id="otherfolk">Folk: pending possible improvements</h2>