updated re recent chnages

2025-03-21 01:31:56 +00:00 · 2023-10-04 23:21:14 +03:00 · 2023-10-04 23:21:14 +03:00 · 1750c5a623
commit 1750c5a623
parent 573c6d3f3b
1 changed files with 38 additions and 34 deletions
--- a/handbook/troggle/namesredesign.html
+++ b/handbook/troggle/namesredesign.html
@ -19,30 +19,43 @@

 </ul>

-<h2 id="why">Names: Why we need a change</h2>
+<h2 id="why">Names: Why it is a problem</h2>


-<p>The <a href="#whatold">current system</a> completely fails with names which are in any way "non standard".
-Troggle can't cope with a name not structured as
+<p>The <a href="#whatold">former system</a> completely failed with names which are in any way "non standard".
+Troggle ccouldn't cope with a name not structured as
 "Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation,
 capital letters or other names or initials). 
-<p>There are 19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing
-are different. Reconciling these (find easily using a link checker scanner on the
-folk/.index.htm file) is a job that needs to be done. Every name in the generated
-index.htm now has a hyperlink which goes to the troggle page about that person. Except 
-for those 19 people.
+<p>There were  19 people for which the troggle name parsing and the separate <a href="scriptscurrent.html#folk">folklist script</a> parsing
+were different. 

-This has to be fixed as it affects ~5% of our expoers. 
-<p><em>[This document originally written 31 August 2022]</em>

 <h2 id="maint">Names: Maintenance constraints</h2>
 <p>We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain.

-<h2 id="whatold">Names: How it works now</h2>
-<p>Fundamentally we have regexes detecting whether something is a name or not - in several places. These should all be replaced by properly delimited strings.
+<h2 id="whatold">Names: How it works</h2>
+<p>Fundamentally we have regexes detecting whether something is a name or not - in several places in the different types of raw data. However we do now use unique 'slugs' for the references between pages (since Sept. 2023).
 <h4>Four different bits</h4>
+
 <ul>
-<li>In <var>urls.py</var> we have
+<li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding  "Forename Surname (nickname)" and  "Surname" as the first two columns in the CSV file. 
+These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var>
+parsers/people.py</var>) only when a full data import is done. Which is a problem for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle 
+'names', McLean, MacLeod and McAdam.
+
+<li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>)  when a full data import is done (or when a survex file is edited online).
+
+    
+
+<li>We have the <var>&lt;div class="trippeople"&gt;&lt;u&gt;Luke&lt;/u&gt;, Hannah&lt;/div&gt;</var> trip people line in each logbook entry.
+These are recognised by a regex in <var>parsers/logbooks.py</var>  when a full data import is done (or when a logbook entry is edited online).
+
+<li>We have the names of people in a list on a wallet: which is necessary when the wallet has no attached survex file. But even when there are (one or more) attached survexfiles, there is a place to input a list of peoples' names as well. This is parsed by <var>parsers/scans.py</var>.
+</ul>
+<p>Frankly it's amazing it even appears to work at all.
+
+<p>
+In <var>urls.py</var> we used to have
 <code>
    re_path(r'^person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"),
  
@ -50,23 +63,19 @@ This has to be fixed as it affects ~5% of our expoers.
  
  re_path('wallets/person/(?P<first_name>[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P<last_name>[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?',   walletslistperson,  name="walletslistperson"), 
 </code>
-where the transmission noise is attmpting to recognise a name and split it into &lt;first_name&gt; and &lt;last_name&gt;. 
-Naturally this fails horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>.

-<li>We have the <a href="scriptscurrent.html#folk">folklist script</a> holding  "Forename Surname (nickname)" and  "Surname" as the first two columns in the CSV file. 
-These are used by the standalone script to produce the <var>/folk/index.html</var> which is run manually, and which is also parsed by troggle (by a regex in <var>
-parsers/people.py</var>) only when a full data import is done. Which it gets wrong for people like <var>Lydia-Clare Leather</var> and various 'von' and 'de' middle 
-'names', McLean, MacLeod and McAdam.
-
-<li>We have the <var>*team notes Becka Lawson</var> lines in all our survex files which are parsed (by regexes in <var> parsers/survex.py</var>) only when a full data 
-import is done. 
-
-    
-
-<li>We have the <var>&lt;div class="trippeople"&gt;&lt;u&gt;Luke&lt;/u&gt;, Hannah&lt;/div&gt;</var> trip people line in each logbook entry.
-These are recognised by a regex in <var>parsers/logbooks.py</var> only when a full data import is done.
-</ul>
-<p>Frankly it's amazing it even appears to work at all.
+where the 'transmission noise' is attmpting to recognise a name and split it into &lt;first_name&gt; and &lt;last_name&gt;. 
+Naturally this failed horribly even for relatively straightforward names such as <em>Ruairidh MacLeod</em>.
+<p>
+We now [October 2023] have
+<code>
+   path('person/&lt;slug:slug&gt;', person, name="person"),<br />
+   
+   path('personexpedition/&lt;slug:slug&gt;/&lt;int:year&gt;', personexpedition, name="personexpedition"),<br />
+   
+   path('wallets/person/&lt;slug:slug&gt;',   walletslistperson,  name="walletslistperson"), 
+</code>
+which is a lot easier to maintain.

 <h4>Troggle folk data importing</h4>
 <p>
@ -91,11 +100,6 @@ and trying to fix this breaks something else (weirdly, not fully investigated).
 There seems to be a problem with importing blurbs with more than one image file, even those the code
  in people.py only looks for the first image file but then fails to use it.]</ul>

-<h4>Proposal</h4>
-<p>I would start by replacing the recognisers in <var>urls.py</var> with a slug for an arbitrary text string, and interpreting it in the python code handling the page. 
-This would entail replacing all the database parsing  bits to produce the same slug in the same way.
-<p>At that point we should get the 19 people into the system even if all the other crumdph is still there.
-Then we take a deep breath and look at it all again.

 <h2 id="otherfolk">Folk: pending possible improvements</h2>