diff --git a/handbook/troggle/namesredesign.html b/handbook/troggle/namesredesign.html index 4a69cdd7a..09cdbeabe 100644 --- a/handbook/troggle/namesredesign.html +++ b/handbook/troggle/namesredesign.html @@ -1,126 +1,127 @@ - - -
- -The current system completely fails with names which are in any way "non standard". -Troggle can't cope with a name not structured as -"Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation, -capital letters or other names or initials). -
There are 19 people for which the troggle name parsing and the separate folklist script parsing -are different. Reconciling these (find easily using a link checker scanner on the -folk/.index.htm file) is a job that needs to be done. Every name in the generated -index.htm now has a hyperlink which goes to the troggle page about that person. Except -for those 19 people. - -This has to be fixed as it affects ~5% of our expoers. -
[This document originally written 31 August 2022] - -
We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain. - -
Fundamentally we have regexes detecting whether something is a name or not - in several places. These should all be replaced by properly delimited strings. -
- re_path(r'^person/(?P[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"),
- re_path(r'^personexpedition/(?P[A-Z]*[a-z&;]*)[^a-zA-Z]*(?P[A-Z]*[a-zA-Z&;]*)/(?P\d+)/?$', personexpedition, name="personexpedition"),
-
-where the transmission noise is attmpting to recognise a name and split it into <first_name> and <last_name>.
-Naturally this fails horribly even for relatively straightforward names such as Ruairidh MacLeod.
-
-Frankly it's amazing it even appears to work at all. - -
-Troggle reads the mugshot and blurb about each person.
-It reads it direct from folk.csv which has fields of URL links to those files.
-It does this when troggle is run with
-python databaseReset.py people
-
-Troggle generates its own blurb about each person, including past expeditions and trips -taken from the logbooks (and from parsing svx files) -A link to this troggle page has been added to folk/index.htm -by making it happen in make-folklist.py -
-Troggle scans the blurb and looks for everything between <body> and <hr> -to find the text of the blurb -(see parsers/people.py) -
-All the blurb files have to be .htm - .html is not recognised by people.py -and trying to fix this breaks something else (weirdly, not fully investigated). -
-There seems to be a problem with importing blurbs with more than one image file, even those the code -in people.py only looks for the first image file but then fails to use it. - -
I would start by replacing the recognisers in urls.py with a slug for an arbitrary text string, and interpreting it in the python code handling the page. -This would entail replacing all the database parsing bits to produce the same slug in the same way. -
At that point we should get the 19 people into the system even if all the other crumdph is still there. -Then we take a deep breath and look at it all again. - -
Read about the folklist script before reading the rest of this. -
This does some basic validation: it checks that the mugshot -images and blurb HTML files exist. - -
The folk.csv file could be split:
-
-folk-1.csv will be for old cavers who will not come again, so this file need never be touched.
-
-folk-2.csv will be for recent cavers and the current expo, this needs editing every year
-
-
-The year headings of folk-1 and folk-2 need to be accurate , but they do not need to be -the same columns. So folk-2 can start in a much later year. - -
-folk-0 will be for awkward buggers whose attendance spans decades. This needs updating whenever -one of these lags attends: -AERW, Becka, Mark Dougherty, Philip Sargent, Chris Densham, Mike Richardson - -
-Currently (August 2022) the software ignores folk-0, -1, -2 and we have used the old folk.csv for -the 2022 expo. But we hope to have this fixed next year... - -
The current system completely fails with names which are in any way "non standard". +Troggle can't cope with a name not structured as +"Forename Surname": where it is only two words and each begins with a capital letter (with no other punctuation, +capital letters or other names or initials). +
There are 19 people for which the troggle name parsing and the separate folklist script parsing +are different. Reconciling these (find easily using a link checker scanner on the +folk/.index.htm file) is a job that needs to be done. Every name in the generated +index.htm now has a hyperlink which goes to the troggle page about that person. Except +for those 19 people. + +This has to be fixed as it affects ~5% of our expoers. +
[This document originally written 31 August 2022] + +
We have special code scattered across troggle to cope with "Wookey", "Wiggy" and "Mike the Animal". This is a pain to maintain. + +
Fundamentally we have regexes detecting whether something is a name or not - in several places. These should all be replaced by properly delimited strings. +
+ re_path(r'^person/(?P[A-Z]*[a-z\-\'&;]*)[^a-zA-Z]*(?P[a-z\-\']*[^a-zA-Z]*[\-]*[A-Z]*[a-zA-Z\-&;]*)/?', person, name="person"),
+ re_path(r'^personexpedition/(?P[A-Z]*[a-z&;]*)[^a-zA-Z]*(?P[A-Z]*[a-zA-Z&;]*)/(?P\d+)/?$', personexpedition, name="personexpedition"),
+
+where the transmission noise is attmpting to recognise a name and split it into <first_name> and <last_name>.
+Naturally this fails horribly even for relatively straightforward names such as Ruairidh MacLeod.
+
+Frankly it's amazing it even appears to work at all. + +
+Troggle reads the mugshot and blurb about each person.
+It reads it direct from folk.csv which has fields of URL links to those files.
+It does this when troggle is run with
+python databaseReset.py people
+
+Troggle generates its own blurb about each person, including past expeditions and trips +taken from the logbooks (and from parsing svx files) +A link to this troggle page has been added to folk/index.htm +by making it happen in make-folklist.py +
+Troggle scans the blurb and looks for everything between <body> and <hr> +to find the text of the blurb +(see parsers/people.py) +
+ [This now seems to have have been fixed (July 2023):
I would start by replacing the recognisers in urls.py with a slug for an arbitrary text string, and interpreting it in the python code handling the page. +This would entail replacing all the database parsing bits to produce the same slug in the same way. +
At that point we should get the 19 people into the system even if all the other crumdph is still there. +Then we take a deep breath and look at it all again. + +
Read about the folklist script before reading the rest of this. +
This does some basic validation: it checks that the mugshot +images and blurb HTML files exist. + +
The folk.csv file could be split:
+
+folk-1.csv will be for old cavers who will not come again, so this file need never be touched.
+
+folk-2.csv will be for recent cavers and the current expo, this needs editing every year
+
+
+The year headings of folk-1 and folk-2 need to be accurate , but they do not need to be +the same columns. So folk-2 can start in a much later year. + +
+folk-0 will be for awkward buggers whose attendance spans decades. This needs updating whenever +one of these lags attends: +AERW, Becka, Mark Dougherty, Philip Sargent, Chris Densham, Mike Richardson + +
+Currently (July 2023) the software ignores folk-0, -1, -2 and we have used the old folk.csv for +the 2023 expo. But we hope to have this fixed next year... + +