From: Philip Sargent (Gmail) [mailto:philip.sargent@gmail.com] Sent: 19 April 2020 01:28 [original - since edited with extra refs.] To: expo-tech@lists.wookware.org Subject: vague thoughts about future troggle architecture
At our last virtual pub Sam confirmed that using today's tools to re-partition troggle with all the user interface in the user's browser would be utterly horrible using current tools (javascript frameworks: react, angular etc.).
These front-end frameworks get out of date in couple of years or so. So they don't give us the decade-long stability we need to match available maintenance effort. [ See Wikipedia list of javascript frameworks.] With our deep historical perspective ("cough"), we can expect this menagerie to sort itself out into a stable, standardised foundation within the next couple of decades but probably not within the next 10 years.
A web API to expose the troggle database (read-only) would allow some keen person to write a special-purpose app on a phone, e.g. an entrance-locator app, talking directly to the database. But replacing the whole user interface does not seem feasible yet. In 10 years: probably.
It did occur to me that we are missing a trick: 99+% of the database doesn't change except for survey data updates which, apart from during expo, happen only every week or so. And the database is only 10 MB so is entirely feasible to copy absolutely everything into the browser except for scanned images and photos.
So we could partition troggle so that all the user display bits run in the
browser (or a progressive web app) using a python interpreter running in
javascript. [yeah, expofiles would need some subset labelled as needing to
be forcibly downloaded, but the rest coming only on demand.] Some django
enthusiast must have done this already surely ? Ah yes, Brython.
github.com/brython-dev/brython
www.brython.info and
Skulpt (which has, since 2017, a full-blown
commerical system() built on top of it - by a CambridgeCL spinout)
Which is fun, but not useful. And not just because it may be immature. None of this addresses our biggest problem: devising something that can be maintained by fewer, less-expert people who can only devote short snippets of time and not long-duration immersion.
I know Wookey has been thinking of a loose federation of independent scripts working on the same data, but the more I look at troggle and the tasks it does the less I feel that would work. At the core there is a common data model that everything must understand - and the only unambiguous way of presenting that data model is working code, e.g. see Troggle architecture and click on the image to see a bigger copy. [It is out of date - if someone can quickly generate an update that would be nice. It's on my to-do list..] Much of what wallets.py does (originally by Martin Green) is in troggle already - but better. [There is a many:many relationship between svx files and wallet directories in reality, not 1:1]
Another possibility is ripping django out of troggle and leaving bare python plus a SQL database [see Trog2030 proposal]. This means that programmers would need to understand more SQL but would not need to understand "django". Arguably this could mean that we could gain.
Writing our own multi-user code would not be sensible, hence the database. But we could move to a read-only system where the only writing happens on data-import. Then we could use python 'pickle()' or 'json()' read-only data structures, but we would need to create all our own indexing and cross-referencing code (which is a much bigger job* than you might think).
There would be more lower-level code, but the different segments of the system could be in caving-sensible modules not django-meaningful modules. And we would not have all the extra language-like constructs that django introduces e.g. X.objects.set_all(), which modern editors complain about because it is a django idiom and not a function within the python codebase. (We could retain an HTML templating engine though.)
The above discussion is extremely ignorant in a couple of respects. Now (April 2021) we can properly appreciate that the part of Django that interacts with a database is actually a small part of the system. The http request/response engine is not easily replaced. And the 90 or so HTML templates do not just reformat the data given to them in python dictionaries: they directly query and traverse the database to produce tabular output. So if we 'took out' the database, most of our templates would fail utterly and need completely rewriting. It could be done, but the manpower requirement is not trivial.
There is a templating engine Nunjucks which is a port to JavaScript of the Django templating system we use (via Jinja - these are the same people who do Flask). This would be an obvious thing to use if we needed to go in that direction.
We need a templating engine because so much of the troggle coorindation output is in tables of data from diffrerent sources, e.g. see all survey data for 264.
Several organisations have moved their user-interface layer to the browser using Nunjucks including the NHS digital service and Firefox.
Currently every troggle code operation uses the django ORM search and filter operations on the central database to find any object it needs. If we don't have a central database then we have to use direct object references and we need to think about the design of a central registry object to hold these. There is a well-studied design pattern that describes this design "Big Ball of Mud" which and the contributing actions "Piecemeal growth" and "Sweeping it under the rug".
We are always using one object, e.g. a wallet, just to get at another object, e.g. a scan of some original notes, in order to check the data we are checking, e.g. a survex file. Maintaining two-way dependencies amoung all the objects is what "foreign keys" do in a database, but the problem doesn't go away when we don't have a database: it gets slightly harder.
One thing that is easier with troggle is that we don't have many object lifecycle issues. Everything is created once and lasts forever. There are only a few ephemeral objects during the initial data import from files.
Troggle today doesn't need anything complex, a single registry singleton would probably be fine (though hard to test), but if it evolves towards being a set of interacting services then a more sophisticated architecture would be needed.
The Java community found "dependency resolution" very helpful for wiring-up loosely objects/components in the late 1990s with the "Inversion of Control" technique which can be implemented in several ways, most commonly using "Dependency Injection". But for troggle we must be careful that doing this the "right" way may make the code even more inaccessible to novice caver-programmers than django is. Which is the whole point of moving away from django. Fortunately python programmers have produced some recent guidance: Python Dependency Injection Made Simple and Dependency injection and inversion of control in Python. We should probably use the simpler "Constructor Injection" variation as we need to make all our code more easily testable. Flask uses that.