expoweb/handbook/troggle/trog2030.html
2024-02-09 00:01:17 +00:00

187 lines
15 KiB
HTML

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Handbook Troggle Design</title>
<link rel="stylesheet" type="text/css" href="/css/main2.css" />
</head>
<body>
<style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>Troggle in 2025-2030</h1>
<h2>5-Year Plan</h2>
<p>[Philip Sargent, 1 June 2020. Async updates 21 April 2021]
<ul>
<li>I reckon django has at least another 4-5 years left as a very active project (~2025) and at least a decade or so as a well-maintained project.
<li>I reckon python has another 10-20 years at least.
<li>I reckon SQL databases have another 30 years at least.
</ul>
<p>I don't think writing our own object/SQL code is sensible:
there is such a lot going on we would create a large volume of software even if we stick close to the metal.
[I could well be wrong. That is Option 1.]
<h3>Option 2</h2>
<p>
We keep the same architecture as now, and incrementally replace modules that use django/SQL with direct object storage of collections using pickle(), shelve() and json().
Concurrency is not a problem as all data is read-only (this is not <em>entirely</em> true - see below).
We keep the url-rewriting and html-template things in django.[and migrate the unit-tests (a recent innovation) from django to be run stand-alone.]
<p>
This means that the "django-ness" part of troggle becomes quite small.
The more modules we replace, the easier it becomes for new people to work on it - but also the easier it becomes to migrate it to newer django versions. Or the easier it becomes to move entirely from django to Jinja2 [or Mako] + a URL-router
[e.g. <a href="https://werkzeug.palletsprojects.com/en/1.0.x/routing/">werkzeug</a> or routes] + a HTTP-request/response system.
The data flow through the system becomes obvious to any python programmer with no django knowledge needed.
<p>
[This could be harder than it looks if cross-referencing and pointers between collections become unmaintainable - a risk we need to watch. But other people are using Redis for this sort of thing. ]
<p>
Being memory-resident, we get a 20x speed increase. Which we don't need.
<p>
So the proposed Option 2 looks a bit like this (django is the "flawed supplier" and pickle() is the "new supplier")<br><a href="
https://martinfowler.com/bliki/BranchByAbstraction.html">
<img src="https://martinfowler.com/bliki/images/branch-by-abstraction/step-2.png">Migrate to Replacement Abstraction Layer</a>
<h3>Option 2A</h2>
<p>
We also use a noSQL db with a direct and easy mapping to python collections. The obvious candidates are
<a href="https://www.mongodb.com/">MongoDb</a> or the
<a href="https://en.wikipedia.org/wiki/Zope_Object_Database">Zope Object Database</a> (ZODB). MongoDb is famous and programmers may want to work on it to get the experience, but ZODB is much closer to python. But ZODB is now rather old, and the Django package django-zodb has not been updated for 10 years. And MongoDb has a bad impedence mismatch: <a href="https://daniel.feldroy.com/when-to-use-mongodb-with-django.html">Short answer is you don't use MongoDB with Django</a> which creates a lot of extra pointless work. If we ever need atomic transactions we should use a database and not try to fudge things ourselves, but not either of those.
[This needs to be explored, but I suspect we don't gain much compared with the effort of forcing maintainers to learn a new query language. Shelve() is already adequate.]
<h3>Option 2B</h2>
<p>We migrate to an <a href="https://www.fullstackpython.com/sanic.html">"improved Django"</a> or a "Django-lite". Django is a massive system, and it is moving with agility towards being more asynchronous, but there are already competing projects which do much the same thing, but in a cleaner way and (being 15 years younger) without the historical baggage and cutting out a lot of now-uneeded complexity. This looks like being a very hopeful possibility. <a href="https://sanicframework.org/en/">Sanic</a> and <a href="https://pgjones.gitlab.io/quart/">Quart</a> look like being the first of a new generation.
<p>The driver for these new Django clones is the
<a href="http://masnun.rocks/2016/11/17/exploring-asyncio-uvloop-sanic-motor/">asynchronous capabilities</a> in python 3.4 and the
<a href="https://www.encode.io/articles/hello-asgi">ASGI interface</a> for web workers which replaces WSGI. These will have much the same effect on python as Node.js had on JavaScript.
<p>But we should not be in too much of a hurry. It will take Sanic (and similar) years to get to a state where things don't
break between versions horribly every 6 months. We lived through
<a href="https://en.wikipedia.org/wiki/Django_(web_framework)#Version_history">that nastiness</a>
with Django 2006-2016 and we don't want to fall into the same mess again. Django has only just got sensibly mature
and has stopped breaking so much with every new release.
<p>A real architectural option would be to move to <a href="https://redis.io/">Redis</a> as an in-memory database
instead of using MariaDB. We don't need a fully-fledged database and these new frameworks are less closely tied to
having a SQL database and object-request broker (ORM) than Django is. This ties in neatly which Option 2 above:
reduce our use of the database functions within Django to vanishing point. [Oops. That would be a mistake. It is the SQL database that handles
all the multi-user non-interference, not Django. So unless we want to roll our own real-time software synchronisation.. Better not go there.]
<p>ASGI, which Dango supports from v3.2 too, has the interesting effect that we no longer need a webserver like Apache or nginx to buffer requests. We can use very lightweight <a href="https://www.uvicorn.org/">uvicorn</a> instead.
<h3>Option Zero</h2>
<div style="color:blue">
<p>We need to de-cruft troggle *first*: remove 'good ideas' which were never used, trim redundancies and generally reduce it in size significantly.
<p>
We should have a good look at making a list of functions that we will drop and some we replace by parsing cavern output and some we calculate
during importing/reading svx files.
<p>
Documentation and a working list of on-going programming projects is the key to keeping troggle in a state where someone can pick it up and do a
useful week's work, e.g. extracting the parsed logbooks to use shelve() storage throughout instead of SQL. The next time someone like Radost comes
along during the next 5 years we want to be able to use them effectively.
</div>
<p>[Decrufting and refactoring has been continuous since 2020. The ExpeditionDay class was refactored away and many unused properties of classes have been removed. DB query optimisation is only just beginning in 2023 though.]
<h3>Option Zero.1</h2>
<div style="color:blue">
<p>We should probably review and revise all the over-complex templates, originally written in 2006, which do serious amounts of database querying,
and linked object sub-querying, within the template code. This is a nightmare to maintain and debug. <!--e.g. see
the note at the bottom of <a href="/personexpedition/Wookey/1999">Wookey 1999 trips</a> - this fixed itself after de-crufting elsewhere -->
<p>A possible goal would be to create all the data that will be displayed in a page as dictionaries - generated by obviously simple python, with
some Django query optimisations if necessary - which are then handed as 'context' to the Django page template (those files in <var>
troggle/templates/xxx.html</var>), instead of using the Django-specific database object requests within the template sublanguage, e.g.
<code><pre>
{% if persondate.2 %}
&lt;td class="survexblock"&gt;&lt;a href="{% url "svx" persondate.2.survexfile.path %}"&gt;{{persondate.2.name}}&lt;/a&gt;&lt;/td&gt;
{%comment%}
&lt;td class="roles" style="padding-right: 3px; text-align:right"&gt;
{% for survexpersonrole in persondate.2.survexpersonrole_set.all %}
{{survexpersonrole.nrole}}
{% endfor %}
&lt;/td&gt;{%endcomment%}
&lt;td style="text-align:right"&gt;
{{persondate.2.legslength|stringformat:".1f"}} m
&lt;/td&gt;
{% else %}
&lt;td colspan="3"&gt; &lt;/td&gt;
{% endif %}
</pre></code>
</div>
<h3>Things that could be a bit sticky 1 - multi-user safety</h3>
<p>Multi-user synchronous use could be a bit tricky without a solid multi-user database sitting behind the python code. So removing all the SQL database use may not be what we want to do after all.
<p>Under all conceivable circumstances we would continue to use WGSI or
<a href="https://asgi.readthedocs.io/en/latest/introduction.html">ASGI</a> to connect our python code to a user-facing
webserver (apache, nginx, gunicorn). Every time a webpage is served, it is done by a separate thread in the webserver and essentially a
new instance of Django is created to serve it. Django relies on its multi-user SQL database (MariaDB, postgresql) to ensure that competing
updates by two instantiations of itself to the same stored object are correctly atomic. But even today, if two people try to update the
same handbook <em>webpage</em>, or the same survex <em>file</em>, at the same time we expect horrible corruption of the data. Even today,
with the SQL database, writing <em>files</em> is not coded in a properly multi-user manner. We should write some file lock/serializer code
to make this safe.
<p>The move by <a href="https://arunrocks.com/a-guide-to-asgi-in-django-30-and-its-performance/">Django
from single-threaded WSGI to asynchronous ASGI</a> began with v3.0 and for 'views' almost complete in 3.2.
This makes the server more responsive,
but doesn't really change anything from the perspective of our need to stop users overwriting each others' work. If we just store
everything in in-memory dictionaries we may need to write our own asyncio python to do that synchronization. That would be a Bad Thing as
we are trying to make future maintenance easier, not harder. But it looks like Redis could be the solution for us.
<p style = "margin-left: 5em; margin-right: 3em"> [ Redis does multi-user concurrency (which is what we need), but most people don't use it like that.
Nearly everyone who uses Redis with Django uses it as a page cache, not as the fundamental store. So public
experience is likely to be less useful than we may hope. Learning to use Redis would probably also mean getting to grips with
Django's <a href="https://docs.djangoproject.com/en/dev/topics/cache/#redis">existing cache system</a>,
which is complex and which we don't use.
<br>
NB We do cache one page per expedition, but explictly in our own python code. So this is a per-process cache which is good for one
person's intensive use, much like Django's
<a href="https://docs.djangoproject.com/en/dev/topics/cache/#local-memory-caching">Local-memory caching</a> ]</p>
<h3 id="frontends">Things that could be a bit sticky 2 - front-end code</h3>
<p>There is not yet a front-end (javascript or <a href="https://en.wikipedia.org/wiki/WebAssembly">WebAssembly</a>) framework on the client, i.e. a phone app or webpage, which is stable enough for us to commit
effort to (we managed to remove all the jQuery by using recent HTML5 capabilities). Flask looks interesting
(but <a href="https://adamj.eu/tech/2019/04/03/django-versus-flask-with-single-file-applications/">maybe is only simpler when
starting a new project and doesn't scale to complexity</a> the way Django does, but maybe in 2025 we
could see a good way to move all the user interface (rewritten to be GIS-centric?) to the client
(re-written in <a href="https://www.educba.com/typescript-vs-dart/">Typescript
or Dart</a>) and just have an API on the server. [We already have a proof of principle JSON export API working, see
<a href="exportjson.html">Troggle JSON</a>.]
<h4>front-end code is not just a pretty face</h4>
<p>Modern JavaScript frameworks support dynamic 'single-page websites' where all the component parts are fetched and replaced
asynchronously (this used to be called <a href=""https://en.wikipedia.org/wiki/Ajax_%28programming%29>AJAX</a> when it first appeared in
1999). This is fundamentally different from how Django was originally designed: using public URLs connected to code which produces a
complete webpage based on a single template. Django <a href="https://engineertodeveloper.com/how-to-use-ajax-with-django/">can interoperate
</a> with dynamic systems but support will <a href=
"https://speakerdeck.com/andrewgodwin/just-add-await-retrofitting-async-into-django?slide=76">become increasingly baroque</a> I imagine.
<h3>Things that could be a bit sticky 3 - GIS</h3>
<p>
New functionality: e.g. making the whole thing GIS-centric is a possibility.
A GIS db could make a lot of sense. Expo has GIS expertise and we have a lot of badly-integrated GPS data, so this needs a lot of thinking to be done and we should get on with that.
<h3>API</h3>
<p>We will also need an API now-ish, whatever we do, so that keen kids can write their own special-purpose front-ends using new cool toys. Which will keep them out of our hair. We can do this easily with Django templates that generate JSON, which is <a href="https://www.cuyc.org.uk/committee/events_json_short/">what CUYC do</a>. We already have some of this: <a href="exportjson.html">JSON export</a>.
<h4>WebAssembly Front-Ends</h4>
[January 2024]<br />
<p>We have been waiting for more than a decade and a half for <a href="trogspeculate.html">the JavaScript Framework mess</a> to sort itself out. We want to see where we could sensibly move to a front-end+back-end architecture, instead of redrawing every screen of data on the server (see above "<a href="frontends">Things that could be a bit sticky 2 - front-end code</a>").
<p>In 2024 it now looks as if we may be able to stretch the current architecture into a <a href="trogspeculate.html#frontends">post-Javascript</a> era entirely because Webassembly <a href="https://thenewstack.io/webassembly-4-predictions-for-2024/">continues to develop rapidly</a>.
<h3>Postscript</h3>
<p>Andy Waddington, who wrote the first expo website in 1996, mentioned that he could never get the hang of Django at all, and working with SQL databases would require some serious book-revision:
<p>
So a useful goal, I think, is to make 'troggle2' accessible to a generic python programmer with no specialist skills in any databases or frameworks. Put against that is the argument that that might double the volume of code to be maintained, which would be worse. Nevertheless, an aim to keep in mind.
But even 'just Python' is not that easy. Python is a much bigger language now than it used to be, with some increasingly esoteric corners, such as the new asyncio framework..
<hr />
Return to: <a href="trogdesign.html">Troggle design and future implementations</a><br />
Return to: <a href="trogintro.html">Troggle intro</a><br />
Troggle index:
<a href="trogindex.html">Index of all troggle documents</a><br />
<hr /></body>
</html>