JSON export documenation

2024-11-21 23:01:55 +00:00 · 2021-04-12 23:07:38 +01:00 · 2021-04-12 23:07:38 +01:00 · 9252c883d4
commit 9252c883d4
parent 8505627b4d
7 changed files with 127 additions and 13 deletions
--- a/handbook/troggle/exportjson.html
+++ b/handbook/troggle/exportjson.html
@ -0,0 +1,39 @@
+<!DOCTYPE html>
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<title>Handbook Troggle JSON</title>
+<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
+</head>
+<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
+<h2 id="tophead">CUCC Expedition Handbook</h2>
+<h1>Handbook Troggle JSON</h1>
+
+<p>
+We can export cave and expo data with Django templates that generate JSON, which is <a href="https://www.cuyc.org.uk/committee/events_json_short/">what CUYC do</a> (and we have access to their code).
+
+We already have a proof of principle JSON export API working at 
+<a href="http://expo.survex.com/api/expeditions_json">expo.survex.com/api/expeditions_json</a> and we also have 
+a text-format export as TSV (tab-separated values), which are like comma-separated values and similarly importable into spreadsheets.
+
+
+<h3>Expoadmin method</h3>
+<img  border="1" class="onright"  src='json-cmd.jpg'/></a>
+<p>Log in as 'expoadmin' (password: 'beery:cavey') at <a href="/admin/login/">/admin/login/</a> then look at the entries for any one of
+the objects: Caves, Expeditions, People etc.
+
+
+
+<p>
+In the top left-hand corner is a drop-down menu, 'export as json' is shown selected in the image.
+Select one or more of the objects listed and press the 'Go' button next to the drop-down menu. A JSON-formatted file will be produced 'troggle-ouput.json' in the troggle installation directory (where the troggle code .git file is).
+<p>There is also an 'export as XML' option. 
+<p>The code which adds this capability into the Troggle Administration control panel is in <var>troggle/core/admin.py</var>.
+<hr />
+Go on to: <a href="trogarch.html">Troggle architecture</a><br />
+Return to: <a href="trog2030.html">Troggle in 2025-2030</a><br />
+Troggle index: 
+<a href="trogindex.html">Index of all troggle documents</a><br />
+<hr />
+</body>
+</html>
--- a/handbook/troggle/exporttgz.html
+++ b/handbook/troggle/exporttgz.html
@ -0,0 +1,28 @@
+<!DOCTYPE html>
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<title>Handbook  all.tgz</title>
+<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
+</head>
+<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
+<h2 id="tophead">CUCC Expedition Handbook</h2>
+<h1>Handbook  all.tgz</h1>
+
+<p>
+The old <a href="scriptscurrent.html#makefile">Makefile</a> was an integral part of the pre-troggle 'scripts + spreadsheets' era of 
+cave data management. These days most files are regenerated whenever a data import is done into troggle, not
+on a daily basis using the cron-scheduled Makefile.
+
+<p>The all.tgz archive is not updated regularly, but if it is you wil find it in<br>
+<a href="/noinfo/all.tgz">noinfo/all.tgz</a>
+
+
+<hr />
+Go on to: <a href="trogarch.html">Troggle architecture</a><br />
+Return to: <a href="trog2030.html">Troggle in 2025-2030</a><br />
+Troggle index: 
+<a href="trogindex.html">Index of all troggle documents</a><br />
+<hr />
+</body>
+</html>
--- a/handbook/troggle/json-cmd.jpg
+++ b/handbook/troggle/json-cmd.jpg
--- a/handbook/troggle/trog2030.html
+++ b/handbook/troggle/trog2030.html
@ -10,7 +10,7 @@
 <h1>Troggle in 2025-2030</h1>

 <h2>5-Year Plan</h2>
-<p>[Philip Sargent, 1 June 2020]
+<p>[Philip Sargent, 1 June 2020. Async updates 14 April 2021]
 <ul>
 <li>I reckon django has at least another 4-5 years left as a very active project (~2025) and at least a decade or so as a well-maintained project.
 <li>I reckon python has another 10-20 years at least.
@ -23,7 +23,7 @@ there is such a lot going on we would create a large volume of software even if
 <h3>Option 2</h2>
 <p>
 We keep the same architecture as now, and incrementally replace modules that use django/SQL with direct object storage of collections using pickle(), shelve() and json().
-Concurrency is not a problem as all data is read-only.
+Concurrency is not a problem as all data is read-only (this is not <em>entirely</em> true - see below).
 We keep the url-rewriting and html-template things in django.[and migrate the unit-tests (a recent innovation) from django to be run stand-alone.]
 <p>
 This means that the "django-ness" part of troggle becomes quite small.
@ -55,28 +55,56 @@ We should have a good look  at modifying everything so that we do not read in ev
 <p>
 Documentation is the key to keeping troggle in a state where someone can pick it up and do a useful week's work, e.g. extracting the parsed logbooks to use shelve() storage throughout instead of SQL. The next time someone like Radost comes along during the next 5 years we want to be able to use them effectively.
 </div>
-<h3>Things that could be a bit sticky</h3>
-<p>
-New functionality: e.g. making the whole thing GIS-centric is a possibility.
-A GIS db could make a lot of sense. Not in scope for this discussion.
+<h3>Things that could be a bit sticky 1 - multi-user safety</h3>
+<p>Multi-user synchronous use could be a bit tricky without a solid multi-user database sitting behind the python code. So removing all the SQL database use may not be what we want to do after all.
+<p>Under all conceivable circumstances we would continue to use WGSI or 
+<a href="https://asgi.readthedocs.io/en/latest/introduction.html">ASGI</a> to connect our python code to a user-facing 
+webserver (apache, nginx, gunicorn). Every time a webpage is served, it is done by a separate thread in the webserver and essentially a 
+new instance of Django is created to serve it. Django relies on its multi-user SQL database (MariaDB, postgresql) to ensure that competing 
+updates by two instantiations of itself to the same stored object are correctly atomic. But even today, if two people try to update the 
+same handbook <em>webpage</em>, or the same survex <em>file</em>, at the same time we expect horrible corruption of the data. Even today, 
+with the SQL database, writing <em>files</em> is not coded in a properly multi-user manner. We should write some file lock/serializer code 
+to make this safe.
+
+<p>The move by <a href="https://arunrocks.com/a-guide-to-asgi-in-django-30-and-its-performance/">Django
+from single-threaded WSGI to asynchronous ASGI</a> began with v3.0 and for 'views' almost complete in 3.2. 
+This makes the server more responsive, 
+but doesn't really change anything from the perspective of our need to stop users overwriting each others' work. If we just store 
+everything in in-memory dictionaries we may need to write our own asyncio python to do that synchronization. That would be a Bad Thing as 
+we are trying to make future maintenance easier, not harder.
+
+<h3>Things that could be a bit sticky 2 - front-end code</h3>
 <p>There is not yet a front-end (javascript) framework on the client, i.e. a phone app or webpage, which is stable enough for us to commit
 effort to. Bits of troggle use very old jQuery ("edit this page", and the svx file editor) , and Flask looks interesting
 (but <a href="https://adamj.eu/tech/2019/04/03/django-versus-flask-with-single-file-applications/">maybe is only simpler when 
 starting a new project and doesn't scale to complexity</a> the way Django does, but maybe in 2025 we 
-could see a good way to move all the user interface (rewritten to be GIS-centric) to the client 
+could see a good way to move all the user interface (rewritten to be GIS-centric?) to the client 
 (re-written in <a href="https://www.educba.com/typescript-vs-dart/">Typescript
-or Dart</a>) and just have an API on the server. [We already have a proof of principle JSON export API working at 
-<a href="http://expo.survex.com/api/expeditions_json">expo.survex.com/api/expeditions_json</a>.]
+or Dart</a>) and just have an API on the server. [We already have a proof of principle JSON export API working, see
+<a href="exportjson.html">Troggle JSON</a>.]

+<h4>front-end code is not just a pretty face</h4>
+<p>Modern JavaScript frameworks support dynamic 'single-page websites' where all the component parts are fetched and replaced 
+asynchronously (this used to be called <a href=""https://en.wikipedia.org/wiki/Ajax_%28programming%29>AJAX</a> when it first appeared in 
+1999). This is fundamentally different from how Django was originally designed: using public URLs connected to code which produces a 
+complete webpage based on a single template. Django <a href="https://engineertodeveloper.com/how-to-use-ajax-with-django/">can interoperate
+</a> with dynamic systems but support will <a href=
+"https://speakerdeck.com/andrewgodwin/just-add-await-retrofitting-async-into-django?slide=76">become increasingly baroque</a> I imagine.
+
+<h3>Things that could be a bit sticky 3 - GIS</h3>
+<p>
+New functionality: e.g. making the whole thing GIS-centric is a possibility.
+A GIS db could make a lot of sense. Expo has GIS expertise and we have a lot of badly-integrated GPS data, so this needs a lot of thinking to be done and we should get on with that.
 <h3>API</h3>
-<p>We will also need an API now-ish, whatever we do, so that keen kids can write their own special-purpose front-ends using new cool toys. Which will keep them out of our hair. We can do this easily with Django templates that generate JSON, which is <a href="https://www.cuyc.org.uk/committee/events_json_short/">what CUYC do</a>
+<p>We will also need an API now-ish, whatever we do, so that keen kids can write their own special-purpose front-ends using new cool toys. Which will keep them out of our hair. We can do this easily with Django templates that generate JSON, which is <a href="https://www.cuyc.org.uk/committee/events_json_short/">what CUYC do</a>. We already have some of this: <a href="exportjson.html">JSON export</a>.
+
 <h3>Postscript</h3>

 <p>Andy Waddington, who wrote the first expo website in 1996, mentioned that he could never get the hang of Django at all, and working with SQL databases would require some serious book-revision:
 <p>
 So a useful goal, I think, is to make 'troggle2' accessible to a generic python programmer with no specialist skills in any databases or frameworks. Put against that is the argument that that might double the volume of code to be maintained, which would be worse. Nevertheless, an aim to keep in mind.

-But even 'just Python' is not that easy. Python is a much bigger language now than it used to be, with some esoteric corners. 
+But even 'just Python' is not that easy. Python is a much bigger language now than it used to be, with some increasingly esoteric corners, such as the new asyncio framework.. 



--- a/handbook/troggle/trogindex.html
+++ b/handbook/troggle/trogindex.html
@ -27,6 +27,9 @@
 </ul>
 <a href="serverconfig.html">Troggle server configuration</a> - how to get troggle running on a new machine (incoimplete!)<br>
 <a href="trogimport.html">Troggle - Data Import</a> - reset and import data<br>
+<ul>
+<li><a href="exportjson.html">Troggle - JSON export</a> - export data as JSON<br>
+</ul>
 <a href="trogdjango.html">Troggle and Django</a> - The Django web framework we use<br>
 <a href="trogdjangup.html">Troggle: updating Django</a> - Upgrading troggle to use a later Django version<br>
 <a href="unittests.html">Troggle unit tests</a> - test suite for programmers<br>
@ -38,8 +41,11 @@
 <a href="archnotes.html">Archive Notes</a> - old ideas and original discussions<br>
 <br>
 <a href="scriptsother.html">Additional Scripts</a> - non-django but important<br>
-<a href="scriptscurrent.html">Additional Scripts</a> - more detail<br>
-<a href="scriptsqms.html">QM (Question Mark) Scripts</a> - all five ways we do it<br>
+<ul>
+<li><a href="scriptsqms.html">QM (Question Mark) Scripts</a> - all five ways we do it<br>
+<li><a href="scriptscurrent.html">Additional Scripts</a> - more detail<br>
+<li><a href="exportjson.html">Export all.tgz</a> - export compressed survey data (Makefile)<br>
+</ul>

 <br>
 <hr />
--- a/handbook/troggle/trogintro.html
+++ b/handbook/troggle/trogintro.html
@ -36,6 +36,7 @@ The troggle software is written and maintained by expo members.
 <li><a href="/survey_scans/">expo.survex.com/survey_scans/</a> - List of all scanned original survey notes.
 <li><a href="/survey_scans/2018%252343/">expo.survex.com/survey_scans/2018%252343/</a> - list of links to scanned notes for wallet #43 during the 2018 expo.
 </ul>
+<p>If you want to find out how to do something using troggle, then you may find it the <a href="../troggle/trogmanual.html">troggle maintainers and advanced users manual</a

 <h3 id="troggle">Troggle - why we developed it</a></h3>

--- a/handbook/troggle/trogmanual.html
+++ b/handbook/troggle/trogmanual.html
@ -28,6 +28,18 @@
 <p>Troggle is completely unlike any other django installation: it has a database, but the database is rebuilt from files every time it starts. 
 <p>Most of the data entry into troggle happens during or just after the expedition.

+<h4>Advanced Users' Manual</h4>
+<p>We don't have one of these. You may find what you are looking for in <a href="scriptsother.html">Other scripts</a> above. But there are a few things which are not really 'maintenance' and are not really cave data management either, e.g.
+<ul>
+<li><a href="exportjson.html">JSON export</a> - how to extract cave and expo data in JSON format.
+<li><a href="exporttgz.html">Compressed data export</a> - how to extract cave and expo data in gzip/tar format.
+</ul>
+but do scan 
+<ul>
+<li><a href="trogindex.html">Index of all troggle documents</a> - list of everything you can do with troggle.
+</ul>
+
+
 <p>* "Troggle eats just one very big meal a year."

 <hr />