Update docs to bullseye

This commit is contained in:
Philip Sargent 2022-03-16 12:46:33 +00:00
parent cc6605d06b
commit 8b0056ccd5
3 changed files with 113 additions and 57 deletions

View File

@ -15,58 +15,87 @@ The python stand-alone script <var>databaseRest.py</var> imports data from files
<p>In the :troggle: directory:
<code><pre>$ python databaseReset.py
Usage is 'python databaseReset.py &lt;command&gt; [runlabel]'
Usage is 'python databaseReset.py <command> [runlabel]'
where command is:
test - testing... imports people and prints profile. Deletes nothing.
profile - print the profile from previous runs. Import nothing.
- del - deletes last entry
- del - deletes last entry
- delfirst - deletes first entry
reset - normal usage: clear database and reread everything from files - time-consuming
init - clear database (delete everything) and make empty tables. Import nothing.
caves - read in the caves
people - read in the people from folk.csv
init - initialisation. Automatic if you run reset.
caves - read in the caves (must run first after initialisation)
people - read in the people from folk.csv (must run after 'caves')
logbooks - read in the logbooks
QMs - read in the QM csv files (older caves only)
scans - the survey scans in all the wallets
scans - the survey scans in all the wallets (must run before survex)
drawings - read in the Tunnel & Therion files - which scans the survey scans too
survex - read in the survex files - all the survex blocks and entrances x/y/z
dumplogbooks - Not used. write out autologbooks (not working?)
and [runlabel] is an optional string identifying this run of the script
in the stored profiling data 'import-profile.json'
caves and logbooks must be run on an empty db before the others as they
set up db tables used by the others.
</pre></code>
<p>On a clean computer with 16GB of memory and using sqlite a complete import takes less than 2 minutes now if nothing else is running.
On the shared expo server it takes longer if the server was in use
(we have only a share of it).
<p>On a clean computer using sqlite a complete import takes less than 100 seconds now if nothing else is running.
On the shared expo server it takes 1,000s as it is a shared machine. More than half of the time on the server is reinitialising the MariaDB database.
<p>Here is an example of the output after it runs, showing which options were used recently and how long
each option took (in seconds). <code><pre>
-- troggle.sqlite django.db.backends.sqlite3
** Running job Profile
** Ended job Profile - 0.0 seconds total.
days ago -4.28 -4.13 -4.10 -3.03 -3.00
runlabel (s) svx NULL RESET svx2 RESET2
reinit (s) - 1.9 1.9 - 1.8
caves (s) - - 39.1 - 32.2
people (s) - - 35.0 - 24.4
logbooks (s) - - 86.5 - 67.3
QMs (s) - - 19.3 - 0.0
survexblks (s) 1153.1 - 3917.0 1464.1 1252.9
survexpos (s) 397.3 - 491.9 453.6 455.0
tunnel (s) - - 25.5 - 23.1
scans (s) - - 52.5 - 45.9
* importing troggle/settings.py
* importing troggle/localsettings.py
- settings on loading databaseReset.py
- settings on loading databaseReset.py
- Memory footprint before loading Django: 8.746 MB
* importing troggle/settings.py
- Memory footprint after loading Django: 31.863 MB
-- start django.db.backends.sqlite3 troggle.sqlite
** Running job 3_16_2022 to troggle.sqlite
-- Initial memory in use 31.906 MB
Reinitialising db troggle.sqlite
- deleting troggle.sqlite
- Migrating: troggle.sqlite
No changes detected in app 'core'
Operations to perform:
Apply all migrations: admin, auth, contenttypes, core, sessions
....much more output from all the import steps....
** Ended job 3_16_2022 - 89.7 seconds total.
days ago -312.64 -95.05 -95.04 -10.65 this
runlabel (s) 7th 000 wsl2_2 3_05 3_16_2022
reinit (s) 1.6 3.1 3.1 2.5 1.5 -40.2%
caves (s) 7.2 12.5 12.5 7.3 6.8 -7.2%
people (s) 9.8 11.6 11.7 9.5 9.1 -3.8%
logbooks (s) 21.2 41.0 40.9 20.2 19.9 -1.5%
QMs (s) 7.7 87.5 87.0 7.8 7.1 -8.6%
scans (s) 1.7 19.9 20.2 2.0 1.7 -14.6%
survex (s) 80.2 143.6 52.1 31.0 36.5 17.8%
drawings (s) 6.0 13.5 8.9 5.2 6.9 33.8%
</pre></code>
[This data is from May 2020 immediately after troggle had been ported from python2 to python3 but before the survex import was re-engineered. It now takes only ~5 minutes for a full reset.]
<p>The 'survexblks' option loaded all the survex files recursively following the <var>*include</var>
statements. It took a long time when memory was low and the operating system had to page a lot. This has now been rewritten and the all batched within a single database transaction.
<p>(That value of 0 seconds for QMs looks suspicious..)
[This data is from March 2022 on an 11-year old PC: Win10, WSL1+Ubuntu20.04, Intel Core i7+2600K, solid-state hard drive.]
<p>The last column shows the precentage chnage in the import runtime for each class of data. This varies quite a bit depending on
what else is running on the computer and how much has been put in virtual memory and file caches by the operating ststem.
<p>The file <var>import_profile.json</var> holds these historic times. Delete it to get
a clean slate.
<h3>Logging Import Errors</h3>
<p>Import glitches are documented on the <a href="http://expo.survex.com/dataissues">Data Issues</a> page. You should always check
this after any import. (Don't worry about the xTherion is"Un-parsed image" messages, this is work in progress.)
<p>There are detailed logs created in the <var>troggle</var> folder where you ran the import from:
<code><pre>
svxblks.log
_1623.svx
svxlinear.log
loadlogbk.log</pre></code>
<p>Severe errors are also printed to the terminal where you are running the import, so watch this. It also prints the terminal the duration of each step and the memory in use while importing the survex files.
<hr />
Return to: <a href="trogintro.html">Troggle intro</a><br />
Troggle index:

View File

@ -43,21 +43,28 @@ http://expo.survex.com/repositories/troggle/.git/tree/README.txt
<h3>Why no Docker container?</h3>
<p>Yes, it is true that this would greatly speed up on-boarding new programmers.
<p>But there is the significant danger that containers would get copied around and deployed without being properly cleaned up: resulting in configuration drift and a <a href="https://martinfowler.com/bliki/SnowflakeServer.html">snowflake server situation</a>. File permissions are a big issue.
<p>We should do both: create a Docker system for getting started, then transition programmers to script-based or recipe-based provisioning so that systems are rebuilt cleanly. CUYC (who also use Django) have a bash script which sets up a new django development system. We should copy that in the first instance. Alas, we haven't got around to doing any of this yet.
<p>But there is the significant danger that containers would get copied around and deployed without being properly cleaned up:
resulting in configuration drift and a <a href="https://martinfowler.com/bliki/SnowflakeServer.html">snowflake server situation</a>.
File permissions are a big issue.
<p>We should do both: create a Docker system for getting started, then transition programmers to script-based or recipe-based
provisioning so that systems are rebuilt cleanly. <a href="http://www.cuyc.org.uk">CUYC</a> (who also use Django) have a bash script which sets up a new django
development system. We should copy that in the first instance. Alas, we haven't got around to doing any of this yet.
<h2 id="python">Installing python</h2>
<a href="https://xkcd.com/1987/"><img src="https://imgs.xkcd.com/comics/python_environment.png" align="right" hspace="20" width="230" alt='XKCD python install'></a>
<p>Python is not installed by default usually, and in any case we need specific versions to be installed. For Ubuntu 20.04 the default is python3.9 but this is incompatible with standard debian Buster which is what is live on our server, so we also need python3.7 .
<p>If you are planning on helping the migration of troggle from debian Buster (v10) to Bullseye (v11) then you will also want python3.8 .
<p>Python is not installed by default usually, and in any case we need specific versions to be installed. For Ubuntu 20.04 the
default is python3.9
<p>If you are planning on eventually helping the migration of troggle from debian Bullseye (v11) to
<a href="https://www.debian.org/releases/">Bookworm (v12)</a> to then you will also want the appropriate later versions of python e.g. 3.10 below:
<pre><code>sudo apt install python3 python3-pip
sudo apt install sqlite3 sqlite3-doc
sudo apt install survex
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.7 python3.7-venv python3.7-doc binutils binfmt-support
sudo apt install python3.10 python3.10-venv python3.10-doc binutils binfmt-support
cd /usr/bin
sudo ln -s python3 python
sudo ln -s pip3 pip </code></pre>
@ -87,7 +94,16 @@ sudo ln -s /mnt/c/EXPO/expowebcache expowebcache
sudo mkdir expowebcache/3d
cd ..
ls -tlA expo</pre></code>
<p>If you do not have a local copy of the 40GB /expofiles/, don't worry. Later on we can set 'EXPOFILESREMOTE = True' and your test system will use the live expofiles on expo.survex.com (read only).
<h4 id="EXPOFILESREMOTE">Remote EXPOFILES</h4>
<p>If you do not have a local copy of the 40GB /expofiles/, don't worry. Later on we can set <var>'EXPOFILESREMOTE = True'</var> in the
<var>localsettings.py</var> file and your test system will use the live expofiles on <var>expo.survex.com</var> (read only).
<p>If you do have <var>'EXPOFILESREMOTE = True'</var> then the forms which upload scans and photos to the server will not work as you expect.
They will upload to your local machine, but read the status of the folders from <var>expo.survex.com</var>. So you will get
confusing and apparently inconsistent behaviour: e.g. you will upload a file but then be unable to see it.
<p>For development, you mostly only need a local copy of the wallets and scanned survey notes and sketches in
<var>expofiles/surveyscans</var> which is less than 5GB.
<h2>Installing Django and troggle</h2>
<img src="Django_Logo-420x180.png" align="right" hspace="20" width='210' alt='django logo'>
@ -100,13 +116,16 @@ ls -tlA expo</pre></code>
<h3 id="venv">Installing a venv</h3>
<p>We set up
a <var>venv</var> specifically for python 3.7 (which is the standard version on our server which is running unmodified Buster (debian v10) and <a href="https://docs.djangoproject.com/en/3.2/releases/2.2/">Django 2.2.19</a>. See the <a href="https://docs.python.org/3.7/library/venv.html">standard python documentation on venv</a> for python 3.7.12. You can upgrade the version of python installed within pip venv but not downgrade. So get that first venv installed right by explicitly stating the python version to create it <var>python3.7 -m venv py37d22</var>.
Note that we are creating it as a sibling folder to /expoweb/ . Note also that up to now we have been using 'sudo ..' but for installing things inside the venv we do not use 'sudo ..':
<p>We set up a <var>venv</var> specifically for python 3.9 (which is the standard version on our server which is running unmodified
Bullseye (debian v11) and <a href="https://docs.djangoproject.com/en/3.2/releases/3.2/">Django 3.2.12</a>. See the <a
href="https://docs.python.org/3.9/library/venv.html">standard python documentation on venv</a> for python 3.9.10.
Create a new venv for each version of python you are using. Get that venv installed right by explicitly stating the
python version to create it <var>python3.9 -m venv py39d32</var>. Note that we are creating it as a sibling folder to /expoweb/ .
Note also that up to now we have been using 'sudo ..' but for installing things inside the venv we do not use 'sudo ..':
<pre><code>cd ~
cd ../expo
python3.7 -m venv py37d22
cd py37d22
python3.9 -m venv py39d32
cd py39d32
source bin/activate
pip list -o</pre></code>
<p>The last command lists the default packages installed in the venv. This is for comparison later, after we
@ -119,15 +138,17 @@ the venv.
dependencies appropriate for troggle - because you have not yet cloned the troggle repo. So the first time it is easiest to just create requirements.txt yourself with a text editor. Without using git yourself, you can get the file from the website at <a href="http://expo.survex.com/repositories/troggle/.git/tree/requirements.txt">requirements.txt</a>. If you have already cloned all the repos, then just copy it.
<pre><code>cp ../troggle/requirements.txt .</pre></code>
where <var>requirements.txt</var> (note the capitalisation of the listed packages) is:
<pre><code>confusable-homoglyphs==3.2.0
Django==2.2
<pre><code>asgiref==3.3.4
confusable-homoglyphs==3.2.0
coverage==5.5
Django==3.2
docutils==0.14
gunicorn==20.1.0
Pillow==5.4.1
pytz==2019.1
sqlparse==0.2.4
typing-extensions==3.7.4.3
Unidecode==1.0.23</pre></code>
This will pick up the latest Django 2.2.x available, but all the other dependencies are pinned. These
This will pick up the latest Django 3.2.x available, but all the other dependencies are pinned. These
dependencies are as-standard on debian Buster (10) and you will want to experiment with upgrading them
for Bullseye (v11).
<p>Once you have the file, install the listed dependencies like this:
@ -136,7 +157,7 @@ pip list -o</pre></code>
<p>Pillow is an image handling package used to make
<p>Pillow (not currently in that package list) is an image handling package used to make
the prospecting map (currently disabled,
see <a href="http://expo.survex.com/prospecting_guide/">/prospecting_guide/</a>).<br>
tinymce is the wysiwyg in-browser
@ -150,13 +171,17 @@ where it describes upgrading and testing with later versions of Django.
<h3 id="troginstall">Installing troggle</h3>
<p>The <var>:troggle:</var> repo is the python source code for troggle. This is what you will be editing. There are over 8,000 lines of python code (excluding comments and blank lines) and over 2,000 lines of HTML/Django template code. This is over 600 files in over 400 folders, but only 38MB in size.
<p>The <var>:troggle:</var> repo is the python source code for troggle. This is what you will be editing. There are over 9,000 lines
of python code (excluding comments and blank lines) and nearly 3,000 lines of HTML/Django template code. This is over 600 files in
over 400 folders, but only 49MB in size (plus a .git repo of 32MB).
<h4>key exchange</h4>
<p>Follow this link to <a href="../computing/keyexchange.html">register a key with the expo server</a> to get git access if you have not already cloned the <var>:troggle:</var> repo.
<p>Follow this link to <a href="../computing/keyexchange.html">register a key with the expo server</a> to get git access if you have
not already cloned the <var>:troggle:</var> repo.
<h4>git clone troggle</h4>
<p>You will do a git clone to create a folder /troggle/ containing all the troggle code. You need to clone the 'python-3' branch, if you an see multiple branches.</p>
<p>You will do a git clone to create a folder /troggle/ containing all the troggle code. You need to clone the 'python-3' branch, if
you an see multiple branches.</p>
<pre><code>cd ~
cd ../expo
@ -164,7 +189,7 @@ git clone ssh://expo@expo.survex.com/home/expo/troggle</code></pre>
<p>Now we create soft links within the venv to all the repo folders and wherever you have a local copy of /expofiles/ and /expowebcache/:
<pre><code>cd py37d22
<pre><code>cd py39d32
sudo ln -s /mnt/c/EXPO/expoweb expoweb
sudo ln -s /mnt/c/EXPO/troggle troggle
sudo ln -s /mnt/c/EXPO/loser loser
@ -223,7 +248,7 @@ localsettings.py
<p>Now edit localsettings.py and insert useful values for EXPOUSERPASS [e.g. cavey:beery], EXPOADMINUSERPASS [e.g. beery:cavey], SECRET_KEY. SECRET_KEY can be anything, it just has to be unique to each installation and invisible to anyone not a developer.
<p>Set <a href="https://docs.djangoproject.com/en/3.2/topics/email/#s-configuring-email-for-development">EMAIL_HOST and EMAIL_HOST_PASSWORD</a> to an email account you control that can send email. Then troggle can email you when some things go wrong. This may mean having to set EMAIL_PORT and MAIL_USE_TLS too (this is not used in troggle currently). Set EXPOUSER_EMAIL and EXPOADMINUSER_EMAIL to your own email address while you are doing software development. All these will be different when troggle is deployed on the public server.
<p>Set <a href="https://docs.djangoproject.com/en/4.0/topics/email/#s-configuring-email-for-development">EMAIL_HOST and EMAIL_HOST_PASSWORD</a> to an email account you control that can send email. Then troggle can email you when some things go wrong. This may mean having to set EMAIL_PORT and MAIL_USE_TLS too (this is not used in troggle currently). Set EXPOUSER_EMAIL and EXPOADMINUSER_EMAIL to your own email address while you are doing software development. All these will be different when troggle is deployed on the public server.
<p>
Now you need to edit the following settings in your localsettings.py file to match your
@ -240,7 +265,8 @@ is living.
If you do not have a local copy of /expofiles/ (40 GB), you can use the expo server copy if
you set:
<pre><code>EXPOFILESREMOTE = TRUE</code></pre>
and then the FILES and EXPOFILES setings will be ignored.
and then the FILES and EXPOFILES setings will be ignored. (Except for the upload forms which will 'upload' files
<a href="#EXPOFILESREMOTE">to your local disc</a>. )
<p>
Now try this again:
@ -317,7 +343,7 @@ explains what this does and gives extra command line options.
<p>Now run the test suite:
<pre><code>python manage.py test -v 3 --traceback</code></pre>
<p>This will run the entire troggle test suite of over 70 tests (it takes only a few seconds).
<p>This will run the entire troggle test suite of ~80 tests (it takes only a few seconds).
<p>
If you get an error, and you probably will, have a look in the source code of the test, e.g. for this error:
@ -356,9 +382,10 @@ cd ../../troggle
</font>
</details>
<p>The test suite now tidies up after itself, so there should not be any temporary files left behind or local git commits that you will need to clean up.
<p>The test suite now tidies up after itself, so there should not be any temporary files left behind or local git commits that you
will need to clean up.
<p>The test suite has over 70 tests but does not cover all of what troggle does and does not use any real data. You need to manually test these too, <em>after</em> you have done a full <a href="trogimport.html">data import</a>:
<p>The test suite has ~80 tests but does not cover all of what troggle does and does not use any real data. You need to manually test these too, <em>after</em> you have done a full <a href="trogimport.html">data import</a>:
<br>- <var><a href="http://localhost:8000/pathsreport">http://localhost:8000/pathsreport</a></var>
<br>- <var><a href="http://localhost:8000/stats">http://localhost:8000/stats</a></var>
<br>- <var><a href="http://localhost:8000/people">http://localhost:8000/people</a></var> (takes a minute or so)

View File

@ -31,9 +31,9 @@ In the course of these migrations several unused or partly-used django plugins w
<h4>March 2022</h4>
<p>The week beginning 7th March we plan to upgrade to the debian release 11 <var>bullseye</var>.
<p>On 15th March Wookey upgraded the server to the debian release 11 <var>bullseye</var>.
At this point 'debian stable' is bullseye and has python 3.9 as
standard. We will also migrate to Django 3.2 LTS which will be a year old by then and which will be <a
standard. We will quickly migrate to Django 3.2 LTS which is now a year old and which will be <a
href="https://www.djangoproject.com/download/#supported-versions">supported until April 2024</a>.
<var>Bullseye</var> will be <a href="https://wiki.debian.org/LTS">in support until June 2026</a>.