expoweb/handbook/troggle/trogimport.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Handbook Troggle Data Import</title>
<link rel="stylesheet" type="text/css" href="/css/main2.css" />
</head>
<body>
<style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>Troggle Data Import</h1>

<h3>Troggle - Reset and import data</h3>
<p>
The python stand-alone script <var>databaseReset.py</var> imports data from files into the troggle database (sqlite or MariaDB). It is separate from the process which runs troggle and serves the data as webpages (via apache), but it is plugged in to the same hierarchy of django python files.
<p>In the :troggle: directory:
<code><pre>$ python databaseReset.py

Usage is 'python databaseReset.py <command> [runlabel]'
             where command is:
             test      - testing... imports people and prints profile. Deletes nothing.
             profile   - print the profile from previous runs. Import nothing.
                        -  del       - deletes last entry
                        -  delfirst  - deletes first entry

             reset     - normal usage: clear database and reread everything from files.

             init      - initialisation. Automatic if you run reset.
             caves     - read in the caves (must run first after initialisation)
             people    - read in the people from folk.csv (must run after 'caves')
             logbooks  - read in the logbooks
             QMs       - read in the QM csv files (older caves only)
             scans     - the survey scans in all the wallets (must run before survex)
             drawings  - read in the Tunnel & Therion files - which scans the survey scans too
             survex    - read in the survex files - all the survex blocks and entrances x/y/z

             dumplogbooks - Not used. write out autologbooks (not working?)

             and [runlabel] is an optional string identifying this run of the script
             in the stored profiling data 'import-profile.json'

             caves and logbooks must be run on an empty db before the others as they
             set up db tables used by the others.
</pre></code>
<p>On the main server you should <b>only ever run the full <em>reset</em></b> option. The different stages are not cleanly separated now and apparently never have been.
Later quick fixes to accomodate last-minute requests for wallets and survex files have further intertwined the precedence relationships.

<p>On a clean computer using sqlite a complete import takes 100 seconds now if nothing else is running (200s if running on an SD card not a SSD).
On the shared expo server it takes 600s. More than half of the time on the server is reinitialising the MariaDB database which is much, much slower than
sqlite on a development machine.

<p>These options exist so that when doing software development on your own <a href="troglaptop.html">troggle laptop</a>,
you can speed up debugging (e.g. survex stuff) by skipping the logbooks data entry.

<p>Here is an example of the output after it runs, showing which options were used recently and how long
each option took (in seconds). <code><pre>
*  importing troggle/settings.py
 * importing troggle/localsettings.py
 - settings on loading databaseReset.py
 - settings on loading databaseReset.py
 - Memory footprint before loading Django: 8.746 MB
*  importing troggle/settings.py
 - Memory footprint after loading Django: 31.863 MB
-- start    django.db.backends.sqlite3 troggle.sqlite
** Running job  3_16_2022 to troggle.sqlite
-- Initial memory in use 31.906 MB
Reinitialising db troggle.sqlite
 - deleting troggle.sqlite
 - Migrating: troggle.sqlite
No changes detected in app 'core'
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, core, sessions

....much more output from all the import steps....

** Ended job   3_16_2022  -  89.7 seconds total.
     days ago   -312.64   -95.05   -95.04   -10.65     this
  runlabel (s)      7th      000   wsl2_2     3_05 3_16_2022
    reinit (s)      1.6      3.1      3.1      2.5      1.5    -40.2%
     caves (s)      7.2     12.5     12.5      7.3      6.8     -7.2%
    people (s)      9.8     11.6     11.7      9.5      9.1     -3.8%
  logbooks (s)     21.2     41.0     40.9     20.2     19.9     -1.5%
       QMs (s)      7.7     87.5     87.0      7.8      7.1     -8.6%
     scans (s)      1.7     19.9     20.2      2.0      1.7    -14.6%
    survex (s)     80.2    143.6     52.1     31.0     36.5     17.8%
  drawings (s)      6.0     13.5      8.9      5.2      6.9     33.8%
</pre></code>
[This data is from March 2022 on an 11-year old PC: Win10, WSL1+Ubuntu20.04, Intel Core i7+2600K, solid-state hard drive.]
<p>The last column shows the precentage chnage in the import runtime for each class of data. This varies quite a bit depending on
what else is running on the computer and how much has been put in virtual memory and file caches by the operating ststem.
<p>The file <var>import_profile.json</var> holds these historic times. Delete it to get
a clean slate.

<h3>Logging Import Errors</h3>
<p>Import glitches are documented on the <a href="http://expo.survex.com/dataissues">Data Issues</a> page. You should always check
this after any import. (Don't worry about the xTherion is"Un-parsed image" messages, this is work in progress.)
<p>There are detailed logs created in the <var>troggle</var> folder where you ran the import from:
<code><pre>
svxblks.log
_1623.svx
svxlinear.log
loadlogbk.log</pre></code>

<p>Severe errors are also printed to the terminal where you are running the import, so watch this. It also prints the terminal the duration of each step and the memory in use while importing the survex files.


<hr />
Return to: <a href="trogintro.html">Troggle intro</a><br />
Troggle index:
<a href="trogindex.html">Index of all troggle documents</a><br />
<hr /></body>
</html>