expoweb/handbook/troggle/trogimport.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Handbook Troggle Data Import</title>
<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
</head>
<body>
<style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>Troggle Data Import</h1>

<h3>Troggle - Reset and import data</h3>
<p>
The python stand-alone script <var>databaseRest.py</var> imports data from files into the troggle database (sqlite or MariaDB). It is separate from the process which runs troggle and serves the data as webpages (via apache), but it is plugged in to the same hierarchy of django python files.
<p>In the :troggle: directory:
<code><pre>$ python databaseReset.py

Usage is 'python databaseReset.py <command> [runlabel]'
             where command is:
             test      - testing... imports people and prints profile. Deletes nothing.
             profile   - print the profile from previous runs. Import nothing.
                        -  del       - deletes last entry
                        -  delfirst  - deletes first entry

             reset     - normal usage: clear database and reread everything from files - time-consuming

             init      - initialisation. Automatic if you run reset.
             caves     - read in the caves (must run first after initialisation)
             people    - read in the people from folk.csv (must run after 'caves')
             logbooks  - read in the logbooks
             QMs       - read in the QM csv files (older caves only)
             scans     - the survey scans in all the wallets (must run before survex)
             drawings  - read in the Tunnel & Therion files - which scans the survey scans too
             survex    - read in the survex files - all the survex blocks and entrances x/y/z

             dumplogbooks - Not used. write out autologbooks (not working?)

             and [runlabel] is an optional string identifying this run of the script
             in the stored profiling data 'import-profile.json'

             caves and logbooks must be run on an empty db before the others as they
             set up db tables used by the others.
</pre></code>
<p>On a clean computer using sqlite a complete import takes 100 seconds now if nothing else is running (200s if running on an SD card not a SSD).
On the shared expo server it takes 600s as it is a shared machine. More than half of the time on the server is reinitialising the MariaDB database. 

<p>Here is an example of the output after it runs, showing which options were used recently and how long 
each option took (in seconds). <code><pre>
*  importing troggle/settings.py
 * importing troggle/localsettings.py
 - settings on loading databaseReset.py
 - settings on loading databaseReset.py
 - Memory footprint before loading Django: 8.746 MB
*  importing troggle/settings.py
 - Memory footprint after loading Django: 31.863 MB
-- start    django.db.backends.sqlite3 troggle.sqlite
** Running job  3_16_2022 to troggle.sqlite
-- Initial memory in use 31.906 MB
Reinitialising db troggle.sqlite
 - deleting troggle.sqlite
 - Migrating: troggle.sqlite
No changes detected in app 'core'
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, core, sessions
  
....much more output from all the import steps....

** Ended job   3_16_2022  -  89.7 seconds total.
     days ago   -312.64   -95.05   -95.04   -10.65     this
  runlabel (s)      7th      000   wsl2_2     3_05 3_16_2022
    reinit (s)      1.6      3.1      3.1      2.5      1.5    -40.2%
     caves (s)      7.2     12.5     12.5      7.3      6.8     -7.2%
    people (s)      9.8     11.6     11.7      9.5      9.1     -3.8%
  logbooks (s)     21.2     41.0     40.9     20.2     19.9     -1.5%
       QMs (s)      7.7     87.5     87.0      7.8      7.1     -8.6%
     scans (s)      1.7     19.9     20.2      2.0      1.7    -14.6%
    survex (s)     80.2    143.6     52.1     31.0     36.5     17.8%
  drawings (s)      6.0     13.5      8.9      5.2      6.9     33.8%
</pre></code>
[This data is from March 2022 on an 11-year old PC: Win10, WSL1+Ubuntu20.04, Intel Core i7+2600K, solid-state hard drive.]
<p>The last column shows the precentage chnage in the import runtime for each class of data. This varies quite a bit depending on 
what else is running on the computer and how much has been put in virtual memory and file caches by the operating ststem. 
<p>The file <var>import_profile.json</var> holds these historic times. Delete it to get 
a clean slate.

<h3>Logging Import Errors</h3>
<p>Import glitches are documented on the <a href="http://expo.survex.com/dataissues">Data Issues</a> page. You should always check 
this after any import. (Don't worry about the xTherion is"Un-parsed image" messages, this is work in progress.)
<p>There are detailed logs created in the <var>troggle</var> folder where you ran the import from:
<code><pre>
svxblks.log
_1623.svx
svxlinear.log
loadlogbk.log</pre></code>

<p>Severe errors are also printed to the terminal where you are running the import, so watch this. It also prints the terminal the duration of each step and the memory in use while importing the survex files.


<hr />
Return to: <a href="trogintro.html">Troggle intro</a><br />
Troggle index: 
<a href="trogindex.html">Index of all troggle documents</a><br />
<hr /></body>
</html>
Documenting data import 2020-05-18 22:25:07 +01:00			`<!DOCTYPE html>`
			`<html>`
			`<head>`
			`<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />`
			`<title>Handbook Troggle Data Import</title>`
			`<link rel="stylesheet" type="text/css" href="../../css/main2.css" />`
			`</head>`
Update duration of import - online edit of handbook/troggle/trogimport.html 2022-06-24 22:23:59 +01:00			`<body>`
			`<style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>`
Documenting data import 2020-05-18 22:25:07 +01:00			`<h2 id="tophead">CUCC Expedition Handbook</h2>`
			`<h1>Troggle Data Import</h1>`

			`<h3>Troggle - Reset and import data</h3>`
			`<p>`
			`The python stand-alone script <var>databaseRest.py</var> imports data from files into the troggle database (sqlite or MariaDB). It is separate from the process which runs troggle and serves the data as webpages (via apache), but it is plugged in to the same hierarchy of django python files.`
			`<p>In the :troggle: directory:`
			`<code><pre>$ python databaseReset.py`

Update docs to bullseye 2022-03-16 12:46:33 +00:00			`Usage is 'python databaseReset.py <command> [runlabel]'`
Documenting data import 2020-05-18 22:25:07 +01:00			`where command is:`
docum updates 2021-10-31 20:48:05 +00:00			`test - testing... imports people and prints profile. Deletes nothing.`
			`profile - print the profile from previous runs. Import nothing.`
Update docs to bullseye 2022-03-16 12:46:33 +00:00			`- del - deletes last entry`
docum updates 2021-10-31 20:48:05 +00:00			`- delfirst - deletes first entry`
Update docs to bullseye 2022-03-16 12:46:33 +00:00
Documenting data import 2020-05-18 22:25:07 +01:00			`reset - normal usage: clear database and reread everything from files - time-consuming`
Update docs to bullseye 2022-03-16 12:46:33 +00:00
			`init - initialisation. Automatic if you run reset.`
			`caves - read in the caves (must run first after initialisation)`
			`people - read in the people from folk.csv (must run after 'caves')`
docum updates 2021-10-31 20:48:05 +00:00			`logbooks - read in the logbooks`
Documenting data import 2020-05-18 22:25:07 +01:00			`QMs - read in the QM csv files (older caves only)`
Update docs to bullseye 2022-03-16 12:46:33 +00:00			`scans - the survey scans in all the wallets (must run before survex)`
docum updates 2021-10-31 20:48:05 +00:00			`drawings - read in the Tunnel & Therion files - which scans the survey scans too`
			`survex - read in the survex files - all the survex blocks and entrances x/y/z`
Documenting data import 2020-05-18 22:25:07 +01:00
docum updates 2021-10-31 20:48:05 +00:00			`dumplogbooks - Not used. write out autologbooks (not working?)`
Update docs to bullseye 2022-03-16 12:46:33 +00:00
Documenting data import 2020-05-18 22:25:07 +01:00			`and [runlabel] is an optional string identifying this run of the script`
			`in the stored profiling data 'import-profile.json'`

			`caves and logbooks must be run on an empty db before the others as they`
			`set up db tables used by the others.`
			`</pre></code>`
Update duration of import - online edit of handbook/troggle/trogimport.html 2022-06-24 22:23:59 +01:00			`<p>On a clean computer using sqlite a complete import takes 100 seconds now if nothing else is running (200s if running on an SD card not a SSD).`
			`On the shared expo server it takes 600s as it is a shared machine. More than half of the time on the server is reinitialising the MariaDB database.`
Update docs to bullseye 2022-03-16 12:46:33 +00:00
Documenting data import 2020-05-18 22:25:07 +01:00			`<p>Here is an example of the output after it runs, showing which options were used recently and how long`
Troggle UML class diagram and text 2020-06-29 16:33:29 +01:00			`each option took (in seconds). <code><pre>`
Update docs to bullseye 2022-03-16 12:46:33 +00:00			`* importing troggle/settings.py`
			`* importing troggle/localsettings.py`
			`- settings on loading databaseReset.py`
			`- settings on loading databaseReset.py`
			`- Memory footprint before loading Django: 8.746 MB`
			`* importing troggle/settings.py`
			`- Memory footprint after loading Django: 31.863 MB`
			`-- start django.db.backends.sqlite3 troggle.sqlite`
			`** Running job 3_16_2022 to troggle.sqlite`
			`-- Initial memory in use 31.906 MB`
			`Reinitialising db troggle.sqlite`
			`- deleting troggle.sqlite`
			`- Migrating: troggle.sqlite`
			`No changes detected in app 'core'`
			`Operations to perform:`
			`Apply all migrations: admin, auth, contenttypes, core, sessions`

			`....much more output from all the import steps....`

			`** Ended job 3_16_2022 - 89.7 seconds total.`
			`days ago -312.64 -95.05 -95.04 -10.65 this`
			`runlabel (s) 7th 000 wsl2_2 3_05 3_16_2022`
			`reinit (s) 1.6 3.1 3.1 2.5 1.5 -40.2%`
			`caves (s) 7.2 12.5 12.5 7.3 6.8 -7.2%`
			`people (s) 9.8 11.6 11.7 9.5 9.1 -3.8%`
			`logbooks (s) 21.2 41.0 40.9 20.2 19.9 -1.5%`
			`QMs (s) 7.7 87.5 87.0 7.8 7.1 -8.6%`
			`scans (s) 1.7 19.9 20.2 2.0 1.7 -14.6%`
			`survex (s) 80.2 143.6 52.1 31.0 36.5 17.8%`
			`drawings (s) 6.0 13.5 8.9 5.2 6.9 33.8%`
Documenting data import 2020-05-18 22:25:07 +01:00			`</pre></code>`
Update docs to bullseye 2022-03-16 12:46:33 +00:00			`[This data is from March 2022 on an 11-year old PC: Win10, WSL1+Ubuntu20.04, Intel Core i7+2600K, solid-state hard drive.]`
			`<p>The last column shows the precentage chnage in the import runtime for each class of data. This varies quite a bit depending on`
			`what else is running on the computer and how much has been put in virtual memory and file caches by the operating ststem.`
Documenting data import 2020-05-18 22:25:07 +01:00			`<p>The file <var>import_profile.json</var> holds these historic times. Delete it to get`
			`a clean slate.`
Update docs to bullseye 2022-03-16 12:46:33 +00:00
			`<h3>Logging Import Errors</h3>`
			`<p>Import glitches are documented on the <a href="http://expo.survex.com/dataissues">Data Issues</a> page. You should always check`
			`this after any import. (Don't worry about the xTherion is"Un-parsed image" messages, this is work in progress.)`
			`<p>There are detailed logs created in the <var>troggle</var> folder where you ran the import from:`
			`<code><pre>`
			`svxblks.log`
			`_1623.svx`
			`svxlinear.log`
			`loadlogbk.log</pre></code>`

			`<p>Severe errors are also printed to the terminal where you are running the import, so watch this. It also prints the terminal the duration of each step and the memory in use while importing the survex files.`


Documenting data import 2020-05-18 22:25:07 +01:00			`<hr />`
Why reverse-URL mapping is important 2020-07-30 01:21:03 +01:00			`Return to: <a href="trogintro.html">Troggle intro</a><br />`
			`Troggle index:`
			`<a href="trogindex.html">Index of all troggle documents</a><br />`
Update duration of import - online edit of handbook/troggle/trogimport.html 2022-06-24 22:23:59 +01:00			`<hr /></body>`
Documenting data import 2020-05-18 22:25:07 +01:00			`</html>`