expoweb/handbook/troggle/trogimport.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Handbook Troggle Data Import</title>
<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
</head>
<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>Troggle Data Import</h1>

<h3>Troggle - Reset and import data</h3>
<p>
The python stand-alone script <var>databaseRest.py</var> imports data from files into the troggle database (sqlite or MariaDB). It is separate from the process which runs troggle and serves the data as webpages (via apache), but it is plugged in to the same hierarchy of django python files.
<p>In the :troggle: directory:
<code><pre>$ python databaseReset.py

Usage is 'python databaseReset.py &lt;command&gt; [runlabel]'
             where command is:
             test      - testing... imports people and prints profile. Deletes nothing.
             profile   - print the profile from previous runs. Import nothing.
                        -  del       - deletes last entry 
                        -  delfirst  - deletes first entry
                        
             reset     - normal usage: clear database and reread everything from files - time-consuming
             
             init      - clear database (delete everything) and make empty tables. Import nothing.
             caves     - read in the caves
             people    - read in the people from folk.csv
             logbooks  - read in the logbooks
             QMs       - read in the QM csv files (older caves only)
             scans     - the survey scans in all the wallets
             drawings  - read in the Tunnel & Therion files - which scans the survey scans too
             survex    - read in the survex files - all the survex blocks and entrances x/y/z

             dumplogbooks - Not used. write out autologbooks (not working?)
             
             and [runlabel] is an optional string identifying this run of the script
             in the stored profiling data 'import-profile.json'

             caves and logbooks must be run on an empty db before the others as they
             set up db tables used by the others.
</pre></code>
<p>On a clean computer with 16GB of memory and using sqlite a complete import takes less than 2 minutes now if nothing else is running. 
On the shared expo server it takes longer if the server was in use 
(we have only a share of it).
<p>Here is an example of the output after it runs, showing which options were used recently and how long 
each option took (in seconds). <code><pre>
--   troggle.sqlite django.db.backends.sqlite3
** Running job  Profile
** Ended job Profile -  0.0 seconds total.
     days ago     -4.28    -4.13    -4.10   -3.03    -3.00
  runlabel (s)      svx      NULL   RESET    svx2    RESET2
    reinit (s)       -       1.9      1.9      -       1.8
     caves (s)       -        -      39.1      -      32.2
    people (s)       -        -      35.0      -      24.4
  logbooks (s)       -        -      86.5      -      67.3
       QMs (s)       -        -      19.3      -       0.0
survexblks (s)   1153.1       -    3917.0  1464.1   1252.9
 survexpos (s)    397.3       -     491.9   453.6    455.0
    tunnel (s)       -        -      25.5      -      23.1
     scans (s)       -        -      52.5      -      45.9
</pre></code>
[This data is from May 2020 immediately after troggle had been ported from python2 to python3 but before the survex import was re-engineered. It now takes only ~5 minutes for a full reset.]
<p>The 'survexblks' option loaded all the survex files recursively following the <var>*include</var>
statements. It took a long time when memory was low and the operating system had to page a lot. This has now been rewritten and the all batched within a single database transaction.
<p>(That value of 0 seconds for QMs looks suspicious..)
<p>The file <var>import_profile.json</var> holds these historic times. Delete it to get 
a clean slate.
<hr />
Return to: <a href="trogintro.html">Troggle intro</a><br />
Troggle index: 
<a href="trogindex.html">Index of all troggle documents</a><br />
<hr />
</body>
</html>
Documenting data import 2020-05-18 22:25:07 +01:00			`<!DOCTYPE html>`
			`<html>`
			`<head>`
			`<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />`
			`<title>Handbook Troggle Data Import</title>`
			`<link rel="stylesheet" type="text/css" href="../../css/main2.css" />`
			`</head>`
			`<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>`
			`<h2 id="tophead">CUCC Expedition Handbook</h2>`
			`<h1>Troggle Data Import</h1>`

			`<h3>Troggle - Reset and import data</h3>`
			`<p>`
			`The python stand-alone script <var>databaseRest.py</var> imports data from files into the troggle database (sqlite or MariaDB). It is separate from the process which runs troggle and serves the data as webpages (via apache), but it is plugged in to the same hierarchy of django python files.`
			`<p>In the :troggle: directory:`
			`<code><pre>$ python databaseReset.py`

			`Usage is 'python databaseReset.py <command> [runlabel]'`
			`where command is:`
docum updates 2021-10-31 20:48:05 +00:00			`test - testing... imports people and prints profile. Deletes nothing.`
			`profile - print the profile from previous runs. Import nothing.`
			`- del - deletes last entry`
			`- delfirst - deletes first entry`

Documenting data import 2020-05-18 22:25:07 +01:00			`reset - normal usage: clear database and reread everything from files - time-consuming`
docum updates 2021-10-31 20:48:05 +00:00
			`init - clear database (delete everything) and make empty tables. Import nothing.`
Documenting data import 2020-05-18 22:25:07 +01:00			`caves - read in the caves`
			`people - read in the people from folk.csv`
docum updates 2021-10-31 20:48:05 +00:00			`logbooks - read in the logbooks`
Documenting data import 2020-05-18 22:25:07 +01:00			`QMs - read in the QM csv files (older caves only)`
			`scans - the survey scans in all the wallets`
docum updates 2021-10-31 20:48:05 +00:00			`drawings - read in the Tunnel & Therion files - which scans the survey scans too`
			`survex - read in the survex files - all the survex blocks and entrances x/y/z`
Documenting data import 2020-05-18 22:25:07 +01:00
docum updates 2021-10-31 20:48:05 +00:00			`dumplogbooks - Not used. write out autologbooks (not working?)`

Documenting data import 2020-05-18 22:25:07 +01:00			`and [runlabel] is an optional string identifying this run of the script`
			`in the stored profiling data 'import-profile.json'`

			`caves and logbooks must be run on an empty db before the others as they`
			`set up db tables used by the others.`
			`</pre></code>`
Update import times documentaiotn 2021-02-05 11:54:06 +00:00			`<p>On a clean computer with 16GB of memory and using sqlite a complete import takes less than 2 minutes now if nothing else is running.`
			`On the shared expo server it takes longer if the server was in use`
Troggle UML class diagram and text 2020-06-29 16:33:29 +01:00			`(we have only a share of it).`
Documenting data import 2020-05-18 22:25:07 +01:00			`<p>Here is an example of the output after it runs, showing which options were used recently and how long`
Troggle UML class diagram and text 2020-06-29 16:33:29 +01:00			`each option took (in seconds). <code><pre>`
Documenting data import 2020-05-18 22:25:07 +01:00			`-- troggle.sqlite django.db.backends.sqlite3`
			`** Running job Profile`
			`** Ended job Profile - 0.0 seconds total.`
			`days ago -4.28 -4.13 -4.10 -3.03 -3.00`
			`runlabel (s) svx NULL RESET svx2 RESET2`
			`reinit (s) - 1.9 1.9 - 1.8`
			`caves (s) - - 39.1 - 32.2`
			`people (s) - - 35.0 - 24.4`
			`logbooks (s) - - 86.5 - 67.3`
			`QMs (s) - - 19.3 - 0.0`
			`survexblks (s) 1153.1 - 3917.0 1464.1 1252.9`
			`survexpos (s) 397.3 - 491.9 453.6 455.0`
			`tunnel (s) - - 25.5 - 23.1`
			`scans (s) - - 52.5 - 45.9`
			`</pre></code>`
Django migrations and troggle laptops 2021-10-24 14:31:22 +01:00			`[This data is from May 2020 immediately after troggle had been ported from python2 to python3 but before the survex import was re-engineered. It now takes only ~5 minutes for a full reset.]`
Troggle UML class diagram and text 2020-06-29 16:33:29 +01:00			`<p>The 'survexblks' option loaded all the survex files recursively following the <var>*include</var>`
Update import times documentaiotn 2021-02-05 11:54:06 +00:00			`statements. It took a long time when memory was low and the operating system had to page a lot. This has now been rewritten and the all batched within a single database transaction.`
Documenting data import 2020-05-18 22:25:07 +01:00			`<p>(That value of 0 seconds for QMs looks suspicious..)`
			`<p>The file <var>import_profile.json</var> holds these historic times. Delete it to get`
			`a clean slate.`
			`<hr />`
Why reverse-URL mapping is important 2020-07-30 01:21:03 +01:00			`Return to: <a href="trogintro.html">Troggle intro</a><br />`
			`Troggle index:`
			`<a href="trogindex.html">Index of all troggle documents</a><br />`
Documenting data import 2020-05-18 22:25:07 +01:00			`<hr />`
			`</body>`
			`</html>`