expoweb/handbook/troggle/trogimport.html

76 lines
3.8 KiB
HTML
Raw Normal View History

2020-05-18 22:25:07 +01:00
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Handbook Troggle Data Import</title>
<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
</head>
<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
<h2 id="tophead">CUCC Expedition Handbook</h2>
<h1>Troggle Data Import</h1>
<h3>Troggle - Reset and import data</h3>
<p>
The python stand-alone script <var>databaseRest.py</var> imports data from files into the troggle database (sqlite or MariaDB). It is separate from the process which runs troggle and serves the data as webpages (via apache), but it is plugged in to the same hierarchy of django python files.
<p>In the :troggle: directory:
<code><pre>$ python databaseReset.py
Usage is 'python databaseReset.py &lt;command&gt; [runlabel]'
where command is:
reset - normal usage: clear database and reread everything from files - time-consuming
caves - read in the caves
logbooks - read in the logbooks
people - read in the people from folk.csv
QMs - read in the QM csv files (older caves only)
reinit - clear database (delete everything) and make empty tables. Import nothing.
scans - the survey scans in all the wallets
survex - read in the survex files - all the survex blocks but not the x/y/z positions
survexpos - just the x/y/z Pos out of the survex files
tunnel - read in the Tunnel files - which scans the survey scans too
profile - print the profile from previous runs. Import nothing.
test - testing...
and [runlabel] is an optional string identifying this run of the script
in the stored profiling data 'import-profile.json'
if [runlabel] is absent or begins with "F-" then it will skip the :memory: pass
caves and logbooks must be run on an empty db before the others as they
set up db tables used by the others.
</pre></code>
<p>On a clean computer with 16GB of memory and using sqlite a complete import takes about 20 minutes if nothing else is running.
On the shared expo server it can take a couple of hours if the server is in use
(we have only a share of it). On your
own computer, the first in-memory sqlite pass takes only about 6 minutes.
We do this so that typos and data-entry errors
are found quickly.
<p>Here is an example of the output after it runs, showing which options were used recently and how long
each option took (in seconds)
<code><pre>
-- troggle.sqlite django.db.backends.sqlite3
** Running job Profile
** Ended job Profile - 0.0 seconds total.
days ago -4.28 -4.13 -4.10 -3.03 -3.00
runlabel (s) svx NULL RESET svx2 RESET2
reinit (s) - 1.9 1.9 - 1.8
caves (s) - - 39.1 - 32.2
people (s) - - 35.0 - 24.4
logbooks (s) - - 86.5 - 67.3
QMs (s) - - 19.3 - 0.0
survexblks (s) 1153.1 - 3917.0 1464.1 1252.9
survexpos (s) 397.3 - 491.9 453.6 455.0
tunnel (s) - - 25.5 - 23.1
scans (s) - - 52.5 - 45.9
</pre></code>
<p>The 'survexblks' option loads all the survex files recursively following the <var>*include</var>
statements. It can take a long time if memory is low and the operating system has to page a lot.
<p>(That value of 0 seconds for QMs looks suspicious..)
<p>The file <var>import_profile.json</var> holds these historic times. Delete it to get
a clean slate.
<hr />
Return to: <a href="datamodel.html">Troggle data model</a> in python code <br />
<hr />
</body>
</html>