mirror of
https://expo.survex.com/repositories/expoweb/.git/
synced 2024-11-22 07:11:55 +00:00
Documenting data import
This commit is contained in:
parent
5d9854a1e7
commit
451492112b
BIN
handbook/i/trogpkg.jpg
Normal file
BIN
handbook/i/trogpkg.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 84 KiB |
Before Width: | Height: | Size: 22 KiB After Width: | Height: | Size: 22 KiB |
BIN
handbook/t/trogpkg-small.jpg
Normal file
BIN
handbook/t/trogpkg-small.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 13 KiB |
@ -10,7 +10,8 @@
|
||||
<h1>Troggle Data Model (python)</h1>
|
||||
|
||||
<h3>Troggle data architecture</h3>
|
||||
Auto-generated on 3 April 2020. (Omitting all the 'Meta' sub-classes.)<code><pre><span style="color: green">
|
||||
Auto-generated on 3 April 2020. (Omitting all the 'Meta' sub-classes.)<br>
|
||||
DPhoto object removed 15 May 2020<code><pre><span style="color: green">
|
||||
# This is an auto-generated Django model module.
|
||||
# You'll have to do the following manually to clean this up:
|
||||
# * Rearrange models' order
|
||||
@ -116,26 +117,6 @@ from django.db import models
|
||||
message = models.CharField(max_length=400, blank=True)
|
||||
|
||||
|
||||
<span style="color: lime">class</span> <span style="color:blue"><b>CoreDphoto</b></span>(models.Model):
|
||||
id = models.IntegerField(<span style="color: magenta">primary_key</span> =True) <span style="color: green"># AutoField?</span>
|
||||
new_since_parsing = models.BooleanField()
|
||||
caption = models.CharField(max_length=1000, blank=True)
|
||||
<span style="color:blue">contains_logbookentry</span> = models.<span style="color:blue">ForeignKey</span> ('CoreLogbookentry', blank=True, null=True)
|
||||
file = models.CharField(max_length=100)
|
||||
is_mugshot = models.BooleanField()
|
||||
<span style="color:blue">contains_cave</span> = models.<span style="color:blue">ForeignKey</span> (CoreCave, blank=True, null=True)
|
||||
<span style="color:blue">contains_entrance</span> = models.<span style="color:blue">ForeignKey</span> ('CoreEntrance', blank=True, null=True)
|
||||
<span style="color:blue">nearest_qm</span> = models.<span style="color:blue">ForeignKey</span> ('CoreQm', db_column='nearest_QM_id', blank=True, null=True) <span style="color: green"># Field name made lowercase.</span>
|
||||
lon_utm = models.FloatField(blank=True, null=True)
|
||||
lat_utm = models.FloatField(blank=True, null=True)
|
||||
|
||||
|
||||
<span style="color: lime">class</span> <span style="color:blue"><b>CoreDphotoContainsPerson</b></span>(models.Model):
|
||||
id = models.IntegerField(<span style="color: magenta">primary_key</span> =True) <span style="color: green"># AutoField?</span>
|
||||
dphoto_id = models.IntegerField()
|
||||
<span style="color:blue">person</span> = models.<span style="color:blue">ForeignKey</span> ('CorePerson')
|
||||
|
||||
|
||||
<span style="color: lime">class</span> <span style="color:blue"><b>CoreEntrance</b></span>(models.Model):
|
||||
id = models.IntegerField(<span style="color: magenta">primary_key</span> =True) <span style="color: green"># AutoField?</span>
|
||||
new_since_parsing = models.BooleanField()
|
||||
|
@ -12,28 +12,37 @@
|
||||
<h3>Troggle data architecture</h3>
|
||||
<p>
|
||||
The core of troggle is the data architecture: the set of tables into which all the cave survey and expo data is poured and stored. These tables are what enables us to produce a large number of different but consistent reports and views.
|
||||
|
||||
<style>figure {font-weight: bold; font-size: small; font-family: sans-serif;}</style>
|
||||
<div style="display: flex">
|
||||
<div style="flex: 50%">
|
||||
<figure>
|
||||
<a href="../i/troggle-tables.jpg">
|
||||
<img src="../i/troggle-tables-small.jpg" /></a>
|
||||
<figurecaption>
|
||||
<img src="../t/troggle-tables-small.jpg" /></a>
|
||||
<br><figurecaption>Tables (Objects)</figurecaption>
|
||||
</figure>
|
||||
|
||||
</div>
|
||||
<div style="flex: 50%">
|
||||
<figure>
|
||||
<a href="../i/trogpkg.jpg">
|
||||
<img src="../t/trogpkg-small.jpg" /></a>
|
||||
<br><figurecaption>Packages</figurecaption>
|
||||
</figure>
|
||||
</div></div>
|
||||
<h3>Architecture description</h3>
|
||||
<p>Read the proposal: "<a href="/expofiles/documents/troggle/troggle_paper.pdf" download>Troggle: a novel system for cave exploration information management</a>", by Aaron Curtis</em>. But remember that this paper is an over-ambitious proposal. Only the core data management features have been built. We have none of the person management features and only two forms in use: for entering cave and cave entrance data.
|
||||
<p>Read the proposal: "<a href="/expofiles/documents/troggle/troggle_paper.pdf" download>Troggle: a novel system for cave exploration information management</a>", by Aaron Curtis</em>. But remember that this paper is an over-ambitious proposal. Only the core data management features have been built. We have none of the "person management" features, none of the "wallet progress" stuff and only two forms in use: for entering cave and cave entrance data.
|
||||
<p>
|
||||
ALSO there have been tables added to the core representation which are not anticipated in that document of this diagram, e.g. Scannedimage, Survexdirectory, Survexscansfolder, Survexscansingle, Tunnelfile, TunnelfileSurvexscansfolders, Survey. See <a href="datamodel.html">Troggle data model</a> python code (3 April 2020).
|
||||
ALSO there have been tables added to the core representation which are not anticipated in that document of this diagram, e.g. Scannedimage, Survexdirectory, Survexscansfolder, Survexscansingle, Tunnelfile, TunnelfileSurvexscansfolders, Survey. See <a href="datamodel.html">Troggle data model</a> python code (15 May 2020).
|
||||
|
||||
<h3>Troggle parsers and input files</h3>
|
||||
[describe which files they read and which tables they write to. Also say what error messages are likely on import and what to do about them.]
|
||||
<ul>logbooks
|
||||
<li>surveyscans
|
||||
<li>survex files (caves)
|
||||
<p>To understand how troggle imports the data from the survex files, tunnel files, logbooks etc., see the <a href="trogimport.html">troggle import (databaseReset.py)</a> documentation.
|
||||
<p>The following separate import operations are managed by the import utility <a href="trogimport.html">(databaseReset.py)</a>:
|
||||
<ul>
|
||||
<li>expo logbooks
|
||||
<li>folk (people)
|
||||
<li>QMs
|
||||
<li>subcaves
|
||||
<li>entrances
|
||||
<li>drawings (tunnel)
|
||||
<li>wallets of notes & scans (surveyscans)
|
||||
<li>cave survey data - survex files
|
||||
<li>QMs - question marks
|
||||
<li>drawings - tunnel & therion files
|
||||
|
||||
</ul>
|
||||
|
||||
@ -42,7 +51,11 @@ ALSO there have been tables added to the core representation which are not antic
|
||||
<p>There are only two places where this happens. This is where online forms are used to create cave entrance records and cave records. These are created in the database but also exported as files so that when troggle is rebuilt and data reimported the new cave data is there.
|
||||
|
||||
<h3>Helpful tools and scripts</h3>
|
||||
[ALSO talk about useful tools, such as those which interrogate MySQL or sqlite databases directly so that one can see the internals chnage as data is imported]
|
||||
<img class="onleft" width = "100px" src="https://mariadb.com/kb/static/images/logo-2018-black.95f5978ae14d.png">
|
||||
<img class="onright" width = "80px" src="https://sqlite.org/images/sqlite370_banner.gif">
|
||||
<p>The public server uses a <a href="https://mariadb.org/about/">MariaDB SQL database</a> and development is usually done using a single-user <a href="https://sqlite.org/about.html">sqlite database</a> which is a standard django option.
|
||||
<p>
|
||||
You will find it very useful to see what is going on if you look directly at the data in the database (just a single file in the sqlite case) and browse the data in the tables. A light-weight, simple db browser is <a href="https://sqlitebrowser.org/">DB Browser for SQLite</a>. Connecting directly the the MariaDB database with a control panel or <a href="https://www.mysql.com/products/workbench/">workbench</a> gives even more tools and documentation capabilities.
|
||||
|
||||
<hr />
|
||||
Go on to: <a href="trognotes.html">Troggle uncategorised notes to be edited</a><br />
|
||||
|
75
handbook/troggle/trogimport.html
Normal file
75
handbook/troggle/trogimport.html
Normal file
@ -0,0 +1,75 @@
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||||
<title>Handbook Troggle Data Import</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../css/main2.css" />
|
||||
</head>
|
||||
<body><style>body { background: #fff url(/images/style/bg-system.png) repeat-x 0 0 }</style>
|
||||
<h2 id="tophead">CUCC Expedition Handbook</h2>
|
||||
<h1>Troggle Data Import</h1>
|
||||
|
||||
<h3>Troggle - Reset and import data</h3>
|
||||
<p>
|
||||
The python stand-alone script <var>databaseRest.py</var> imports data from files into the troggle database (sqlite or MariaDB). It is separate from the process which runs troggle and serves the data as webpages (via apache), but it is plugged in to the same hierarchy of django python files.
|
||||
<p>In the :troggle: directory:
|
||||
<code><pre>$ python databaseReset.py
|
||||
|
||||
Usage is 'python databaseReset.py <command> [runlabel]'
|
||||
where command is:
|
||||
reset - normal usage: clear database and reread everything from files - time-consuming
|
||||
caves - read in the caves
|
||||
logbooks - read in the logbooks
|
||||
people - read in the people from folk.csv
|
||||
QMs - read in the QM csv files (older caves only)
|
||||
reinit - clear database (delete everything) and make empty tables. Import nothing.
|
||||
scans - the survey scans in all the wallets
|
||||
survex - read in the survex files - all the survex blocks but not the x/y/z positions
|
||||
survexpos - just the x/y/z Pos out of the survex files
|
||||
|
||||
tunnel - read in the Tunnel files - which scans the survey scans too
|
||||
profile - print the profile from previous runs. Import nothing.
|
||||
|
||||
test - testing...
|
||||
|
||||
and [runlabel] is an optional string identifying this run of the script
|
||||
in the stored profiling data 'import-profile.json'
|
||||
if [runlabel] is absent or begins with "F-" then it will skip the :memory: pass
|
||||
|
||||
caves and logbooks must be run on an empty db before the others as they
|
||||
set up db tables used by the others.
|
||||
</pre></code>
|
||||
<p>On a clean computer with 16GB of memory and using sqlite a complete import takes about 20 minutes if nothing else is running.
|
||||
On the shared expo server it can take a couple of hours if the server is in use
|
||||
(we have only a share of it). On your
|
||||
own computer, the first in-memory sqlite pass takes only about 6 minutes.
|
||||
We do this so that typos and data-entry errors
|
||||
are found quickly.
|
||||
<p>Here is an example of the output after it runs, showing which options were used recently and how long
|
||||
each option took (in seconds)
|
||||
<code><pre>
|
||||
-- troggle.sqlite django.db.backends.sqlite3
|
||||
** Running job Profile
|
||||
** Ended job Profile - 0.0 seconds total.
|
||||
days ago -4.28 -4.13 -4.10 -3.03 -3.00
|
||||
runlabel (s) svx NULL RESET svx2 RESET2
|
||||
reinit (s) - 1.9 1.9 - 1.8
|
||||
caves (s) - - 39.1 - 32.2
|
||||
people (s) - - 35.0 - 24.4
|
||||
logbooks (s) - - 86.5 - 67.3
|
||||
QMs (s) - - 19.3 - 0.0
|
||||
survexblks (s) 1153.1 - 3917.0 1464.1 1252.9
|
||||
survexpos (s) 397.3 - 491.9 453.6 455.0
|
||||
tunnel (s) - - 25.5 - 23.1
|
||||
scans (s) - - 52.5 - 45.9
|
||||
</pre></code>
|
||||
<p>The 'survexblks' option loads all the survex files recursively following the <var>*include</var>
|
||||
statements. It can take a long time if memory is low and the operating system has to page a lot.
|
||||
<p>(That value of 0 seconds for QMs looks suspicious..)
|
||||
<p>The file <var>import_profile.json</var> holds these historic times. Delete it to get
|
||||
a clean slate.
|
||||
<hr />
|
||||
Return to: <a href="datamodel.html">Troggle data model</a> in python code <br />
|
||||
<hr />
|
||||
</body>
|
||||
</html>
|
Loading…
Reference in New Issue
Block a user