rsync docm updates

This commit is contained in:
Philip Sargent 2021-12-20 23:42:57 +00:00
parent f826f7a946
commit 369e8426a2
4 changed files with 44 additions and 9 deletions

View File

@ -26,28 +26,43 @@ uploading photographs: <a href="uploading.html">uploading.html</a>.
<p> To sync all <p> To sync all
the files from the server to your local expofiles directory on your laptop:</p> the files from the server to your local expofiles directory on your laptop:</p>
<p><code>rsync -nazv --delete-after --exclude="thumbs/" --exclude="*.???.xml" --exclude="*.jpeg.xml" expo@expo.survex.com:expofiles/ /home/expo/expofiles</code></p> <pre style="font-size: small"><code>rsync -nazv --delete-after --exclude="thumbs/" --exclude="*.???.xml" --exclude="*.jpeg.xml" expo@expo.survex.com:expofiles/ /home/expo/expofiles</code></pre>
<p>To sync the local expofiles directory back to the server after you have edited updates (e.g. scanned some hand-drawn surveys into expofiles/surveyscans/ (but only if your machine runs Linux):</p> <p>To sync the local expofiles directory back to the server after you have edited updates (e.g. scanned some hand-drawn surveys into expofiles/surveyscans/ (but only if your machine runs Linux):</p>
<p><code>rsync -nazv --delete-after /home/expo/expofiles/surveyscans/2019/ expo@expo.survex.com:expofiles/surveyscans/2019</code></p> <pre style="font-size: small"><code>rsync -nazv --delete-after /home/expo/expofiles/surveyscans/2019/ expo@expo.survex.com:expofiles/surveyscans/2019</code></pre>
then CHECK that the list of files it produces matches the ones you absolutely intend to delete forever! ONLY THEN do it without the "-n" option. "-n" is the same as "--dry-run" which shows you the overwriting changes but doesn't actually do them. then CHECK that the list of files it produces matches the ones you absolutely intend to delete forever! ONLY THEN do it without the "-n" option. "-n" is the same as "--dry-run" which shows you the overwriting changes but doesn't actually do them.
<p>Always <p>Note that the target folder <em>has no trailing slash</em> but that the source folder <em>does</em>. Important.
<p>Always:
<ul> <ul>
<li>do a dry-run of rsync from the server to your laptop immediately before you do an upload to the server <li>do a dry-run of rsync from the server to your laptop immediately before you do an upload to the server
<li>use --delete-after <li>use <var>--delete-after </var>
<li>never use <var>--prune-empty-dirs</var> because we often need an empty folder, e.g. when preparing for the next expo we create several empty folders.
<li>work at the minimum scope of folders you need, e.g. within expofiles/photos/ or expofiles/surveyscans/ not for the whole of expofiles all at once. <li>work at the minimum scope of folders you need, e.g. within expofiles/photos/ or expofiles/surveyscans/ not for the whole of expofiles all at once.
<li>take exagerated care with the placement of the final slash in directory parameters to the rsync. Get it wrong and you duplicate things instead of updating them and it takes ages to sort out. <li>take exaggerated care with the placement of the final slash in directory parameters to the rsync. Get it wrong and you duplicate things instead of updating them and it takes ages to sort out.
</ul> </ul>
<p>(do be <b>incredibly</b> careful not to delete piles of stuff then rsync back, or to get the directory level of the command wrong - as it'll all get deleted on the server too, and we may not have backups!). It's <b>absolutely vital</b> to use rsync --dry-run --delete-after first to check what would be deleted. <p>(do be <b>incredibly</b> careful not to delete piles of stuff then rsync back, or to get the directory level of the command wrong - as it'll all get deleted on the server too, and we will not have backups if these are recent files!). It's <b>absolutely vital</b> to use <var>rsync --dry-run --delete-after</var> first to check what would be deleted.
<p>If your version of rsync produces output for every folder it sees, even if it is not update, then pipe the output through
<pre style="font-size: small"><code>| grep -v "/$"</code></pre>
to hide the folders which have a termial slash.
<p>If you are using rsync from an NTFS folder on a Windows machine (even if you are using WSL to do it) you will <em>not</em> get all the files for certain as some Linux filenames are incompatible with Windows. What will happen is that rsync will invisibly change the names as it downloads them from the Linux expo server to your Windows machine, but then it forgets what it has done and tries to re-upload all the renamed files to the server even if you have touched none of them. This pollutes the server. Now there won't be any problems with simple filenames using all lowercase letters and no funny characters, but we have nothing in place to stop anyone creating an incompatible filename of that sort somewhere in that 40GB or of detecting the problem at the time. So don't do it. Be extra, extra careful and religiously use the -n (DRY RUN) setting and manually check all changes before running rsync without -n. <p>If you are using rsync from an NTFS folder on a Windows machine (even if you are using WSL to do it) you will <em>not necessarily</em> get all the files cleanly as some legal Linux filenames are incompatible with Windows. What will happen is that
<ol>
<li>rsync will invisibly change the names as it downloads them from the Linux expo server to your Windows machine, but
<li>then it forgets what it has done and
<li>when you next try to rsynchronise using rsync, it will
<li>re-upload all the <em>renamed files</em> and maybe delete the originals <em>even if you have touched none of them</em>.
<li>This pollutes the server and would break links between survex files and drawings file.
</ol>Now there won't be any problems with simple filenames using all lowercase letters and no funny characters (except for "con.jpg" and similar of course*), but we have nothing in place to stop anyone creating (using Linux) an incompatible filename of that sort somewhere in that 40GB or of detecting the problem at the time. So be extra, extra careful and religiously use the <var>-n (DRY RUN)</var> setting and manually check all changes before running rsync without -n.
<p>(We may also have an issue with rsync not using the appropriate user:group attributes for files pushed back to the server. This may not cause any problems, but watch out for it.)</p> <p>(We may also have an issue with rsync not using the appropriate user:group attributes for files pushed back to the server. This may not cause any problems, but watch out for it.)</p>
<p>* CON is an MS-DOS identifier for the CONSOLE and it is still an illegal Windows filename.
It's <a href="https://www.howtogeek.com/fyi/windows-10-still-wont-let-you-use-these-file-names-reserved-in-1974/">not the only thing</a> like that.
</dl> </dl>

View File

@ -40,7 +40,13 @@
<p>There are also scripts running cron jobs on the server to fix file permissions and to periodically tidy <a href="../computing/repos.html">repositories</a>, and example rsync and scp scripts to help manage synchronisation of the expofiles directories which are not under version control. <p>There are also scripts running cron jobs on the server to fix file permissions and to periodically tidy <a href="../computing/repos.html">repositories</a>, and example rsync and scp scripts to help manage synchronisation of the expofiles directories which are not under version control.
<p>Apart from these scripts, troggle in full deployment <a href="serverconfig.html#js">on the server</a> also needs <br>- a running mySQL database, <br>- a running apache webserver and <br>- cgit to display git repos. <p>Apart from these scripts, troggle in full deployment <a href="serverconfig.html#js">on the server</a> also needs (at least)
<br>- a mySQL database,
<br>- a webserver such as apache
<br>- a text seach utility such as xapian
<br>- cgit to display git repos.
<p> See the <a href="serverconfig.html#js">server configuration</a> for the full list,
or the smaller <a href="troglaptop.html">troggle dev</a> setup for just core software development.
<h3 id="inscripts">Old but maybe useful scripts</a></h3> <h3 id="inscripts">Old but maybe useful scripts</a></h3>
<ul> <ul>

View File

@ -38,7 +38,7 @@ nearly half of it is directly relevant to us:
<li>Always discover why previous modernisations failed <em>first</em>. <li>Always discover why previous modernisations failed <em>first</em>.
<li>Spend a lot of time <em>problem setting</em> before you start to think about <em>problem solving</em>. <li>Spend a lot of time <em>problem setting</em> before you start to think about <em>problem solving</em>.
<li>Working through a major modernisation is all about managing <em>scope</em>. <li>Working through a major modernisation is all about managing <em>scope</em>.
<li>Code is much easier to write than it is to read. Modernisaton means a lot of <em>code reading</em>. <li>Code is much easier to write than it is to read. Modernisation means a lot of <em>code reading</em>.
<li>Human beings are absolutely terrible at estimating probabilities and risk. We always under-estimate the amount of work in a rewrite and over-estimate the likelihood of success. <li>Human beings are absolutely terrible at estimating probabilities and risk. We always under-estimate the amount of work in a rewrite and over-estimate the likelihood of success.
<li>Success does not come all at once. What are the <em>progressive success criteria</em> during the reengineering? <li>Success does not come all at once. What are the <em>progressive success criteria</em> during the reengineering?
<li>Use <em>Diagnosis, Policy, Actions</em> where there is little consensus about what success looks like. <li>Use <em>Diagnosis, Policy, Actions</em> where there is little consensus about what success looks like.

14
scripts/updatephotos Normal file
View File

@ -0,0 +1,14 @@
#!/bin/sh
#script to regenerate expo photo collection
# The active version of this script lives at /expofiles/updatephotos
BASEDIR=/home/expo
WEBDIR=/home/expo
#tidy up dangling symlinks first
find ${WEBDIR}/webphotos -type l -print0 | xargs -0 --no-run-if-empty rm
#config file location set in /etc/bins/binsrc, not here.
bins -o scaled -e ${BASEDIR}/expofiles/photos ${WEBDIR}/webphotos > ${BASEDIR}/updatephoto.log 2>&1
# <html><body>Line inserted to stop link-scanners complaining</body></html>