[Solved] Copy from local to cloud, prune inconsistent

Hi there,
I use cli version and my setup is quite straightforward: backup on local storage, copy from local backup to cloud storage and prune identical to the both storages. All seems to work fine except that the cloud storage grows in size despite pruning. Today I got a warning that I am exceeding my limits. On local storage the backup size is ca. 37G, but on the cloud storage it’s 50G.

I aim for the mirrored backup of the single personal server instance running in my home. E/g/ ideally, the both backups should be identical hust in case something would go wrong with one of them.

The backup script:

for target in $backup_duplicacy_daily_targets
do
        path=`echo $target | cut -d\@ -f 1`
        cd "$path"
        echo "[`date`] Backing up $path"
        $backup_duplicacy -log -s UPLOAD_FILE -s PACK_END backup -threads 20
        sleep 1
done

sleep 10

cd /
echo "[`date`] Copying daily backup from local_backup to sftp_ihc"
$backup_duplicacy -log -s COPY_PROGRESS copy -threads 30 -from local_backup -to sftp_ihc

The script for pruning:

for target in $backup_duplicacy_daily_targets
do
        path=`echo $target | cut -d\@ -f 1`
        keep=`echo $target | cut -d\@ -f 2`
        cd "$path"
        echo "[`date`] Pruning local_backup $path $keep"
        $backup_duplicacy prune -storage local_backup -threads 30 -keep $keep
        sleep 1
        echo "[`date`] Pruning sftp_ihc $path $keep"
        $backup_duplicacy prune -storage sftp_ihc -threads 30 -keep $keep
        sleep 1
done

Frankly, I’m quite puzzled why this is the case. According to what I read on the nice support forum and in the docs, it should work “just fine”, but it doesn’t. Could you please enlighten me why this could happen and what I should do to avoid it?

You prune commands need either -a or -id argument.

As written now it does nothing.

1 Like

Thank you for the suggestion, it does make sense, but

I have to disagree. The local storage has been pruned exactly as expected, but not the remote (cloud) one on a sftp server.
This scripts have been run for years on my VPS till I migrate it into the home lab. Nothing was changed in the scripts.
They successfully backed up and pruned local and remote storage (it was B2) with consisted pruning on both storages.
Once I migrated, I initialized storages (name changing) and run exactly the same scripts with the posted outcomes. This is why I am puzzled.

Anyway, I will try to add -a argument to both lines for pruning and see tomorrow if it solve the problem. Thanks again!

The default snapshot id is “default”. If your storage uses default id — perhaps that’s why it got pruned. But it’s a coincidence, and should not be relied upon. Only the first storage added to the repo has the id “default”

1 Like

Oh, I see. Still do not understand why it worked as expected on the previous setup. Perhaps, the duplicacy version was a very old one (didn’t upgrade for quite a while, probably never).

I stand corrected, btw. The scripts do something (not nothing). I didn’t wait for tomorrow, I added to the scripts -a argument and run them manually. The size of both storages dropped to 25G. So, even local pruning didn’t work fully, as it been proved.

Anyway, thank you for the suggestion and the clarification. The problem is solved, case closed :slight_smile:

1 Like

Well, not quite.

That was by doing duplicacy check -tabular. Actual size on both storages is the same as previous, 51G in cloud and 38G on local (yeah, it’s grown a bit over night).

I added an argument -exhaustive to both commands and run manually with it, but nothing changed: size on disk stays as previous.

Could you please point me into right direction to solve this? I’m starting to think to wipe all that out clean and start over, but my ignorance in reason why it doesn’t work still stops me from that.

I would start with reviewing the prune logs. Then review the state of each storage – is there just one snapshot ID, and is the number of revisions under the snapshot named folder approximately as expected.

How are you checking the space utilized? du? What about cloud storage? The available space may not update realist.

What cloud storage are you using? Does the user has permissions to rename and delete data from both destinations?

In prune logs there is not much.
this is for daily prune before I added -a argument to it (starts with six hundred lines of “Marked fossil”):

Marked fossil 021d42636e9ce2df8ac955a6cefd982b2bb5e2495c6df03a7b3dd13dbfcb5788
....
Marked fossil 1a5b0cc42b704137bcbbed1fbbde37fd4d6e23dddc45abbb012d09f665eae124
Marked fossil 98db43dda2ada04953914b973af01b0aef8cc989f4f4a40416fb80ce9d0fd474
Fossil collection 18 saved
Deleted cached snapshot 12_jail at revision 28

This is after I added -a & - exhaustive arguments (starts with 12 thousands of “Found unreferenced fossil”):

Found unreferenced fossil b8/828bbd739857dae045c6c938c496975393e65eb604d213066f41b3ea17dbdc.fsl
...
Found unreferenced fossil b8/edfb80b10a0ee822cbdab972eda1d70931f61d450af86edd63880d211b1a1d.fsl
Fossil collection 3 saved

No snapshot IDs though…

Yes, for local storage I use ncdu (it’s like du but with some visual perks). For cloud I rely upon provider info (it’s delayed for several minutes, but they bill me based upon it). But anyway, GBs difference I would definitely spotted :slight_smile:

It’s an SFTP share as a service, some local provider, proved to be quite reliable and rather inexpensive. I do have all necessary permissions, I even can login to it with filezilla and do with the files there whatever I want, incl. delete them. So permission (755 for folders & 644 for files) and credentials (owner & group = account) are not the issue here.

I dug further and found out the difference between two prune logs (local and cloud) before I added -a argument to the script. For local there are line, as you suggested, with IDs (total ca. 4 000 lines):

Snapshot 7_jail revision 9 was created after collection 11
Snapshot 8_jail revision 9 was created after collection 11
Snapshot 9_jail revision 9 was created after collection 11
Snapshot 12_jail revision 33 was created after collection 11
Snapshot 1_jail revision 9 was created after collection 11
Snapshot 2_jail revision 9 was created after collection 11
Snapshot 4_jail revision 9 was created after collection 11
Snapshot 5_jail revision 33 was created after collection 11
Snapshot 6_jail revision 9 was created after collection 11
Snapshot 0_root revision 33 was created after collection 11
Snapshot 15_jail revision 9 was created after collection 11
Snapshot 3_jail revision 33 was created after collection 11
Deleted fossil dbaab1d6c8e98ee72d0c4132a40fe5b9a8d148c31f31eefecfd66f5a8d299cfa (collection 11)
Deleted fossil c551e0040352efee64f2816918355b46b4c654ff6f8c43d79d23248e74798c83 (collection 11)
Deleted fossil 7004d3036ffb33d7350e9cea8392af0e918c20f43323d4259a66346eb694b288 (collection 11)
Deleted fossil cf72ebfad18514513640da43b7b69580d413a250735542f55407234bc2d0cea6 (collection 11)
...
Marked fossil 9180b50a9284af8ed3de10e0612cb46deb7ae0acf82f6273b02ecbe460c5b0a8
Marked fossil a19c5bb9b65bddbbe3fa315505ba9ca478c5dbd5455a4695255ad27faf29391e
Marked fossil de99c18d6d869f8319f7a51ef625ce6ed83c9a97dfe7c7a0735f02e407b4c226
Marked fossil fbbc1e6b84decbec75860041f6a7e455d0ee1b7c7d1ca9964654a9d11a83b5f7
...
Marked fossil 764eec5065d44f020dfe2f20890120cc82d1db4f579d7c79888cdd56b3b04268
Fossil collection 16 saved
Deleted cached snapshot 12_jail at revision 23
Deleted cached snapshot 12_jail at revision 24
Deleted cached snapshot 12_jail at revision 25
Deleted cached snapshot 12_jail at revision 26

But for cloud there are only 60 lines of "Marked fossil " lines, the log starts and ends with them, nothing else in between:

Marked fossil 0898290d70cbd62dabcf10d2b3098b3d036bfe4a9b98ff1c8d19f6bd90a05c61
Marked fossil e7126e4cfa573525608b6d67abf9425e4c05f391d2d64a971ce5aa4787edf02f
Marked fossil 15a67e94e71823b904d977f6ca7f1b1d5ee62cd1aed583f56afa2f19cb5cb24f

So, I reckon this is because there is no ID (even default one) for the cloud storage, as you kindly suggested.

The most peculiar thing is that the prune logs for a command with -a argument are completely empty, zero bytes of the files, not even a line in them. In the console (I ran scripts manually) there are line like that:

Storage set to /var/duplicacy
Keep no snapshots older than 22 days
Fossil collection 1 found
Fossils from collection 1 can't be deleted because deletion criteria aren't met
Fossil collection 2 found
Fossils from collection 2 can't be deleted because deletion criteria aren't met
Fossil collection 3 found
Fossils from collection 3 can't be deleted because deletion criteria aren't met
No snapshot to delete
[Fri Jul 19 10:26:38 MSK 2024] Pruning weekly in sftp_ihc / 0:22
Storage set to sftp://p627403@p627403.backup.ihc.ru/tuft
Keep no snapshots older than 22 days
Fossil collection 1 found
Fossils from collection 1 can't be deleted because deletion criteria aren't met
Fossil collection 2 found
Fossils from collection 2 can't be deleted because deletion criteria aren't met
Fossil collection 3 found
Fossils from collection 3 can't be deleted because deletion criteria aren't met
No snapshot to delete
...

And it’s continue for each storage in the script with more or less the same content.

What I think is that prune somehow remove the obsolete snapshots from the counter (duplicacy check command), but the files still on the disk for some incomprehensible (at least for me) reason.

Any suggestions?

It seems it works as designed. You can read more here duplicacy/duplicacy_paper.pdf at master · gilbertchen/duplicacy · GitHub but the gist of it is that duplicacy is fully concurrent, it has to support weird scenarios where five machines run backup to a specific storage and seven other machines try to prune it at the same time. There is safety built in preventing immediate deletion of chunks — that fossilization step is part of that mechanism. For example, you don’t want prune to nuke the chunk that is needed for an in-progress backup.

If you really want to free space right away you can promise duplicacy that it is right now the only process messing with the datastore and to turn off safety, by providing -exclusive flag.

Then pruned data will be deleted right away. But if the promise is broken — bad things will happen.

Alternatively, you can keep running backups and prunes, and eventually that data will be gone. (I believe it’s after one week and one backup, whichever is later, but don’t quote me on this)

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.