Impact of Prune on storage Copy operations

Use Case:

  • I run Copy from local to cloud
  • Immediately after that, both storages are Pruned
  • I then run a second Copy job, a large number of chunks are marked for upload/copy
  • Rough numbers - 190K chunks flagged, 6500 chunks copied

Can you please help me understand what is happening in the Prune operation that results in that many chunks being flagged and uploaded?

One possibility: you prune cloud harder than local and therefore when you copy the second time, you again have to copy some revisions which were pruned. (and if you pruned a second time, from local nothing would be ā€œdeletedā€, but from cloud again something would be)

The Prune options are identical for both local and cloud.

The local Prune (or the local Prune in combination with the cloud Prune) appears to be resulting in a LOT of chunks being touched. Iā€™m requesting some insight into what is going on there.

Separate but related: is there a simple way to determine the total number of chunks contained in a storage pool?

duplicacy check -tabular should do it.

1 Like

I think so, but wanted to confirm what Iā€™m seeing. I have four repositories backing up to the same storage pool. So the total chunk count equals the unique chunks from each repository?

Might be a decent minor feature request to add the total storage stats at the end of the ā€œ-tabularā€ output.

Iā€™d say: total chunk count = number of unique chunks from each repository + number of shared chunks

And how does one determine shared chunks? I see sum footers that have total chunks for each repository along with total unique chunks. But even if I subtract out the unique chunks I still donā€™t see how I can end up with and accurate number shared chunks in the storage pool.

Can I suggest the following?

  • run the first copy and then prune
  • run duplicacy check -tabular on both storages
  • run the second copy
  • run duplicacy check -tabular on both storages again

Then post the output from all duplicacy check -tabular commands.

As requested.

Initial Copy
Command(s) used:
duplicacy copy -from default -to wasabi -threads 2 | grep -v ā€œskipped at the destinationā€

Results:
Copy complete, 33912 total chunks, 420 chunks copied, 33492 skipped

Log file:
https://drive.google.com/open?id=15XpMjyw8MmcSTrSKNOlFAfj0vXsWKeDX

Prune
Command(s) used:
duplicacy prune -all -keep 0:180 -keep 7:30 -keep 1:7
duplicacy prune -exclusive -all -keep 0:180 -keep 7:30 -keep 1:7 -storage wasabi

Log file:
https://drive.google.com/open?id=1fUiLA3IgSUkNaCXihVoBSWyaXjtsATEp

Initial Storage Check
Command(s) used:
duplicacy check -all -tabular
duplicacy check -all -tabular -storage wasabi

Log file:
https://drive.google.com/open?id=1QGu_Ld3J18_jDjscIjtIpjHFnHSFBxIU

2nd Copy
Command(s) used:
duplicacy copy -from default -to wasabi -threads 2 | grep -v ā€œskipped at the destinationā€

Results:
Copy complete, 28719 total chunks, 5104 chunks copied, 23615 skipped

Log file:
https://drive.google.com/open?id=1Dv-mmy4MPLT8zvc0AfaC8e19n3sdO7DV

2nd Storage Check
Command(s) used:
duplicacy check -all -tabular
duplicacy check -all -tabular -storage wasabi

Log file:
https://drive.google.com/open?id=1ZZZNR1t0BmcbaHBq1HOVo3PC9eIl1OR-

Notes: prior to running the 2nd storage check, at least one cron backup job started. I killed the job, but am unsure of the impact on the ā€œCheckā€ results. If I need to re-do this, please let me know.

For snapshot id J742845-W10-J742845-J742845, the prune commands produced different revisions on two storages after revision 862:

160. All chunks referenced by snapshot J742845-W10-J742845-J742845 at revision 862 exist 160. All chunks referenced by snapshot J742845-W10-J742845-J742845 at revision 862 exist
161. All chunks referenced by snapshot J742845-W10-J742845-J742845 at revision 936 exist 161. All chunks referenced by snapshot J742845-W10-J742845-J742845 at revision 923 exist

This is likely due to two prune commands starting at different times causing different retention frequencies to be selected.

A quick fix I can think of is to provide a -now option to overwrite the current time so that two prune commands will use the same base time for deciding the retention frequencies.

Thanks.

So, perform a Prune on both storages using the ā€œ-nowā€ option, correct? That must be an undocumented feature :wink:

Question: I was (and have since stopped) performing backups from that repository directly to the cloud copy when traveling. Would that have created this situation?

Maybe I jumped the gun? ā€œ-nowā€ is not a recognized option.

-now is not yet available. Itā€™s just an idea for a quick-fix.

:+1:

Iā€™m guessing the other option would be to perform a ā€œnuclearā€ prune and remove all snapshots older than, say, 30 days in both storages. It seems like that may accomplish the goal of getting the storages back in sync. Or am I missing something?

Isnā€™t this issue discussed somewhere on github as well? I think there was something along the lines of ā€œprune is dependent on the time it starts when checking which revisions to deleteā€ and there was an idea to make prune always take the ā€œfirstā€ revision of a day when pruning per day. Maybe iā€™m wrong but i think i remember reading such a topic somewhere.

Doesnā€™t appear to be any additional input here, so Iā€™ll add a postscript and try to summarize my understanding.

I performed a ā€œnuclearā€ prune on both the local and cloud storage pools to remove all snapshots older than 30 days. This appears to have fixed the issue and the entire backup job is now running in 2-4 hours (that includes creating/copying images of two server volumes, backing up the server target volume to the local Duplicacy storage pool, pruning both the local and cloud storage pools, copying the local storage pool to the cloud with snapshots from four targets).

Iā€™ve noticed during the prune on the cloud storage, a recent snapshot from the most active target is removed, and then the copy process re-adds that snapshot to the cloud. Not sure if thatā€™s something to be watched or not.

My impression is that I likely caused the issue initially by backing up one of the targets directly to the cloud storage pool on several occasions thereby ā€˜confusingā€™ Duplicacy as it attempted to prune what was no longer needed in both pools and then copying data from local to cloud. The result was older shapshots being pruned out of the cloud and then re-uploaded during the copy process.

Is that an accurate explanation?

If so, then it seems reasonable to state that backing up directly to a storage pool that is primarily used for replication is a BAD IDEA. :slightly_smiling_face: Correct?

1 Like

Regarding specifically to this point, see this topic, which is somewhat related. I use a local storage as a ā€œbufferā€ for backup to the cloud, which seems similar to your use case.

Yes, that has been my understanding. And, additionally, it seems that doing so will create problems downstream as you attempt to prune each storage and execute subsequent copies.

Soā€¦ I believe the short answer is: donā€™t do it!

2 Likes

This is fixed by Align snapshot times to the beginning of days when calculating the diā€¦ Ā· gilbertchen/duplicacy@22a0b22 Ā· GitHub.

Specifically, the commit makes sure that, if you run the prune command at different times but on the same day, the same set of snapshots will be deleted.

3 Likes