Impact of Prune on storage Copy operations

Raindogtoo · 22 June 2018 12:36

Use Case:

I run Copy from local to cloud
Immediately after that, both storages are Pruned
I then run a second Copy job, a large number of chunks are marked for upload/copy
Rough numbers - 190K chunks flagged, 6500 chunks copied

Can you please help me understand what is happening in the Prune operation that results in that many chunks being flagged and uploaded?

TheBestPessimist · 22 June 2018 13:05

One possibility: you prune cloud harder than local and therefore when you copy the second time, you again have to copy some revisions which were pruned. (and if you pruned a second time, from local nothing would be “deleted”, but from cloud again something would be)

Raindogtoo · 22 June 2018 13:31

The Prune options are identical for both local and cloud.

The local Prune (or the local Prune in combination with the cloud Prune) appears to be resulting in a LOT of chunks being touched. I’m requesting some insight into what is going on there.

Separate but related: is there a simple way to determine the total number of chunks contained in a storage pool?

russell.davis · 24 June 2018 07:15

duplicacy check -tabular should do it.

Raindogtoo · 24 June 2018 12:26

I think so, but wanted to confirm what I’m seeing. I have four repositories backing up to the same storage pool. So the total chunk count equals the unique chunks from each repository?

Might be a decent minor feature request to add the total storage stats at the end of the “-tabular” output.

Christoph · 24 June 2018 21:29

I’d say: total chunk count = number of unique chunks from each repository + number of shared chunks

Raindogtoo · 25 June 2018 10:46

And how does one determine shared chunks? I see sum footers that have total chunks for each repository along with total unique chunks. But even if I subtract out the unique chunks I still don’t see how I can end up with and accurate number shared chunks in the storage pool.

gchen · 25 June 2018 14:36

Can I suggest the following?

run the first copy and then prune
run duplicacy check -tabular on both storages
run the second copy
run duplicacy check -tabular on both storages again

Then post the output from all duplicacy check -tabular commands.

Raindogtoo · 27 June 2018 13:07

As requested.

Initial Copy
Command(s) used:
duplicacy copy -from default -to wasabi -threads 2 | grep -v “skipped at the destination”

Results:
Copy complete, 33912 total chunks, 420 chunks copied, 33492 skipped

Log file:
https://drive.google.com/open?id=15XpMjyw8MmcSTrSKNOlFAfj0vXsWKeDX

Prune
Command(s) used:
duplicacy prune -all -keep 0:180 -keep 7:30 -keep 1:7
duplicacy prune -exclusive -all -keep 0:180 -keep 7:30 -keep 1:7 -storage wasabi

Log file:
https://drive.google.com/open?id=1fUiLA3IgSUkNaCXihVoBSWyaXjtsATEp

Initial Storage Check
Command(s) used:
duplicacy check -all -tabular
duplicacy check -all -tabular -storage wasabi

Log file:
https://drive.google.com/open?id=1QGu_Ld3J18_jDjscIjtIpjHFnHSFBxIU

2nd Copy
Command(s) used:
duplicacy copy -from default -to wasabi -threads 2 | grep -v “skipped at the destination”

Results:
Copy complete, 28719 total chunks, 5104 chunks copied, 23615 skipped

Log file:
https://drive.google.com/open?id=1Dv-mmy4MPLT8zvc0AfaC8e19n3sdO7DV

2nd Storage Check
Command(s) used:
duplicacy check -all -tabular
duplicacy check -all -tabular -storage wasabi

Log file:
https://drive.google.com/open?id=1ZZZNR1t0BmcbaHBq1HOVo3PC9eIl1OR-

Notes: prior to running the 2nd storage check, at least one cron backup job started. I killed the job, but am unsure of the impact on the “Check” results. If I need to re-do this, please let me know.

gchen · 27 June 2018 14:35

For snapshot id J742845-W10-J742845-J742845, the prune commands produced different revisions on two storages after revision 862:

160.	All chunks referenced by snapshot J742845-W10-J742845-J742845 at revision 862 exist	160.	All chunks referenced by snapshot J742845-W10-J742845-J742845 at revision 862 exist
161.	All chunks referenced by snapshot J742845-W10-J742845-J742845 at revision 936 exist	161.	All chunks referenced by snapshot J742845-W10-J742845-J742845 at revision 923 exist

This is likely due to two prune commands starting at different times causing different retention frequencies to be selected.

A quick fix I can think of is to provide a -now option to overwrite the current time so that two prune commands will use the same base time for deciding the retention frequencies.

Raindogtoo · 27 June 2018 14:53

Thanks.

So, perform a Prune on both storages using the “-now” option, correct? That must be an undocumented feature

Question: I was (and have since stopped) performing backups from that repository directly to the cloud copy when traveling. Would that have created this situation?

Raindogtoo · 27 June 2018 14:59

Maybe I jumped the gun? “-now” is not a recognized option.

TheBestPessimist · 27 June 2018 15:19

-now is not yet available. It’s just an idea for a quick-fix.

Raindogtoo · 27 June 2018 15:26

I’m guessing the other option would be to perform a “nuclear” prune and remove all snapshots older than, say, 30 days in both storages. It seems like that may accomplish the goal of getting the storages back in sync. Or am I missing something?

TheBestPessimist · 27 June 2018 15:28

Isn’t this issue discussed somewhere on github as well? I think there was something along the lines of “prune is dependent on the time it starts when checking which revisions to delete” and there was an idea to make prune always take the “first” revision of a day when pruning per day. Maybe i’m wrong but i think i remember reading such a topic somewhere.

Raindogtoo · 7 July 2018 17:12

Doesn’t appear to be any additional input here, so I’ll add a postscript and try to summarize my understanding.

I performed a “nuclear” prune on both the local and cloud storage pools to remove all snapshots older than 30 days. This appears to have fixed the issue and the entire backup job is now running in 2-4 hours (that includes creating/copying images of two server volumes, backing up the server target volume to the local Duplicacy storage pool, pruning both the local and cloud storage pools, copying the local storage pool to the cloud with snapshots from four targets).

I’ve noticed during the prune on the cloud storage, a recent snapshot from the most active target is removed, and then the copy process re-adds that snapshot to the cloud. Not sure if that’s something to be watched or not.

My impression is that I likely caused the issue initially by backing up one of the targets directly to the cloud storage pool on several occasions thereby ‘confusing’ Duplicacy as it attempted to prune what was no longer needed in both pools and then copying data from local to cloud. The result was older shapshots being pruned out of the cloud and then re-uploaded during the copy process.

Is that an accurate explanation?

If so, then it seems reasonable to state that backing up directly to a storage pool that is primarily used for replication is a BAD IDEA. Correct?

towerbr · 7 July 2018 18:19

Regarding specifically to this point, see this topic, which is somewhat related. I use a local storage as a “buffer” for backup to the cloud, which seems similar to your use case.

Raindogtoo · 7 July 2018 20:36

Yes, that has been my understanding. And, additionally, it seems that doing so will create problems downstream as you attempt to prune each storage and execute subsequent copies.

So… I believe the short answer is: don’t do it!

gchen · 9 September 2018 01:21

This is fixed by Align snapshot times to the beginning of days when calculating the di… · gilbertchen/duplicacy@22a0b22 · GitHub.

Specifically, the commit makes sure that, if you run the prune command at different times but on the same day, the same set of snapshots will be deleted.