Feature Request: Periodically write the list of verified chunks

Hello.
I want to execute duplicacy check -chunks on a relatively large repository. However, my computer regularly crashes. As duplicacy keeps a list of already verified chunks in .duplicacy/cache/storage/verified_chunks, this would usually not be a problem, as duplicacy could resume the check process from where it ended on the previous crash.

However, duplicacy only writes to .duplicacy/cache/storage/verified_chunks when the process is terminated regularly, but not when my computer crashes, which means that the list of already verified chunks is not persisted and progress is thus not saved.
This is why I propose to implement a feature, through which check progress is written to .duplicacy/cache/storage/verified_chunks periodically, so that progress is preserved in case of computer crashes. For example, duplicacy could write .duplicacy/cache/storage/verified_chunks each time after 1000*n chunks have been checked.

I’d appreciate your feedback in this regard. Best, Juri

1 Like

This is a wrong solution to a wrong problem.

Computers don’t crash for no reason out of the blue. Lost progress checking chunks is the least of your worries. Your actual data can get corrupted or destroyed.

Find the culprit and address the crashes — that’s the solution. You may want to start with running memtest86, followed by disk check, and if memory is fine — analyzing crash dumps/panic logs.

4 Likes

I’d like to see this as well for a backup I make online to a destination that has a slow upload speed. Even just selecting one revision of the huge backup, it would take almost 4 days according to the estimate for the chunk check to finish. However, my internet connection (and likely the connection on the destination) is cut every 24 hours. I haven’t yet tested how Duplicacy reacts to that, but to be on the safe side not having to loose all the progress, it would be great for Duplicacy to save progress, perhaps every 2 hours or something?

Thanks!

1 Like

Okay, so while the chunk check persists even with the connection reconnecting after 24 hours and I limited the check to a single revision, it still takes over 2 weeks for the initial backup revision to finish the check.

I had started the process and after a week, I lost all the progress as I had an automated shutdown that I forgot to turn off on the machine running the check.

Restarting the check now and I hope I won’t lose the progress again due to something else.

I’m really hoping in a future update, an option can be added to allow for periodically saving the progress on the chunk checks!

Another downside of Duplicacy not periodically writing verified chunks to disk is that its memory consumption increases on large chunk verifications. The current verification running for a total of 233901 chunks, is currently at 34172 verified chunks with Duplicacy taking already 500M in memory. With still 9 days to go, that’s going to tie up quite a lot of memory over a long period of time.

1 Like

I have submitted a PR on Github (periodically save check progress to verified_chunks by gutjuri · Pull Request #662 · gilbertchen/duplicacy · GitHub) that adds periodic saves (as per now, once every hour).
I’d be grateful for feedback and eventually a merge of this PR.

However, I think that this PR will not solve the memory issue you described in Feature Request: Periodically write the list of verified chunks - #5 by rainforest1155

That’s great, thank you!

Thinking about the memory use again, I’m guessing it’s not really that bad and in the last day it actually went slightly down again.