How to properly do an initial backup to B2?

This question is related.

  1. It seems my initial backup attempts are now restarting from scratch every time there’s an internet disruption or PC shutdown. Are we supposed to do a completely uninterrupted initial backup? It appears Duplicacy is not picking-up from where it left.

  2. Before I notice this issue, I had 16GB initially uploaded to B2. Now that Duplicacy may have restarted from scratch again and I can now see B2 has about 40GB data. Incidentally, Duplicacy was interrupted again, and it appears to be starting from scratch again. I’m basing this assumption on the blue progress bar in the web GUI. What will happen to the “interrupted/incomplete” files already uploaded in B2? Are they going to eventually be pruned even if they are incomplete “stray” data? Or are they going to stay there forever?

  3. Since I don’t have high speed internet and Duplicacy seems to be restarting the backup from scratch when interrupted, I don’t think I will ever complete my initial backup to B2. Is there a way I can make B2 resume from where it left? How do I properly upload to B2 when using slow internet?

You’ll notice that the bitrate on the first part of the backup is extremely high, probably higher than your upstream bandwidth. This is because what duplicacy is doing at this point is querying the b2 api to see if the chunk is in the bucket. It’s not doing any actual data transfer. This lets duplicacy skip the chunks that were previously uploaded, but it does still take time to call the b2 api and parse the response. At a point later in the backup, you’ll see the bitrate drop to either the upload limit you set or whatever it takes to saturate your upstream. Additionally, if you click the status bar, it’ll open a log in a new tab and you’ll see it skipping chunks.

3 Likes

Do you mean to say Duplicacy is actually picking up from where it left?

How come the status bar went from about 50% (before internet disconnection) then now back steady at 20%? It gives me an impression it started from scratch.

:d: picks off from where it left, and even in the case when it doesn’t, it will still find the chunks already existing in B2 during the backup process and not reupload them again. This is the main usecase for deduplication.

As for the reason why it’s always starting from 0% even though the previous backup might have reached 90%? I guess this is just the simplest way to implement everything.

IMO you shouldn’t worry. :d: will pickup the existing file chunks and the initial backup will finish eventually.

If you want to finish it sooner, you may want to use the Filters/Include exclude patterns to add only 1 folder and let that backup finish. Then add a second folder, then start a new backup. It’s true that in this way the first few backups may take a little longer, but you’re sure that the initial backup is finished sooner and that some folders are already saved!

@ TheBestPessimist

Thanks. So it’s safe to say those initial GBs of data in B2 that were interrupted at any point in time will remain useful and that no data in B2 is “stray” or “useless” data? One concern is I might be paying B2 for storage of data that are no longer being used by :d:.

I think there will still be some chunks unused. The chunking algorithm is only “that good”.

After the initial backup you can rune a prune for any chunks which are left unused with:

duplicacy -d -log prune -exclusive -exhaustive

and use whatever -keep options you need, or none.

2 Likes

I’m using web gui, I guess I only need to add -exclusive -exhaustive in the options of the scheduled prune job? Sorry, I’m newb on this.

Basically If I add it to the current options of my scheduled prune job, it should look like this?

-keep 0:180 -keep 30:90 -keep 7:30 -keep 1:7 -a -exclusive -exhaustive

1 Like

yeah, that looks good enough. Do note that you only need to run the prune with -exclusive -exhaustive once after the initial backups are completed, just to cleanup any unused chunks. Afterwards you should remove those 2 flags as they make the prune much slower.

3 Likes

Not just slower, but risky. -exclusive shouldn’t be used while backups are running. Also, I think he can get away with just -exhaustive - it won’t delete the pruned chunks on the first run, but they’ll go eventually.

2 Likes

Thanks guys :sunglasses:

1 Like