Reducing Backups Size

I have various backups from the past few years. One repository’s source data is 292GB on disk. I don’t really need as much history as I had for this repository and pruned some older revisions to reduce the size of the data on the storage. However, the data is still showing much larger: 420,401M.

 snap | rev |                          | files |    bytes | chunks |    bytes |  uniq |    bytes |   new |    bytes |
 music | 588 | @ 2024-01-03 03:09       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 | 61101 | 420,401M |
 music | 595 | @ 2024-01-10 03:00       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | 602 | @ 2024-01-17 03:00       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | 608 | @ 2024-01-23 03:01       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | 609 | @ 2024-01-24 03:00       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | 610 | @ 2024-01-25 03:00       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | 611 | @ 2024-01-26 03:00       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | 612 | @ 2024-01-27 03:00       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | 613 | @ 2024-01-28 03:00       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | 614 | @ 2024-01-29 03:00       | 10160 | 299,977M |  61101 | 420,401M |     0 |        0 |     0 |        0 |
 music | all |                          |       |          |  61101 | 420,401M | 59785 | 414,079M |       |          |

It looks to me like all revisions are identical in this situation, so why are the 2nd and 3rd bytes columns reporting so much higher?

Thanks

Did you add Erasure Coding to the storage?

One of the recommended configurations is 5:2 - 5 data and 2 parity shards - so, 300M / 5 * (5+2) = 420M.

Your storage is slightly compressed, though at ~414M for all snapshots.

2 Likes

Aha! That’s definitely it. I had forgotten about that.

Thanks!

Reading a little more about erasure coding and it seems like I have it enabled for no really good reason, since my storage is in Backblaze B2. Is it possible to remove the erasure coding from an existing storage?

You would need to re-encode all chunks.

I.e. create copy compatible storage and copy snapshots there.

In your case however I would just start a new backup to a new bucket/prefix, and then delete the old.

Music is immutable, no reason to waste resources copying the storage.

I.e. create copy compatible storage and copy snapshots there.

Wouldn’t that just copy the encoded chunks with the erasure coding info or am I misunderstanding how copy works?

In your case however I would just start a new backup to a new bucket/prefix, and then delete the old.

Music is immutable, no reason to waste resources copying the storage.

This was just an example. I have a bunch of snapshots that do have a few years worth of history I’d rather not lose.

duplicacy copy definitely decrypts the chunks and re-encrypts them. For example, you may create copy compatible storage with different encryption password — it would not have worked otherwise. I’d be very surprised if erasure coding would not be treated the same way — it’s just another chunk storage parameter.

Edit: confirmation: Erasure Coding and Copy Command - #10 by gchen

To save money on b2 egress you can connect to b2 via cloudflare, Duplicacy supports that usecase by allowing to provide alternative download url.

Ok, then it sounds like I can just do that to reduce the size of my overall storage.

For B2, I thought they changed it recently to make egress free for a certain amount or something. I might be mis-remembering this though.

Actually, thinking more about it, I have a local copy of the backups that is identical to the B2 version. My backup plan currently is backup to local and then copy to B2.

I would build and copy to the new storage locally first and if I do that I think it’s probably simpler to just copy to a new bucket in B2 and then delete the old one. Should be no egress required in that situation.

Why not keep local storage erasure-coded? Unless you have a checksumming filesystem like zfs or btrfs.

I would create new storage on B2, without EC, as copy-compatible to your local storage. Then copy from the local storage to that new storage.

Oh yes, you are right, they jacked up the storage price and allowed to download 3x stored amount for free.

Why not keep local storage erasure-coded? Unless you have a checksumming filesystem like zfs or btrfs.

I would create new storage on B2, without EC, as copy-compatible to your local storage. Then copy from the local storage to that new storage.

I was just thinking this as well. I was thinking both had to ditch erasure coding because I thought it would copy the chunks WITH the encoded info but now that I understand it doesn’t, I should be good to just recopy the local offsite and remove the old bucket.