Prune retention policy for Amazon Glacier

delebru · 27 August 2018 03:28

Hi, I just started trialing Duplicacy and must say that the prune manual read a big ambiguous for me too until I found -keep n:m -> “Keep 1 snapshot every n days for snapshots older than m days.”.

Can someone please tell me if I’m right with the following command and expected results?

duplicacy prune -keep 0:364 -keep 24:182 -keep 7:91 -keep 1:1

Keep 1 snapshot per day for snapshots older than 1 day.
Keep 1 snapshot every 7 days for snapshots older than 91 days.
Keep 1 snapshot every 24 days for snapshots older than 182 days.
Keep no snapshots older than 364 days.

And if snapshots are done daily I could remove “-keep 1:1”?

My reasoning behind this is that I’m planning to use Amazon Glacier and they charge for a minimum of 90 days of storage: “archives stored in Glacier have a minimum 90 days of storage, and archives deleted before 90 days incur a pro-rated charge equal to the storage charge for the remaining days”. So it wouldn’t make sense to delete snapshots before the 90 day period…

TheBestPessimist · 27 August 2018 04:55

Moved this post to a new topic. Hijackin’ an unrelated topic ain’t cool.

Yep, what you have there is correct.

I have a few suggestions:

you could use -keep 7:92 instead of 7:91. It’s just a larger window, but i think it’s safer like this
you should use prune -all to prune all the repositories at the same time, and run prune from only 1 machine, not all. This helps with speed, and with download bandwidth used. (explained in: Prune command details, section Only one device should run prune)

delebru · 27 August 2018 05:06

Awesome, thanks!!! Didn’t mean to hijack someone else’s topic. Was just trying to avoid the noobish mistake of making a new topic for an existing debate

saspus · 28 August 2018 05:58

I will hijack this topic now.

I thought Glacier is very difficult to support due to thawing requirements. For example, to download even a config file one would need to defrost that bucket (or just file?) which can take several hours. This makes Glacier unsuitable for backup purposes (only archiving, and only if you design clever strategy of moving chunk files only to glacier leaving snapshot and config files on a warmer storage. It can be more or less easily achieved with Azure with automatic lifecycle management rules; not sure if the same is possible with Glacier.

Regardless, I thought duplicacy, like all other known to me backup solutions, does not and will not support Glacier. Is that not correct?

And the second point, glacier is more expensive than B2 or Wasabi. Egress costs will bankrupt you. It is suitable for archiving, not for backup. Any reason why are you trying to use storage designed for Archiving with a Backup solution instead of aforementioned cheaper alternatives?

delebru · 28 August 2018 06:26

Well, we already have lots of services with AWS and it just makes it simple to keep our offsite backups with them too. Considering Glacier’s prices are quite low and our data is not really that much we won’t be saving enough to consider having another service provider.

To “upload to glacier” you actually have to upload to an s3 bucket and lifecycle rules transfer all/certain files to glacier. So if you have any recommendations on how to customize the lifecycle rules for Duplicacy to work with glacier please let me know

In our case, we do a local backup to a storage server through SFTP, then the storage server syncs the backups with s3 (where lifecycle rules transfer all contents to glacier). Now that I believe we are going to definitely switch to Duplicacy for our Nextcloud server, I would like to have Duplicacy handling both storages instead of the backup server syncing to s3. Would that be possible?

gchen · 28 August 2018 19:22

@saspus is correct. For the backup command Duplicacy needs to download the config and snapshot files as well as chunks that compose the latest snapshot (if these chunks can’t be found in the local cache), and this is incompatible with the Glacier API. In addition, there doesn’t seem to be a simple lifecycle rule that can keep these files for instant access.

delebru · 28 August 2018 21:03

That makes sense then, thanks! I’ll have to stay with Duplicacy doing only onsite backups then have our backup server to sync the backed up files with glacier.

If we ever need to retrieve the files from glacier I could add a storage on the Duplicacy client and point it to s3. Or is there anything that would stop that from working?

saspus · 29 August 2018 02:42

Yes, you can do that. Or you can add Glacier as bit-identical storage to your repository, and copy the content via and then use lifecycle rules to move that to Glacier. That assuming copy does not need to read the target repository, which I’m not 100% sure of.

Yes, +thawing data 12 hours in advance.

Yes, time, manual labor and money

To access data in the archive you need to thaw it first. Here is time cost right away. See “Data Retrievals”.

Because you don’t know which chunks are used in the revision you are restoring you would likely end up needing to thaw entire archive. This will cost quite a bit. That cost is on top of egress cost when you actually download data. API calls are additional.

Alternatively you could only unfreeze the snapshots file and manually identify chunks they are using and unfreeze those, if Glacier supports that level of granularity, but this involves even more efforts, understanding of the duplicacy storage format and easily avoidable unnecessary labor.

In other words, you are effectively taking up upon yourself manual one-shot implementation of Glacier support for Duplicacy, something that even developers decided to be not worth the effort.

And then there is the question of what all that headache buy you? One less account to maintain at different provider and perhaps a few cents less a month per TB in storage costs. That savings would be immediately eaten away by minus 1000% by Glacier thawing and S3 egress and api costs if you even need to restore anything. Amazon Glacier really does not want you to ever touch that data.

Therefore attempting to use Glacier for backup is more laborious, less reliable (nobody attempted that and there is unknown number of weird issues that await) and more expensive compared to alternatives (some of which have 0 egress cost and comparable monthly storage cost and instant data availability)

Not trying to convince you to go either way, just wanted to point out these non-trivial hurdles associated with unsupported configuration of using Glacier as backup destination.

delebru · 29 August 2018 05:15

saspus, yes, I do agree that Glacier is far from ideal considering all the previous points but in our case is mostly a 3rd alternative in case the office gets hit by a UFO or something goes terribly wrong. In which case the time to thaw the entire data will most likely not be a real issue.

Our Nextcloud data is redundant in 2 servers with automated ZFS syncs every 15 minutes. Backed up daily to a local storage, then that backup is archived on Glacier. If we didn’t have this level of local redundancy I would most likely use B2 for the backups.

saspus · 29 August 2018 05:30

but in our case is mostly a 3rd alternative

Ah, in this case it should be perfectly fine. After thawing it should be accessible via S3 normally, as far as I understand; I haven’t used Glacier enough to be sure. In any case you would be able to retrieve that data locally and then restore from that local storage. Duplicacy store does not care about the transport (i.e you can backup to S2, then download to local drive and restore from there).

Our Nextcloud data is redundant in 2 servers with automated ZFS syncs every 15 minutes. Backed up daily to a local storage, then that backup is archived on Glacier.

I would just feel safer with at least one offsite backup other than Glacier, so that power surge does not take down entire thing (unless your second nexcloud server is offsite, in which case it’s awesome).

delebru · 29 August 2018 06:02

I would just feel safer with at least one offsite backup other than Glacier, so that power surge does not take down entire thing (unless your second nexcloud server is offsite, in which case it’s awesome).

This is a great point and actually the intention is to have the 2nd server at our ISPs datacenter so backups don’t even have to go through the internet. That’s another point why we like AWS, we can store our data on Sydney which is very close to New Zealand and we don’t have to rely on a very crowded pacific cable if we ever need to restore those backups.

I hope you don’t jinx our fairly stable power

towerbr · 29 August 2018 13:54

Just reinforcing what was said above, Glacier is not a backup solution, but rather an archiving solution …

https://www.google.com.br/search?q=backup+vs+archive