Feature request: Store metadata chunks, or any chunks containing them in a different folder

wcdbsteve · 11 March 2022 13:50

I’m working with Amazon S3 and having a lot of dificulty getting it to work with Deep Archive. It was working fine, until I missed a few days of backups, now to restart it’s trying to check metadata chunks, but they need to be restored first.

It would be amazing if we could store those chunks in another folder, say “/metadata” instead of “/chunks” so we could set lifecycle rules to only move chunks to deep archive and not metadata to avoid this problem. Deep Archive is nearly 1/4th of the price of the next highest option… this could improve the usability of Duplicacy with AWS in a HUGE way and reduce cost for a lot of users.

I’m even willing to offer a bounty for development to make it happen. I’d write the code myself but I’m just not experienced enough. Right now I’m stuck promoting a huge number of chunks just to sort out which ones belong to metadata so I can restart my backups. This feature would remove all that trouble.

Droolio · 11 March 2022 17:42

Can’t argue with that, have already suggested the same. AFAICS, there wouldn’t be any downside, and would even benefit local pooling software, where such a /metadata directory could be duplicated across drives for extra redundancy. Can’t imagine it’d be terribly difficult to implement either - the main issue would be to maintain backwards compatibility and migrating storages.

gchen · 12 March 2022 04:40

I’ll address this feature after merging the memory optimization PR. My plan is to pass the isMetadata flag (which is already present in that PR) down to the storage backend so each backend can decide what to do when this flag is set when uploading a chunk.

On the other hand I don’t want every backend to move metadata chunks to this special /metadata folder. Maybe the best way is to add a new S3 backend to be used with AWS Deep Archive.

mfeit_duplicacy · 12 March 2022 12:34

[quote=“gchen, post:3, topic:6145”]Maybe the best way is to add a new S3 backend to be used with AWS Deep Archive.
[/quote]

That would be preferable given that the Wasabi backend is a pass-through to most of S3Storage.

wcdbsteve · 12 March 2022 17:29

All of this would be AMAZING! I’m happy to test/help/contribute in any way I can… this would be amazing for both S3 Deep Archive use as well as tiered Minio deployments on bare metal setting for distribution over raid sets, ZFS, shingled drives, etc…

I’ll stay tuned here, but please let me know if there is anything I can do to help!

Droolio · 13 March 2022 02:39

Would be nice if we could somehow force this feature to be enabled for local (sftp and network share too) storages too - looking forward to be able to duplicate metadata on my DrivePool.

wcdbsteve · 23 March 2022 01:46

I’m dying here… I don’t know why this ran perfectly for weeks with everything transitioning to Deep Archive after being 1 day old… but now all of a sudden my backups are failing left and right and checks are failing still trying to download these intermixed chunks.

I don’t want to bail, but if I can’t count on the backups, it may be a problem. I mean, Duplicati is a mess because of their local DB, and if you lose that you’re screwed… but it does work perfectly with Deep Archive which says me 75% on my S3 bill each month.

I even wrote a python utility to bulk restores of objects just to get them back running… which worked fine, but now that the restored replicas are expiring everything is failing again.

gchen · 24 March 2022 16:16

I guess this may have something to do with the cache. By default metadata chunks needed by last backup are always present in the local cache, so if you only run backups in sequence then Duplicacy doesn’t need to download any chunks from the storage. But when you try to run other commands it will break apart.

wcdbsteve · 24 March 2022 16:33

So is the best way to just run all of my backups on each machine one right after the other, then only do checks right after all the backups are run?

saspus · 24 March 2022 20:54

I would not run check with -chunks at all: s3 api already guarantees data consistency: if the file was successfully uploaded — it’s correct on the cloud.

Check without -chunks may still be a good idea to run: to ensure that all the necessary chunks are present (after the placement of metadata to different folder is implemented); mostly to protecr against duplicacy prune scheninigans. If you don’t prune — than that also is of little value.

wcdbsteve · 24 March 2022 21:54

I’m trying to look through the options in thew web GUI and I don’t see a way to run a check without the chunks. I mean, it’s mostly to ensure all the uploads worked and get the current size of the storage right? Is there an option I can put in that I’m missing?

I’m reading about verified_chunks in this post: Check command details, the thing is… I can’t seem to find that in any location inside the docker container.

wcdbsteve · 24 March 2022 22:02

I think I may have just figured it out… I have my dupliacy cache docker folder set to /tmp on the host… so I think after a reboot that’s when everything got cleared out and that’s why it’s stuck checking again…

Sound plausible??

gchen · 25 March 2022 17:57

That is definitely the reason why it failed. Also, when you run the check you can only check the last backup otherwise it will need to download from the storage. Unfortunately there isn’t a convenient way to check only the last revision so you’ll have to parse the output from duplicacy list and find the revision number of the last backup.

dwtp · 17 January 2023 18:05

I saw the above mentioned PR around memory optimization appears to be live. @gchen is this still something on your radar, whether it’s a custom S3 storage backend or modifying the metadata behavior?

Thanks!

gchen · 18 January 2023 15:00

I plan to implement this feature after the new web GUI release is done.