Copy-compatible storage - can I back to to either and copy both ways?

copy

#1

Suppose I create a storage A, and then a second storage B with the -copy flag.

Do I now have to backup to A, and copy to B, or can I instead backup to B and copy to A?

Probably not a common use case, but out of curiosity, can I sometimes backup to one, and sometimes to the other, and occasionally run copies in either direction? (Perhaps I have a storage device at home, and one at work, and want to backup a laptop to whichever is local.)


#2

You should be careful because depending on the periodicity of these backups the revisions in the two storages will not be “aligned”.

You may end up with two storages that have compatible configurations, but in which the backups (revisions) are not aligned, and you will not be able to easily make copies between the two.


#3

I’m not sure I follow. What do you mean by “aligned”?


#4

Let’s say you make the first backup to storage A. This would be the revision 1 (R1) in this storage A.

You continue to work with your files, make some modifications, and then make a second backup, but this time to storage B. This backup will also be numbered as revision 1.

So you now have two “revisions 1”, one in each storage, but they are not equivalent, they represent two different state of the files.

You can no longer copy revision 1 between the storages.

Expand this example to dozens of revisions on every storage and everything is messed up. You can no longer - easily - copy between them.

You will have to have a very good control of the periodicity / schedule of backups to the two storages to avoid conflict problems related to revision numbers.


#5

This applies only when you use the same repository id when backing up to two storages (i.e., the same repository id is used in the init and add command).

If you want to copy between these two storages from time to time you can use different repository ids which should not cause any conflict.


#6

I understood he wants to back up the same repository:


#7

Oh! I was assuming the revision number was a property of the repository, not the repository+storage combination. That is, I figured the revision number increments every time a backup is run on a particular repository, regardless of which storage is used. So, I might have revision #1 on one storage, then revision #2 on another storage, and after running a copy each way, they’d be in sync. Might this be more robust?


#8

Here’s the specific situation I’m in which started me down this path of questions:

I’ve created a repository on B2 and successfully run a large backup to it. There are a couple of dozen snapshots there already. Now I want to switch to running backups locally, to my NAS
device, and then copy to B2.

I could delete everything on B2 and start over, but I’d rather not if I can avoid it. I can also run a single copy “backwards” from B2 to the NAS, which I would also prefer to avoid if necessary.

So, based on our discussion, my understanding is this (am I correct?):

  1. I can create a new local storage on my NAS using the add --copy command.
  2. I can run a backup to the new NAS storage.
  3. This is where I’m a bit fuzzy. I think that, as long as my most recent backup to B2 was recent, most of the same chunks will now exist on both storages, even though I haven’t run a copy yet, but snapshots will only be on one or the other.
  4. If I already have, say, 20 snapshot revisions on B2, I’ll have to run 20 local backups back-to-back to “fast-forward” the revision number on the NAS. After that, all further backups on the NAS can be copied to B2 successfully. Is this correct?

#9

I wouldn’t go down this path, it gets quite a bit messy…

As @gchen eluded to, you can avoid the problem of revision alignment by giving a unique repository id for each storage that you directly back up to.

When you copy from A to B or B to A, it does two things - the referenced chunks are copied across (the main benefit, particularly for de-duplication and incremental), and the snapshot id’s are copied across from each repository id.

Your singlular repository can have different repository id’s for a different storage. So rather than use the same repo id for storage A and B in the configuration, make them different.


#10

I think what you want is:

  1. I can create a new local storage on my NAS using the add --copy command.
  2. Run a copy command from B2 to NAS.
    2.3. I can run a backup to the new NAS storage and copy from NAS to B2.

#11

That’s an interesting idea. However, by definition, a backup tool must store all the information needed for a restore in the storage, not locally in the repository. To implement this, it would be necessary that Duplicacy store all revision information in storage, even if a particular revision is not in that storage.

This would avoid a lot of problems when using the copy command with multiple storages, which I think is a common use case, with many users using one local storage and another in cloud.

@gchen, is this easy to implement? Maybe in the new 2.2.0 version?


#12

That’s what I’m trying to avoid, because it’s comparatively slow and expensive to download data from B2 when all of that data should already be local, since it is after all a backup!

As it happens, given the amount of data I have - about 100 GB - I can do that and have it not be a big deal - today. But I’m about to move to a place where I’ll only have a 200kbps download speed, and so I’d like to take this opportunity to learn the most efficient and scalable way to solve this sort of problem!