I used Crashplan since they started, and recently switched to Duplicacy (after a months-long research and evaluation phase). And before Crashplan, some other popular backup products which I don’t even remember.
Crashplan and the others backed up newest-files-first. It wasn’t configurable, but that’s what I wanted anyway. (Crashplan is supposedly a little more complex than that, also taking file size into account but obviously more weighted towards modified date.)
Duplicacy seems to back up in alphabetical path order. (Or by node ID? Or however the filesystem returns data according to whatever APIs you’re calling?)
It would be fantastic, if the file backup priority was configurable! Even if with simple options, such as only one choice among:
- Date: (oldest|newest) first
- Size: (largest|smallest) first
- Alphabetical by path (asc|desc)
Why? Consider my use-case and requirements for example, which aren’t unique:
- My backup set will take a year or two to complete to B2, assuming I can get it to go faster. (Or longer if not.)
- The farther back in time you go in my dataset, mostly photos, the more ways the files are already backed up in increasingly multiple ways both locally and cloud.
- While I eventually want them all backed up to one place, there is increasily less urgency the older they get.
- Like slight variations used by many pros, I store my photos and videos in folders named YYYY/YYYYMMDD.)
- My newer data is simply “more important” than older data, for various reasons, including taxes for example.
- Most importantly: I can’t risk waiting 12-24 months for my newest stuff to get backed up to the cloud.
So as it is now, I have to resort to various tricks with
mount --bind and filters, to approximate what I need. Which of course, makes the following problem worse.
I realize that with the current chunking algorithm, sorting by newest first would likely cause problems for deduplication, resulting in an unnecessarily higher amount of data to upload and store (and pay for).
But that problem in turn wouldn’t exist - if the problems noted, acknowledged, and objectively measured in the discussion, “Inefficient Storage after Moving or Renaming Files? #334”, were addressed - e.g. with this PR waiting to be merged to master “Bundle or chunk #456”.
That single PR alone would likely save me thousands of dollars in cloud storage bills, over the long run.
Edits Sept 13 2019:
- Clarified reasons for needing newest-first priority.
- Referenced discussion #334 (inefficient deduplication with moved/renamed files).
- Referenced PR #456 (“bundle or chunk” algorithm to solve that problem).