Any thoughts on Backblaze B2 optimum number of threads and chunk size?
I normally get 1-2MB/s from single thread uploading, and since each thread will get a different B2 upload server, I would suggest using a number of threads that can max out your upload bandwidth.
The default 4 MB chunk size should work. You can also try reducing that to, say, 1MB, in order to get better deduplication, then use more threads to overcome the increased overhead associated with smaller chunks.
I was struggling with dismal backup speeds from Northern Europe to B2. The server has 100Mbps Fibre and I can get easily 2-3MByte / sec from it over SSH to Asia, but I got only 20KB/sec to B2. Then I found the -threads option and tried with 5 and suddenly I am getting 3-5MByte/sec to B2. I donât know why the one-thread backup was so slow - it could have been due to some ISP throttling or what. Quite mysterious.
Anyway, @gchen, how about having e.g. 4 threads as a default for B2? I think many users would appreciate faster default setting?
Iâm not him but I disagree. Default behavior shall be consistent across backends.
Have you found the root cause of the issue? 20*5 does not add up to 5000
Hi,
No, I was unable to find out the root cause (and happy with increased performance of the multiple threads).
Your argument rests on the consistency of behavior in technical sense, my argument is based on optimal default user experience. Since from black-box-perspective threads will just increase the performance, does it really make sense to drag down all the platforms by the weakest denominator or to provide good defaults to all the storage types depending on their capabilities? I think this is a question where the devs need to think how to get the balance right.
I have a problem with this, in the sense that if you are running this (backup) on a nas, where you have limited memory but more time to waste, you may want to use only 1 thread.
There is also the factor of latency: those in Europa/Australia will have much more latency and therefore will need more uploading threads comparing with those in the US.
And you could also see it in this way: if you may have a slow DSL connection (again, think US), increasing the # of threads will only waste resources since youâre also killing the bandwidth for any other devices in your household.
All in all, i think that doing a few trials when you start your backups to test how many threads would be best for your upload speed and all the other network conditions is not a very difficult task.
Thanks, these are fair points.
The performance of individual B2 upload servers isnât reliable. If youâre unlucky you can get an overloaded server that causes the upload to be very slow. Fortunately B2 always forces you to frequently change the upload server(s) by sending back http errors (or even closing the connections) so youâll never get stuck in a slow server for long.
With regard to the default number of threads for B2, youâre not the first one to request this change, but as @saspus and @TheBestPessimist pointed out there are downsides of doing this so I would rather not change it now.
Perhaps this âoptimal default user experienceâ could be something for the GUI version? Though perhaps not entirely hidden but as a setting that is on by default?
Also, I guess it would be helpful to provide recommended thread settings for various backends (where applicable) here on the forum. If anyone has something to contribute regarding this, please start a topic in #how-to!
As a wild suggestion - for the CLI version, could not each storageâs thread count (and, come to think of it, limit-rate) be established with the set command, so it doesnât have to be specified every time?
This seems like a great idea at first. I even wanted to suggest to do that for any command line modifier - to avoid specifying the same command line parameter over and over; in a way creating your own default behaviors.
But then thinking about it a bit more it starts to seem less and less appealing. We already have a way to accomplish that via shell scripts. It would make sense to only implement the basic minimum required set of features and let the users go wild in the shell or GUI, the Unix way.
saspus, your argument can be used to reject almost any feature since you can always point a user to python libraries she can use to âscriptâ around the problem. But in reality that scripting around is not very practical and therefore following your guideline does not make it a right decision as such. If stopping feature creep would be the primary goal of duplicacy development, why is there even the set command if we can âlet users to go wild in the shellâ? Feature decisions are always about the balance of pros and cons.
@Droolioâs proposal is very sensible to me as end-user looking for practical solution rather than dogmatic implementation guidelines.
When you put it this way, yes, that proposal does make some sense if we think of it not as a âanother way to specify command line arguments to save time typingâ but instead âtreating number of threads as part of storage backend configurationâ. Whether it is right thing to do Iâm still not sure. You might want to alter number of threads depending on your connection (LAN vs LTE), and therefore it should not be part of storage configuration, it shall be function of the environment.
The current implementation of it as a command line argument is therefore the most appropriate.
However I still disagree that we should be adding redundant features just because it seems to simplify one particular use case: this is what front ends are for. We have GUI for that. Command line utility is a backend, and itâs interface shall be clean, logical, and unambiguous, without 20 different ways to accomplish the same things. This adds unnecessary complexity and ambiguity and bloat and increased possibilitity to misconfigure things.
Well, CLI is is the UI for many servers that do not have X server even installed. There are no â20 different ways to accomplish the same thingsâ, but two (2) command line option or settings in preferences files.
The implementation and code base management aspect is for gchen to assess. For an end-user, adding support for threads settings in preferences file would be a nice addition. Or at least has two votes so far.
Itâs one too much.
Well, CLI is is the UI for many servers that do not have X server even installed
You know what I meant. GUI is an example of a frontend. Frontend does not have to be graphical.
Or at least has two votes so far.
Those hearts are not votes for the feature. They mean that the reader thinks that the comment contributed to the discussion. One of these two âvotesâ is mine by the way. If you tap on a number you can see who voted
I tried to politely to tell you that there are two different opinions here. Letâs agree to disagree.
I donât think you meant that gchen should remove the preferences file support completely from duplicacy, or?
To clarify, the set
command doesnât save these options on the storage backend - theyâre for each repository id on the client side. A single repository can have multiple storages so you might want to have a way to remember each environment internally, just as -no-backup
is remembered on our fallible behalf.
With you on the Unix way, I donât think it precludes the possibility to allow defaults to be overridden with a command line switch, especially in exceptional situations as above. And if you continually switch connections, you might simply choose not to âsetâ anything at all.
Since the set
command exists already for similar purposes, feature creep is minimal, and my proposal was strictly for thread counts and bandwidth limits.
But one problem with the idea I can think of already is deciding which thread amount and bw limit to use for the copy command. We have source and destination threads, upstream and downstream values for each. The lesser of the two applicable values? What if one isnât set? Could it make a more intelligent choice? Might seem straightforward, yet not necessarily so. Would Duplicacy benefit from having a different thread count for source and destination? If not, why does copy
already have both -download-limit-rate
and -upload-limit-rate
options?
BTW Iâm not overly attached to this idea⌠I just threw it out there as a suitable enough improvement - a compromise to having hard-coded defaults for storage backends. I definitely donât suggest the init command pre-populate these settings based on recommended values. Thatâs definitely up to the user to define.