How BorgBackup works and User Guide for BorgBackup

New Link with updated Guide here

Updated Guide here

Moved this topic to there.

Decided to copy over this reply to its own thread so more people can profit from it. Currently a work in progress, as I will add some examples on how to use the program and what commands are good for certain oprations.


So apparently many people are interested in a good backup solution in general, so I will present to you the one I am mainly using and I personally think it is better than all the backup solutions I have seen being talked about in the whole forum, together.

BorgBackup Github

BorgBackup Documentation

What is BorgBackup?

BorgBackup (short: Borg) is a deduplicating backup program. Optionally, it supports compression and authenticated encryption.

The main goal of Borg is to provide an efficient and secure way to backup data. The data deduplication technique used makes Borg suitable for daily backups since only changes are stored. The authenticated encryption technique makes it suitable for backups to not fully trusted targets.

Main Features

Space efficient storage

Deduplication based on content-defined chunking is used to reduce the number of bytes stored: each file is split into a number of variable length chunks and only chunks that have never been seen before are added to the repository.

Speed
  • performance critical code (chunking, compression, encryption) is implemented in C/Cython
  • local caching of files/chunks index data
  • quick detection of unmodified files
Data encryption

All data can be protected using 256-bit AES encryption, data integrity and authenticity is verified using HMAC-SHA256. Data is encrypted clientside.

Compression

All data can be optionally compressed:

  • lz4 (super fast, low compression)
  • zstd (wide range from high speed and low compression to high compression and lower speed)
  • zlib (medium speed and compression)
  • lzma (low speed, high compression)
Off-site backups

Borg can store data on any remote host accessible over SSH. If Borg is installed on the remote host, big performance gains can be achieved compared to using a network filesystem (sshfs, nfs, …).
Backups mountable as filesystems

Free and Open Source Software
  • security and functionality can be audited independently
  • licensed under the BSD (3-clause) license, see License for the complete license

The catch with BorgBackup is that it takes some time to understand the concept and use it appropriately. It definitely takes some time getting used to, especially for people not familiar with advanced backup solutions. But once you get the gist of it, it will definitely be a pleasure and I am sure a majority would stick to this solution for most backup targets.

Here’s how I personally would explain how it works and how I use it

The program has 2 main functions. The first is creating a repository, the second is creating an archive within a repository.

Repository

The repository is basically the world that stores all the content as archives for you. The special thing about this world is that every single thing exists only once. Everyone can have a simulation of all things, e.g. everyone can have an orange and use it in this world, but actually there is literally only a single orange in the whole world.

Archive

An archive is a certain state of the world, defined by the time the state (snapshot, basically) was captured. Let’s say, yesterday your brother had an orange, but your sister and you had none. Today, your brother gives you an orange, so you have one now, but your sister and brother have none. If you create a snapshot, i.e. archive, every day in the evening, then there will be 2 archives. 1 from yesterday where your brother had an orange but your sister and you didn’t, the second one is from today where you have an orange but your brother and sister have none. That means that you have 2 entire archives that are basically standalone (you can delete one of them and the other one will remain as it should) while the space used for both archives equals to the space only 1 archive uses because all that changed in the world is that the orange changed its owner, so no additional data was added, which means that the size of the new archive seems to increase by 0 because the other archive already contains the needed data for the new archive.
Now if your sister gets an orange tomorrow, so that you and your sister have one each now, then the archive from tomorrow will only increase the size of the respository by a couple of bytes (if the size of owning an orange would be a couple of bytes, that is; NOTE: the orange itself does not get duplicated, the only thing that gets saved additionally, is that your sister has the orange now, but the orange exists only once in the whole world, as explained above).

Now comes the even more interesting part. Let’s say, every day there are major changes in the whole world but the only thing you care about is the orange situation at home, for now. Your very first archive already contains the whole world ( i.e. e.g. root directory / ). Now further backups only make a snapshot of the orange situation ( i.e. e.g. /home/*/oranges-directory ). This directory is part of the root directory so all the data is already in the initial backup and doesn’t need to be additionally stored. The only thing that is stored in the newest archive, are the changes in the oranges-directory, effectively ignoring all other changes in other places.

Real world example

I had a repository containing an initial archive of my root directory /. Yesterday, I created an additional archive of my Downloads folder, because I downloaded some .deb files; i.e. /home/user/Downloads. Today, I updated my Debian archive mirror, so I only backed up the /var/debian folder. Tomorrow, I will update the whole root directory / once again.

How much space will all this use? I have 2 separate backups from 2 separate days from the whole root directory / and yet all the space that will be used is pretty much the space that the whole root directory / + the couple of .deb files I downloaded, need. Nothing else. My Debian archive mirror only updated the packages, didn’t add any new ones. My system overall didn’t change much, except I have a couple more .deb files in my Downloads folder. So you can pretty much have 100s of different archives, each saving the state of when the snapshot was taken and at what location, but the size won’t increase, at all, except you actually add entirely new data. Therefore it already takes almost no space to backup everything you need to backup, and yet you can optionally compress everything, too, so the space needed is EVEN SMALLER.

Real world example from my Raspberry Pi system:

The root directory / of my Raspberry Pi 3B takes about 12-14GB of space on my SD card. The actual initial Borg archive of the whole SD card takes up about 4GB in space, after low compression (so you can compress the data even higher if you have a more compute ready machine).
Now, do you have several Raspberry Pis but don’t want to use ~4GB for each Raspberry Pi? No problem, just make archives of all the different Pis in the same repository and if the data on all the Raspberry Pis is more or less the same datawise, then the repository will be maybe ~4.5-5GB in size, despite backing up 4 Raspberry Pis (real world example from my own setup).


I hope I could explain the system well enough to you, since I had to try out BorgBackup several times to finally get the gist of how to use it at best.

P.S.: You can also safely encrypt all your backup data. I personally don’t need that option, but it definitely pumps up the value of this backup solution by a whole lot, as well.


Borg 1.1.9

Installation

Debian

Install Dependencies
sudo apt install python3 python3-dev python3-pip python-virtualenv libssl-dev openssl libacl1-dev libacl1 build-essential libfuse-dev fuse pkg-config
Install BorgBackup through Python pip

Note that this part can take some time, especially on very old machines or SBCs, for example.

sudo pip3 install borgbackup[fuse]

Ubuntu

Install Dependencies
sudo usermod -aG fuse #appendyourusernamehere
sudo apt install python3 python3-dev python3-pip python-virtualenv libssl-dev openssl libacl1-dev libacl1 build-essential libfuse-dev fuse pkg-config
Install BorgBackup through Python pip

Note that this part can take some time, especially on very old machines or SBCs, for example.

sudo pip3 install borgbackup[fuse]

User Guide

Create backup of your development files to another hard drive

  • ~/src is the folder with your development files
  • /dev/sda i.e. / is where your system is installed and /dev/sdb i.e. /mnt/usb-hdd0 is an attached external hard drive
mkdir -p /mnt/usb-hdd0/backups/borg/repos/$HOSTNAME/    # creates directory for backed up files
cd /mnt/usb-hdd0/backups/borg/repos/                    # changes to this directory
borg --verbose init --encryption=none $HOSTNAME          # initializes a borg repository
cat README                                              # verify the creation was executed successfully
borg --verbose --progress create --stats --comment "Initial backup of my dev files." /mnt/usb-hdd0/backups/borg/repos/$HOSTNAME/::firstbackup /home/$USER/src
# creates the initial archive for your dev files in the above created repository

Now backing up your dev files the next day, again…

borg --verbose --progress create --stats --comment "Backup of my dev files after several commits to my main project." /mnt/usb-hdd0/backups/borg/repos/$HOSTNAME/::secondbackup /home/$USER/src
# creates the second archive for your dev files in the above created repository

Wait, when did I back up my dev files the last time and how much space do they take, exactly?

borg --verbose info --last 1 /mnt/usb-hdd0/backups/borg/repos/$HOSTNAME/
# shows all information about the newest backup in this repository
# e.g. time/date of creation, comment, size, etc.

Wait, what files exactly did I back up in my last backup?

borg --verbose list --short --last 1 /mnt/usb-hdd0/backups/borg/repos/$HOSTNAME/
# prints all file and directory names contained
# within the last archive of this repository

Create a complete backup of your entire OS

In case you didn’t create a repository as in the above example, yet:

# creates directory for backed up files
mkdir -p /mnt/usb-hdd0/backups/borg/repos/$HOSTNAME/
# changes to this directory
cd /mnt/usb-hdd0/backups/borg/repos/$HOSTNAME/    
# initializes a borg repository      
borg --verbose init --encryption=none 
# verify the creation was executed successfully                   
cat README                                             

Now the actual backup of the whole system:

sudo borg --progress --verbose create --comment "Use as root. My first system backup." -e /dev -e /run -e /tmp -e /sys -e /proc -e /mnt -s /mnt/usb-hdd0/backups/borg/repos/$HOSTNAME/::firstsystembackup /
# This creates a backup of the whole system, excluding locations
# that are re-generated at every boot up, as well as, the `/mnt` location
# as we don't want to back up our backups, especially when you
# have a backup drive containing several hundred GB of data.

Note that it is important to not mix up owners when backing up and later restoring backups. If you are unsure that you completely own a location, you should execute the backup as root. When doing a whole system backup, of course you have no choice but to execute the backup as root.

To be continued…
2 Likes

thank you for moving this to a more visible location. i still want to give it a go one of these days and think this will make it easier to find than trying to remember to dig it out of the clonezilla thread :slight_smile:

1 Like

I am also adding examples for use cases I find common and/or useful. So it should be even easier to use, since the documentation is extensive but it’s not easy to puzzle everything together that quickly, so I summarize all the stuff on point.

1 Like

I’ve been using rsync to backup via a simple script I wrote. The compression, de-duplication and encryption are options I wish I had when I set this up. It would save a TON of space! However, if it creates one massive file it would not play nice with CrashPlan, as CrashPlan would be constantly trying to copy that whole file offsite. You see, I have a NAS which holds all my live data. I use rsync to create a copy of that NAS drive nightly to a large internal HDD backup drive. CrashPlan backs up the files on that large HDD offsite. If the compression/encryption worked on an individual file basis, then applied the de-dup before it writes that file to the backup drive, CrashPlan would be happy to copy only new/changed files offsite. Is this possible with BORG?

1 Like

@IrwinElectronics
Thank you for your question.

Data is divided into chunks, that is also what the deduplication is based on. (As far as I know you can even change the chunk size yourself.) Borg creates chunks of parts of data and whenever new data is in part consisting of such chunk, the chunk won’t be written down additionally, as it is already there. The chunks are small in size and there will never be just one huge backup file with Borg.

Another plus with Borg is that it uses chunk indexes to speed up backup processes. While Borg is not needed on the server side to save data, it is very recommended as it speeds up the backup process immensely by using aforementioned chunk indexes. Depending on the type of data and amount of changes before each backup, it can sometimes take maybe a minute or less to backup huge amounts of data, if there are for example little changes.

You could alternatively back up to your primary and secondary drive during each initial backup instead of chaining them. However I assume you want to keep the backing up at night property of your current setup, so you could also use Borg to back up your initial backups at night, which would also save a lot of time in the long run.

I don’t know CrashPlan so I don’t know how it works, but if it, similarly to rsync, copies only modified files (I would go for the safe route and check against checksums, not metadata) then it should be able to copy only what’s necessary properly.

As explained, everything is first divided into individual chunks, then further operations like compression and encryption are applied on these chunks. So size-wise you could assume them to be “files” except none of them will be bigger than the chunk-size you set in the first place.


I found the following, which provides information on backup mirroring:
https://borgbackup.readthedocs.io/en/stable/faq.html?highlight=chunk#can-i-copy-or-synchronize-my-repo-to-another-location

In terms of chunk-size I found this:

This at --chunker-params:
https://borgbackup.readthedocs.io/en/stable/usage/create.html?highlight=chunk

I installed BORG, created the repository and the script, and was ready to begin creating my first backup when I came to another realization. In order to perform a restore, all “chunks” would have to be present. That means if I needed to restore a single “file” from CrashPlan, for example, /path/some_program.conf, I would have to download ALL chunks to extract that SINGLE file. Or change the way CrashPlan backs up. But I believe the way CrashPlan is installed it only back up local data. It needs to be installed at a user level to backup a NAS device. That’s why I have rsync backup the NAS to a local drive. Plus the fact that it gives me a local backup. You’ve got a great idea and some great features, but the chunks wouldn’t meet my needs. I need to have ‘/path/some_app.conf’ encrypted and written only once to /backup_path. From here, CrashPlan would grab it, encrypt it again, and send it for storage offsite. It would be nice to have a GUI too, but not a requirement. My rsync script works to make a local archive of all files. I save them to a VeraCrypt virtual drive, so they’re still encrypted on the local system. However, they are not compressed, nor de-duplicated. That’s not to say I can’t take your idea and run with it. Since I run from a script, that script can run compression and de-dup. That may be a better idea, to run what I need tailored to my setup and exact requirements, than adapting my needs to other software. But Ill keep BORG and the repository in case I find a perfect use for it.

You might need to elaborate on your potential situation. If I needed to restore a file from the repository, I would issue borg extract to a remote location. That’s it. You do not need to download a single chunk for that. Just the file itself would be transferred.

That said, you can mount individual archives, as well as whole repositories, which act like a mounted device. From there you can issue all commands you would be able to use on your normal OS folders. So, for example, if you want to scp a file from ~ to a remote location, you can just as well do the exact same for your mounted archive or repository. Just borg mount it, cd to the target location, scp the single file. Done. No downsides to that.

1 Like