Backing up Gitea to Backblaze
Gitea is a self-hosted git server. Backblaze is a cloud storage provider
This page has brief instructions for backing up git repositories hosted in gitea to Backblaze b2 cloud storage
Assuming we have a gitea server hosted in a docker container as described here we want to back up the contents of the server.
There are multiple ways of backing up docker hosted gitea, including:
- back up the docker volume from the host, outside of the container
- run the
gitea backup
command inside the container then copy the data out of the container - back up individual repositories from outside the container
Here I focus on backing up individual repositories
because backing up the whole docker volume or using gitea dump
to back up the whole git server (1) makes it difficult to restore one repository
and (2) backs up all the git-lfs data making each backup file very large. If
I want to restore a small repository from Backblaze backup servers I don't
want to have to download some massive file containing all my git data to do it.
Backing up a git repository to file
The gitea dump
command does not support
specifying which repository to back up, it just backs up everything, so we
need to clone the repository to another directory outside
the docker container and then back up that directory.
This kind of backup does not save the gitea database which contains stuff not stored by git, such as issues.
We do these steps:
- clone the git repository
- then fetch its lfs data
- then tar everything up to one file
git clone --mirror [repo-name] [temp-dir-name]
cd [temp-dir-name]
git lfs fetch --all
tar -czf [backup-file-name] .
A bash shell which takes the repository name as a parameter looks like:
#!/bin/bash
if [ "$#" -ne 1 ]; then
echo "Usage: $0 repo"
exit 1
fi
# check we in correct dir
if [[ ! -f "$PWD/docker-compose.yml" ]]; then
echo "You must run this inside a folder containing docker-compose.yml"
exit 1
fi
# stop on errors
set -e
REPO=$1
URL=ssh://git@tower:2222/jfarrow/$REPO.git
HOST_BACKUP_DIR=/home/jfarrow/backblaze/backups
TEMP_REPO_DIR=/tmp/$REPO
mkdir -p $TEMP_REPO_DIR
rm -rf $TEMP_REPO_DIR/*
TAR_FILENAME="gitea_backup_$REPO.tar"
HOST_BACKUP_FILE="$HOST_BACKUP_DIR/$TAR_FILENAME"
# use git clone to copy the non-lfs files
git clone --mirror $URL $TEMP_REPO_DIR
pushd $TEMP_REPO_DIR
# fetch lfs content
git lfs fetch --all
tar -cf $HOST_BACKUP_FILE .
popd
source .venv/bin/activate
python upload.py $HOST_BACKUP_FILE
deactivate
Uploading the backup file to Backblaze
In the previous stage we created a file called something like /home/jfarrow/backblaze/backups/gitea_backup_test.zip where the "test" part of the file is the repository name.
Now we can use the Backblaze python API to upload that file.
Making a python virtual environment
These commands make a python virtual environment containing the Backblaze API.
python -m venv .venv
source .venv/bin/activate
pip install b2sdk
pip install python-dotenv
Using python-dotenv
This allows you to store values such as the Backblaze application id and application API key in a file called ".env", like this:
APPLICATION_KEY_ID=0077777777776100000000001
APPLICATION_API_KEY=K77777777777777777777777777777I
The python code loads these values into the environment
and they can be accessed using os.getenv()
calls.
The point in doing this is that you add the .env file to your .gitignore file, so that the API keys are never stored in git.
Python code to upload the file
If you have the name of an existing Backblaze bucket and the application key id and API key, you can upload a file using this code:
import os
import os.path
import b2sdk.v2 as b2
import sys
import time
import argparse
import datetime
from zoneinfo import ZoneInfo
from dotenv import load_dotenv
from pathlib import Path
parser=argparse.ArgumentParser()
parser.add_argument("file", type=str)
args = parser.parse_args()
local_file = args.file
if not os.path.exists(local_file):
print(f"file {local_file} does not exist")
sys.exit()
load_dotenv()
info = b2.InMemoryAccountInfo()
b2_api = b2.B2Api(info)
application_key_id = os.getenv("APPLICATION_KEY_ID")
application_key = os.getenv("APPLICATION_API_KEY")
tz = ZoneInfo("Pacific/Auckland")
now = datetime.datetime.now(tz)
file_name = Path(local_file).stem + now.strftime("_%Y%m%d-%H%M%S") + Path(local_file).suffix
print(f"sending {file_name}")
metadata = { }
b2_api.authorize_account("production", application_key_id, application_key)
bucket = b2_api.get_bucket_by_name( INSERT_YOUR_BUCKET_NAME )
uploaded_file = bucket.upload_local_file( file_name=file_name, local_file=local_file, file_infos=metadata )
This code uploads the file and renames it with the date and time so gitea_backup_test.zip becomes something like gitea_backup_test_20250828-095421.zip
Usage
This code to upload the file is at the end of the bash script above:
source .venv/bin/activate
python upload.py $HOST_BACKUP_FILE
deactivate
Backing up using gitea dump
The gitea dump
command executes inside the gitea docker
container. It backs up all of the git data and the gitea
database. It also backs up all of the git-lfs files
for all repositories, so it can create a very big backup file.
We are backing up to file inside the docker container and then copying that file to some directory on the host from where it can be copied to off-site backup.
The steps to create a back up file are:
# for the container called 'testgitea'
# delete existing backup file in the container
docker exec -u git -i testgitea bash -c "rm -f /tmp/backup.zip"
# do the back up
docker exec -u git -i testgitea bash -c "/app/gitea/gitea dump -c /data/gitea/conf/app.ini --skip-log --file /tmp/backup.zip"
-- copy the backup file from the container to the host
docker cp testgitea:/tmp/backup.zip ~/backblaze/backups/backup.zip
Restoring
There is no one command for restoring a backup, see here, so we have to manually unzip the backup file and move directories to the correct places.
Note that the instructions at https://docs.gitea.com/next/administration/backup-and-restore are not correct, at least for my configuration. For example:
restore the repositories itself
mv repos/* /data/git/gitea-repositories/
does not work, the repository directory is called /data/git/repositories, not /data/git/gitea-repositories
Given the file ~/backblaze/backups/backup.zip we can restore this using these commands:
-- assuming we get the backup file from offline storage
-- copy the host file into the container
-- copy into /data not /tmp so it survives container restart
docker cp ~/backblaze/backups/backup.zip testgitea:/tmp/backup.zip
-- to list contents of backup.zip
unzip -l backup.zip | wc -l
> 37492
# to fix "unzip: short read" error
# login to container as root and install a better version of unzip
docker exec -it --user=root testgitea /bin/sh
apk add unzip
# to fix "error: invalid zip file with overlapped components (possible zip bomb)"
# use export UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE
# unzip
cd /tmp
export UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE; unzip backup.zip
# restore in this order
# lfs files are backed up to data/lfs but restore to /data/git/lfs
# repos are backed up to data/repos/jfarrow but restored
# to /data/git/repositories/jfarrow
# then
# all other files in data are restore to /data/gitea
rm -rf /data/git/lfs
mv /tmp/data/lfs/. /data/git/lfs/*
rm -rf /tmp/data/lfs
rm -rf /data/git/repositories
mkdir /data/git/repositories
mv /tmp/repos/jfarrow /data/git/repositories/jfarrow
rm -rf /tmp/repos
cp -r /tmp/data/. /data/gitea/
# adjust file permissions
chown -R git:git /data
# Regenerate Git Hooks
/usr/local/bin/gitea -c '/data/gitea/conf/app.ini' admin regenerate hooks
# exit the bash shell
^D
# restart the container
docker compose down
docker compose up -d
References
My Unreal Engine VCS setup - Gitea + Git + LFS + Locking
Backblaze Python Examples
How to upload files to Backblaze B2 using Python