Skip to main content

Backing up Gitea to Backblaze

Gitea is a self-hosted git server. Backblaze is a cloud storage provider

This page has brief instructions for backing up git repositories hosted in gitea to Backblaze b2 cloud storage

Assuming we have a gitea server hosted in a docker container as described here we want to back up the contents of the server.

There are multiple ways of backing up docker hosted gitea, including:

  • back up the docker volume from the host, outside of the container
  • run the gitea backup command inside the container then copy the data out of the container
  • back up individual repositories from outside the container

Here I focus on backing up individual repositories because backing up the whole docker volume or using gitea dump to back up the whole git server (1) makes it difficult to restore one repository and (2) backs up all the git-lfs data making each backup file very large. If I want to restore a small repository from Backblaze backup servers I don't want to have to download some massive file containing all my git data to do it.

Backing up a git repository to file

The gitea dump command does not support specifying which repository to back up, it just backs up everything, so we need to clone the repository to another directory outside the docker container and then back up that directory.

This kind of backup does not save the gitea database which contains stuff not stored by git, such as issues.

We do these steps:

  • clone the git repository
  • then fetch its lfs data
  • then tar everything up to one file
git clone --mirror [repo-name] [temp-dir-name] 
cd [temp-dir-name]
git lfs fetch --all
tar -czf [backup-file-name] .

A bash shell which takes the repository name as a parameter looks like:

#!/bin/bash

if [ "$#" -ne 1 ]; then
echo "Usage: $0 repo"
exit 1
fi

# check we in correct dir
if [[ ! -f "$PWD/docker-compose.yml" ]]; then
echo "You must run this inside a folder containing docker-compose.yml"
exit 1
fi

# stop on errors
set -e

REPO=$1
URL=ssh://git@tower:2222/jfarrow/$REPO.git
HOST_BACKUP_DIR=/home/jfarrow/backblaze/backups
TEMP_REPO_DIR=/tmp/$REPO
mkdir -p $TEMP_REPO_DIR
rm -rf $TEMP_REPO_DIR/*

TAR_FILENAME="gitea_backup_$REPO.tar"
HOST_BACKUP_FILE="$HOST_BACKUP_DIR/$TAR_FILENAME"

# use git clone to copy the non-lfs files
git clone --mirror $URL $TEMP_REPO_DIR
pushd $TEMP_REPO_DIR
# fetch lfs content
git lfs fetch --all
tar -cf $HOST_BACKUP_FILE .
popd

source .venv/bin/activate
python upload.py $HOST_BACKUP_FILE
deactivate

Uploading the backup file to Backblaze

In the previous stage we created a file called something like /home/jfarrow/backblaze/backups/gitea_backup_test.zip where the "test" part of the file is the repository name.

Now we can use the Backblaze python API to upload that file.

Making a python virtual environment

These commands make a python virtual environment containing the Backblaze API.

python -m venv .venv
source .venv/bin/activate
pip install b2sdk
pip install python-dotenv

Using python-dotenv

This allows you to store values such as the Backblaze application id and application API key in a file called ".env", like this:

APPLICATION_KEY_ID=0077777777776100000000001
APPLICATION_API_KEY=K77777777777777777777777777777I

The python code loads these values into the environment and they can be accessed using os.getenv() calls.

The point in doing this is that you add the .env file to your .gitignore file, so that the API keys are never stored in git.

Python code to upload the file

If you have the name of an existing Backblaze bucket and the application key id and API key, you can upload a file using this code:

import os
import os.path
import b2sdk.v2 as b2
import sys
import time
import argparse
import datetime
from zoneinfo import ZoneInfo
from dotenv import load_dotenv
from pathlib import Path

parser=argparse.ArgumentParser()
parser.add_argument("file", type=str)
args = parser.parse_args()

local_file = args.file

if not os.path.exists(local_file):
print(f"file {local_file} does not exist")
sys.exit()

load_dotenv()

info = b2.InMemoryAccountInfo()
b2_api = b2.B2Api(info)

application_key_id = os.getenv("APPLICATION_KEY_ID")
application_key = os.getenv("APPLICATION_API_KEY")

tz = ZoneInfo("Pacific/Auckland")
now = datetime.datetime.now(tz)
file_name = Path(local_file).stem + now.strftime("_%Y%m%d-%H%M%S") + Path(local_file).suffix

print(f"sending {file_name}")

metadata = { }

b2_api.authorize_account("production", application_key_id, application_key)

bucket = b2_api.get_bucket_by_name( INSERT_YOUR_BUCKET_NAME )

uploaded_file = bucket.upload_local_file( file_name=file_name, local_file=local_file, file_infos=metadata )

This code uploads the file and renames it with the date and time so gitea_backup_test.zip becomes something like gitea_backup_test_20250828-095421.zip

Usage

This code to upload the file is at the end of the bash script above:

source .venv/bin/activate
python upload.py $HOST_BACKUP_FILE
deactivate

Backing up using gitea dump

The gitea dump command executes inside the gitea docker container. It backs up all of the git data and the gitea database. It also backs up all of the git-lfs files for all repositories, so it can create a very big backup file.

We are backing up to file inside the docker container and then copying that file to some directory on the host from where it can be copied to off-site backup.

The steps to create a back up file are:

# for the container called 'testgitea'
# delete existing backup file in the container
docker exec -u git -i testgitea bash -c "rm -f /tmp/backup.zip"
# do the back up
docker exec -u git -i testgitea bash -c "/app/gitea/gitea dump -c /data/gitea/conf/app.ini --skip-log --file /tmp/backup.zip"
-- copy the backup file from the container to the host
docker cp testgitea:/tmp/backup.zip ~/backblaze/backups/backup.zip

Restoring

There is no one command for restoring a backup, see here, so we have to manually unzip the backup file and move directories to the correct places.

Note that the instructions at https://docs.gitea.com/next/administration/backup-and-restore are not correct, at least for my configuration. For example:

restore the repositories itself

mv repos/* /data/git/gitea-repositories/

does not work, the repository directory is called /data/git/repositories, not /data/git/gitea-repositories

Given the file ~/backblaze/backups/backup.zip we can restore this using these commands:

-- assuming we get the backup file from offline storage
-- copy the host file into the container
-- copy into /data not /tmp so it survives container restart
docker cp ~/backblaze/backups/backup.zip testgitea:/tmp/backup.zip

-- to list contents of backup.zip
unzip -l backup.zip | wc -l
> 37492

# to fix "unzip: short read" error
# login to container as root and install a better version of unzip
docker exec -it --user=root testgitea /bin/sh
apk add unzip
# to fix "error: invalid zip file with overlapped components (possible zip bomb)"
# use export UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE

# unzip
cd /tmp
export UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE; unzip backup.zip

# restore in this order
# lfs files are backed up to data/lfs but restore to /data/git/lfs
# repos are backed up to data/repos/jfarrow but restored
# to /data/git/repositories/jfarrow
# then
# all other files in data are restore to /data/gitea

rm -rf /data/git/lfs
mv /tmp/data/lfs/. /data/git/lfs/*
rm -rf /tmp/data/lfs

rm -rf /data/git/repositories
mkdir /data/git/repositories
mv /tmp/repos/jfarrow /data/git/repositories/jfarrow
rm -rf /tmp/repos

cp -r /tmp/data/. /data/gitea/

# adjust file permissions
chown -R git:git /data
# Regenerate Git Hooks
/usr/local/bin/gitea -c '/data/gitea/conf/app.ini' admin regenerate hooks
# exit the bash shell
^D

# restart the container
docker compose down
docker compose up -d

References

My Unreal Engine VCS setup - Gitea + Git + LFS + Locking
Backblaze Python Examples
How to upload files to Backblaze B2 using Python