Skip to main content

Automated Backup System

Virtalis Hub comes with an automated back up system that allows you to restore to an earlier snapshot in the event of a disaster. We will install Velero to back up the state of your Kubernetes Cluster and uses a custom built solution which leverages Restic to back up the persistent data imported into Virtalis Hub.

You should consider creating regular backups of your buckets which hold the backed-up data in case of failure. This will bed one through your cloud provider or manually if you host your own bucket.

Alternatively, you can consider using your own backup solution. A good option is the Persistent Volume Snapshot which creates a snapshot of a persistent volume at a point in time. The biggest caveat is that they’re only supported on a number of platforms such as Azure and AWS.

If you opt in for a different solution to the one we provide, you have to be mindful of the fact that not all databases used by Virtalis Hub support live backups. This means that the databases have to be taken offline before backing up.

The following databases in use by Virtalis Hub must be taken offline for the duration of the backup:

  • Minio
  • Neo4j

Set Up the Deployment Shell

Export the variable:

export HUB_NAMESPACE=hub

Navigate to the directory containing Virtalis Hub installation files:

cd /home/root/Hub/k8s

Installation

Creating a StorageLocation

Recommended:

Follow the “Create S3 Bucket” and “Set permissions for Velero” sections from the link below and ensure that you create the following 2 buckets under your s3 bucket:

  • hub-restic
  • hub-velero

GitHub - vmware-tanzu/velero-plugin-for-aws: Plugins to support Velero on AWS

Export the address and port of the bucket you have created:

export S3_BUCKET_ADDRESS=<address>
#i.e S3_BUCKET_ADDRESS=192.168.1.3, S3_BUCKET_ADDRESS=mydomain.com
export S3_BUCKET_PORT=<port>
export S3_BUCKET_PROTOCOL=<http or https>
export S3_BUCKET_REGION=<region>

Not recommended - create an s3 bucket on the same cluster, alongside Virtalis Hub:

Customize the persistence.size if the total size of your data exceeds 256gb and change the storage class HUB_SC if needed:

export HUB_SC=local-path
kubectl create ns hub-backup

#check if pwgen is installed for the next step
command -v pwgen

kubectl create secret generic hub-s3-backup -n hub-backup \
--from-literal='access-key'=$(pwgen 30 1 -s | tr -d '\n') \
--from-literal='secret-key'=$(pwgen 30 1 -s | tr -d '\n')

helm upgrade --install hub-s3-backup bitnami/minio \
-n hub-backup --version 3.6.1 \
--set persistence.storageClass=$HUB_SC \
--set persistence.size=256Gi \
--set mode=standalone \
--set resources.requests.memory='150Mi' \
--set resources.requests.cpu='250m' \
--set resources.limits.memory='500Mi' \
--set resources.limits.cpu='500m' \
--set disableWebUI=true \
--set useCredentialsFile=true \
--set volumePermissions.enabled=true \
--set defaultBuckets="hub-velero hub-restic" \
--set global.minio.existingSecret=hub-s3-backup

cat <<EOF > credentials-velero
[default]
aws_access_key_id=$(kubectl get secret hub-s3-backup \
-n hub-backup -o jsonpath="{.data.access-key}" | base64 --decode)
aws_secret_access_key=$(kubectl get secret hub-s3-backup \
-n hub-backup -o jsonpath="{.data.secret-key}" | base64 --decode)
EOF

Export the address and port of the bucket you have created:

export S3_BUCKET_ADDRESS=hub-s3-backup-minio.hub-backup.svc.cluster.local
export S3_BUCKET_PORT=9000
export S3_BUCKET_PROTOCOL=http
export S3_BUCKET_REGION=local

Set Up Variables

For the duration of this installation, you must navigate to the k8s folder that is downloadable by following the Virtalis Hub Installation Guide.

Make scripts executable:

sudo chmod +x \
trigger-database-restore.sh \
trigger-database-backup.sh \
trigger-database-backup-prune.sh \
install-backup-restore.sh

Export the following variables:

export ACR_REGISTRY_NAME=virtaliscustomer

Export the address of the hub-restic bucket:

export REPO_URL=s3:$S3_BUCKET_PROTOCOL://\
$S3_BUCKET_ADDRESS:$S3_BUCKET_PORT/hub-restic

Optional configuration variables:

export MANAGED_TAG=<custom image tag for Virtalis Hub services>
export DISABLE_CRON<seto to true to install without an automated cronSchedule>

Velero Installation

The following steps will assume you named your Velero bucket “hub-velero”.

Add the VMware helm repository and update:

helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm repo update

Install Velero:

helm install velero vmware-tanzu/velero \
--namespace velero \
--create-namespace \
--set-file credentials.secretContents.cloud\
=./credentials-velero \
--set configuration.provider=aws \
--set configuration.backupStorageLocation.name\
=hub-velero \
--set configuration.backupStorageLocation.bucket\
=hub-velero \
--set configuration.backupStorageLocation.config.region\
=$S3_BUCKET_REGION \
--set configuration.backupStorageLocation.config.s3Url\
=$S3_BUCKET_PROTOCOL://$S3_BUCKET_ADDRESS:$S3_BUCKET_PORT \
--set configuration.backupStorageLocation.config.publicUrl\
=$S3_BUCKET_PROTOCOL://$S3_BUCKET_ADDRESS:$S3_BUCKET_PORT \
--set configuration.backupStorageLocation.config.s3ForcePathStyle\
=true \
--set initContainers[0].name=velero-plugin-for-aws \
--set initContainers[0].image=velero/velero-plugin-for-aws:v1.1.0 \
--set initContainers[0].volumeMounts[0].mountPath=/target \
--set initContainers[0].volumeMounts[0].name=plugins \
--set snapshotsEnabled=false \
--version 2.23.1 \
--set deployRestic=true

Install the velero cli client:

wget https://github.com/vmware-tanzu/velero/releases\
/download/v1.5.3/velero-v1.5.3-linux-amd64.tar.gz
tar -xzvf velero-v1.5.3-linux-amd64.tar.gz
rm -f velero-v1.5.3-linux-amd64.tar.gz
sudo mv $(pwd)/velero-v1.5.3-linux-amd64/velero /usr/bin/
sudo chmod +x /usr/bin/velero

Manually create a single backup to verify that the connection to the aws bucket is working:

velero backup create test-backup-1 \
--storage-location=hub-velero --include-namespaces $HUB_NAMESPACE

Watch the status of the backup until it’s finished, this should show up as complete if everything was set up correctly:

watch -n2 velero backup get

Create a scheduled backup:

velero create schedule cluster-backup --schedule="45 23 * * 6" \
--storage-location=hub-velero --include-namespaces $HUB_NAMESPACE

This schedule will run a backup every Saturday at 23:45PM.

Restic Integration

The custom restic integration uses Kubernetes jobs to mount the data, encrypt it, and send it to a bucket. Kubernetes Custom Resource Definitions are used to store the information about the resticre positories as well as any created backups.

The scheduled data backup runs on every Friday at 23:45PM by default. This can be modified by editing the cronSchedule field in all values.yaml files located in backup-restore/helmCharts/<release_name>/ with the exception of common-lib.

All the performed backups are offline backups,therefore Virtalis Hub will be unavailable for that period as a number of databases have to be taken down.

Create an AWS bucket with the name “hub-restic” by following the same guide from the Velero section.

Replace the keys and create a secret containing the hub-restic bucket credentials:

kubectl create secret generic hub-restic-bucket-creds \
-n "$HUB_NAMESPACE" \
--from-literal='AWS_ACCESS_KEY'='<ACCESS_KEY>' \
--from-literal='AWS_SECRET_KEY'='<SECRET_KEY>'

If you instead opted in to deploy an s3 bucket on the same cluster, run this instead:

kubectl create secret generic hub-restic-bucket-creds \
-n "$HUB_NAMESPACE" \
--from-literal='AWS_ACCESS_KEY'=$(kubectl get secret hub-s3-backup \
-n hub-backup -o jsonpath="{.data.access-key}" | base64 --decode) \
--from-literal='AWS_SECRET_KEY'=$(kubectl get secret hub-s3-backup \
-n hub-backup -o jsonpath="{.data.secret-key}" | base64 --decode)

Export the address of the hub-restic bucket:

export REPO_URL=s3:$S3_BUCKET_PROTOCOL://\
$S3_BUCKET_ADDRESS:$S3_BUCKET_PORT/hub-restic

Run the installation:

./install-backup-restore.sh

Check if all the -init-repository- jobs have completed:

kubectl get pods -n $HUB_NAMESPACE | grep init-repository

Query the list of repositories:

kubectl get repository -n $HUB_NAMESPACE

The output should look something like this with the status of all repositories showing as Initialized:

NAME                    STATUS        SIZE   CREATIONDATE
artifact-binary-store Initialized 0B 2021-03-01T10:21:53Z
artifact-store Initialized 0B 2021-03-01T10:21:57Z
job-db Initialized 0B 2021-03-01T10:21:58Z
keycloak-db Initialized 0B 2021-03-01T10:21:58Z
vrdb-binary-store Initialized 0B 2021-03-01T10:21:58Z
vrdb-store Initialized 0B 2021-03-01T10:22:00Z

Once you are happy to move on, delete the completed job pods:

kubectl delete jobs -n $HUB_NAMESPACE -l app=backup-restore-init-repository

Trigger a manual backup:

./trigger-database-backup.sh

After a while, all the -triggered-backup- jobs should show up as Completed:

kubectl get pods -n "$HUB_NAMESPACE" | grep triggered-backup

Query the list of snapshots:

kubectl get snapshot -n "$HUB_NAMESPACE"

The output should look something like this with the status of all snapshots showing as Completed:

NAME                       STATUS      ID       CREATIONDATE
artifact-binary-store... Completed 62e... 2021...
artifact-store-neo4j-... Completed 6ae... 2021...
job-db-mysql-master-1... Completed 944... 2021...
keycloak-db-mysql-mas... Completed 468... 2021...
vrdb-binary-store-min... Completed 729... 2021...
vrdb-store-neo4j-core... Completed 1c2... 2021...

Once you are happy to move on, delete the completed job pods:

kubectl delete jobs -n $HUB_NAMESPACE -l app=backup-restore-triggered-backup

Triggering a Manual Backup

Consider scheduling system downtime and scaling down the ingress to prevent people from accessing the server during the backup procedure.

Note down the replica count for nginx before scaling it down:

kubectl get deploy -n ingress-nginx 
export NGINX_REPLICAS=<CURRENT_REPLICA_COUNT>

Scale down the nginx ingress service:

kubectl scale deploy --replicas=0 ingress-nginx-ingress-controller \
-n ingress-nginx

Create a cluster resource level backup:

velero backup create cluster-backup-$(date +"%m-%d-%Y") \
--storage-location=hub-velero --include-namespaces $HUB_NAMESPACE

Check the status of the velero backup:

watch -n2 velero backup get

Create a database level backup:

./trigger-database-backup.sh

Check the status of the database backup:

watch -n2 kubectl get snapshot -n "$HUB_NAMESPACE"

Once you are happy to move on, delete the completed job pods:

kubectl delete jobs -n $HUB_NAMESPACE -l app=backup-restore-triggered-backup

Restoring Data

Restoration Plan

Plan your restoration by gathering a list of the snapshot IDs you will be restoring from and export them.

Begin by querying the list of repositories:

kubectl get repo -n "$HUB_NAMESPACE"
NAME STATUS SIZE CREATIONDATE
artifact-binary-store Initialized 12K 2021-07-02T12:03:26Z
artifact-store Initialized 527M 2021-07-02T12:03:29Z
comment-db Initialized 180M 2021-07-02T12:03:37Z
job-db Initialized 181M 2021-07-02T12:03:43Z
keycloak-db Initialized 193M 2021-07-02T12:03:43Z
vrdb-binary-store Initialized 12K 2021-07-02T12:03:46Z
vrdb-store Initialized 527M 2021-07-02T12:02:44Z

Run a dry run of the restore script to gather a list of the variables you can export:

DRY_RUN=true ./trigger-database-restore.sh

Sample output:

Error: ARTIFACT_BINARY_STORE_RESTORE_ID has not been exported. Please run 'kubectl get snapshot -n develop -l repository=artifact-binary-store' to see a list of available snapshots.
Error: ARTIFACT_STORE_RESTORE_ID has not been exported. Please run 'kubectl get snapshot -n develop -l repository=artifact-store'...
...

Query available snapshots or use the commands returned in the output above to query by specific repositories:

kubectl get snapshot -n "$HUB_NAMESPACE"

This should return a list of available snapshots:

NAME                            STATUS      ID       CREATIONDATE
artifact-binary-store-mini... Completed 4a2... 2021-07-0...
artifact-store-neo4j-core-... Completed 41d... 2021-07-0...
comment-db-mysql-master-16... Completed e72... 2021-07-0...
job-db-mysql-master-162522... Completed eb5... 2021-07-0...
keycloak-db-mysql-master-1... Completed 919... 2021-07-0...
vrdb-binary-store-minio-16... Completed cf0... 2021-07-0...
vrdb-store-neo4j-core-1625... Completed 08d... 2021-07-0...
It’s strongly advised to restore all the backed-up data using snapshots from the same day to avoid data corruption.

Note down the replica count for nginx before scaling it down:

kubectl get deploy -n ingress-nginx
export NGINX_REPLICAS=<CURRENT_REPLICA_COUNT>

Scale down the nginx ingress service to prevent people from accessing Virtalis Hub during the restoration process:

kubectl scale deploy --replicas=0 ingress-nginx-ingress-controller \
-n ingress-nginx

Run the Restore Script

./trigger-database-restore.sh

Get list of backups created with velero:

velero backup get

Example:

NAME            STATUS      ERRORS   WARNINGS   CREATED     EXPIRES   STORAGE LOCATION   SELECTOR
test-backup-1 Completed 0 0 2022-0... 29d hub-velero <none>
test-backup-2 Completed 0 0 2022-0... 29d hub-velero <none>

Restore:

velero restore create --from-backup <backup name>

Unset exported restore id’s:

charts=( $(ls backup-restore/helmCharts/) ); \
for chart in "${charts[@]}"; do if [ $chart == "common-lib" ]; \
then continue; fi; id_var="$(echo ${chart^^} | \
sed 's/-/_/g')_RESTORE_ID"; unset ${id_var}; done

After a while, all the -triggered-restore- jobs should show up as Completed:

kubectl get pods -n "$HUB_NAMESPACE" | grep triggered-restore

Once you are happy to move on, delete the completed job pods:

kubectl delete jobs -n $HUB_NAMESPACE \
-l app=backup-restore-triggered-restore

Watch and wait for all pods that are running to be Ready:

watch -n2 kubectl get pods -n "$HUB_NAMESPACE"

Scale back nginx:

kubectl scale deploy --replicas="$NGINX_REPLICAS" \
ingress-nginx-ingress-controller -n ingress-nginx

Pruning Backups

Velero

Get list of backups created with velero:

velero backup get

Example:

NAME            STATUS      ERRORS   WARNINGS   CREATED     EXPIRES   STORAGE LOCATION   SELECTOR
test-backup-1 Completed 0 0 2022-0... 29d hub-velero <none>
test-backup-2 Completed 0 0 2022-0... 29d hub-velero <none>

Delete a single backup:

velero backup delete <name>

Example:

velero backup delete test-backup-1

Restic Get list of Hub backups:

kubectl get snapshot -n $HUB_NAMESPACE

Example:

artifact-binary-store-minio-1642502735    Completed   5daa1580   2022-01-18T10:45:35Z
artifact-binary-store-minio-1642512232 Completed b1fb6a15 2022-01-18T13:23:52Z
artifact-binary-store-minio-1642512501 Completed 764c989e 2022-01-18T13:28:21Z
artifact-store-neo4j-core-1642502736 Completed f8233016 2022-01-18T10:45:36Z
artifact-store-neo4j-core-1642512234 Completed df2606b1 2022-01-18T13:23:54Z
artifact-store-neo4j-core-1642512502 Completed a9d9470b 2022-01-18T13:28:22Z
comment-db-mysql-master-1642502737 Completed da0de6a7 2022-01-18T10:45:37Z
comment-db-mysql-master-1642512237 Completed 37a860e4 2022-01-18T13:23:57Z
comment-db-mysql-master-1642512503 Completed d5e04e5e 2022-01-18T13:28:23Z
job-db-mysql-master-1642502739 Completed 51db2aa5 2022-01-18T10:45:39Z
job-db-mysql-master-1642512235 Completed 40c5fae1 2022-01-18T13:23:55Z
job-db-mysql-master-1642512505 Completed dbc0921e 2022-01-18T13:28:25Z
keycloak-db-mysql-master-1642502742 Completed 70ab969a 2022-01-18T10:45:42Z
keycloak-db-mysql-master-1642512235 Completed 6df99e96 2022-01-18T13:23:55Z
keycloak-db-mysql-master-1642512506 Completed 25c93be7 2022-01-18T13:28:26Z
product-tree-db-mysql-master-1642502740 Completed ba4edb53 2022-01-18T10:45:40Z
product-tree-db-mysql-master-1642512238 Completed 47880e3b 2022-01-18T13:23:58Z
product-tree-db-mysql-master-1642512504 Completed 378dd1e1 2022-01-18T13:28:24Z
vrdb-binary-store-minio-1642502746 Completed c7109c6b 2022-01-18T10:45:46Z
vrdb-binary-store-minio-1642512247 Completed 18f03082 2022-01-18T13:24:07Z
vrdb-binary-store-minio-1642512510 Completed 5c47d40c 2022-01-18T13:28:30Z
vrdb-store-neo4j-core-1642502748 Completed bd692195 2022-01-18T10:45:48Z
vrdb-store-neo4j-core-1642512241 Completed df6cbaf9 2022-01-18T13:24:01Z
vrdb-store-neo4j-core-1642512510 Completed 89ae8fcc 2022-01-18T13:28:30Z

Prune backups based on a restic policy.

Learn more about the different policies here: Removing backup snapshots — restic 0.16.4 documentation

PRUNE_FLAG="<policy>" ./trigger-database-backup-prune.sh

Example:

Keep the latest snapshot and delete everything else:

PRUNE_FLAG="--keep-last 1" ./trigger-database-backup-prune.sh

After:

NAME                                      STATUS      ID         CREATIONDATE
artifact-binary-store-minio-1642512501 Completed 764c989e 2022-01-18T13:28:21Z
artifact-store-neo4j-core-1642512502 Completed a9d9470b 2022-01-18T13:28:22Z
comment-db-mysql-master-1642512503 Completed d5e04e5e 2022-01-18T13:28:23Z
job-db-mysql-master-1642512505 Completed dbc0921e 2022-01-18T13:28:25Z
keycloak-db-mysql-master-1642512506 Completed 25c93be7 2022-01-18T13:28:26Z
product-tree-db-mysql-master-1642512504 Completed 378dd1e1 2022-01-18T13:28:24Z
vrdb-binary-store-minio-1642512510 Completed 5c47d40c 2022-01-18T13:28:30Z
vrdb-store-neo4j-core-1642512510 Completed 89ae8fcc 2022-01-18T13:28:30Z