Automate MongoDB backup and restore using AWS S3, GitHub Actions, and Node.js

6 min readFeb 12, 2023

MongoDB is a popular NoSQL document database that is used to store large amounts of unstructured data. It is critical to back up this data so that it can be easily restored in the event of a failure or other problem.

Automating repetitive processes is always a good idea. Automation can save developers time while reducing the possibility of human error. We will use GitHub Actions to automate the software development workflow. Overall, automating backups and restoring databases allows software development teams to work more efficiently and effectively, allowing them to focus on more complex tasks like design and innovation rather than worrying about your data’s safety and availability.

Introduction

Using Node.js and GitHub Actions, there are several approaches to automating MongoDB backup and restore. We can use the GitHub repository or a cloud provider to store backups. In this article, we’ll show you how to use MongoDB Database Tools, Node.js, AWS S3, and GitHub Actions to back up your MongoDB database automatically.

First, let’s go over some fundamental terms and concepts related to MongoDB, MongoDB Database Tools, Node.js, Child Processes in Node.js, GitHub Actions, and AWS S3.

MongoDB is a NoSQL database that is document-oriented and used for high-volume data storage. MongoDB, unlike traditional relational databases, uses collections and documents to provide high scalability, performance, and flexibility.

MongoDB Database Tools are a set of command-line utilities used to interact with a MongoDB deployment.

Node.js is a JavaScript runtime based on the V8 JavaScript engine in Chrome. It is a cross-platform open-source runtime environment for executing JavaScript code outside of a web browser.

In Node.js, child processes are separate processes that run independently of the main Node.js process. They can be used in Node.js for a variety of tasks, including running multiple independent tasks, performing background tasks, and breaking down a large task into smaller, more manageable parts. In Node.js, the most commonly used methods for creating child processes are exec(), spawn(), and fork():

The exec() method runs a shell command in a child process and returns the output.
The spawn() method starts a new process and returns anChildProcess object that can be used to communicate with it.
The fork() method is a variant of the spawn() method that appears to be intended for use with Node.js scripts. It enables the child process to communicate.

AWS S3 (Simple Storage Service) is a cloud-based object storage service with industry-leading scalability, data availability, security, and performance that can be used to store any type of data and files.

GitHub Actions is a YAML-based tool for automating software development workflow directly in a GitHub repository. It allows developers to focus on writing code by automating tasks like building, testing, and deploying code. GitHub Actions triggers are based on a variety of events, including push events, pull request events, releases, cron job actions, manual events, and so on.

This example shows how to create an AWS S3 bucket for data storage and then run the mongodump and mongorestore commands. We will show you how to connect to the database using a MongoDB connection string.

Development

1. Create AWS S3 bucket

Here you can find a full explanation of how to create S3 bucket.

2. Run mongodump and mongorestore

MongoDB Database Tools must be installed in order to run the mongodump and mongorestorecommands. The installation instructions are provided here.

To dump the database, use the following command (you need to update your uri and dump path).

mongodump --uri ${uri} --gzip --archive=${dumpPath}

To restore your database from the desired path, follow the steps:

mongorestore --uri ${uri} --gzip --archive=${dumpPath}

3. Promisify exec functions

To take advantage of all the benefits of node.js, we will implement promises in our project and use our exec function from child_processesin the promise way of working.

import { exec as execNonPromise } from 'child_process';

export default function exec(command) {
  return new Promise((resolve, reject) => {
    execNonPromise(command, (error, stdout, stderr) => {
      if (error) {
        return reject(error);
      }
      if (stderr) {
        return resolve(stderr);
      }
      return resolve(stdout);
    });
  });
}

4. Function to backup and restore the database

Here we have the example of how to backup the database and push it to the S3 bucket.

import path from 'path';
import fs from 'fs';
import AWS from 'aws-sdk';
import exec from './exec.js';

AWS.config.update({
  accessKeyId: process.env.ACCESS_KEY_ID,
  secretAccessKey: process.env.SECRET_ACCESS_KEY,
});

const uri = process.env.URI;
const backupName = process.env.BACKUP_NAME;
const bucket = process.env.BUCKET;

(async () => {
  const s3 = new AWS.S3();
  const __dirname = path.resolve();

  const dumpPath = path.resolve(__dirname, backupName);

  const command = `mongodump --uri '${uri}' --gzip --archive=${dumpPath}`;

  try {
    await exec(command);

    const readStream = fs.createReadStream(dumpPath);
    const params = {
      Bucket: bucket,
      Key: backupName,
      Body: readStream,
    };

    await s3.putObject(params).promise();

    console.log('Successful backup!');
  } catch (err) {
    console.log(`Backup failed: ${err}`);
  }
})();

Here is an example of how to restore the database from our S3 backup to a new database (URI provided in the GitHub actions):

import path from 'path';
import fs from 'fs';
import AWS from 'aws-sdk';
import exec from './exec.js';

AWS.config.update({
  accessKeyId: process.env.ACCESS_KEY_ID,
  secretAccessKey: process.env.SECRET_ACCESS_KEY,
});

const uri = process.env.NEW_DATABASE_URI || process.env.URI;
const backupName = process.env.BACKUP_NAME;
const bucket = process.env.BUCKET;

const s3download = ({ bucketName, keyName }) => {
  const s3 = new AWS.S3();

  const params = {
    Bucket: bucketName,
    Key: keyName,
  };

  const file = fs.createWriteStream(keyName);

  return new Promise((resolve, reject) => {
    s3.getObject(params)
      .createReadStream()
      .on('end', () => {
        return resolve();
      })
      .on('error', (error) => {
        return reject(error);
      })
      .pipe(file);
  });
};

(async () => {
  const __dirname = path.resolve();

  const dumpPath = path.resolve(__dirname, backupName);
  const command = `mongorestore --uri '${uri}' --gzip --archive=${dumpPath}`;

  try {
    await s3download({
      bucketName: bucket,
      keyName: backupName,
    });

    await exec(command);

    console.log('Restore successful!');
  } catch (err) {
    console.log(`Restore failed: ${err}`);
  }
})();

5. Add GitHub actions

We need to add environment variables in Github.

Finally, in the fifth step, we will add GitHub actions to backup and restore our database. A corn job will be used to trigger the GitHub backup action four times per day.

name: Backup

on:
  schedule:
    - cron: '* */4 * * *'

jobs:
  backup:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [14.x]

    env:
      ACCESS_KEY_ID: ${{ secrets.ACCESS_KEY_ID }}
      SECRET_ACCESS_KEY: ${{ secrets.SECRET_ACCESS_KEY }}
      URI: ${{ secrets.URI }}
      BACKUP_NAME: ${{ vars.BACKUP_NAME }}
      BUCKET: ${{ vars.BUCKET }}

    steps:
      - name: Checkout Code  
        uses: actions/checkout@v2

      - name: Install mongo-tools
        run:  sudo wget https://fastdl.mongodb.org/tools/db/mongodb-database-tools-debian92-x86_64-100.3.1.deb && sudo apt install ./mongodb-database-tools-*.deb

      - name: Install node.js
        run: npm ci

      - name: Run backup
        run: node backup.js

GitHub action to restore the database will be triggered manually.

name: Restore

on:
  workflow_dispatch:
    inputs:
      NEW_DATABASE_URI:
        type: string
        required: false

jobs:
  restore:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [14.x]

    env:
      NEW_DATABASE_URI: ${{ github.event.inputs.NEW_DATABASE_URI }}
      ACCESS_KEY_ID: ${{ secrets.ACCESS_KEY_ID }}
      SECRET_ACCESS_KEY: ${{ secrets.SECRET_ACCESS_KEY }}
      URI: ${{ secrets.URI }}
      BACKUP_NAME: ${{ vars.BACKUP_NAME }}
      BUCKET: ${{ vars.BUCKET }}

    steps:
      - name: Checkout Code  
        uses: actions/checkout@v2

      - name: Install mongo-tools
        run:  sudo wget https://fastdl.mongodb.org/tools/db/mongodb-database-tools-debian92-x86_64-100.3.1.deb && sudo apt install ./mongodb-database-tools-*.deb

      - name: Install node.js
        run: npm ci

      - name: Restore database
        run: node restore.js

The last, we can see our backup in the S3 bucket.

The full code can be found in the next repository. Of course, we can run backup and restore locally on our machine. We need to clone this repo and export environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, URI,BACKUP_NAME,BUCKET , NEW_DATABASE_URI .

Conclusion

It is always beneficial to have access to resilient data with complete integrity. Regular backups can help ensure your security.

Benefits

All backup versions in our S3 buckets are accessible to us.
This backup approach can be applied to other databases (PostgreSQL, MySQL)

Drawbacks

This method can be complicated at times; however, some database services provide automatic backup that is much easier to use.

For future work, we can make a backup for the PostgreSQL database.

References

Automate MongoDB backup and restore using AWS S3, GitHub Actions, and Node.js

Introduction

Development

Conclusion

Written by Nermin Imamovic