[LINUX] NAS backup with php and rsync

It is a php script written on the premise that mirroring and generation management backup are performed with rsync for the following environment where independent HDDs for network sharing and backup are mounted on NAS built with Xubuntu and Samba. I think it can be applied even if the configuration is slightly different.

** Drive for NAS ** Just below the route / data /… For network drives Create a directory called. Mount this /home/nas/ Mounted on /home/nas/data/ Used as a NAS network drive by setting to a shared directory with samba.

** Backup drive ** Just below the route / data /… For mirroring / generation / ... For generation management backup Create a directory called. Mount this /home/nas_backup/ By mounting on /home/nas_backup/data/ The mirroring destination of / home / nas / data /, /home/nas_backup/generation/ Is the generation backup destination of / home / nas_backup / data /.

It may be difficult to grasp the image with letters, but it looks like this in the figure. blockimage.jpg

Note that generation management backups use rsync's --link-dest option, which takes advantage of hard links, so the target drive must be a file system that can handle hard links without problems, so it's full every time a generation management backup It will take a lot of disk consumption and processing time equivalent to backup. In my case I am using ext4.


script

mirroring.php This script uses the NAS shared directory as the backup source and mirrors the backup destination drive using the --delete option of rsync.

mirroring.php


<?php
/**
 *rsync mirroring
 */

//Mirroring source directory
define('SOURCE_DIR', '/home/nas/data/');

//Mirroring destination directory
define('BACKUP_DIR', '/home/nas_backup/data/');

//Other rsync option examples: '--exclude=/temp/ --exclude=/*.bak';
define('OTHER_OPTIONS', '');

/**
 *
 */

set_time_limit(0);
date_default_timezone_set('Asia/Tokyo');

//Directory for saving temporary files
define('TEMP_DIR', (file_exists('/dev/shm/') ? '/dev/shm/.' : '/var/tmp/.'). md5(__DIR__));
if(!file_exists(TEMP_DIR)) {
    mkdir(TEMP_DIR);
    chmod(TEMP_DIR, 0700);
}

$tempFile = TEMP_DIR. '/mirroring.tmp';
$temps = getTmpFile($tempFile);

//Delimiter correction for each directory name
$sourceDir = preg_replace('|/+$|', '/', SOURCE_DIR. '/');
$backupDir = preg_replace('|/+$|', '/', BACKUP_DIR. '/');

//Exit if there is no backup source / backup destination
if(!file_exists($sourceDir) || strpos($backupDir, ':') === false && !file_exists($backupDir)) {
    print "The source '{$sourceDir}' or backup '{$backupDir}' destination directory does not exist.\n";
    exit;
}

//Check the backup source disk usage, and if there is no change from the previous time, finish without doing anything
//However, the block size may not change when renaming or updating a small size.
//If more than 1 hour has passed since the last mirroring, mirroring will be performed regardless of the change in block size.
exec("df {$sourceDir}", $ret);
$usedSize = (preg_split('/\s+/', $ret[1]))[2];
$prevUsedSize = isset($temps['prev_used_size']) ? (time() - filemtime($tempFile) < 3600 ? $temps['prev_used_size'] : 0) : 0;
if($usedSize == $prevUsedSize) exit;

//Lock file name
$lockFilename = TEMP_DIR. '/backup.lock';

//If the lock file exists, it is considered that the process with the same name is running and ends.
if(file_exists($lockFilename)) {
    print "A process with the same name is running.\n";
    exit;
} else {
    //Lock file creation
    if([email protected]_put_contents($lockFilename, 'Process is running.')) {
        print "Could not create `$lockFilename`.\nSet the permissions of the directory `". TEMP_DIR. "` to 0700.\n";
        exit;
    }
    chmod($lockFilename, 0600);
}

//Information update saved in tmp file
//In the case of mirroring, the number of blocks used in the backup source
$temps['prev_used_size'] = $usedSize;
setTmpFile($tempFile, $temps);

$updateDirList = getUpdataDirList($sourceDir);
if(!$updateDirList) {
    $updateDirList[] = $sourceDir;
}

foreach($updateDirList as $dir) {
    $path = str_replace($sourceDir, '', $dir);
    //rsync command
    $command = implode(" ", [
            'rsync -avH',
            '--delete',
            OTHER_OPTIONS,
            '"'. preg_replace('|/+$|', '/', ($sourceDir. $path. '/')). '"',
            '"'. preg_replace('|/+$|', '/', ($backupDir. $path. '/')). '"',
        ]);
    print "$command\n";
    exec($command);
}

//Lock file deletion
unlink($lockFilename);

exit;

/**
 *
 */

//Get tmp file
function getTmpFile($fn) {
    if(file_exists($fn)) {
        $tmp = file_get_contents($fn);
        return(json_decode($tmp, true));
    }
    return [];
}

//tmp file save
function setTmpFile($fn, $temps) {
    if(getTmpFile($fn) != json_encode($temps)) {
        if([email protected]_put_contents($fn, json_encode($temps))) {
            print "Could not create `$fn`.\nSet the permissions of the directory `". TEMP_DIR. "` to 0700.\n";
            exit;
        }
        chmod($fn, 0600);
    }
}

//Get update directory
function getUpdataDirList($sourceDir) {
    $duFile = TEMP_DIR. '/prev_du.txt';
    $prevDirList = duToArray($duFile);

    exec("du {$sourceDir} > {$duFile}");
    chmod($duFile, 0600);
    $dirList = duToArray($duFile);

    $tmpArr = [];
    foreach($dirList as $k => $v) {
        if(isset($prevDirList[$k]) && $prevDirList[$k] != $v) $tmpArr[$k] = $v;
    }
    unset($prevDirList, $dirList);

    $retArr = $tmpArr;
    foreach($tmpArr as $k => $v) {
        foreach($tmpArr as $k_ => $v_) {
            if($k == $k_) continue;
            if(isset($retArr[$k]) && strpos($k_, $k) === 0) unset($retArr[$k]);
        }
    }
    return array_keys($retArr);
}

//Convert the result of the du command to an array
function duToArray($duFile) {
    $retArr = [];
    if(file_exists($duFile)) {
        if($fp = @fopen($duFile, 'r')) {
            while(($l = fgets($fp)) !== false) {
                $l = trim($l);
                if(!$l) continue;
                $l = explode("\t", $l);
                $retArr[$l[1]] = $l[0];
            }
            fclose($fp);
        }
    }
    return $retArr;
}

If the backup source disk capacity has not changed since the last execution, rsync is not performed and it ends, so I think that the load will not become extremely high even if it is executed frequently, but the environment around that Please adjust according to. The capacity check uses the df command, and updates that do not change the block size such as file name changes and small size changes cannot be detected, so if more than 1 hour has passed since the last execution, the backup source disk capacity has not changed. But I try to run rsync.

** Main setting items **

//Mirroring source directory
define('SOURCE_DIR', '/home/nas/data/');

Specify the directory to be the mirroring source.

//Mirroring destination directory
define('BACKUP_DIR', '/home/nas_backup/data/');

Specify the directory to be mirrored. You can also specify the remote including "user name @ host name:" etc. at the beginning.

define('BACKUP_DIR', '[email protected]:/home/username/data/');

When remote is specified, [Public key authentication login] without password (https://akebi.jp/temp/ssh-keygen.html) so that password input waiting does not occur when logging in to remote during automatic execution with cron. ) Must be set appropriately.


generation.php A script that backs up a generation management directory based on the mirrored directory using rsync's --link-dest option. If the backup drive is on a remote location other than the NAS itself, install this script on the remote side as well.

generation.php


<?php
/**
 *rsync generation backup
 */

//Backup source directory
define('SOURCE_DIR', '/home/nas_backup/data/');

//Backup destination directory
define('BACKUP_DIR', '/home/nas_backup/generation/');

//Other rsync option examples: '--exclude=/temp/ --exclude=/*.bak';
define('OTHER_OPTIONS', '');

//Number of backup generations
define('BACKUP_GENERATION', 200);

//Disk space threshold to delete old backups(%)
//If it is 0, the disk capacity is not checked.
define('THRESHOLD', 95);

/**
 *
 */

set_time_limit(0);
date_default_timezone_set('Asia/Tokyo');

//Directory for saving temporary files
define('TEMP_DIR', (file_exists('/dev/shm/') ? '/dev/shm/.' : '/var/tmp/.'). md5(__DIR__));
if(!file_exists(TEMP_DIR)) {
    mkdir(TEMP_DIR);
    chmod(TEMP_DIR, 0700);
}

//Delimiter correction for each directory name
$sourceDir = preg_replace('|/+$|', '/', SOURCE_DIR. '/');
$backupDir = preg_replace('|/+$|', '/', BACKUP_DIR. '/');

//Exit if there is no backup source / backup destination
if(!file_exists($sourceDir) || !file_exists($backupDir)) {
    print "The source '{$sourceDir}' or backup '{$backupDir}' destination directory does not exist.\n";
    exit;
}

$nowDate = date('Y-m-d_Hi');

//Lock file name
$lockFilename = TEMP_DIR. '/backup.lock';

//If the lock file exists, it is considered that the process with the same name is running and waits for up to 2 minutes. If it is not released during that time, it ends.
$time = time();
while(file_exists($lockFilename)) {
    sleep(1);
    if($time + 120 < time()) {
        print "A process with the same name is running.\n";
        exit;
    }
}
//Lock file creation
if([email protected]_put_contents($lockFilename, 'Process is running.')) {
    print "Could not create `$lockFilename`.\nSet the permissions of the directory `". TEMP_DIR. "` to 0700.\n";
    exit;
}
chmod($lockFilename, 0600);

//Get backed up directory name
$backupList = getBackupList($backupDir);

//Thin out old backups
$processed = [];
foreach($backupList as $backupName) {
    if(!preg_match('/^(\d{4})-(\d\d)-(\d\d)_(\d\d)(\d\d)/', $backupName, $m) || isset($processed[$backupName])) continue;
    list($year, $month, $day, $hour, $minute) = array_slice($m, 1);
    $fDate = "$year-$month-$day $hour:$minute";

    //If more than one month has passed, delete the ones other than the last one of the month
    if(time() >= strtotime("$fDate +1 month")) {
        $pickup = [];
        foreach($backupList as $tmp) {
            if(substr($tmp, 0, 7) == "{$year}-{$month}" && substr($tmp, 0, 10) <= "{$year}-{$month}-{$day}") $pickup[] = $tmp;
        }
        rsort($pickup);
        foreach(array_slice($pickup, 1) as $tmp) {
            deleteBackup($backupDir, $tmp, $processed);
        }
    }
    //If more than one day has passed, delete the ones other than the last one of the day
    elseif(time() >= strtotime("$fDate +1 day")) {
        $pickup = [];
        foreach($backupList as $tmp) {
            if(substr($tmp, 0, 10) == "{$year}-{$month}-{$day}" && $tmp <= $backupName) $pickup[] = $tmp;
        }
        rsort($pickup);
        foreach(array_slice($pickup, 1) as $tmp) {
            deleteBackup($backupDir, $tmp, $processed);
        }
    }
}
//Reacquire the backed up directory name
$backupList = getBackupList($backupDir);

//Delete old backups until disk usage drops below the specified percentage
sort($backupList);
while(THRESHOLD && checkPercentage($backupDir) && count($backupList) > 1) {
    $command = "rm -rf {$backupDir}{$backupList[0]}";
    array_shift($backupList);
    print "$command\n";
    exec($command);
}

//If you have an existing generation backup
if(count($backupList)) {
    rsort($backupList);
    //Delete backups that exceed the number of saved generations from the oldest
    if(count($backupList) >= BACKUP_GENERATION) {
        $delNames = array_slice($backupList, BACKUP_GENERATION -1);
        foreach($delNames as $del) {
            $command = "rm -rf {$backupDir}{$del}";
            print "$command\n";
            exec($command);
        }
    }
}

//New backup directory name
$backupName = "{$nowDate}/";

//rsync command
$command = implode(" ", [
        "rsync -avH",
        OTHER_OPTIONS,
        "--link-dest={$sourceDir}",
        $sourceDir,
        sprintf("%s%s", $backupDir, $backupName),
    ]);
print "$command\n";
exec($command);

//Reacquire the backed up directory name
$backupList = getBackupList($backupDir);
//Get only the log by the difference from the backup one generation ago
if(count($backupList) > 1) {
    rsort($backupList);
    $command = "rsync -avHn --delete --exclude=/_rsync.log {$backupDir}{$backupList[0]}/ {$backupDir}{$backupList[1]}/ > {$backupDir}_rsync.log";
    exec($command);
    exec("mv {$backupDir}_rsync.log {$backupDir}{$backupList[0]}");
}

//Lock file deletion
unlink($lockFilename);

exit;

/**
 *
 */

//Get existing backup directory name
function getBackupList($backupDir) {
    $backupList = [];
    if($dir = opendir($backupDir)) {
        while($fn = readdir($dir)) {
            if(preg_match('/^\w{4}-\w{2}-\w{2}_\w{4,6}$/', $fn) && is_dir("{$backupDir}{$fn}")) {
                $backupList[] = $fn;
            }
        }
        closedir($dir);
    }
    return $backupList;
}

//Delete backup
function deleteBackup($backupDir, $str, &$processed) {
    if(isset($processed[$str])) return;
    if(file_exists("{$backupDir}{$str}")) {
        $command = "rm -rf {$backupDir}{$str}";
        print"$command\n";
        exec($command);
        $processed[$str] = 1;
    }
}

//Disk usage check
function checkPercentage($backupDir) {
    exec("df {$backupDir}", $ret);
    if(!isset($ret[1])) return false;
    if(preg_match('/(\d+)\%/', $ret[1], $ret)) {
        if($ret[1] >= THRESHOLD) return true;
    }
    return false;
}

Create a directory named after the execution date and time, and keep a backup at that time in it. By using rsync's --link-dest option, only newly added or changed files are saved as an entity, and other files only add hard links, so disk space consumption and processing time are incremented. The feature is that each backup created is equivalent to a full backup, although it is similar to a backup. Backups older than 1 day are deleted leaving only the final version of the day, backups older than 1 month are deleted leaving the final version of the month, and old backups until the disk usage specified by THRESHOLD is reached. Processing such as deleting from is also performed with this script.

In order to make effective use of hard links, it is common to specify a backup one generation ago for --link-dest, but in this case $ sourceDir itself is part of the already mirrored backup, so here-- It is specified in link-dest. By doing this, you can save space and reduce processing speed at the same time.

** Main setting items **

//Backup source directory
define('SOURCE_DIR', '/home/nas_backup/data/');

Specify the mirroring destination directory in mirroring.php.

//Backup destination directory
define('BACKUP_DIR', '/home/nas_backup/generation/');

Specify the generation management backup save destination. Under this directory YYYY-MM-DD_HHMM A directory is created in the format, and backups of each generation are saved in it. Using rsync's --link-dest option, unchanged files will create hard links rather than entities, so they will not consume more disk space than necessary.

//Number of backup generations
define('BACKUP_GENERATION', 200);

Specify the number of generations you want to save. If the number of generation backups exceeds this value, the oldest backups will be deleted, but due to the thinning process and the deletion process due to the disk capacity, the deletion may be performed before the number specified here is reached.

//Disk space threshold to delete old backups(%)
//If it is 0, the disk capacity is not checked.
define('THRESHOLD', 95);

Check the disk usage (%) of the backup destination with the df command, and if this value is reached, delete the oldest backup in order until it falls below the value. If it is 0, the deletion process will not be performed, but even if there is not enough free space on the backup destination, processing such as suppressing the execution of rsync will not be performed.


crontab configuration example

# rsync mirroring
* * * * * php /Script installation path/mirroring.php &> /dev/null
* * * * * sleep 30; php /Script installation path/mirroring.php &> /dev/null

# rsync generation backup
0 */6 * * * php /Script installation path/generation.php &> /dev/null

In the above example, mirroring is performed every 30 seconds in the first half block, and generation management backup is performed every 6 hours in the second half block.

Recommended Posts

NAS backup with php and rsync
With and without WSGI
With me, cp, and Subprocess
Programming with Python and Tkinter
Encryption and decryption with Python
Working with tkinter and mouse
Super-resolution with SRGAN and ESRGAN
group_by with sqlalchemy and sum
python with pyenv and venv
With me, NER and Flair
Works with Python and R