[LINUX] That's it for "dd | gzip"! 20 times faster just by setting "dd | pigz"

If you're a unix user, you're probably done with the dd command for all the "little backup and restore" stuff:

This time it is a backup from "nvme SSD 500GB to SATA HDD" (Back up pre-installed Windows before turning it into a GPU machine)

sudo dd if=/dev/nvme0n1 of=/somedrive/bkp.image

#Stop in the middle and check the speed
347762688 bytes (348 MB, 332 MiB) copied, 21.6687 s, 16.0 MB/s

You know that it would be nice to add a little option.

・ Bs = 16MB It's faster to read all together to some extent. -Nocache Avoiding filling the disk cache

sudo dd if=/dev/nvme0n1 of=/somedrive/bkp.image \
  bs=16MB \
  iflag=nocache \

#Stop in the middle and check the speed
434501120 bytes (435 MB, 414 MiB) copied, 24.5498 s, 17.7 MB/s
#With ssd, the speed does not change unexpectedly, right? In the old days, it was necessary to specify the bs size,

Since it's Unix, you know that if you pipe it to gzip, it will be compressed and save disk space.

sudo dd if=/dev/nvme0n1 \
  bs=16MB \
  iflag=nocache \
  oflag=nocache,dsync \
  | pv | gzip > /somedrive/bkp.image.gz

  #pv is a handy utility command that shows your progress

#output of pv
810MiB 0:00:17 [21.6MiB/s]

It's a little faster. Since the disk cost is higher than the CPU cost, it seems that compressed data can be processed more.


I've taken this for more than 10 years, I suddenly noticed: "Manycore CPUs are the norm, so parallel processing can compress them faster, right?" </ B>

There was! !! !! " pigz </ b> = P </ b> arallel I </ b> implementation of GZ </ b> ip" = "Parallel implementation gzip"

❯ pigz --help
Usage: pigz [options] [files ...]
  will compress files in place, adding the suffix '.gz'. If no files are
  specified, stdin will be compressed to stdout. pigz does what gzip does,
  but spreads the work over multiple processors and cores when compressing.

Wonderful. When an ordinary person like me comes up with something, a smart person comes up with it. You can simply use it as a drop-in replacement for the gzip command.

let's try it

sudo dd if=/dev/nvme0n1 of=/somedrive/bkp.image \
  bs=16MB \
  iflag=nocache \
  oflag=nocache,dsync \
  | pv | pigz > /somedrive/bkp.image.gz  #Only this line is different gzip-> pigz

[  99168000000 bytes (99 GB, 92 GiB) copied, 265 s, 374 MB/s92.7GiB 0:04:26  [ 359MiB/s] 
[  99552000000 bytes (100 GB, 93 GiB) copied, 266 s, 374 MB/s93.1GiB 0:04:27 [ 381MiB/s] 
[  99952000000 bytes (100 GB, 93 GiB) copied, 267 s, 374 MB/s93.4GiB 0:04:28 [ 335MiB/s] 

21.6MB / s-> 374 MB / s 17.3 times faster! !! !!

[  498288000000 bytes (498 GB, 464 GiB) copied, 1271 s, 392 MB/s 464GiB 0:21:12 [ 428MiB/s] 
[  498736000000 bytes (499 GB, 464 GiB) copied, 1272 s, 392 MB/s 464GiB 0:21:13 [ 430MiB/s] 
[  499200000000 bytes (499 GB, 465 GiB) copied, 1273 s, 392 MB/s 465GiB 0:21:14 [ 456MiB/s]

Further speed up in the latter half of processing. It is expected that the compression will be effective when unused sectors etc. are filled with zeros. It's 21x speed!

pigz Pigs Piggs (I wrote it 3 times so as not to forget it)

The more you laugh, the easier it will be to explode, so

I think it's worth remembering!

-rwxrwxrwx 1 root root 29G Dec  6 12:28  bkp.image.gz 
Raw SSD 500G->It is a compression of angry waves to 29G. Thank you pigz

#One liner to replace gzip in existing scripts(Please keep a backup as it will be overwritten.)
sed -i 's/gzip/pigs/g' somescript.sh

#It was included by default in ubuntu18
#For Mac
brew install pigz
#Bonus 2-It's such a machine
> ls /dev/disk/by-id
nvme-Samsung_SSD_960_EVO_500GB_S3X4NB0K142330K -> ../../nvme0n1
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6ZX6483 -> ../../sda

❯ sudo cat /proc/cpuinfo | grep "model name" | uniq                                                        ╯
model name	: Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
#It is a CPU with 10 cores and 20 threads.(So I wonder if it was "20 times a little")