gzip

The data compression program

Concept Index

  • concatenated files
  • Environment
  • overview
  • tapes
  • bzip2

    a block-sorting file compressor

    bzip2 [ -cdfkqstvzVL123456789 ] [ filenames ]
    bunzip2 [ -fkvsVL ] [ filenames]
    bzcat [ -s ] [ filenames] # - decompresses files to stdout
    bzip2recover filename # - recovers data from damaged bzip2 files

    More at wikipedia. See the link at the bottom too.

    The command-line options are similar to those of GNU gzip.

    Files on the command line (or expanded by globing) are replaced by a compressed version with the name suffixed by .bz2
    Compressed files retain ownership, permissions, and modification date ( access and change date are not preserved).

    Files are not over-written, specify --force.

    Piping is done if no file names are specified reading from standard input to writing to standard output (usefull to pipe elsewhere).

    Decompresses specified files, unless they were not created by bzip2 which will be skipped with a warning.
    Filename for the decompressed file from that of the compressed file as follows:

    filename.bz2 → filename
    filename.bz → filename
    filename.tbz2 → filename.tar
    filename.tbz → filename.tar
    anyothername → anyothername.out

    If the file does not end in a recognised ending, .bz2, .bz, .tbz2 or .tbz, bzip2 warns that it cannot determine the name of the original file, and uses the original name with .out appended.

    Given the concatenation of two or more compressed files produces the concatenation of the corresponding uncompressed files.

    Integrity testing (-t) of concatenated compressed files is supported.

    files re output to the standard output by using -c .
    Multiple files may be compressed and decompressed using this.
    The resulting outputs are fed sequentially to stdout. Compression of multiple files in this manner generates a stream containing multiple compressed file representations.

    bzcat (or bzip2 -dc) decompresses to the standard output.

    bzip2 reads arguments from $BZIP2 and $BZIP, in that order, and will process them before any arguments read from the command line. This gives a convenient way to supply default arguments.

    Return values:
    0 normal exit,
    1 environmental problems (file not found, invalid flags, I/O errors
    2 to indicate a corrupt compressed file,
    3 non-gzip file; internal consistency error (eg, bug)

        zip warning: missing end signature--probably not a zip file (did you
        zip warning: remember to use binary mode when you transferred it?)
        zip warning: (if you are trying to read a damaged archive try -F)
    
    zip error: Zip file structure invalid (auth.log.0.bz2) 

    -d
    ‑‑decompress
    -z
    --compress
    -t
    --test
    Test integrity of the files without writing output
    -c
    --stdout
    write to standard output. Useful for piping.
    -f
    --force
    1. overwrites existing output files.
    2. hard links to files are severed
    3. Files that don't to be compressed pass unmodified.
    -k
    --keep
    Keep input files
    -s
    --small
    Reduce memory usage (at the expense of creating larger output fieles)
    -q
    --quiet
    Suppress warnings .
    I/O errors and critical events will not be suppressed.
    -v
    --verbose
    Show the compression ratio, multiple on stderr. -v's increase the verbosity
    -L
    --license
    -V
    --version
    -1 or --fast
     to
    -9 or --best
    Set the block size to 100 k, 200 k .. 900 k when compressing.
    Only useful for very small memory environments.
    Aliases for GNU gzip compatibility.
    --fast doesn't , --best selects the default behaviour.
    -- subsequent arguments as file names,
    example: bzip2 -- -myfilename.

    Recovering data damaged files:

  • bzip2 compresses files in blocks, handled independently. If a error causes a file to become damaged, it may be possible to recover data from the undamaged blocks in the file.

    bzip2, bunzip2 and bzcat are the same program, and the decision about what actions to take is done on the basis of which name is used.

    Author Julian Seward, jsewardbzip.org. bzip.org:

    --list, -l

    gzip -l *gz
              compressed        uncompressed  ratio uncompressed_name
                     20                   0   0.0% smother.diske-
                     20                   0   0.0% smother.diskf-
                     20                   0   0.0% smother.diskg-
                     20                   0   0.0% smother.diskh-
              798830592          3596346423  77.8% smother_wd0e
              798830672          3596346423  77.8% (totals) 
    with --verbose
     gzip -lv *gz
    method  crc     date  time           compressed        uncompressed  ratio uncompressed_name
    defla 00000000 Sep  1 15:00                  20                   0   0.0% smother.diske-
    defla 00000000 Sep  1 15:01                  20                   0   0.0% smother.diskf-
    defla 00000000 Sep  1 15:01                  20                   0   0.0% smother.diskg-
    defla 00000000 Sep  1 15:01                  20                   0   0.0% smother.diskh-
    defla dbd673f2 Sep  1 16:09           798830592          3596346423  77.8% smother_wd0e
                                          798830672          3596346423  77.8% (totals) 
    The uncompressed size is given as -1 for files not in gzip format, such as compressed .Z files.
    To get the uncompressed size for such a file, use: zcat file.Z | wc -c
    The crc is given as ffffffff for a file not in gzip format.
    Title and totals lines are not displayed with --quiet,.

    Errors

     bzip2: xxx.zip: bad magic number (file not created by bzip2)
    
    You can use the `bzip2recover' program to attempt to recover
    data from undamaged sections of corrupted files.
    > echo $?
    2
    > file  3.31.1_H+2.zip  
    xxx.zip: Zip archive data, at least v2.0 to extract, compression method=store
    Or try gzip.


    gzip

    --decompress
    --uncompress
    -d
    -r
    --recursive
    Travel the directory structure recursively.
    -S suf
    --suffix suf
    Suffix suf
    A null suffix decompresses all given files as in:
     gunzip -S "" *        (*.* for MSDOS) 
    -t
    --test
    Test the compressed file
    --quiet
    -q
    Suppress warning messages.
    -v
    --verbose
    Display the name and percentage reduction for each file on stderr.
    --no-name
    -n
    When compressing, do not save the original name and time stamp .
    The original name is saved if the name had to be truncated.

    When decompressing, do not restore the original name.
    Remove only the suffix and do not restore the original time stamp use the time stamp from the compressed file.
    Default when decompressing.

    --name
    -N
    When compressing, save the original name and time stamp (default).
    When decompressing, restore the original name and time stamp.
    --stdout
    --to-stdout
    -c
    output on standard output. Do not change the input file.
    WIth several input files, the output consists of a sequence of independently compressed members.
    For best compression, concatenate input files before compressing them.
    --fast
    --best
    -n
    Specify speed/compression tradeoff
    --fast or -1 fastest / less compression
    -6 biased towards high compression at expense of speed.Default
    --best or -9 slowest / more compression
    --force
    -f
    For a file with multiple links or
    the corresponding file exists or
    if the data is from/to a terminal.

    If the input is not in a recognized by gzip and
    if --stdout is given then copy the input without change to the standard ouput: let

    zcat Behave as cat. br> If --force is not given, and not running in the background, prompt to verify if a file should be overwritten.
    --help
    -h
    --version
    -V
    --license
    -L
    Display the gzip license then quit.

    Find all gzip files in the current directory and subdirectories, and extract them in place without destroying the original:

            find . -name '*.gz' -print | sed 's/^\(.*\)[.]gz$/gunzip < "&" > "\1"/' | sh 

    Advanced usage

    Multiple compressed files can be concatenated, all members are extracted at once.
    If one member is damaged, other members might still be recovered.

    Best compression can be obtained if members are decompressed and then recompressed in a single step.

    Concatenating gzip files:

         gzip --to-stdout file1  > foo.gz
         gzip --to-stdout file2 >> foo.gz 
    gunzip --to-stdout foo is equivalent to cat file1 file2

    To recompress concatenated files to get better compression: zcat old.gz | gzip > new.gz

    A compressed file with several members, the uncompressed size and CRC reported by --list to the last member only.
    To display the uncompressed size for all members, use:

         zcat file.gz | wc -c 

    Create an archive with multiple members use an archiver such as tar or zip.
    GNU tar supports -z to invoke gzip.

    gzip is a complement to tar, not a replacement.

    Environment

    $GZIP default options, interpreted first and overwritten by command line parameters.
    For example:
    for sh:    GZIP="-8v --name"; export GZIP
    for csh:   setenv GZIP "-8v --name"
    for MSDOS: set GZIP=-8v --name 

    Using gzip on tapes

    When writing compressed data to a tape, pad the output with zeroes to a block boundary.
    When the data is read and the whole block is passed to gunzip for decompression, gunzip detects that there is extra trailing garbage after the compressed data and emits a warning by default, --quiet to suppress the warning.

    Overview

    A digression: Taken from bzip2.txt 9/11/10 v1.06

    The original documentation containg supstantial discussions related to legacy versions running on VM, MSDOS... which had file systems with sever limitations including kength of filename.

    gzip reduces the size using Lempel-Ziv coding (LZ77) replacing a file by one with the extension .gz.
    If no files are specified or if a name is - standard input is compressed to the standard output.
    Compress regular files ( symbolic links are ignored).

    gzip saves the original name and timestamp in the compressed file and uses it when decompressing the with -N .

    gunzip takes a list of files on the command line and replaces each file whose name ends with .gz,
    zcat is identical to gunzip -c with a .gz suffix or not.

    Apple gzip 272.250.1
    usage: gzip [-123456789acdfhklLNnqrtVv] [-S .suffix] [ [ ...]]
     -1 --fast            fastest (worst) compression
     -2 .. -8             set compression level
     -9 --best            best (slowest) compression
     -c --stdout          write to stdout, keep original files
        --to-stdout
     -d --decompress      uncompress files
        --uncompress
     -f --force           force overwriting & compress links
     -k --keep            don't delete input files during operation
     -l --list            list compressed file contents
     -N --name            save or restore original file name and time stamp
     -n --no-name         don't save original file name or time stamp
     -r --recursive       recursively compress files in directories
    
     -q --quiet           output no warnings
     -S .suf              use suffix .suf instead of .gz
        --suffix .suf
     -t --test            test compressed file
     -V --version         display program version
     -v --verbose         print extra statistics
     -h --help            display this help
    
    As on Monterey:
    gzip -V  Apple gzip 353.100.22
    gzip --license
    Apple gzip 353.100.22 (based on FreeBSD gzip 20190107, NetBSD gzip 20150113)
       Copyright (c) 1997, 1998, 2003, 2004, 2006 Matthew R. Green
    …
    
    www.bzip.org