Проект OpenNet: MAN bzip2 (1) Команды и прикладные программы пользовательского уровня (FreeBSD и Linux)

Интерактивная система просмотра системных руководств (man-ов)

bzip2 (1)
>> bzip2 (1) ( Solaris man: Команды и прикладные программы пользовательского уровня )
bzip2 (1) ( FreeBSD man: Команды и прикладные программы пользовательского уровня )
bzip2 (1) ( Linux man: Команды и прикладные программы пользовательского уровня )

NAME
     bzip2, bunzip2 - a block-sorting file compressor, v0.9.0
     bzcat - decompresses files to stdout
     bzip2recover - recovers data from damaged bzip2 files


SYNOPSIS
     bzip2 [ -cdfkstvzVL123456789 ] [ filenames ... ]
     bunzip2 [ -fkvsVL ] [ filenames ... ]
     bzcat [ -s ] [ filenames ... ]
     bzip2recover filename


DESCRIPTION
     bzip2 compresses  files  using  the  Burrows-Wheeler  block-
     sorting  text  compression  algorithm,  and  Huffman coding.
     Compression  is  generally  considerably  better  than  that
     achieved  by  more conventional LZ77/LZ78-based compressors,
     and approaches the performance of the PPM family of statist-
     ical compressors.

     The command-line options are deliberately  very  similar  to
     those of GNU Gzip, but they are not identical.

     bzip2  expects  a  list  of  file  names  to  accompany  the
     command-line  flags.  Each  file is replaced by a compressed
     version of itself, with the name "original_name.bz2".   Each
     compressed  file  has the same modification date and permis-
     sions as the corresponding original, so that  these  proper-
     ties  can be correctly restored at decompression time.  File
     name handling is naive in the sense that there is no mechan-
     ism  for  preserving  original  file  names, permissions and
     dates in filesystems which  lack  these  concepts,  or  have
     serious file name length restrictions, such as MS-DOS.

     bzip2 and bunzip2 will by  default  not  overwrite  existing
     files; if you want this to happen, specify the -f flag.

     If no file names are specified, bzip2 compresses from  stan-
     dard  input  to  standard  output.  In this case, bzip2 will
     decline to write compressed output to a  terminal,  as  this
     would be entirely incomprehensible and therefore pointless.

     bunzip2 (or bzip2 -d ) decompresses and restores all  speci-
     fied  files  whose  names end in ".bz2".  Files without this
     suffix are ignored. Again,  supplying  no  filenames  causes
     decompression from standard input to standard output.

     bunzip2 will correctly decompress a file which is  the  con-
     catenation  of  two or more compressed files.  The result is
     the concatenation of the corresponding  uncompressed  files.
     Integrity  testing  (-t) of concatenated compressed files is
     also supported.

     You can also compress or decompress files  to  the  standard
     output  by  giving  the  -c  flag.   Multiple  files  may be
     compressed and decompressed like this.  The  resulting  out-
     puts  are fed sequentially to stdout.  Compression of multi-
     ple files in this manner generates a stream containing  mul-
     tiple compressed file representations.  Such a stream can be
     decompressed correctly only by bzip2 version 0.9.0 or later.
     Earlier  versions of bzip2 will stop after decompressing the
     first file in the stream.

     bzcat (or bzip2 -dc ) decompresses all  specified  files  to
     the standard output.

     Compression is always performed, even if the compressed file
     is  slightly  larger  than the original.  Files of less than
     about one hundred  bytes  tend  to  get  larger,  since  the
     compression  mechanism has a constant overhead in the region
     of 50 bytes.  Random data (including the output of most file
     compressors) is coded at about 8.05 bits per byte, giving an
     expansion of around 0.5%.

     As a self-check for your protection, bzip2 uses 32-bit  CRCs
     to  make  sure  that  the  decompressed version of a file is
     identical to the original. This guards against corruption of
     the  compressed  data,  and against undetected bugs in bzip2
     (hopefully very unlikely).  The chances of  data  corruption
     going  undetected  is  microscopic, about one chance in four
     billion for each file processed.  Be aware, though, that the
     check  occurs  upon  decompression,  so it can only tell you
     that that something is wrong.  It can't help you recover the
     original uncompressed data.  You can use bzip2recover to try
     to recover data from damaged files.

     Return values: 0 for a  normal  exit,  1  for  environmental
     problems  (file not found, invalid flags, I/O errors, &c), 2
     to indicate a corrupt compressed file,  3  for  an  internal
     consistency error (eg, bug) which caused bzip2 to panic.


MEMORY MANAGEMENT
     Bzip2 compresses large files  in  blocks.   The  block  size
     affects  both the compression ratio achieved, and the amount
     of memory needed both  for  compression  and  decompression.
     The flags -1 through -9 specify the block size to be 100,000
     bytes through 900,000 bytes (the default) respectively.   At
     decompression-time,  the  block size used for compression is
     read from the header of the  compressed  file,  and  bunzip2
     then  allocates  itself just enough memory to decompress the
     file.  Since block sizes are stored in compressed files,  it
     follows  that  the  flags  -1 to -9 are irrelevant to and so
     ignored during decompression.  Compression and decompression
     requirements, in bytes, can be estimated as:

           Compression:   400k + ( 7 x block size )

           Decompression: 100k + ( 4 x block size ), or
                          100k + ( 2.5 x block size )

     Larger  block  sizes  give  rapidly   diminishing   marginal
     returns; most of the compression comes from the first two or
     three hundred k of block size, a fact worth bearing in  mind
     when using bzip2 on small machines.  It is also important to
     appreciate that the decompression memory requirement is  set
     at compression-time by the choice of block size.

     For files compressed with the default 900k block size,  bun-
     zip2  will require about 3700 kbytes to decompress.  To sup-
     port decompression of any file on a 4 megabyte machine, bun-
     zip2  has  an  option to decompress using approximately half
     this amount of memory,  about  2300  kbytes.   Decompression
     speed  is  also  halved,  so you should use this option only
     where necessary.  The relevant flag is -s.

     In general, try and use the largest block size  memory  con-
     straints   allow,   since  that  maximises  the  compression
     achieved.  Compression and decompression speed are virtually
     unaffected by block size.

     Another significant point applies to files which  fit  in  a
     single  block -- that means most files you'd encounter using
     a large block size.  The amount of real  memory  touched  is
     proportional  to  the  size  of  the file, since the file is
     smaller than a  block.   For  example,  compressing  a  file
     20,000 bytes long with the flag -9 will cause the compressor
     to allocate around 6700k of memory, but only  touch  400k  +
     20000  *  7 = 540 kbytes of it.  Similarly, the decompressor
     will allocate 3700k but only touch 100k + 20000 *  4  =  180
     kbytes.

     Here is a table which summarises the  maximum  memory  usage
     for  different  block  sizes.   Also  recorded  is the total
     compressed size for 14 files of the Calgary Text Compression
     Corpus  totalling  3,141,622  bytes.  This column gives some
     feel for how compression varies with block size.  These fig-
     ures  tend to understate the advantage of larger block sizes
     for larger files, since the Corpus is dominated  by  smaller
     files.

                Compress   Decompress   Decompress   Corpus
         Flag     usage      usage       -s usage     Size

          -1      1100k       500k         350k      914704
          -2      1800k       900k         600k      877703
          -3      2500k      1300k         850k      860338
          -4      3200k      1700k        1100k      846899
          -5      3900k      2100k        1350k      845160
          -6      4600k      2500k        1600k      838626
          -7      5400k      2900k        1850k      834096
          -8      6000k      3300k        2100k      828642
          -9      6700k      3700k        2350k      828642


OPTIONS
     -c --stdout
          Compress or decompress to  standard  output.   -c  will
          decompress  multiple  files  to  stdout,  but will only
          compress a single file to stdout.

     -d --decompress
          Force decompression.   bzip2,  bunzip2  and  bzcat  are
          really  the  same  program, and the decision about what
          actions to take is done on the basis of which  name  is
          used.   This  flag overrides that mechanism, and forces
          bzip2 to decompress.

     -z --compress
          The complement to -d: forces compression, regardless of
          the invokation name.

     -t --test
          Check integrity of the  specified  file(s),  but  don't
          decompress   them.    This   really  performs  a  trial
          decompression and throws away the result.

     -f --force
          Force overwrite of output files.  Normally, bzip2  will
          not overwrite existing output files.

     -k --keep
          Keep (don't delete) input files during  compression  or
          decompression.

     -s --small
          Reduce memory usage, for compression, decompression and
          testing.   Files  are  decompressed  and tested using a
          modified algorithm which only requires  2.5  bytes  per
          block byte.  This means any file can be decompressed in
          2300k of memory, albeit at about half the normal speed.

          During compression, -s selects a block  size  of  200k,
          which  limits  memory use to around the same figure, at
          the expense of your compression ratio.   In  short,  if
          your  machine  is  low on memory (8 megabytes or less),
          use -s for everything.  See MEMORY MANAGEMENT above.

     -v --verbose
          Verbose mode -- show the  compression  ratio  for  each
          file  processed.   Further  -v's increase the verbosity
          level, spewing out lots of information  which  is  pri-
          marily of interest for diagnostic purposes.

     -L --license -V --version
          Display the software version, license terms and  condi-
          tions.

     -1 to -9
          Set the block size to 100  k,  200  k  ..  900  k  when
          compressing.   Has  no  effect when decompressing.  See
          MEMORY MANAGEMENT above.

     --repetitive-fast
          bzip2 injects some small pseudo-random variations  into
          very  repetitive blocks to limit worst-case performance
          during compression.  If sorting runs into difficulties,
          the block is randomised, and sorting is restarted. Very
          roughly, bzip2 persists for three times as  long  as  a
          well-behaved  input would take before resorting to ran-
          domisation.  This flag makes it give up much sooner.


     --repetitive-best
          Opposite of --repetitive-fast; try a lot harder  before
          resorting to randomisation.


RECOVERING DATA FROM DAMAGED FILES
     bzip2 compresses files in blocks,  usually  900kbytes  long.
     Each   block  is  handled  independently.   If  a  media  or
     transmission error causes a multi-block .bz2 file to  become
     damaged,  it may be possible to recover data from the undam-
     aged blocks in the file.

     The compressed representation of each block is delimited  by
     a  48-bit pattern, which makes it possible to find the block
     boundaries with reasonable certainty.  Each block also  car-
     ries  its  own  32-bit  CRC,  so  damaged blocks can be dis-
     tinguished from undamaged ones.

     bzip2recover is a simple program whose purpose is to  search
     for  blocks in .bz2 files, and write each block out into its
     own .bz2 file.  You can  then  use  bzip2  -t  to  test  the
     integrity of the resulting files, and decompress those which
     are undamaged.

     bzip2recover takes a single argument, the name of  the  dam-
     aged  file,  and writes a number of files "rec0001file.bz2",
     "rec0002file.bz2", etc,  containing  the  extracted  blocks.
     The  output  filenames are designed so that the use of wild-
     cards in subsequent processing -- for  example,  "bzip2  -dc
     rec*file.bz2  >  recovered_data"  --  lists the files in the
     "right" order.

     bzip2recover should be of most use dealing with  large  .bz2
     files,  as  these  will  contain many blocks.  It is clearly
     futile to use it on damaged single-block files, since a dam-
     aged block cannot be recovered.  If you wish to minimise any
     potential data loss through media  or  transmission  errors,
     you might consider compressing with a smaller block size.


PERFORMANCE NOTES
     The sorting phase of compression  gathers  together  similar
     strings in the file.  Because of this, files containing very
     long runs  of  repeated  symbols,  like  "aabaabaabaab  ..."
     (repeated  several  hundred  times)  may compress extraordi-
     narily slowly.  You can use the  -vvvvv  option  to  monitor
     progress  in great detail, if you want.  Decompression speed
     is unaffected.

     Such pathological cases seem  rare  in  practice,  appearing
     mostly  in  artificially-constructed test files, and in low-
     level disk images.  It may be inadvisable to  use  bzip2  to
     compress  the  latter.  If  you  do  get a file which causes
     severe slowness in compression, try making the block size as
     small as possible, with flag -1.

     bzip2 usually  allocates  several  megabytes  of  memory  to
     operate  in, and then charges all over it in a fairly random
     fashion.  This means that performance, both for  compressing
     and  decompressing,  is  largely  determined by the speed at
     which your machine can  service  cache  misses.  Because  of
     this, small changes to the code to reduce the miss rate have
     been observed to give disproportionately  large  performance
     improvements.  I imagine bzip2 will perform best on machines
     with very large caches.


CAVEATS
     I/O error messages are not as  helpful  as  they  could  be.
     Bzip2  tries hard to detect I/O errors and exit cleanly, but
     the details of what the problem  is  sometimes  seem  rather
     misleading.

     This  manual  page  pertains  to  version  0.9.0  of  bzip2.
     Compressed data created by this version is entirely forwards
     and backwards compatible with the previous  public  release,
     version  0.1pl2, but with the following exception: 0.9.0 can
     correctly decompress multiple concatenated compressed files.
     0.1pl2 cannot do this; it will stop after decompressing just
     the first file in the stream.

     Wildcard expansion for Windows 95 and NT is flaky.

     bzip2recover uses 32-bit integers to represent bit positions
     in  compressed  files,  so it cannot handle compressed files
     more than 512 megabytes long.  This could easily be fixed.


AUTHOR
     Julian Seward, jseward@acm.org.

     http://www.muraroa.demon.co.uk

     The ideas embodied in bzip2 are due to (at least)  the  fol-
     lowing  people:   Michael Burrows and David Wheeler (for the
     block sorting transformation), David Wheeler (again, for the
     Huffman  coder),  Peter  Fenwick  (for the structured coding
     model in the original bzip, and many refinements), and Alis-
     tair Moffat, Radford Neal and Ian Witten (for the arithmetic
     coder in the original bzip). I am much  indebted  for  their
     help, support and advice.  See the manual in the source dis-
     tribution for pointers to sources of documentation.   Chris-
     tian  von  Roques  encouraged  me to look for faster sorting
     algorithms, so as to  speed  up  compression.   Bela  Lubkin
     encouraged  me to improve the worst-case compression perfor-
     mance.  Many people sent patches,  helped  with  portability
     problems,  lent  machines,  gave  advice  and were generally
     helpful.

NOTES
     Source for bzip2 is available in the SUNWbzipS package.
Партнёры:
Хостинг:
Закладки на сайте
Проследить за страницей
Created 1996-2026 by Maxim Chirkov
Добавить, Поддержать, Вебмастеру