Проект OpenNet: MAN findaffix (1) Команды и прикладные программы пользовательского уровня (FreeBSD и Linux)

Интерактивная система просмотра системных руководств (man-ов)

findaffix (1)
>> findaffix (1) ( Solaris man: Команды и прикладные программы пользовательского уровня )
findaffix (1) ( Linux man: Команды и прикладные программы пользовательского уровня )

NAME
     ispell, buildhash, munchlist, findaffix, tryaffix, icombine,
     ijoin - Interactive spelling checking

SYNOPSIS
     ispell [common-flags] [-M|-N] [-Lcontext] [-V] files
     ispell [common-flags] -l
     ispell [common-flags] [-f file] [-s] {-a|-A}
     ispell [-d file] [-w chars] -c
     ispell [-d file] [-w chars] -e[e]
     ispell [-d file] -D
     ispell -v[v]

     common-flags:
          [-t] [-n] [-b] [-x] [-B] [-C] [-P] [-m] [-S] [-d  file]
          [-p file] [-w chars] [-W n] [-T type]

     buildhash [-s] dict-file affix-file hash-file
     buildhash -s count affix-file

     munchlist [-l aff-file] [-c conv-file] [-T suffix]
               [-s hash-file] [-D] [-v] [-w chars] [files]

     findaffix [-p|-s] [-f] [-c] [-m min] [-M max] [-e elim]
               [-t tabchar] [-l low] [files]

     tryaffix [-p|-s] [-c] expanded-file affix[+addition]

     icombine [-T type] [aff-file]

     ijoin [-s|-u] join-options file1 file2

DESCRIPTION
     Ispell is fashioned after the spell program from ITS (called
     ispell on Twenex systems.)  The most common usage is "ispell
     filename".  In this case,  ispell  will  display  each  word
     which  does  not  appear in the dictionary at the top of the
     screen and allow you to  change  it.   If  there  are  "near
     misses" in the dictionary (words which differ by only a sin-
     gle letter, a missing or extra letter, a pair of  transposed
     letters,  or  a missing space or hyphen), then they are also
     displayed on following lines.  As  well  as  "near  misses",
     ispell  may  display  other guesses at ways to make the word
     from a known root, with  each  guess  preceded  by  question
     marks.  Finally, the line containing the word and the previ-
     ous line are printed at the bottom of the screen.   If  your
     terminal  can  display  in reverse video, the word itself is
     highlighted.  You have the option of replacing the word com-
     pletely,  or  choosing one of the suggested words.  Commands
     are single characters as follows (case is ignored):


          R    Replace the misspelled word completely.

          Space
               Accept the word this time only.

          A    Accept the word for the rest of this  ispell  ses-
               sion.

          I    Accept the word, capitalized as it is in the file,
               and update private dictionary.

          U    Accept the word, and add an  uncapitalized  (actu-
               ally,  all lower-case) version to the private dic-
               tionary.

          0-n  Replace with one of the suggested words.

          L    Look up words in system dictionary (controlled  by
               the WORDS compilation option).

          X    Write the rest of  this  file,  ignoring  misspel-
               lings, and start next file.

          Q    Exit immediately and leave the file unchanged.

          !    Shell escape.

          ^L   Redraw screen.

          ^Z   Suspend ispell.

          ?    Give help screen.

     If the -M switch is specified, a one-line mini-menu  at  the
     bottom  of  the  screen  will summarize these options.  Con-
     versely, the -N switch may be used  to  suppress  the  mini-
     menu.   (The  minimenu is displayed by default if ispell was
     compiled with the MINIMENU option, but  these  two  switches
     will always override the default).

     If the -L flag is given, the specified number is used as the
     number  of lines of context to be shown at the bottom of the
     screen (The default is to calculate the amount of context as
     a  certain  percentage  of  the screen size).  The amount of
     context is subject to a system-imposed limit.

     If the -V flag is given, characters that are not in  the  7-
     bit ANSI printable character set will always be displayed in
     the style of "cat -v", even  if  ispell  thinks  that  these
     characters  are  legal  ISO Latin-1 on your system.  This is
     useful when working  with  older  terminals.   Without  this
     switch, ispell will display 8-bit characters "as is" if they
     have been defined as string characters for the  chosen  file
     type.

     "Normal" mode, as well as the -l, -a, and  -A  options  (see
     below) also accepts the following "common" flags on the com-
     mand line:

          -t   The input file is in TeX or LaTeX format.

          -n   The input file is in nroff/troff format.

          -b   Create a backup file by appending  ".bak"  to  the
               name of the input file.

          -x   Don't create a backup file.

          -B   Report run-together words with missing  blanks  as
               spelling errors.

          -C   Consider run-together words as legal compounds.

          -P   Don't generate extra root/affix combinations.

          -m   Make possible root/affix combinations that  aren't
               in the dictionary.

          -S   Sort the list of guesses by probable correctness.

          -d file
               Specify an alternate dictionary file.   For  exam-
               ple,  use -d deutsch to choose a German dictionary
               in a German installation.

          -p file
               Specify an alternate personal dictionary.

          -w chars
               Specify additional characters that can be part  of
               a word.

          -W n Specify length of words that are always legal.

          -T type
               Assume a given formatter type for all files.

     The  -n  and  -t  options  select  whether  ispell  runs  in
     nroff/troff (-n) or TeX/LaTeX (-t) input mode.  (The default
     is  controlled  by  the  DEFTEXFLAG  installation   option.)
     TeX/LaTeX  mode  is  also automatically selected if an input
     file has the extension ".tex", unless overridden by  the  -n
     switch.   In  TeX/LaTeX  mode, whenever a backslash ("\") is
     found, ispell will skip to the next whitespace or  TeX/LaTeX
     delimiter.   Certain commands contain arguments which should
     not be checked, such as labels and  reference  keys  as  are
     found  in  the  \cite command, since they contain arbitrary,
     non-word arguments.  Spell checking is also suppressed  when
     in math mode.  Thus, for example, given

          \chapter {This is a Ckapter} \cite{SCH86}

     ispell will find "Ckapter" but not  "SCH".   The  -t  option
     does  not  recognize  the TeX comment character "%", so com-
     ments are also spell-checked.  It also assumes correct LaTeX
     syntax.   Arguments  to  infrequently used commands and some
     optional arguments are sometimes checked unnecessarily.  The
     bibliography will not be checked if ispell was compiled with
     IGNOREBIB defined.   Otherwise,  the  bibliography  will  be
     checked but the reference key will not.

     References for the tib(1) bibliography system, that is, text
     between  a ``[.'' or ``<.'' and ``.]'' or ``.>'' will always
     be ignored in TeX/LaTeX mode.

     The -b and -x options control whether ispell leaves a backup
     (.bak) file for each input file.  The .bak file contains the
     pre-corrected text.  If there are  file  opening  /  writing
     errors, the .bak file may be left for recovery purposes even
     with the -x option.  The default for  this  option  is  con-
     trolled by the DEFNOBACKUPFLAG installation option.

     The -B and  -C  options  control  how  ispell  handles  run-
     together  words,  such  as "notthe" for "not the".  If -B is
     specified, such words will  be  considered  as  errors,  and
     ispell will list variations with an inserted blank or hyphen
     as possible replacements.  If -C is specified,  run-together
     words  will  be considered to be legal compounds, so long as
     both components are in the dictionary, and each component is
     at  least as long as a language-dependent minimum (3 charac-
     ters, by default).  This is useful  for  languages  such  as
     German  and  Norwegian, where many compound words are formed
     by concatenation.  (Note that compounds formed from three or
     more  root  words  will  still  be  considered errors).  The
     default for this option is language-dependent; in  a  multi-
     lingual installation the default may vary depending on which
     dictionary you choose.

     The -P and -m options control when ispell automatically gen-
     erates  suggested root/affix combinations for possible addi-
     tion to your personal dictionary.  (These are the entries in
     the  "guess" list which are preceded by question marks.)  If
     -P is specified, such guesses are displayed only  if  ispell
     cannot  generate  any  possibilities  that match the current
     dictionary.  If -m is specified,  such  guesses  are  always
     displayed.   This  can  be  useful  if  the dictionary has a
     limited word list, or a word list with few  suffixes.   How-
     ever,  you  should  be careful when using this option, as it
     can  generate  guesses  that  produce  illegal  words.   The
     default for this option is controlled by the dictionary file
     used.

     The -S option suppresses ispell's normal behavior of sorting
     the  list  of  possible  replacement words.  Some people may
     prefer this, since it somewhat enhances the probability that
     the correct word will be low-numbered.

     The -d option is used to specify an  alternate  hashed  dic-
     tionary  file, other than the default.  If the filename does
     not contain a "/", the library  directory  for  the  default
     dictionary  file  is  prefixed; thus, to use a dictionary in
     the local directory "-d ./xxx.hash" must be used.   This  is
     useful   to  allow  dictionaries  for  alternate  languages.
     Unlike  previous  versions  of  ispell,  a   dictionary   of
     /dev/null  is  illegal,  because the dictionary contains the
     affix table.  If you need an effectively  empty  dictionary,
     create  a  one-entry  list  with  an  unlikely string (e.g.,
     "qqqqq").

     The -p option is used to specify an alternate personal  dic-
     tionary  file.   If  the  file name does not begin with "/",
     $HOME is prefixed.  Also, the shell variable WORDLIST may be
     set,  which  renames  the  personal  dictionary  in the same
     manner.  The command line overrides  any  WORDLIST  setting.
     If  neither the -p switch nor the WORDLIST environment vari-
     able is given, ispell will search for a personal  dictionary
     in  both  the  current  directory and $HOME, creating one in
     $HOME if none is found.  The preferred name  is  constructed
     by  appending  ".ispell_" to the base name of the hash file.
     For example, if you use the English  dictionary,  your  per-
     sonal dictionary would be named ".ispell_english".  However,
     if the file ".ispell_words" exists, it will be used  as  the
     personal  dictionary  regardless  of  the language hash file
     chosen.  This feature is included  primarily  for  backwards
     compatibility.

     If the -p option is not specified, ispell will look for per-
     sonal  dictionaries  in  both  the current directory and the
     home directory.  If dictionaries exist in both places,  they
     will be merged.  If any words are added to the personal dic-
     tionary, they will be written to the current directory if  a
     dictionary  already  existed  in  that place; otherwise they
     will be written to the dictionary in the home directory.

     The -w option may be used to specify characters  other  than
     alphabetics  which  may also appear in words.  For instance,
     -w "&" will allow "AT&T" to be picked up.   Underscores  are
     useful  in many technical documents.  There is an admittedly
     crude provision in this option for 8-bit international char-
     acters.   Non-printing  characters  may  be specified in the
     usual way by inserting a backslash  followed  by  the  octal
     character  code;  e.g.,  "\014"  for  a form feed.  Alterna-
     tively, if "n" appears in the character string, the (up  to)
     three  characters  following are a DECIMAL code 0 - 255, for
     the character.  For example, to include bells and form feeds
     in  your  words (an admittedly silly thing to do, but aren't
     most pedagogical examples):

          n007n012

     Numeric digits other than the three following "n" are simply
     numeric  characters.  Use of "n" does not conflict with any-
     thing because actual alphabetics have no meaning - alphabet-
     ics  are  already  accepted.   Ispell will typically be used
     with input from a file, meaning that preserving  parity  for
     possible 8 bit characters from the input text is OK.  If you
     specify the -l option, and actually type text from the  ter-
     minal,  this  may  create  problems  if  your  stty settings
     preserve parity.

     The -W option may be used to change the length of words that
     ispell  always  accepts  as  legal.   Normally,  ispell will
     accept all 1-character words as legal, which  is  equivalent
     to specifying "-W 1."  (The default for this switch is actu-
     ally controlled by the MINWORD installation  option,  so  it
     may vary at your installation.)  If you want all words to be
     checked against the dictionary, regardless  of  length,  you
     might  want  to  specify "-W 0."  On the other hand, if your
     document specifies a lot of three-letter acronyms, you would
     specify "-W 3" to accept all words of three letters or less.
     Regardless of the setting of this option, ispell  will  only
     generate  words  that  are  in  the  dictionary as suggested
     replacements for words; this prevents the list from becoming
     too  long.   Obviously,  this  option can be very dangerous,
     since short misspellings may be missed.   If  you  use  this
     option  a  lot, you should probably make a last pass without
     it before you publish your  document,  to  protect  yourself
     against errors.

     The -T option is used to specify a  default  formatter  type
     for  use in generating string characters.  This switch over-
     rides the default type determined from the file  name.   The
     type  argument may be either one of the unique names defined
     in the language affix file (e.g., nroff) or  a  file  suffix
     including the dot (e.g., .tex).  If no -T option appears and
     no type can be determined from the file  name,  the  default
     string  character  type  declared in the language affix file
     will be used.


     The -l or "list" option to ispell is used to produce a  list
     of misspelled words from the standard input.

     The -a option is intended to be  used  from  other  programs
     through a pipe.  In this mode, ispell prints a one-line ver-
     sion identification message, and then begins  reading  lines
     of  input.  For each input line, a single line is written to
     the standard output for each word checked  for  spelling  on
     the  line.  If the word was found in the main dictionary, or
     your personal dictionary, then the line contains only a '*'.
     If  the  word was found through affix removal, then the line
     contains a '+', a space, and the root word. If the word  was
     found  through  compound  formation  (concatenation  of  two
     words, controlled by the -C option), then the line  contains
     only a '-'.

     If the word is not in the dictionary,  but  there  are  near
     misses,  then  the  line  contains  an  '&',  a  space,  the
     misspelled word, a space, the number  of  near  misses,  the
     number  of  characters between the beginning of the line and
     the beginning of  the  misspelled  word,  a  colon,  another
     space, and a list of the near misses separated by commas and
     spaces.  Following the near misses (and identified  only  by
     the  count  of  near misses), if the word could be formed by
     adding (illegal) affixes to a known root, is a list of  sug-
     gested  derivations,  again  separated by commas and spaces.
     If there are no near misses at all, the line format  is  the
     same,  except that the '&' is replaced by '?' (and the near-
     miss count is always zero).  The suggested derivations  fol-
     lowing the near misses are in the form:

          [prefix+] root [-prefix] [-suffix] [+suffix]

     (e.g., "re+fry-y+ies" to get "refries") where each  optional
     pfx  and  sfx is a string.  Also, each near miss or guess is
     capitalized the same as the input word unless such capitali-
     zation is illegal; in the latter case each near miss is cap-
     italized correctly according to the dictionary.

     Finally, if the word does not appear in the dictionary,  and
     there  are  no  near misses, then the line contains a '#', a
     space, the misspelled  word,  a  space,  and  the  character
     offset  from  the  beginning  of the line.  Each sentence of
     text input is terminated  with  an  additional  blank  line,
     indicating  that  ispell  has completed processing the input
     line.

     These output lines can be summarized as follows:

          OK:  *

          Root:
               + <root>

          Compound:
               -

          Miss:
               & <original>  <count>  <offset>:  <miss>,  <miss>,
               ..., <guess>, ...

          Guess:
               ? <original> 0 <offset>: <guess>, <guess>, ...

          None:
               # <original> <offset>

     For example, a dummy dictionary containing the words "fray",
     "Frey",  "fry",  and  "refried"  might produce the following
     response to the command "echo 'frqy refries | ispell  -a  -m
     -d ./test.hash":
          (#) International Ispell Version 3.0.05 (beta), 08/10/91
          & frqy 3 0: fray, Frey, fry
          & refries 1 5: refried, re+fry-y+ies

     This mode is also suitable for interactive use when you want
     to figure out the spelling of a single word.

     The -A option works just like -a,  except  that  if  a  line
     begins  with  the  string  "&Include_File&", the rest of the
     line is taken as the name of a  file  to  read  for  further
     words.   Input returns to the original file when the include
     file is exhausted.  Inclusion may be nested up to five deep.
     The  key string may be changed with the environment variable
     INCLUDE_STRING (the ampersands, if any, must be included).

     When in the -a mode, ispell will also accept lines of single
     words  prefixed  with  any  of '*', '&', '@', '+', '-', '~',
     '#', '!', '%', or '^'.   A  line  starting  with  '*'  tells
     ispell  to insert the word into the user's dictionary (simi-
     lar to the I command).   A  line  starting  with  '&'  tells
     ispell  to  insert an all-lowercase version of the word into
     the user's dictionary (similar to the U  command).   A  line
     starting  with  '@' causes ispell to accept this word in the
     future (similar to the A command).   A  line  starting  with
     '+',  followed immediately by tex or nroff will cause ispell
     to parse future input according  the  syntax  of  that  for-
     matter.  A line consisting solely of a '+' will place ispell
     in TeX/LaTeX mode (similar to the -t option) and '-' returns
     ispell   to   nroff/troff   mode  (but  these  commands  are
     obsolete).  However, string character type is  not  changed;
     the  '~'  command  must be used to do this.  A line starting
     with '~' causes ispell to set internal parameters  (in  par-
     ticular,  the  default  string  character type) based on the
     filename given in the rest of the line.  (A file  suffix  is
     sufficient,  but  the period must be included.  Instead of a
     file name or  suffix,  a  unique  name,  as  listed  in  the
     language  affix  file, may be specified.)  However, the for-
     matter parsing is not changed;  the '+' command must be used
     to  change  the  formatter.   A  line prefixed with '#' will
     cause the personal dictionary to be saved.  A line  prefixed
     with  '!'  will  turn  on terse mode (see below), and a line
     prefixed with '%' will return ispell to  normal  (non-terse)
     mode.   Any  input following the prefix characters '+', '-',
     '#', '!', or '%' is ignored, as is any input  following  the
     filename  on  a  '~' line.  To allow spell-checking of lines
     beginning with these characters, a line  starting  with  '^'
     has  that  character  removed  before  it  is  passed to the
     spell-checking code.  It is  recommended  that  programmatic
     interfaces prefix every data line with an uparrow to protect
     themselves against future changes in ispell.

     To summarize these:

          *    Add to personal dictionary

          @    Accept word, but leave out of dictionary

          #    Save current personal dictionary

          ~    Set parameters based on filename

          +    Enter TeX mode

          -    Exit TeX mode

          !    Enter terse mode

          %    Exit terse mode

          ^    Spell-check rest of line

     In terse mode, ispell will not print  lines  beginning  with
     '*', '+', or '-', all of which indicate correct words.  This
     significantly improves running speed when the  driving  pro-
     gram is going to ignore correct words anyway.

     The -s option is only valid in conjunction with the -a or -A
     options,  and  only  on  BSD-derived systems.  If specified,
     ispell will stop itself with a  SIGTSTP  signal  after  each
     line  of  input.   It  will  not  read  more  input until it
     receives a SIGCONT signal.  This may be useful for handshak-
     ing with certain text editors.

     The -f option is only valid in conjunction with the -a or -A
     options.   If -f is specified, ispell will write its results
     to the given file, rather than to standard output.

     The -v option causes ispell to  print  its  current  version
     identification  on  the  standard  output  and exit.  If the
     switch is doubled, ispell will also print the  options  that
     it was compiled with.

     The -c, -e[1-4], and -D options  of  ispell,  are  primarily
     intended  for  use  by  the  munchlist shell script.  The -c
     switch causes a list of words to be read from  the  standard
     input.   For  each  word,  a list of possible root words and
     affixes will be written to the standard output.  Some of the
     root  words  will  be  illegal and must be filtered from the
     output by other means; the munchlist script does  this.   As
     an example, the command:

          echo BOTHER | ispell -c

     produces:

          BOTHER BOTHE/R BOTH/R

     The -e switch is the reverse of -c; it expands  affix  flags
     to produce a list of words.  For example, the command:

          echo BOTH/R | ispell -e

     produces:

          BOTH BOTHER

     An optional expansion level can also be specified.  A  level
     of 1 (-e1) is the same as -e alone.  A level of 2 causes the
     original root/affix combination to be prepended to the line:

          BOTH/R BOTH BOTHER

     A level of 3 causes multiple lines to  be  output,  one  for
     each  generated  word, with the original root/affix combina-
     tion followed by the word it creates:

          BOTH/R BOTH
          BOTH/R BOTHER

     A level of 4 causes a floating-point number to  be  appended
     to  each  of the level-3 lines, giving the ratio between the
     length of the root and the total  length  of  all  generated
     words including the root:

          BOTH/R BOTH 2.500000
          BOTH/R BOTHER 2.500000

     Finally, the -D flag causes the affix tables from  the  dic-
     tionary file to be dumped to standard output.

     Unless your system administrator has suppressed the  feature
     to  save  space,  ispell is aware of the correct capitaliza-
     tions of words in the dictionary and in your  personal  dic-
     tionary.  As well as recognizing words that must be capital-
     ized (e.g., George) and  words  that  must  be  all-capitals
     (e.g.,  NASA), it can also handle words with "unusual" capi-
     talization (e.g., "ITCorp" or "TeX").  If a word is capital-
     ized incorrectly, the list of possibilities will include all
     acceptable capitalizations.  (More than  one  capitalization
     may  be  acceptable;  for  example, my dictionary lists both
     "ITCorp" and "ITcorp".)

     Normally, this feature will not  cause  you  surprises,  but
     there  is  one circumstance you need to be aware of.  If you
     use "I" to add a word to your  dictionary  that  is  at  the
     beginning  of a sentence (e.g., the first word of this para-
     graph if "normally" were not in the dictionary), it will  be
     marked  as "capitalization required".  A subsequent usage of
     this word without capitalization (e.g., the quoted  word  in
     the  previous  sentence) will be considered a misspelling by
     ispell, and it will suggest the  capitalized  version.   You
     must then compare the actual spellings by eye, and then type
     "I" to add the uncapitalized variant to your  personal  dic-
     tionary.  You can avoid this problem by using "U" to add the
     original word, rather than "I".

     The rules for capitalization are as follows:

     (1)  Any word may appear in all capitals, as in headings.

     (2)  Any word that is in  the  dictionary  in  all-lowercase
          form  may appear either in lowercase or capitalized (as
          at the beginning of a sentence).

     (3)  Any word that has "funny" capitalization (i.e., it con-
          tains  both  cases  and there is an uppercase character
          besides the first) must appear exactly as in  the  dic-
          tionary,  except as permitted by rule (1).  If the word
          is acceptable in all-lowercase, it must appear thus  in
          a dictionary entry.

  buildhash
     The buildhash program builds  hashed  dictionary  files  for
     later use by ispell. The raw word list (with affix flags) is
     given in dict-file, and the the affix flags are  defined  by
     affix-file.  The hashed output is written to hash-file.  The
     formats of the two input files are described  in  ispell(4).
     The  -s (silent) option suppresses the usual status messages
     that are written to the standard error device.

  munchlist
     The munchlist shell script is used to  reduce  the  size  of
     dictionary  files,  primarily personal dictionary files.  It
     is also  capable  of  combining  dictionaries  from  various
     sources.   The  given  files  are read (standard input if no
     arguments are given), reduced to a minimal set of roots  and
     affixes  that will match the same list of words, and written
     to standard output.

     Input for munchlist contains of raw  words  (e.g  from  your
     personal  dictionary  files)  or root and affix combinations
     (probably generated in earlier munchlist runs).   Each  word
     or root/affix combination must be on a separate line.

     The -D (debug) option leaves temporary  files  around  under
     standard  names instead of deleting them, so that the script
     can be debugged.  Warning:  this option can eat up an  enor-
     mous amount of temporary file space.

     The -v (verbose)  option  causes  progress  messages  to  be
     reported  to  stderr so you won't get nervous that munchlist
     has hung.

     If the -s (strip) option is specified, words that are in the
     specified  hash-file  are  removed from the word list.  This
     can be useful with personal dictionaries.

     The -l option can be used to specify an alternate affix-file
     for munching dictionaries in languages other than English.

     The -c option can be used to convert dictionaries that  were
     built with an older affix file, without risk of accidentally
     introducing unintended affix combinations into the  diction-
     ary.

     The -T option allows  dictionaries  to  be  converted  to  a
     canonical  string-character format.  The suffix specified is
     looked up in the affix file (-l  switch)  to  determine  the
     string-character  format used for the input file; the output
     always uses  the  canonical  string-character  format.   For
     example,  a dictionary collected from TeX source files might
     be converted to canonical format by specifying -T tex.

     The -w option is passed on to ispell.

  findaffix
     The findaffix shell script is  an  aid  to  writers  of  new
     language  descriptions  in choosing affixes.  The given dic-
     tionary files (standard input if none are given)  are  exam-
     ined  for  possible  prefixes  (-p  switch)  or suffixes (-s
     switch, the  default).   Each  commonly-occurring  affix  is
     presented  along  with  a  count  of  the number of times it
     appears and an estimate of the number of bytes that would be
     saved  in  a  dictionary  hash  file if it were added to the
     language table.  Only  affixes  that  generate  legal  roots
     (found in the original input) are listed.

     If the "-c" option is not given, the output lines are in the
     following format:

          strip/add/count/bytes

     where strip is the string that should  be  stripped  from  a
     root  word  before  adding the affix, add is the affix to be
     added, count is a count of the number  of  times  that  this
     strip/add  combination  appears, and bytes is an estimate of
     the number of bytes that might be saved in the raw  diction-
     ary  file  if  this  combination is added to the affix file.
     The field separator in the output will be the tab  character
     specified by the -t switch;  the default is a slash ("/").

     If the -c ("clean output") option is given,  the  appearance
     of  the output is made visually cleaner (but harder to post-
     process) by changing it to:

          -strip+add<tab>count<tab>bytes

     where strip, add, count, and bytes are as before, and  <tab>
     represents the ASCII tab character.

     The method used to generate possible affixes will also  gen-
     erate  longer affixes which have common headers or trailers.
     For example, the two words "moth" and "mother" will generate
     not  only  the  obvious substitution "+er" but also "-h+her"
     and "-th+ther" (and possibly even longer ones, depending  on
     the  value  of  min).  To prevent cluttering the output with
     such affixes, any affix pair that  shares  a  common  header
     (or,  for prefixes, trailer) string longer than elim charac-
     ters (default 1) will be suppressed.  You may  want  to  set
     "elim" to a value greater than 1 if your language has string
     characters; usually the need for this parameter will  become
     obvious when you examine the output of your findaffix run.

     Normally, the affixes are sorted according to  the  estimate
     of  bytes  saved.   The  -f  switch may be used to cause the
     affixes to be sorted by frequency of appearance.

     To save output file space, affixes which occur fewer than 10
     times  are eliminated; this limit may be changed with the -l
     switch.  The -M switch  specifies  a  maximum  affix  length
     (default 8).  Affixes longer than this will not be reported.
     (This saves on temporary disk space and makes the script run
     faster.)

     Affixes which generate stems shorter than 3  characters  are
     suppressed.   (A stem is the word after the strip string has
     been removed, and before the add  string  has  been  added.)
     This  reduces both the running time and the size of the out-
     put file.  This limit may be changed  with  the  -m  switch.
     The  minimum stem length should only be set to 1 if you have
     a lot of free time and disk space (in the range of many days
     and hundreds of megabytes).

     The findaffix script requires  a  non-blank  field-separator
     character  for  internal use.  Normally, this character is a
     slash ("/"), but if the slash appears as a character in  the
     input word list, a different character can be specified with
     the -t switch.

     Ispell dictionaries should be expanded before being  fed  to
     findaffix;  in  addition,  characters  that  are  not in the
     English alphabet (if any) should be translated to lowercase.

  tryaffix
     The tryaffix shell script is used to estimate the effective-
     ness  of a proposed prefix (-p switch) or suffix (-s switch,
     the default) with a given expanded-file.  Only one affix can
     be  tried with each execution of tryaffix, although multiple
     arguments can be used to describe varying forms of the  same
     affix flag (e.g., the D flag for English can add either D or
     ED depending on whether a trailing E  is  already  present).
     Each  word  in the expanded dictionary that ends (or begins)
     with the chosen suffix (or prefix) has that suffix  (prefix)
     removed; the dictionary is then searched for root words that
     match the stripped word.  Normally, all matching  roots  are
     written  to  standard  output, but if the -c (count) flag is
     given, only a statistical summary of the results is written.
     The  statistics  given are a count of words the affix poten-
     tially applies to and an estimate of the number of  diction-
     ary bytes that a flag using the affix would save.  The esti-
     mate will be high if  the  flag  generates  words  that  are
     currently  generated by other affix flags (e.g., in English,
     bathers can be generated by either bath/X or bather/S).

     The dictionary file, expanded-file, must already be expanded
     (using  the -e switch of ispell) and sorted, and things will
     usually work best if uppercase has been folded to lower with
     'tr'.

     The affix arguments are things to be stripped from the  dic-
     tionary file to produce trial roots:  for English, con (pre-
     fix) and ing (suffix) are examples.  The addition  parts  of
     the  argument  are letters that would have been stripped off
     the root before adding the affix.  For example,  in  English
     the  affix  ing  normally  strips e for words ending in that
     letter (e.g., like becomes liking) so we might run:
          tryaffix ing ing+e

     to cover both cases.

     All of the shell scripts contain documentation as commentary
     at  the  beginning;  sometimes these comments contain useful
     information beyond the scope of this manual page.

     It is possible to install ispell in such a way  as  to  only
     support ASCII range text if desired.

  icombine
     The icombine program is a helper for munchlist.  It reads  a
     list  of  words in dictionary format (roots plus flags) from
     the standard input, and produces a reduced list on  standard
     output   which  combines  common  roots  found  on  adjacent
     entries.  Identical roots which have  differing  flags  will
     have  their  flags  combined, and roots which have differing
     capitalizations  will  be  combined  in  a  way  which  only
     preserves   important   capitalization   information.    The
     optional aff-file specifies a language  file  which  defines
     the  character  sets  used  and  the meanings of the various
     flags.  The -T switch can be used to select  among  alterna-
     tive  string  character  types by giving a dummy suffix that
     can be found in an altstringtype statement.

  ijoin
     The ijoin program is a re-implementation  of  join(1)  which
     handles  long  lines and 8-bit characters correctly.  The -s
     switch specifies that the sort(1) program  used  to  prepare
     the  input to ijoin uses signed comparisons on 8-bit charac-
     ters; the -u switch specifies  that  sort(1)  uses  unsigned
     comparisons.  All other options and behaviors of join(1) are
     duplicated as exactly as possible based on the manual  page,
     except that ijoin will not handle newline as a field separa-
     tor.  See the join(1) manual page for more information.

ENVIRONMENT
     DICTIONARY
          Default dictionary to use, if no -d flag is given.

     WORDLIST
          Personal dictionary file name

     INCLUDE_STRING
          Code for file inclusion under the -A option

     TMPDIR
          Directory used for some of munchlist's temporary files

FILES
     /usr/local/lib/english.hash
          Hashed dictionary (may be found  in  some  other  local
          directory, depending on the system).

     /usr/local/lib/english.aff
          Affix-definition file for munchlist

     /usr/dict/web2 or /usr/dict/words
          For the Lookup function (depending on the WORDS  compi-
          lation option).

     $HOME/.ispell_hashfile
          User's private dictionary

     .ispell_hashfile
          Directory-specific private dictionary

SEE ALSO
     spell(1),  egrep(1),  look(1),  join(1),  sort(1),   sq(1L),
     tib(1L), ispell(4L), english(4L)

BUGS
     It takes several to many seconds for ispell to read  in  the
     hash table, depending on size.

     When all  options  are  enabled,  ispell  may  take  several
     seconds  to  generate  all  the guesses at corrections for a
     misspelled word; on slower machines this time is long enough
     to be annoying.

     The hash table is stored as a quarter-megabyte  (or  larger)
     array, so a PDP-11 or 286 version does not seem likely.

     Ispell should understand more troff syntax,  and  deal  more
     intelligently with contractions.

     Although small personal dictionaries are sorted before  they
     are  written  out,  the order of capitalizations of the same
     word is somewhat random.

     When the -x flag is specified, ispell will unlink any exist-
     ing .bak file.

     There are too many flags, and many of them have non-mnemonic
     names.

     Munchlist does not deal very  gracefully  with  dictionaries
     which  contain "non-word" characters.  Such characters ought
     to be deleted from the dictionary with a warning message.

     Findaffix and munchlist require tremendous amounts  of  tem-
     porary  file  space for large dictionaries.  They do respect
     the TMPDIR  environment  variable,  so  this  space  can  be
     redirected.  However, a lot of the temporary space needed is
     for sorting, so TMPDIR is only a  partial  help  on  systems
     with an uncooperative sort(1).  ("Cooperative" is defined as
     accepting the undocumented -T switch).  At its  peak  usage,
     munchlist  takes  10  to  40 times the original dictionary's
     size in Kb.  (The larger  ratio  is  for  dictionaries  that
     already  have  heavy  affix use, such as the one distributed
     with ispell).  Munchlist  is  also  very  slow;  munching  a
     normal-sized  dictionary  (15K  roots,  45K  expanded words)
     takes around an hour on a small workstation.  (Most of  this
     time  is spent in sort(1), and munchlist can run much faster
     on machines that have a more modern sort that  makes  better
     use  of  the  memory  available  to  it.)  Findaffix is even
     worse; the smallest English dictionary cannot  be  processed
     with  this  script  in  a  mere 50Kb of free space, and even
     after specifying switches  to  reduce  the  temporary  space
     required,  the  script will run for over 24 hours on a small
     workstation.

AUTHOR
     Pace Willisson (pace@mit-vax), 1983,  based  on  the  PDP-10
     assembly  version.   That version was written by R. E. Gorin
     in 1971, and later revised by W. E. Matson (1974) and W.  B.
     Ackerman (1978).

     Collected, revised, and enhanced  for  the  Usenet  by  Walt
     Buehring, 1987.

     Table-driven multi-lingual version by Geoff Kuenning,  1987-
     88.

     Large dictionaries provided by Bob Devine (vianet!devine).

     A complete list of contributors is too large to  list  here,
     but is distributed with the ispell sources in the file "Con-
     tributors".

VERSION
     The version of ispell  described  by  this  manual  page  is
     International Ispell Version 3.1.00, 10/08/93.
Партнёры:
Хостинг:
Закладки на сайте
Проследить за страницей
Created 1996-2026 by Maxim Chirkov
Добавить, Поддержать, Вебмастеру