Unternehmensberatung Dieckmann

Home Report excerpts References DIHYPH -
hyphenation
Detailed program descriptions Price List
Languages
InDesign PlugIns Contact

 DIHYPH

 hyphenation

 Silbentrennung

DITECT

spelling-check 

Rechtschreibprüfung 






























































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































DIHYPH Hyphenation Interfacing




DHINT.C is an interface between textsystem and all DIHYPH-hyphenations.
It not only recodes textcharacters into hyphenation code, but also takes into
account all:
- language specialities
- typographic rules
- compound words
- special character handling
- letter standardization
- text-/typesetting-commands



Calling hyphenation



Before calling hyphenation, the following parameters have to be / may be set
by textsystem (see also "DHDEF.C" and "DHEXT.H" ):


Parameters to be set each time !
to describe the position of the "word" to be hyphenated within text array "line":

afc      = index of first letter of "word".

eol      = index of letter in "word" exceeding right margin

alc      = index of last  letter of "word".
           To get all possible hyphens, set:  eol >= alc;


Parameters that may be overwritten !                  Default:

vs       = Minimum length of first syllable.          2       *)

ns       = Minimum length of last  syllable.          2       *)

minwl    = Minimum word length to be hyphenated       4       *)

spprm    = Bit 1: 4711-Splitting allowed
               2: "eol" = last hyphen position

dhpath[] = Directory name holding runtime files       \dihyph\

exfile[] = Name(s) of special exception file(s)

codspac  = RAM size for code-files DHCOnn             257L    *)

tabspac  = RAM size for one/more table-files DHTAnn   29000L  *)

excspac  = RAM size for largest exception catalogue   2000L   *)

exdspac  = RAM size for largest exception record      8000L   *)

                                 Language dependant:          *)

*) All default values may be overwritten by user
by editing file 'DHDFLT.CFG'
before calling DIHYPH.



Textsystem calls hyphenation by:


rc = DHYPH(line, nn);            *)

rc =  0:  O.K.
     -1:  File "DHCOnn" or "DHTAnn" not found or wrong.
     -3:  Incorrect language-no.

line   =  Character array defined in text system
          holding word to be hyphenated.

nn     =  language-no. (01 =German, 02 =English, etc.). 
*) For Unicode version DHYPH.C has to be replaced by DHYPHEUC.c !

Once installed every DIHYPH update version and/or every new language added
to the system is just a couple of disk files.
Compiling and linking of programs is not necessary for that.



Returning from hyphenation



three external parameters (defined in hyphenation) have to be evaluated by
textsystem (see: "quality ranking" and "Evaluation of array radr"):

had     = integer index of "line" character to be shifted to next
          line before inserting hyphen at this position

          (had = 0:  no hyphenation possible !).

hpw     = see: Evaluation of array RADR.

ic      = character holding letter to be inserted at "had".
Note: Index of first "line"-character is 0, etc.


Some examples:      [ ] means "delete"     ( ) means "insert"

Letter      Return-
index:      parameters:
0123456789  had hpw ic  print line after  H & J
----------  --- --- --  ---------------------------

aber        0   0       ...........................
                        aber.......................

Jo-Ann      3   1       ........................Jo-
                        Ann........................

Schiffahrt  5   2   f   ..................Schif(f-)
                        fahrt......................

asszony     2   2   z   .....................as(z-)
                        szony......................

Couve-Flor  5   3   -   ...................Couve(-)
                        -Flor......................

Dackel      3   6   k   ..................Da[c](k-)
                        kel........................


Hyphenation demonstration resulting from 8 different
"end-of-line" (right margin) conditions.

Sample-word... 4711(System.22)-NN./AB2'Processor
right             :  :      : : :  :   :      :
margin........... 1  2      3 4 5  6   7      8
:
: results:
:              .................................
1              4711(System.22)-NN./AB2'Processor

2              .................................
               4711(System.22)-NN./AB2'Processor

3              ........................4711(Sys-
               tem.22)-NN./AB2'Processor........

4              ........................4711(Sys-
               tem.22)-NN./AB2'Processor........

5              .................4711(System.22)-
               NN./AB2'Processor................

6              .............4711(System.22)-NN./
               AB2'Processor....................

7              .............4711(System.22)-NN./
               AB2'Processor....................

8              ..4711(System.22)-NN./AB2'Proces-
               sor..............................



Action Codes



Before entering DIHYPH hyphenation logic every text character is automatically
converted into language specific DIHYPH Action Code by interface program
DHINT.C using code-file "DHCOnn" (nn=language-no.).

Text Code                Action-Code (hex.)

letters:                       01 - 1E  depending on language

others :
       Ignored character       00
       Space                   20
'      Apostrophe              22
*      Asterix                 23
-  /   Hyphen characters       24
       Forbidden hyphenation   25
+      Plus                    26
#      Number sign             27
.      Point                   28
,      Colon                   29
(      Bracket on              2A
)      Bracket off             2B
       Forced hyphenation      2C
0 - 9  Numbers                 30 - 39

`      Accent grave            41
'      Accent acute            42
ùù     Accent dieresis         43
ø      Accent angstrom         44
       other  accents          00

       all other characters    21



DHCOnn code files

(nn =language no.) have a specific construction:


Lines starting with:

1)     Blank   or    ---   are treated as comment lines

1.1)   ---A    Any non-hyphen-character in position four of first line
               means, letters A to Z and a to z are standard.
               ASCII (hex. 41 - 5A and 61 - 7A).  Else see point 3.1 .


2)     '       (Apostrophe) have a maximum length of 25 characters each.
               These lines have to start and end with ' apostrophes,
               the text between them is error text used in exception
               dictionary programs.
               This text may be translated into other languages.


Construction of action code

3)    UUAA  An Action Code is represented by 4 hexadecimal digits.
            Lower two hex. digits  (AA) are described above
            (see: Action Code).

3.1)        Higher two hex. digits (UU) are used for 'special'
            lower-/uppercase conversion:
            Every lowercase non-standard-letter holds hexa.
            position of corresponding uppercase letter in first
            two hex. digits  (UU).

            Sample Action Code  8E01  means:
            Letter  ä  gets Action Code  01  and is converted to
            letter  Ä  on hex. position  8E  (see: DHCO02 table
            next page !).
Note
Special command codes from word composition systems are not allowed in
DHCOnn files !



Hyphenation quality ranking



DIHYPH hyphenations are not only able to hyphenate with highest accuracy
but are also able to return ranking 1 (best) to 5 (worst) of every hyphen.

Although grammatically correct, some hyphenations are much better than others.
So very often it is better not to select hyphen next to right margin but another
one if it's ranking is better.
In addition a hyphen ranking better than 4 in all probability is not an incorrect
hyphenation.

Note: Do not look for best rankings only !

Hyphen ranking is either defined by algorithm and program tables or may be
inserted as digits 1 to 5 in exception dictionary.
A hyphen ranking of 4 must not be a bad one but sometimes it's the only one:
sat-4el-4lite  /  no-4to-4rious  /  Chi-4hua-4hua  /  ran-4dom-4ize

Some examples showing effect of hyphenation ranking

( 1, 2 = good      3 = quite good      4 = acceptable      5 = bad):

DIHYPH hyphenation with ranking is much better   than without ranking
----------------------------------------------   --------------------

English:
auto-1mo-5bile           auto-mobile             automo-bile
chemo-1ther-4apy         chemo-therapy           chemother-apy
ex-1ca-5vate             ex-cavate               exca-vate
micro-1or-5gan-4ism      micro-organism          microor-ganism
mid-1sum-4mer            mid-summer              midsum-mer
mis-1in-5formed          mis-informed            misin-formed
mon-1ox-5ide             mon-oxide               monox-ide
per-2se-5cute            per-secute              perse-cute

French:
bis-1an-5nuel            bis-annuel              bisan-nuel
cis-1al-5pine            cis-alpine              cisal-pine
co-2ad-5ju-4teur         co-adjuteur             coad-juteur
trans-1al-5pine          trans-alpine            transal-pine

German:
ent-1ge-5gen-1tre-4ten   ent-gegentreten         entge-gentreten
Fahr-1er-5laub-4nis      Fahr-erlaubnis          Fahrer-laubnis
Non-4nen-2klo-4ster      Nonnen-kloster          Nonnenklo-ster
See-1ad-5ler             See-adler               Seead-ler
Volks-1or-5che-4ster     Volks-orchester         Volksor-chester
wohl-1er-5ge-4hen        wohl-ergehen            wohler-gehen
Zi-4vil-1an-5zug         Zivil-anzug             Zivilan-zug



Evaluation of array RADR

on return from hyphenation


every letter (and combined-word-hyphen) is described within RADR by one
"RADR word" (= 2 integer fields) starting with RADR field 1, first field (iiii) holding
position of letter relative to start of word (afc + iiii), second field holding hyphen-
bits (h), hyphen ranking (q) and letter possibly to be inserted (ic).

RADR-field 0 (bbbb) is index to RADR-field that holds parameters forhyphenation
next to right margin (see values: had, hpw, ic)..
Variable "CAP" is index to end of "RADR"-array.

Following example demonstrates connection between textword and RADR-array,
dots (...) in example word symbolizing possible textcommands or characters
ignored by hyphenation.


Meaning of "radr[nn]" int-words:
int radr[nn]
         nn =  00      01   02      03   04      05   06 ... "cap"
              bbbb    iiii hqic    iiii hqic    iiii hqic

radr-word  description:
---------------------------------------------------------------------
 bbbb      index to radr word holding hyphenation         | one
           next to right text-margin                      | int-word
---------------------------------------------------------------------
 iiii      index to text character                        |
                                                          |
--------                                                  | one text-
 hqic      h   =hyphenation bits:                         | character
                 0000  insert hyphen, split (standard)    | field
                 0001    no   hyphen, split (compound)    |
                 0011  insert "ic" but no hyphen, split   | =
                 0010  insert "ic", insert hyphen, split  |
                 0100  erase letter, insert hyphen, split | two
                 1xxx  hyphen from exception dictionary   | integer
                                                          | words.
            q  =hyphenation quality (ranking 1-5)         |
                                                          |
           ic  =character to be inserted (00 = no)        |
---------------------------------------------------------------------


Sample word: . . D a c k e l - . . S c h i f f a h r t e n . . .
Index:      00  02   04     08    0B      0F      13    16


sample                       m e a n i n g
word   nn nn  radr[nn]   erase: insert: quality:
-----  -- --  ---- ----  ------ ------- --------
              bbbb:
       00     0013

              iiii hqic:
  D    01 02  0002 0000
  a    03 04  0003 0000
  c    05 06  0004 646B     c     k -      4
  k    07 08  0005 0000
  e    09 0A  0006 0000
  l    0B 0C  0007 0000
  -    0D 0E  0008 1100                    1
  S    0F 10  000B 0000
  c    11 12  000C 0000
  h    13 14  000D 0000
  i    15 16  000E 0000
  f    17 18  000F 2166           f -      1
  f    19 1A  0010 0000
  a    1B 1C  0011 0000
  h    1D 1E  0012 0000
  r    1F 20  0013 0400             -      4
  t    21 22  0014 0000
  e    23 24  0015 0000
  n    25 26  0016 0000
----   ----------------
       27  =  'cap'



Resulting
hyphenation:   . . D a k- k e l - . . S c h i f f- f a h r- t e n . . .
                       1        2               3        4

1   erase  'c',
    insert 'k',   insert '-',   split behind '-' of quality 4.
2                               split behind '-' of quality 1.
3   insert 'f',   insert '-',   split behind '-' of quality 1.
4                 insert '-',   split behind '-' of quality 4.

Attention:
As you may see from example above hyphenation bit combinations are possible !