Unternehmensberatung Dieckmann |
---|
Home | Report excerpts | References | DIHYPH - hyphenation |
Detailed program descriptions | Price List Languages |
InDesign PlugIns | Contact |
DHINT.C is an interface between textsystem and all DIHYPH-hyphenations.
It not only recodes textcharacters into hyphenation code, but also takes into
account all:
- language specialities
- typographic rules
- compound words
- special character handling
- letter standardization
- text-/typesetting-commands
Calling hyphenation
Before calling hyphenation, the following parameters have to be / may be set
by textsystem (see also "DHDEF.C" and "DHEXT.H" ):
Parameters to be set each time !
to describe the position of the "word" to be hyphenated within text array "line":afc = index of first letter of "word". eol = index of letter in "word" exceeding right margin alc = index of last letter of "word". To get all possible hyphens, set: eol >= alc; Parameters that may be overwritten ! Default: vs = Minimum length of first syllable. 2 *) ns = Minimum length of last syllable. 2 *) minwl = Minimum word length to be hyphenated 4 *) spprm = Bit 1: 4711-Splitting allowed 2: "eol" = last hyphen position dhpath[] = Directory name holding runtime files \dihyph\ exfile[] = Name(s) of special exception file(s) codspac = RAM size for code-files DHCOnn 257L *) tabspac = RAM size for one/more table-files DHTAnn 29000L *) excspac = RAM size for largest exception catalogue 2000L *) exdspac = RAM size for largest exception record 8000L *) Language dependant: *)
*) All default values may be overwritten by user
by editing file 'DHDFLT.CFG' before calling DIHYPH.
Textsystem calls hyphenation by:
rc = DHYPH(line, nn); *) rc = 0: O.K. -1: File "DHCOnn" or "DHTAnn" not found or wrong. -3: Incorrect language-no. line = Character array defined in text system holding word to be hyphenated. nn = language-no. (01 =German, 02 =English, etc.).*) For Unicode version DHYPH.C has to be replaced by DHYPHEUC.c !
Once installed every DIHYPH update version and/or every new language added
to the system is just a couple of disk files.
Compiling and linking of programs is not necessary for that.
Returning from hyphenation
three external parameters (defined in hyphenation) have to be evaluated by
textsystem (see: "quality ranking" and "Evaluation of array radr"):
had = integer index of "line" character to be shifted to next line before inserting hyphen at this position (had = 0: no hyphenation possible !). hpw = see: Evaluation of array RADR. ic = character holding letter to be inserted at "had".Note: Index of first "line"-character is 0, etc.
Some examples: [ ] means "delete" ( ) means "insert" Letter Return- index: parameters: 0123456789 had hpw ic print line after H & J ---------- --- --- -- --------------------------- aber 0 0 ........................... aber....................... Jo-Ann 3 1 ........................Jo- Ann........................ Schiffahrt 5 2 f ..................Schif(f-) fahrt...................... asszony 2 2 z .....................as(z-) szony...................... Couve-Flor 5 3 - ...................Couve(-) -Flor...................... Dackel 3 6 k ..................Da[c](k-) kel........................
Hyphenation demonstration resulting from 8 different
"end-of-line" (right margin) conditions.Sample-word... 4711(System.22)-NN./AB2'Processor right : : : : : : : : margin........... 1 2 3 4 5 6 7 8 : : results: : ................................. 1 4711(System.22)-NN./AB2'Processor 2 ................................. 4711(System.22)-NN./AB2'Processor 3 ........................4711(Sys- tem.22)-NN./AB2'Processor........ 4 ........................4711(Sys- tem.22)-NN./AB2'Processor........ 5 .................4711(System.22)- NN./AB2'Processor................ 6 .............4711(System.22)-NN./ AB2'Processor.................... 7 .............4711(System.22)-NN./ AB2'Processor.................... 8 ..4711(System.22)-NN./AB2'Proces- sor..............................
Action Codes
Before entering DIHYPH hyphenation logic every text character is automatically
converted into language specific DIHYPH Action Code by interface program
DHINT.C using code-file "DHCOnn" (nn=language-no.).Text Code Action-Code (hex.) letters: 01 - 1E depending on language others : Ignored character 00 Space 20 ' Apostrophe 22 * Asterix 23 - / Hyphen characters 24 Forbidden hyphenation 25 + Plus 26 # Number sign 27 . Point 28 , Colon 29 ( Bracket on 2A ) Bracket off 2B Forced hyphenation 2C 0 - 9 Numbers 30 - 39 ` Accent grave 41 ' Accent acute 42 ùù Accent dieresis 43 ø Accent angstrom 44 other accents 00 all other characters 21
DHCOnn code files
(nn =language no.) have a specific construction:
Lines starting with:
1) Blank or --- are treated as comment lines 1.1) ---A Any non-hyphen-character in position four of first line means, letters A to Z and a to z are standard. ASCII (hex. 41 - 5A and 61 - 7A). Else see point 3.1 . 2) ' (Apostrophe) have a maximum length of 25 characters each. These lines have to start and end with ' apostrophes, the text between them is error text used in exception dictionary programs. This text may be translated into other languages.
Construction of action code3) UUAA An Action Code is represented by 4 hexadecimal digits. Lower two hex. digits (AA) are described above (see: Action Code). 3.1) Higher two hex. digits (UU) are used for 'special' lower-/uppercase conversion: Every lowercase non-standard-letter holds hexa. position of corresponding uppercase letter in first two hex. digits (UU). Sample Action Code 8E01 means: Letter ä gets Action Code 01 and is converted to letter Ä on hex. position 8E (see: DHCO02 table next page !).Note
Special command codes from word composition systems are not allowed in
DHCOnn files !
Hyphenation quality ranking
DIHYPH hyphenations are not only able to hyphenate with highest accuracy
but are also able to return ranking 1 (best) to 5 (worst) of every hyphen.
Although grammatically correct, some hyphenations are much better than others.
So very often it is better not to select hyphen next to right margin but another
one if it's ranking is better.
In addition a hyphen ranking better than 4 in all probability is not an incorrect
hyphenation.
Note: Do not look for best rankings only !
Hyphen ranking is either defined by algorithm and program tables or may be
inserted as digits 1 to 5 in exception dictionary.
A hyphen ranking of 4 must not be a bad one but sometimes it's the only one:
sat-4el-4lite / no-4to-4rious / Chi-4hua-4hua / ran-4dom-4ize
Some examples showing effect of hyphenation ranking( 1, 2 = good 3 = quite good 4 = acceptable 5 = bad): DIHYPH hyphenation with ranking is much better than without ranking ---------------------------------------------- -------------------- English: auto-1mo-5bile auto-mobile automo-bile chemo-1ther-4apy chemo-therapy chemother-apy ex-1ca-5vate ex-cavate exca-vate micro-1or-5gan-4ism micro-organism microor-ganism mid-1sum-4mer mid-summer midsum-mer mis-1in-5formed mis-informed misin-formed mon-1ox-5ide mon-oxide monox-ide per-2se-5cute per-secute perse-cute French: bis-1an-5nuel bis-annuel bisan-nuel cis-1al-5pine cis-alpine cisal-pine co-2ad-5ju-4teur co-adjuteur coad-juteur trans-1al-5pine trans-alpine transal-pine German: ent-1ge-5gen-1tre-4ten ent-gegentreten entge-gentreten Fahr-1er-5laub-4nis Fahr-erlaubnis Fahrer-laubnis Non-4nen-2klo-4ster Nonnen-kloster Nonnenklo-ster See-1ad-5ler See-adler Seead-ler Volks-1or-5che-4ster Volks-orchester Volksor-chester wohl-1er-5ge-4hen wohl-ergehen wohler-gehen Zi-4vil-1an-5zug Zivil-anzug Zivilan-zug
Evaluation of array RADR
on return from hyphenation
every letter (and combined-word-hyphen) is described within RADR by one
"RADR word" (= 2 integer fields) starting with RADR field 1, first field (iiii) holding
position of letter relative to start of word (afc + iiii), second field holding hyphen-
bits (h), hyphen ranking (q) and letter possibly to be inserted (ic).
RADR-field 0 (bbbb) is index to RADR-field that holds parameters forhyphenation
next to right margin (see values: had, hpw, ic)..
Variable "CAP" is index to end of "RADR"-array.
Following example demonstrates connection between textword and RADR-array,
dots (...) in example word symbolizing possible textcommands or characters
ignored by hyphenation.
Meaning of "radr[nn]" int-words:int radr[nn] nn = 00 01 02 03 04 05 06 ... "cap" bbbb iiii hqic iiii hqic iiii hqic radr-word description: --------------------------------------------------------------------- bbbb index to radr word holding hyphenation | one next to right text-margin | int-word --------------------------------------------------------------------- iiii index to text character | | -------- | one text- hqic h =hyphenation bits: | character 0000 insert hyphen, split (standard) | field 0001 no hyphen, split (compound) | 0011 insert "ic" but no hyphen, split | = 0010 insert "ic", insert hyphen, split | 0100 erase letter, insert hyphen, split | two 1xxx hyphen from exception dictionary | integer | words. q =hyphenation quality (ranking 1-5) | | ic =character to be inserted (00 = no) | --------------------------------------------------------------------- Sample word: . . D a c k e l - . . S c h i f f a h r t e n . . . Index: 00 02 04 08 0B 0F 13 16 sample m e a n i n g word nn nn radr[nn] erase: insert: quality: ----- -- -- ---- ---- ------ ------- -------- bbbb: 00 0013 iiii hqic: D 01 02 0002 0000 a 03 04 0003 0000 c 05 06 0004 646B c k - 4 k 07 08 0005 0000 e 09 0A 0006 0000 l 0B 0C 0007 0000 - 0D 0E 0008 1100 1 S 0F 10 000B 0000 c 11 12 000C 0000 h 13 14 000D 0000 i 15 16 000E 0000 f 17 18 000F 2166 f - 1 f 19 1A 0010 0000 a 1B 1C 0011 0000 h 1D 1E 0012 0000 r 1F 20 0013 0400 - 4 t 21 22 0014 0000 e 23 24 0015 0000 n 25 26 0016 0000 ---- ---------------- 27 = 'cap' Resulting hyphenation: . . D a k- k e l - . . S c h i f f- f a h r- t e n . . . 1 2 3 4 1 erase 'c', insert 'k', insert '-', split behind '-' of quality 4. 2 split behind '-' of quality 1. 3 insert 'f', insert '-', split behind '-' of quality 1. 4 insert '-', split behind '-' of quality 4.
Attention:
As you may see from example above hyphenation bit combinations are possible !