Unternehmensberatung Dieckmann

Home Report excerpts References DITECT -
Spelling check
Detailed program descriptions Price List
Languages
InDesign PlugIns Contact

 DIHYPH

 hyphenation

 Silbentrennung

DITECT

spelling-check 

Rechtschreibprüfung 

































































































































































































































































































































































































































































































DITECT Interface




DITECT-calling and -returning



As DITECT partly uses DIHYPH program-functions, the calling program has to take
care that the wanted pathname is set in both arrays "dtpath[100]" (for DITECT)
and "dhpath[100]" (for DIHYPH) before DITECT (or DIHYPH) is called.

Typesetting-system defines textarea to be checked by DITECT as follows:

NT:                          /* Get next text area for spell-checking */
       :
   afc = int-index of first text-character to be checked.
   alc = int-index of last  text-character to be checked.
NP:
   rc  = DTECT (nn, text);   /* nn = int-language-no. (1 =German)     */
   if (rc   == -1) ... ;     /* Program error, missing files. Abort   */
   if (errm >   0) ... ;     /* Evaluate error markings.              */
   if (afc  < alc) goto NP;  /* Check remaining part of text.         */
   else            goto NT;  /* Now get next text-area for checking   */

END:                         /* At end of job, typesetting-system     */
   DHCLOSAL();               /* closes all open files and             */
   if (DHSTAT(dtxc) == 0)    /* If "DTnnEXC.mmm" is empty,            */
     { DHDELET(dtmp);        /*   delete "DTnnEXC.mmm"                */
       DHDELET(dtxc);        /*   and    "DTnnTMP.mmm"                */
     }
   else
     { if (etmp == 1)        /* else and if wanted so,                */
       DHDELET(dtmp);        /* delete only "DTnnTMP.mmm"             */
     }
   DHFREEAL();               /* then free all RAM-allocations.        */

'afc' und 'alc' are defining text area to be spell checked.
Size of this area is unlimited as it is checked sentence by sentence !

After returning from DITECT with 'errm' > 0, typesetting-system has to
evaluate character-array 'charr[ ]' to find errors marked and has to position
text-editor-cursor directly on the erroneous position of text.
Correct words, falsely marked by DITECT as not found in dictionary, may be
stored immediately "short-" or "medium-term" (see: 'ftmp').
From then on, DITECT will 'know' them.

If possible, DITECT always ends checking at end of one sentence, stores
index of next following sentence into 'afc' and returns to calling program
that - after evaluating all marked errors - again calls DITECT, until the
defined text-area is checked.
When 'afc' > 'alc the calling program defines next text-area a.s.o.



Return-array 'charr'



After returning from DITECT, typesetting-system has to evaluate array 'charr'
   to get position and type of spelling error.
Charr-field 0      = 2-byte error count.
Charr-field 1 - n = 4-bytes, holding character-informations.
Characters unimportant for spelling check are skipped.
Lenght of 'charr' is: 0 to cap-1 ('cap' = int-value).
Maximum length of "charr"-array is defined by int-value 'charm'.


                          error         error
                            |             |
Example-sentence:           i t ' s  a  t y x t - l i n e .

                          | |        |  |       |       |   |
Hex. character-index:    00 01      06 08      0C      10  12
     _______________________|        |  |       |       |________________________________________
     |                           ____|  |       |____________________                           |
|    |                           |      |                           |                           |    |
| charr -field:                                                                                      |c
| 0|  1   |  2   |  3   |  4   |  5   |  6   |  7   |  8   |      |      |                  ...  n   |a
|  |      |      |      |      |      |      |      |      |      |      |      |      |      |      |p
|ee|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|ii|c|e|
|02|01|2|2|02|0|0|03|0|0|04|0|0|06|1|0|08|1|0|09|0|1|0A|0|0|0B|0|0|0C|0|0|0D|0|0|0E|0|0|0F|0|0|10|0|0|
|--|------|------|------|------|------|------|------|------|------|------|------|------|------|------|-
|0 |2  4 5|6  8 9|10    |      |      |22     ...                                           = Byte-no.

  |  | | |                                         |
  |  | | |                                         |
  |  | | |____  Error-Byte set ! __________________|
  |  | |          (see:  "Error-type")
  |  | |
  |  | |______  Char.-byte: Char.-type is
  |  |            00      = Letter, hyphen (-), apostrophe (') or colon (.)
  |  |            01      = Start of word
  |  |            02      = Start of sentence
  |  |            04      = Ending abbreviation dot  ( etc. )
  |  |
  |  |________  Two index-bytes  ( = position of text-character).
  |___________  No. of errors found  (or int-value  'errm' ).


Error-type

DITECT sets 7 different error indices for different types of spelling errors.
In array "errtp[]" a special error-code may be defined for every error index
just as text-/publishing-system needs it.

e.g.:
errtp[] = { 2, 4, 6, 8, 10, 12, 14, 0 }
or better:
errtp[] = { 1, 2, 3, 4, 5, 6, 7, 0 }
            |  |  |  |  |  |  |  | Index  Type of spelling error        
            |  |  |  |  |  |  |  |_  0    unused
            |  |  |  |  |  |  |____  7    automatic replacement
            |  |  |  |  |  |_______  6    word refused by user
            |  |  |  |  |__________  5    space is missing
            |  |  |  |_____________  4    double words
            |  |  |________________  3    wrong capital initial letter
            |  |___________________  2    wrong small   initial letter
            |______________________  1    incorrect spelling
In case of:

    errtp[] = { 1, 2, 3, 1, 1, 1, 1 }
all errors are of type "incorrect spelling" (=1), except of
"wrong small" (=2) or "wrong capital" (=3) initial letter.


Error-type defined in "errtp[]" is stored into "error-byte" of array "charr"
whenever an error occurs.
When user doesn't want words of a specific error-type to be marked by DITECT, he
may set that error-type to "0" in "errtp[]", e.g. in case of:

    errtp[] = { 1, 2, 0, 1,  1,  1, 1 }
all errors resulting from wrong capital initial letter are ignored by DITECT.


Two consecutive words

a) and both words are correct:
They might be incorrect as a combination (e.g. 'Barbara Streisand') when this
combination is found to be refused in dictionary.
As checking all combinations decreases program performance it is only done
when +8 is added to switch "mexsw".
When such an expression is incorrect (=refused), +50 is added to error type
6 or 7 (56 or 57) to signal that both words together
- have to be rejected (error-type 56) or
- have to be automatically replaced (error-type 57).

b) and one or both are incorrect (e.g. 'Barbra'):
They might be correct as a combination (e.g. 'Barbra Streisand') when this
combination is found in dictionary.


Error-type 6 (or 56): Rejected expression

When DITECT marks an expression by error-type 6 (or 56= two words), a list is
displayed showing one or more words line by line. User may select one of these
words to replace the incorrect text word.
When the replacement happens to be at start of sentence, initial letter of the
selected proposal must be capital. This is easily done when calling program uses
following function, where "ptr_prop" is "char-pointer" to the selected proposal:
DTCAPIT (ptr_prop);


Error-type 7 (or 57): Automatic replacement

When DITECT marks an expression by error-type 7 (or 57= two words), the calling-
system will find the replacement expression in first or second line of proposal
list (percentage 101), but must not display this proposal list !
When the automatic replacement happens at start of sentence, the conversion to
capital initial letter is done automatically by DITECT.



Proposal word list



When DITECT has found a spelling error, array 'prbuf' holds max. 20 words
most similar to the erroneous word.

Every word in this proposal list is stored in 50 bytes, where always the first
byte holds binary percentage of similarity, followed by the word, ending with
binary zero. Unused 'prbuf' - lines have a percentage of binary zero.

e.g. when e.g.
errours is an unknown or incorrect word, proposal list looks like:

82 e r r o r s
81 e r r o r
71 e r r o r f u l
66 E r r o l
 :
 :
|  |                        |
0  1 2 3 4 5       ...     49   = 'prbuf'-index  0-49


Attention

When DITECT has to check not only one word, but a text article with one or more
sentences, the calling system has to call DITECT as follows:

1. Set switch "prbs= 0;" before calling, so DITECT finds all error words within
the text and stores error-index and error-type in array 'charr'.

2. Don't display all error-marked words at once but one after the other.

3. Before displaying it, evaluate type of error in array 'charr' and decide if
proposal list is useful to correct this word (normally only for error-types
1, 3 and 6). If yes, call DITECT again to check only that erroneous word but
with switch "prbs= 1;".
After the word is checked by DITECT, display the proposal list, wait for user
action and look for next error in array 'charr' (repeat action 3. a.s.o).


In case of:
- Proposal list switch 'prbs' = 0 (see file 'DTDFLT.CFG'),
- Double words (... word word ...),
- Incorrect small initial letter at start of sentence,
- Missing space error
a proposal list is not stored (all 20 percentages in 'prbuf' are binary zero).

When +1 is added to parameter "usuk" (see file: dtdflt.cfg), unwanted exception
words like Photo* are always displayed as first proposal (with three ending***)
to show why this (perhaps correct looking) word is marked by DITECT.

Program speed:
When DITECT is searching for proposal words, it is assumed that first two letters
of the word are correct, e.g. incorrect word "widerholen" would show the correct
proposal "wiederholen", but in case of "weiderholen" that proposal is not found,
as second letter is incorrect. Here switch "usuk" +2 (= 2 or 3) can help, but
program performance goes down as many more words have to be checked.



File - description



File DTnn.BIN
is the strongly compressed binary dictionary containing (nearly) all words
or expressions of language nn .
File address plus 18 holds (4 digits) Version-No. e.g. 3.09 !

File DTEXnn.TXT
has to be considered as an appendix of file DTnn.BIN
When growing very large, this file should be inserted into DTnn.BIN, which
can only be done by U.B. Dieckmann.
This may happen perhaps once a year, perhaps never.
After doing so, this file has to be erased from user's disk.

File DTnnEXC.mmm
When an error is marked by DITECT, user may decide if the word is really
incorrect or not. If it is incorrect, he will correct it.
If it is correct, user may decide whether to store it medium-term into this
file depending on switch 'ftmp'.
There must be an interaction between user and system to call the storing-
function.
There may be files like this with up to 999 different mmm-numbers.
These files are never automatically erased as long as they are not checked
and stored into file "DTEXnn.TXT" by an authorized person e.g. by calling
program "DTEXA.BAT", which automatically also creates the new catalogue
"DTEXnn.CAT" and erases the medium-term files.

File DTnnTMP.mmm
If a marked word is correct, user may decide whether to store it short-term
into this file depending on switch 'ftmp'.
There must be an interaction between user and system to call the storing-
function.
This binary file prevents DITECT from stopping again and again at the same
unknown expression. Once a word is stored here, it is not marked again.
As such expressions (e.g. names etc.) usually are text-document dependant,
this file should be erased (ftmp = 5 or 6) at end of document, as keeping
it for longer time would decrease program speed very much.


User-dependant file-no. 'usef'

Parameter 'usef' = 0;
Every workstation automatically gets a new free file-no. mmm for files
"DTnnTMP.mmm" and "DTnnEXC.mmm" (nn = language-no., e.g. 01 for German).

Parameter 'usef' = mmm;
The workstation that defined this number (mmm e.g. = 24) is working with files
"DTnnTMP.24" and "DTnnEXC.24", no matter if these files already exist or not.
This workstation-defined user-no. of 'usef' must not be set via configuration-
file "DTDFLT.CFG" because this file is on the server and therefore valid for
every user, but this 'usef'-definition has to be requested from user by calling
system "HERMES" and set into 'extern int usef'.

For this case the 'usef'-definition has to be erased from file "DTDFLT.CFG", else
the 'usef' value set by calling system would be overwritten by the "DTDFLT.CFG"-
'usef' definition !

The same may happen with other user-dependant definitions such as:
'csch', 'ftmp', 'minwl', minkl and 'mexsw' !



Calling "short-" or "medium-term" storage


DTSTORW ( text, wi, ftmp);
            |    |   |_   ftmp =  Storage-switch
            |    |
            |    |_____   wi   =  Index to start of word in array  'charr'
            |               e.g.:  word "tyxt" has  'wi' = 22 .
            |               Textword thus defined is stored without
            |               possible typesetting-commands into  "short-"
            |               or  "medium-term"  file, depending on value
            |               of  "ftmp".
            |
            |__________   text to be checked.


To store words, instead of "DTSTORW" also another function may be used,
when the single word ends with binary zero and when there are no typesetting
commands within the word:

DTFILSS (word, 1, ftmp);
          |         |___ 0  file-storage switched off
          |              1  short-term file-storage
          |              2  short- and medium-term storage
          |
          |_____________ pointer to word to be stored.



Global values definable by user.


following values may be changed either by file  DTDFLT.CFG
or  - if possible -  by publishing system via keyboard:

name       value Meaning                                      default
mexsw            multiple search:                                 6
             0   = switched off
             1   = on combined-words (e.g. Jo-Ann)
             2   = on combined-words and
                   on compoundwords (see: minkl)
            +4   = on double words  ".. word word .."
            +8   = on two correct neighbouring words

minkl        n   Minimum length of word compounds.                5

prbs             proposal-word-list:                              1
             0   = switched off
             1   = switched on

usuk         1   = refused words are displayed***                 1
            +0   = Standard proposal search (improved speed)
            +2   = Standard proposal search (lower speed)
            +4   = Strong   proposal search (slow speed)
            +8   = Limited  proposal search (high speed)

csch             Check capital/small initial letter:              6
             0   = switched off
             1   = within sentence
             2   = at start of and within sentence
            +4   = Don't check words with 1-4
                      capital letters, e.g.  UBD
            +8   = Don't check words following "

ftmp             Storage of new (unknown) words:                  6
             0   = switched off
             1   = short-term  (write/read)
             2   = medium- and short-term
            +4   = delete short-term file at end of job

usef       nnn   Use short-/medium-term file-no.  nnn
             0   = new file-no. is automatically defined          0

charm            Max. text-size (charm:4 =2500 characters)    10000