Program t1accent -- adds a new accents into Type1 fonts (pfb) ============================================================= Petr Ol\v{s}\'ak April 1998 The program t1accent reads the font in Type1 format (pfb) and adds new accented letters using callsubr Type1 operator. The program behavior is controlled by configuration file. The following information is stored here: -- the names of new characters -- the file with encoding vector of new generated font, -- the declaration of the position of accents with respect of the base characters, -- the accent shape defined as PostScript procedure (using moveto, lineto, curveto, rmoveto...) The output of the program is the text file with the pseudo-PostScript language of the new font in the same format, as the output from t1disasm (the program from free t1utils package). You can add another modification into this file manually (or by common text filters) and then you can convert the result into pfb file using t1asm (from t1utils package). The hinting information for the new accented character is copied from base character. Usually, new accents need not the new hinting. The callsubr Type1 operator for accents is more flexible than seac operator because of strictly given Standard Encoding for accents and base characters in seac operator. Two different shapes of the same accent is usually needed for lowercase/uppercase letters, but it is impossible via seac operator. The "overlapping" problem is not solved in t1accent program. It means, that new accents have to have their own path independent on the path of the base character. If these paths are overlapped, the behavior of rendering algorithm of PostScript RIP is undefined (the overlapped area is sometimes white and sometimes black). This limit of t1accent program has no matter for Czech and Slovak accented letters, but it become more important for another languages (such as Polish). If overlapping exist, you can use Fontographer or such similar program for corrections of this problem manually. 1. Running of t1accent ---------------------- You can start the t1accent from command line with one parameter: the name of configuration file: t1accent config.file The behavior of the program is defined in configuration file including the names of input *.pfb files. No another command line options are implemented in this version of the program. 2. The recommended action for making a new font with accented letters --------------------------------------------------------------------- Suppose, we have the "font.pfb" and "font.afm" files of some original Type1 font without accents. We prepare the new font with information about it stored in files "csfont.pfb", "csfont.afm" and "csfont.tfm". The story is divided into several steps and several programs from public domain is used. 1. Convert the font.pfb into human readable form using t1disasm from t1utils: t1disasm font.pfb font.tmp 2. Prepare the t1accent configuration file. You can use the *.tab files from t1accent package and modify it for your purposes. You can copy some parts of code from font.tmp (for example the code of caron accent may be included as new procedure /lc-caron and /uc-caron may be derived from the same code, but modified (using 1 0.7 scaled, for example), because the caron form capital letters has a different shape. 3. Choose the implicit encoding vector of the new font using appropriate *.enc file (see ENC declaration in configuration file). Note: the implicit encoding vector of the font is irrelevant for processing of TeX dvi files using dvips, because the needed encoding vector can be set in psfont.map (configuration file of the dvips). On the other hand, the compatibility of names with the shapes of the characters are important (Attention: some DTP studios produced fonts with absolutely incompatible names!) 4. Convert the font.pfb to font.pps using: t1accent config.file 5. Edit the font.pps: You can add some copyright notices and so on. 6. Convert the font.pps into a new csfont.pfb using t1asm from t1utils: t1asm -b font.pps csfont.pfb If you need to prepare a new csfont.afm file, you can use the free printafm.ps and process it using Ghostscript. The resulting csfont.afm file has a good implicit encoding but the kern information is missing. You can use some programs to add kern information from original font.afm (my program a2ac, for instance), but the result has bad implicit encoding. If you combine and merge both results of previous processes, you can get the csfont.afm with right implicit encoding and with correct kern information. If you need to prepare the csfont.tfm file, use fontinst or afm2tfm on the file csfont.afm from previous step. The afm2tfm have to be processed to make the unused virtual font in order to save the kern information in tfm properly: afm2tfm csfont.afm -T file.enc -v csfont rfont file.enc have to be the same as encoding file used in step 3 and 4 above. The resulting rfont.tfm has no kerning information and can be removed. vptovf csfont.vpl csfont.vf csfont.tfm csfont.vf is only a by-product and is not needed. Remove it: rm rfont.tfm csfont.vpl csfont.vf rfont.tfm 3. The format of the configuration file --------------------------------------- The configuration file of the t1accent is text file. The end of line is interpreted as a space. The '%' is a comment character -- all characters from '%' to the first end of line (including this one) are ignored. The tab marks, carrigue return marks (visible from DOS file in UNIX) are interpreted as spaces. More consecutive spaces are interpreted as one space. The keywords NAME, SRC, IN, ENC, REM, VAR, SUB, CHAR, BBOX and END (the uppercase letters are necessary) can be included into configuration file. The keyword have to be separated by space. The information after these words has a special format (described below). If the information after any keyword is completed, another keyword can be used or the same keyword can be used repeatedly (excluding the keywords NAME, SRC, ENC and END). The order of the keywords are irrelevant with this exceptions: * The END keyword ends the reading of configuration file. * If some variable is used, the declaration VAR of this variable have to be _before_ this point. * If some name of subroutine is used, the declaration SUB of this subroutine have to be _before_ this point. * The ordering of the SUBs and CHARs in configuration file is the same as the ordering of relevant information in *.pps output. But this has no matter on behavior of the new font. Now, the syntax and semantic of all keywords will be described in detail. ***** The NAME keyword has the form: NAME Original-Name -> New-Name The symbol "->" surrounded by spaces is necessary. Program replaces all occurrences of sequences "Original-Name" by sequence "New-Name" in the first part of font text (this part is ended by "/Encoding" sequence). If you need the more complicated "find+replace" behavior, you can use sed, awk, perl or some similar on the output text file *.pps after t1accent processing. Example: NAME Times-Roman -> Times-Roman-CZ ***** The SRC keyword has the form: SRC /directory/with/source/files/ The text after SRC keyword declares the directory, where the *.pfb input files will be found. The character '/' (in UNIX) or '\' (in DOS) is necessary at the end of the directory text. The SRC keyword may be presented at most once in configuration file. If SRC declaration is absented, program opens input files in current directory. Examples: SRC /usr/local/share/texmf/fonts/type1/public/cm/ or SRC d:\emtex\fonts\type1\cm\ ***** The IN keyword has the form: IN name1 name2 ... The no-empty list of names is separated by spaces. The list is delimited by an occurrence of a new keyword NAME, SRC, IN, ENC, REM, VAR, SUB, CHAR, BBOX or END. Thus, the keyword itself cannot be the element of the list. The elements of the list are names of input *.pfb files (without extension!). The program t1accent is able to read these files, if they are present in the directory declared by SRC keyword. Program creates an output file with extension .ppf for every input file listed in the list in the current directory. If you use IN repeatedly, this has the same effect as using IN only once with all names specified. For example: IN name1 IN name2 END is the same as: IN name1 name2 END ***** The ENC keyword has the form: ENC name All another information after "name" (separated by space) is ignored until a new keyword occurs. The ENC keyword have to be present exactly once in configuration file. The "name" is the name of *.enc file (without extension!). The implicit encoding vector of a new created font is defined in this *.enc file. The format of *.enc file is the same, as a format of *.enc files in dvips package. The t1accent reads the information only between [ and ] brackets in this file. The number of charnames between these brackets have to be exactly 256. ***** The VAR declaration has the form: VAR /name value The name of a new declared variable have to be preceded by slash and "value" is the usual real number (possibly with a minus sign and/or decimal point). The different name of the variable from charnames in the font and SUBs in the configuration file is strongly recommended. The another information after "value" (separated by space) is ignored until a new keyword occurs. The using of def operator after "value" is recommended because the arithmetic parser with postfix operators ended by def is planned to implement into a future version of t1accent. You can write for example: VAR /ac-shift 250 def and the form of: VAR /new-variable old-variable dup mul neg def is planned in a future version. The "value" is stored into variable width the name "name" for future use in SUB or CHAR declarations. ***** The SUB keyword -- a declaration of a new subroutine has the form: SUB /name { subroutine text } All information after the closing '}' is ignored until a new keyword occurs. You can add the def operator after closing '}' only for a more "elegance". The "name" of the procedure have to be preceded by slash and delimited by space. The "subroutine text" is the sequence of a postfix operators with is operands. The following operators are possible: 1. The special t1accent operators: scaled vvaxis 2. The PostScript operators: moveto lineto curveto rcurveto 3. All Type1 Buildchar operators: hsbw sbw callsubr return seac closepath rmoveto hmoveto vmoveto rlineto hlineto vlineto rrcurveto hvcurveto vhcurveto dotsection hstem hstem3 vstem vstem3 div ... callothersubr pop setcurrentpoint For more information about PostScript and Type1 operators see the [1] and [2]. Program t1accent transforms the PostScript operators into Type1 Buildchar operators and interprets the scaled and vvaxis operators. The subroutine text is written into output (after conversion) as a new Type1 subroutine. Program t1accent calculates the position of current point. If you does not use the return operator in subroutine text, program adds the two operators on the output: <-x0> <-y0> rmoveto return where (<-x0>, <-y0>) are coordinates of current point. The result is: the current point has the same position before and after subroutine is executed. If you need not this "intelligence", you can write return operator explicitly in subroutine text. The scaled operator sets the actual coefficients and of scale of all relative dimensions in operands of Type1 operators. The initial values is (1, 1) at the start of SUB. The repeatedly used scaled operator changes and relatively with previous values. For example: SUB /ac-accent { 100 100 scaled % the global scaled factor 2 7 moveto % current point = (200,700) 1 .5 scaled % the whole path will be deformed with respect % to the point (200,700) in the y-axis direcion 5 5 lineto % this will be transformed to: % 100*(5-2) 0.5*100*(5-7) rlineto 8 7 lineto % this is: 100*(8-5) 0.5*100*(7-5) rlineto 5 5.5 lineto closepath } Program converts this subroutine text into new Type1 subroutine described in pseudo-PostScript language: dup { 200 700 rmoveto 300 -100 rlineto 300 100 rlineto -300 -75 rlineto closepath -500 -625 rmoveto return } NP Note, that the -500 -626 rmoveto returns the current point to (0,0), because the Type1 closepath does not move the current point (in opposite to the same named PostScript operator). The vvaxis operator sets x-coordinate of the position of visual vertical axis of the accent. This information will be used by putcenter operator in CHAR declaration. If vvaxis is not used, the position of the visual axis is calculated as the center of bounding box of the path drawn by the procedure. From the previous example, the vvaxis is not used, thus the x-coordinate of visual axis is 200+800/2 = 500. The vvaxis sets the new value with respect to the current point and scaled factor, but current point is not moved. For example, the .5 vvaxis before the first moveto and after the first scaled sets the x-coordinate of the visual axis to 50. The same .5 vvaxis after the first moveto sets the value to 250 and .5 vvaxis after the 5 5 lineto sets the value to 550. It is recommended to set the vvaxis of the accents with respect to the slant factor of the font, because the putcenter operator in CHAR does not calculate the slant factor. You can use the numeric operands or variable names or subroutine names in subroutine text. For example: SUB /one-dot { 200 700 moveto 200 670 225 645 255 645 curveto 285 645 310 670 310 700 curveto 310 730 285 755 255 755 curveto 225 755 200 730 200 700 curveto } SUB /lc-dieresis { 155 vvaxis % x-coordinate of the vvaxis: 200/2 + radius of the dot one-dot callsubr 200 hmoveto one-dot callsubr } yields to: dup { 200 700 rmoveto -30 25 -25 30 vhcurveto 30 25 25 30 hvcurveto 30 -25 25 -30 vhcurveto -30 -25 -25 -30 hvcurveto closepath -200 -700 rmoveto return } | dup { callsubr 200 hmoveto callsubr -200 hmoveto return } | Note that: 1. The t1accent automatically inserts the closepath operator, if this operator and return is not used explicitly and if the subroutine uses some "drawing" operator (rlineto, rrcurveto, ...). 2. When the t1accent calculates the current point in a subroutine, it never inputs into nested subroutines, thus the calculation of bounding box may by incorrect. The explicit vvaxis is recommended for such situations. ***** The CHAR keyword -- a declaration of a new character There exist three variants of using the CHAR declaration: 1. Inserting an accent: CHAR /new-name /base-name /sub-name put-operator 2. Correction of values of base character and (possibly) inserting a new accent: CHAR /new-name /base-name correct /sub-name put-operator 3. Definition of an new character: CHAR /new-name { character text } "new-name" is the name of a new declared character, "base-name" is the name of the base character (it have to be present in input *.pfb font) and "sub-name" is a name of procedure for accent. You can use the reserved name ".noinsert" (write /.noinsert including slash) instead "sub-name", iff no accent is needed to insert. The character text (see the alternative 3) can include the same information as the subroutine text in SUB declaration. Only return operator is not allowed and endchar operator is possible. If the endchar operator is not present, t1accent includes this operator (and may be closepath) automatically. The hsbw or seac or sbw is needed in character text (see [2] for more details). The correct operator (see the alternative 2) interprets its operands in following way: ... left sidebearing point is changed: = + ... the glyph of character is shifted: vmoveto ... parameter of hsbw is changed: = + Note: shifting of a glyph in x-axis direction is performed via changing of left sidebearing point because the current point is initialized to the left sidebearing point at the begin of all characters. The correct operator reads the parameters of the hsbw operator in base character and changes its operands , and includes the vmoveto before the first "drawing" operator. It changes the operators of hstem and hstem3 with respect to parameter of shifting of the glyph. The put-operator is one from following words: putorigin ... puts the accent from the procedure with respect to the origin putsidebar ... puts the accent from the procedure with respect to the left sidebearing point putcenter ... puts the accent from the procedure with respect to the center of the character and vvaxis of the accent putafter ... puts the accent from the procedure with respect to the next character origin calculated from character width. The put-operator has two numeric operands and . The accent is shifted by to right and by to up. Let us describe the inserting of the procedure more precisely now. Let the base character has hsbw operator with operands (possibly corrected by correct operator): hsbw If putsidebar is used, then program copies the base character text into new character text and inserts the following text before the first callsubr or first "drawing" operator: rmoveto callsubr <-x> <-y> rmoveto The putorigin inserts: rmoveto callsubr <-x+sbx> <-y> rmoveto When putcenter is used, we get the result: rmoveto callsubr <-x+sbx+cx-wx/2> <-y> rmoveto where the is a x-coordinate of the vertical visual axis of the accent (see SUB declaration, namely the vvaxis operator). The putafter yields: rmoveto callsubr <-x-wx+sbx> <-y> rmoveto The t1accent program does some optimization of character text code. It means, that the two consecutive rmoveto is parsed into one and the rmoveto is changed to hmoveto or vmoveto, if it is possible. The subroutine have to return the current point to the same position as it was at the start of subroutine. If no the unexpected result occurs: the glyph of base character is shifted. If the return operator was not used in subroutine text, the condition of the current point position after callsubr is satisfied automatically. ***** The REM keyword -- remark has the form: REM "text of the notice" The "text of the notice" (without quotes but with '%' comment character) is added after the first line in *.pps output file. The first line of *.pps output is unchanged because the '%!' is supposed in input file and the same line is copied to the output. The quotes around "text of the notice" are necessary. Two or more REM keywords makes two or more comment lines on the output. Example: REM "TestFonts -- pfb version 1.0, generated from XY fonts. NO WARRANTY." REM "Accents are added using t1accent program, (c) Petr Olsak, 1998" The first three lines on the output look like: %!FontType1-1.0: TestFont-Roman-BoldItalic 12345 % TestFonts -- pfb version 1.0, generated from XY fonts. NO WARRANTY. % Accents are added using t1accent program, (c) Petr Olsak, 1998 ***** The BBOX keyword has the form: BBOX where are integers: (, ) are coordinates of lower left corner and (, ) are coordinates of upper right corner of BoundingBox of one character. You can use more BBOX keywords -- program t!accent calculates minimum of , and maximum of , for all BBOX used. Program calculates and (no , ) from each composed character declared via CHAR /new-name /base-name /sub-name. Then program reads the global FontBBox from input and calculates new FontBBox from calculated values and values from FontBBox. New values are written into new FontBBox information. ***** The END keyword -- end of the configuration file If the END keyword occurs in configuration file, all another text after this END mark is ignored. The END keyword has the same effect as the end of configuration file. 4. References ------------- [1] Adobe Systems Incorporated. PostScript Language Reference Manual (Red Book), Addison Wesley, 2nd edition, 1990. [2] Adobe Systems Incorporated. Adobe Type1 Font Format (Black Book), Addison Wesley, 1990. The full text is available on ftp.adobe.com in PDF format. [3] Ol\v{s}\'ak Petr. Typografick\'y syst\'em TeX (Typesetting system TeX), CSTUG, Prague 1995.