Program t1accent -- adds a new accents into Type1 fonts (pfb)
       =============================================================
       Petr Ol\v{s}\'ak                                   April 1998

The program t1accent reads the font in Type1 format (pfb) and adds new
accented letters using callsubr Type1 operator. The program behavior is
controlled by configuration file. The following information is stored
here:
-- the names of new characters
-- the file with encoding vector of new generated font,
-- the declaration of the position of accents with respect of the base
   characters,
-- the accent shape defined as PostScript procedure (using moveto, lineto,
   curveto, rmoveto...)

The output of the program is the text file with the pseudo-PostScript
language of the new font in the same format, as the output from t1disasm
(the program from free t1utils package). You can add another modification
into this file manually (or by common text filters) and then you can
convert the result into pfb file using t1asm (from t1utils package).

The hinting information for the new accented character is copied from base
character. Usually, new accents need not the new hinting.

The callsubr Type1 operator for accents is more flexible than seac operator
because of strictly given Standard Encoding for accents and base characters
in seac operator. Two different shapes of the same accent is usually
needed for lowercase/uppercase letters, but it is impossible via seac
operator.

The "overlapping" problem is not solved in t1accent program. It means, that
new accents have to have their own path independent on the path of the base
character. If these paths are overlapped, the behavior of rendering
algorithm of PostScript RIP is undefined (the overlapped area is sometimes
white and sometimes black). This limit of t1accent program has no matter for
Czech and Slovak accented letters, but it become more important for another
languages (such as Polish). If overlapping exist, you can use Fontographer
or such similar program for corrections of this problem manually.


1. Running of t1accent
----------------------

You can start the t1accent from command line with one parameter: the name
of configuration file:

t1accent config.file

The behavior of the program is defined in configuration file including the
names of input *.pfb files. No another command line options are implemented
in this version of the program.


2. The recommended action for making a new font with accented letters
---------------------------------------------------------------------

Suppose, we have the "font.pfb" and "font.afm" files of some original Type1
font without accents. We prepare the new font with information about it
stored in files "csfont.pfb", "csfont.afm" and "csfont.tfm". The story is
divided into several steps and several programs from public domain is used.

1. Convert the font.pfb into human readable form using t1disasm from
   t1utils:
      t1disasm font.pfb font.tmp
2. Prepare the t1accent configuration file. You can use the *.tab files
   from t1accent package and modify it for your purposes. You can copy some
   parts of code from font.tmp (for example the code of caron accent may be
   included as new procedure /lc-caron and /uc-caron may be derived from
   the same code, but modified (using 1 0.7 scaled, for example), because
   the caron form capital letters has a different shape.
3. Choose the implicit encoding vector of the new font using appropriate
   *.enc file (see ENC declaration in configuration file). Note: the
   implicit encoding vector of the font is irrelevant for processing of TeX
   dvi files using dvips, because the needed encoding vector can be set in
   psfont.map (configuration file of the dvips). On the other hand, the
   compatibility of names with the shapes of the characters are important
   (Attention: some DTP studios produced fonts with absolutely incompatible
   names!)
4. Convert the font.pfb to font.pps using:
      t1accent config.file
5. Edit the font.pps: You can add some copyright notices and so on.
6. Convert the font.pps into a new csfont.pfb using t1asm from t1utils:
      t1asm -b font.pps csfont.pfb

If you need to prepare a new csfont.afm file, you can use the free
printafm.ps and process it using Ghostscript. The resulting csfont.afm file
has a good implicit encoding but the kern information is missing. You can
use some programs to add kern information from original font.afm (my
program a2ac, for instance), but the result has bad implicit encoding. If
you combine and merge both results of previous processes, you can get the
csfont.afm with right implicit encoding and with correct kern information.

If you need to prepare the csfont.tfm file, use fontinst or afm2tfm on the
file csfont.afm from previous step. The afm2tfm have to be processed to
make the unused virtual font in order to save the kern information in tfm
properly:
      afm2tfm csfont.afm -T file.enc -v csfont rfont
file.enc have to be the same as encoding file used in step 3 and 4 above.
The resulting rfont.tfm has no kerning information and can be removed.
      vptovf csfont.vpl csfont.vf csfont.tfm
csfont.vf is only a by-product and is not needed. Remove it:
      rm rfont.tfm csfont.vpl csfont.vf rfont.tfm


3. The format of the configuration file
---------------------------------------

The configuration file of the t1accent is text file. The end of line is
interpreted as a space. The '%' is a comment character -- all characters
from '%' to the first end of line (including this one) are ignored.
The tab marks, carrigue return marks (visible from DOS file in UNIX)
are interpreted as spaces. More consecutive spaces are interpreted as one
space.

The keywords NAME, SRC, IN, ENC, REM, VAR, SUB, CHAR, BBOX and END (the
uppercase letters are necessary) can be included into configuration file.
The keyword have to be separated by space. The information after these
words has a special format (described below). If the information after any
keyword is completed, another keyword can be used or the same keyword can
be used repeatedly (excluding the keywords NAME, SRC, ENC and END).

The order of the keywords are irrelevant with this exceptions:
* The END keyword ends the reading of configuration file.
* If some variable is used, the declaration VAR of this variable
  have to be _before_ this point.
* If some name of subroutine is used, the declaration SUB of this
  subroutine have to be _before_ this point.
* The ordering of the SUBs and CHARs in configuration file is the same as
  the ordering of relevant information in *.pps output. But this has no
  matter on behavior of the new font.

Now, the syntax and semantic of all keywords will be described in detail.


***** The NAME keyword

has the form:

NAME Original-Name -> New-Name

The symbol "->" surrounded by spaces is necessary.

Program replaces all occurrences of sequences "Original-Name" by sequence
"New-Name" in the first part of font text (this part is ended by
"/Encoding" sequence). If you need the more complicated "find+replace"
behavior, you can use sed, awk, perl or some similar on the output text
file *.pps after t1accent processing.

Example:

NAME Times-Roman -> Times-Roman-CZ


***** The SRC keyword

has the form:

SRC /directory/with/source/files/

The text after SRC keyword declares the directory, where the *.pfb input
files will be found. The character '/' (in UNIX) or '\' (in DOS) is
necessary at the end of the directory text. The SRC keyword may be
presented at most once in configuration file. If SRC declaration is
absented, program opens input files in current directory.

Examples:

SRC /usr/local/share/texmf/fonts/type1/public/cm/

or

SRC d:\emtex\fonts\type1\cm\


***** The IN keyword

has the form:

IN name1 name2 ...

The no-empty list of names is separated by spaces. The list is delimited
by an occurrence of a new keyword NAME, SRC, IN, ENC, REM, VAR, SUB, CHAR,
BBOX or END. Thus, the keyword itself cannot be the element of the list.

The elements of the list are names of input *.pfb files (without extension!).
The program t1accent is able to read these files, if they are present in the
directory declared by SRC keyword. Program creates an output file with
extension .ppf for every input file listed in the list in the current
directory.

If you use IN repeatedly, this has the same effect as using IN only once
with all names specified. For example:

IN name1
IN name2
END

is the same as:

IN name1 name2
END


***** The ENC keyword

has the form:

ENC name

All another information after "name" (separated by space) is ignored until
a new keyword occurs. The ENC keyword have to be present exactly once in
configuration file.

The "name" is the name of *.enc file (without extension!). The implicit
encoding vector of a new created font is defined in this *.enc file. The
format of *.enc file is the same, as a format of *.enc files in dvips package.
The t1accent reads the information only between [ and ] brackets in this
file. The number of charnames between these brackets have to be exactly
256.


***** The VAR declaration

has the form:

VAR /name value

The name of a new declared variable have to be preceded by slash and "value"
is the usual real number (possibly with a minus sign and/or decimal point).
The different name of the variable from charnames in the font and SUBs in
the configuration file is strongly recommended.

The another information after "value" (separated by space) is ignored until
a new keyword occurs. The using of def operator after "value" is recommended
because the arithmetic parser with postfix operators ended by def is
planned to implement into a future version of t1accent. You can write
for example:

VAR /ac-shift 250 def

and the form of:

VAR /new-variable old-variable dup mul neg def

is planned in a future version.

The "value" is stored into variable width the name "name" for future use in
SUB or CHAR declarations.


***** The SUB keyword -- a declaration of a new subroutine

has the form:

SUB /name { subroutine text }

All information after the closing '}' is ignored until a new keyword occurs.
You can add the def operator after closing '}' only for a more "elegance".

The "name" of the procedure have to be preceded by slash and delimited
by space. The "subroutine text" is the sequence of a postfix operators
with is operands. The following operators are possible:

1. The special t1accent operators:
     <sx> <sy>  scaled
     <dx>       vvaxis
2. The PostScript operators:
     <x>  <y>   moveto
     <x>  <y>   lineto
     <x1> <y1> <x2> <y2> <x> <y> curveto
     <dx1> <dy1> <dx2> <dy2> <dx> <dy> rcurveto
3. All Type1 Buildchar operators:
     <sbx> <wx> hsbw
     <sbx> <sby> <wx> <wy>  sbw
     <number> callsubr
     return
     <asb> <adx> <ady> <bchar> <achar>  seac
     closepath
     <dx> <dy> rmoveto
     <dx> hmoveto
     <dy> vmoveto
     <dx> <dy> rlineto
     <dx> hlineto
     <dy> vlineto
     <dx1> <dy1> <dx2> <dy2> <dx3> <dy3> rrcurveto
     <dx1> <dx2> <dy2> <dy3> hvcurveto
     <dy1> <dx2> <dy2> <dx3> vhcurveto
     dotsection
     <y> <dy> hstem
     <y0> <dy0> <y1> <dy1> <y2> <dy2> hstem3
     <x> <dx> vstem
     <x0> <dx0> <x1> <dx1> <y2> <dy2> vstem3
     <num1> <num2> div
     <arg1> ... <argn> <n> <number> callothersubr
     pop
     <x> <y> setcurrentpoint

For more information about PostScript and Type1 operators see the [1] and
[2].

Program t1accent transforms the PostScript operators into Type1 Buildchar
operators and interprets the scaled and vvaxis operators. The subroutine
text is written into output (after conversion) as a new Type1 subroutine.

Program t1accent calculates the position of current point. If you does not
use the return operator in subroutine text, program adds the two operators
on the output:

<-x0> <-y0> rmoveto
return

where (<-x0>, <-y0>) are coordinates of current point. The result is: the
current point has the same position before and after subroutine is
executed. If you need not this "intelligence", you can write return
operator explicitly in subroutine text.

The scaled operator sets the actual coefficients <sx> and <sy> of scale of
all relative dimensions in operands of Type1 operators. The initial values
is (1, 1) at the start of SUB. The repeatedly used scaled operator changes
<sx> and <sy> relatively with previous values. For example:

SUB /ac-accent {
  100 100 scaled % the global scaled factor
  2 7 moveto     % current point = (200,700)
  1 .5 scaled    % the whole path will be deformed with respect
                 % to the point (200,700) in the y-axis direcion
  5 5 lineto     % this will be transformed to:
                 %             100*(5-2)  0.5*100*(5-7)  rlineto
  8 7 lineto     % this is:    100*(8-5)  0.5*100*(7-5)  rlineto
  5 5.5 lineto
  closepath
}

Program converts this subroutine text into new Type1 subroutine
described in pseudo-PostScript language:

dup <number of proc.> {
  200 700 rmoveto
  300 -100 rlineto
  300 100 rlineto
  -300 -75 rlineto
  closepath
  -500 -625 rmoveto
  return
  } NP

Note, that the -500 -626 rmoveto returns the current point to (0,0),
because the Type1 closepath does not move the current point (in opposite to
the same named PostScript operator).

The vvaxis operator sets x-coordinate of the position of visual vertical
axis of the accent. This information will be used by putcenter operator in
CHAR declaration. If vvaxis is not used, the position of the visual axis
is calculated as the center of bounding box of the path drawn by the
procedure. From the previous example, the vvaxis is not used, thus the
x-coordinate of visual axis is 200+800/2 = 500.

The vvaxis sets the new value with respect to the current point and <sx>
scaled factor, but current point is not moved. For example, the .5 vvaxis
before the first moveto and after the first scaled sets the x-coordinate of
the visual axis to 50. The same .5 vvaxis after the first moveto sets the
value to 250 and .5 vvaxis after the 5 5 lineto sets the value to 550.

It is recommended to set the vvaxis of the accents with respect to the slant
factor of the font, because the putcenter operator in CHAR does not
calculate the slant factor.

You can use the numeric operands or variable names or subroutine names
in subroutine text. For example:

SUB /one-dot {
   200 700 moveto
   200 670 225 645 255 645 curveto
   285 645 310 670 310 700 curveto
   310 730 285 755 255 755 curveto
   225 755 200 730 200 700 curveto
    }
SUB /lc-dieresis {
    155 vvaxis % x-coordinate of the vvaxis: 200/2 + radius of the dot
    one-dot callsubr
    200 hmoveto
    one-dot callsubr
    }

yields to:

dup <the number of the subroutine one-dot> {
	200 700 rmoveto
	-30 25 -25 30 vhcurveto
	30 25 25 30 hvcurveto
	30 -25 25 -30 vhcurveto
	-30 -25 -25 -30 hvcurveto
	closepath
	-200 -700 rmoveto
	return
	} |
dup <the number of the subroutine lc-dieresis> {
        <the number of the subroutine one-dot> callsubr
	200 hmoveto
        <the number of the subroutine one-dot> callsubr
	-200 hmoveto
	return
	} |

Note that:

1. The t1accent automatically inserts the closepath operator, if this
   operator and return is not used explicitly and if the subroutine uses
   some "drawing" operator (rlineto, rrcurveto, ...).
2. When the t1accent calculates the current point in a subroutine, it never
   inputs into nested subroutines, thus the calculation of bounding box
   may by incorrect. The explicit vvaxis is recommended for such situations.


***** The CHAR keyword -- a declaration of a new character

There exist three variants of using the CHAR declaration:

1. Inserting an accent:

CHAR /new-name /base-name /sub-name <x> <y> put-operator

2. Correction of values of base character and (possibly) inserting a new
   accent:

CHAR /new-name /base-name <dsbx> <dy> <dwx> correct
      /sub-name <x> <y> put-operator

3. Definition of an new character:

CHAR /new-name { character text }

"new-name" is the name of a new declared character, "base-name" is the name
of the base character (it have to be present in input *.pfb font) and
"sub-name" is a name of procedure for accent. You can use the reserved
name ".noinsert" (write /.noinsert including slash) instead "sub-name",
iff no accent is needed to insert.

The character text (see the alternative 3) can include the same information
as the subroutine text in SUB declaration. Only return operator is not
allowed and endchar operator is possible. If the endchar operator is not
present, t1accent includes this operator (and may be closepath)
automatically. The hsbw or seac or sbw is needed in character text (see [2]
for more details).

The correct operator (see the alternative 2) interprets its operands in
following way:
<dsbx> ... left sidebearing point is changed:  <sbx> = <original sbx> + <dsbx>
<dy>   ... the glyph of character is shifted:  <dy> vmoveto
<dwx>  ... <wx> parameter of hsbw is changed:  <wx> = <original wx> + <dwx>
Note: shifting of a glyph in x-axis direction is performed via changing
of left sidebearing point because the current point is initialized to the
left sidebearing point at the begin of all characters.

The correct operator reads the parameters of the hsbw operator in base
character and changes its operands <sbx>, <wx> and includes the
<dy> vmoveto before the first "drawing" operator. It changes the operators
of hstem and hstem3 with respect to <dy> parameter of shifting of the
glyph.

The put-operator is one from following words:

putorigin  ... puts the accent from the procedure with respect to the origin
putsidebar ... puts the accent from the procedure with respect to the left
               sidebearing point
putcenter  ... puts the accent from the procedure with respect to the
               center of the character and vvaxis of the accent
putafter   ... puts the accent from the procedure with respect to the next
               character origin calculated from character width.

The put-operator has two numeric operands <x> and <y>. The accent is
shifted by <x> to right and by <y> to up.

Let us describe the inserting of the procedure more precisely now.
Let the base character has hsbw operator with operands (possibly corrected
by correct operator):

<sbx> <wx> hsbw

If <x> <y> putsidebar is used, then program copies the base character
text into new character text and inserts the following text
before the first callsubr or first "drawing" operator:

<x> <y>        rmoveto
<number sub.>  callsubr
<-x> <-y>      rmoveto

The <x> <y> putorigin inserts:

<x-sbx> <y>    rmoveto
<number sub.>  callsubr
<-x+sbx> <-y>  rmoveto

When <x> <y> putcenter is used, we get the result:

<x-sbx-cx+wx/2> <y>   rmoveto
<number sub.>         callsubr
<-x+sbx+cx-wx/2> <-y> rmoveto

where the <cx> is a x-coordinate of the vertical visual axis of the accent
(see SUB declaration, namely the vvaxis operator).

The <x> <y> putafter yields:

<x+wx-sbx> <y>   rmoveto
<number sub.>    callsubr
<-x-wx+sbx> <-y> rmoveto

The t1accent program does some optimization of character text code. It
means, that the two consecutive rmoveto is parsed into one and the rmoveto
is changed to hmoveto or vmoveto, if it is possible.

The subroutine have to return the current point to the same position as it
was at the start of subroutine. If no the unexpected result occurs: the
glyph of base character is shifted. If the return operator was not used in
subroutine text, the condition of the current point position after
callsubr is satisfied automatically.

***** The REM keyword -- remark

has the form:

REM "text of the notice"

The "text of the notice" (without quotes but with '%' comment character) is
added after the first line in *.pps output file. The first line of *.pps
output is unchanged because the '%!' is supposed in input file and the same
line is copied to the output. The quotes around "text of the notice" are
necessary. Two or more REM keywords makes two or more comment lines on the
output. Example:

REM "TestFonts -- pfb version 1.0, generated from XY fonts. NO WARRANTY."
REM "Accents are added using t1accent program, (c) Petr Olsak, 1998"

The first three lines on the output look like:

%!FontType1-1.0: TestFont-Roman-BoldItalic 12345
% TestFonts -- pfb version 1.0, generated from XY fonts. NO WARRANTY.
% Accents are added using t1accent program, (c) Petr Olsak, 1998


***** The BBOX keyword

has the form:

BBOX <lx> <ly> <ux> <uy>

where <lx> <ly> <ux> <uy> are integers: (<lx>, <ly>) are coordinates of
lower left corner and (<ux>, <uy>) are coordinates of upper right corner of
BoundingBox of one character. You can use more BBOX keywords -- program
t!accent calculates minimum of <lx>, <ly> and maximum of <ux>, <uy> for all
BBOX used. Program calculates <ly> and <uy> (no <lx>, <ux>) from each
composed character declared via CHAR /new-name /base-name /sub-name.
Then program reads the global FontBBox from input and calculates new
FontBBox from calculated values and values from FontBBox. New values are
written into new FontBBox information.


***** The END keyword -- end of the configuration file

If the END keyword occurs in configuration file, all another text after
this END mark is ignored. The END keyword has the same effect as the end of
configuration file.


4. References
-------------

[1] Adobe Systems Incorporated. PostScript Language Reference Manual (Red
    Book), Addison Wesley, 2nd edition, 1990.
[2] Adobe Systems Incorporated. Adobe Type1 Font Format (Black Book),
    Addison Wesley, 1990. The full text is available on
    ftp.adobe.com in PDF format.
[3] Ol\v{s}\'ak Petr. Typografick\'y syst\'em TeX (Typesetting system TeX),
    CSTUG, Prague 1995.