CSplain

Czech and Slovak in plain TeX

Petr Olšák, May 2016

cz Cesky

CSplain is a conservative extension of Knuth's plain TeX. Difference between plain format and csplain format lies in the fact that instead of the CM fonts the CS-fonts are used by default, which allows you to:

  • direct processing characters of the Czech and Slovak letters (without using macros),
  • hyphenation patterns for the Czech and Slovak language.

News in CSplain in May 2016:

  • font-files are supported by ff-mac.tex macro file, which gives a concept of independent font modifiers. The feature is desribed and illustrated in the article KP-fonts in plain TeX,
  • the Unicode Math font support is added in the file uni-math.tex,
  • The \fontfam macro from OPmac can list all font families available in font-files and a font catalog can be printed.

CSplain supports since Dec. 2012:

  • implicit input encoding UTF-8,
  • power managing with fonts (including resizing them),
  • usage of TeX, eTeX, pdfTeX, XeTeX or LuaTeX,
  • internal encodings: by CSfonts or by T1 or Unicode (the last one only with XeTeX or LuaTeX),
  • hyphenation patterns for 50+ languages in various internal encodings,
  • the powerful OPmac macro package which is a part of the csplain package.

csplain.tar.gz
csplain - TUGboat atricle
the lecture in Brno

csplain.tar.gz package contains not only files needed for csplain format generating but further macro support. It is a part of CSTeX. You don't need probably to download and extract the csplain package from this site, because it is a part of the normal distributions of TeX (Texlive, MikTeX).

Contents

  1. Formats csplain and pdfcsplain
  2. Making the formats
  3. Input and internal encoding
  4. Using Czech and Slovak
  5. The macro file opmac.tex
  6. The UTF-8 input
  7. Fonts in csplain
  8. Hyphenation patterns of various languages differently encoded
  9. Recommended reading

1. Formats csplain and pdfcsplain

Format csplain has implicitly set DVI format as an outupt, while pdfcsplain outputs implicitly to PDF. To run TeX format csplain (on document titled dokument.tex) you can use the

   csplain document

The document.dvi is created. To run TeX with pdfcsplain format (on document titled document.tex) you can use the

   pdfcsplain document

The document.pdf is created. The commands csplain and pdfcsplain are implemented differently depending on the TeX distribution and operating system.

2. Making the formats

The following formats would be installed in your TeX distribution automatically:

1. csplain.fmt ...... input: UTF-8, output: DVI, engine: pdfTeX+encTeX,
                      commandline: csplain document 
2. pdfcsplain.fmt ... input: UTF-8, output: PDF, engine: pdfTeX+encTeX,
                      commandline: pdfcsplain document
3. pdfcsplain.fmt ... input: UTF-8, output: PDF, engine: luaTeX,
                      comandline: luatex -fmt pdfcsplain document
4. pdfcsplain.fmt ... input: UTF-8, output: PDF, engine: XeTeX,
                      commandline: xetex -fmt pdfcsplain document

The formats 1 and 2 are intended for common usage. If you are an expert, you can try to use the formats 3 and 4.

If you need to generate the formats manually, here i the command lines:

1. pdftex -jobname csplain -ini -enc csplain-utf8.ini
   ... the csplain.fmt file is created, save it to .../web2c/pdftex/
2. pdftex -jobname pdfcsplain -ini -enc csplain-utf8.ini   
   ... the pdfcsplain.fmt file is created, save it to .../web2c/pdftex/
3. luatex -jobname pdfcsplain -ini csplain.ini
   ... the pdfcsplain.fmt file is created, save it to .../web2c/luatex/
4. xetex -jobname pdfcsplain -ini -etex csplain.ini
   ... the pdfcsplain.fmt file is created, save it to .../web2c/xetex/

You have to use texhash command (or something similar in your distribution) after the files are installed.

Note: You can generate your own formats for XeTeX or LuaTeX based on CSplain. See the files xeplain.ini and luaplain.ini

Note 2 Csplain (nor opmac) needs not the eTeX extension. If you need it (or your macro needs it) then you can add the option -etex after -enc option in lines 1. and 2. TeXlive distribution uses this option by default when it generates the csplain formats.

3. Input and internal encoding

Input encoding. The old version of csplain had have the input encoding depended on the used operting system. The new version (from December 2012) accepts only UTF-8 encoding.

Internal encoding. The default for csplain in the CSfont encoding (derived from ISO-8859-2). The CSfonts are loaded in csplain by default. You can use another fonts but wit the same encoding.

If you need to use the T1 encoded fonts then you have to write the followig line at the begin of your document:

\input t1code

It he XeTeX or LuaTeX is used then the default nor T1 encoding are unusable for Czech and Slovak texts, because these engines are working in Unicode internally which differs from menitoned encodings in Czech and Slovak alphabets. So you have to wirite the following at begin of your document:

\input ucode

and all fonts you are using have to be Unicode encoded (OpenType format). For example you can use \input lmfonts.

4. Using Czech and Slovak

The csplain is started so that its default behavior is as plainTeX. It means that it is set to English hyphenation and control sequences \v, \' expand to the \accent primitive. It is also active the \nonfrenchspacing. The default setting is thus the same as in plainTeX. The difference is only in the implicit dimension size of typesetting area. The csplain creates one inch margins on A4, while the plainTeX is set for one inch for letter paper format.

To initialize the hyphenation patterns and setting the sequences \v, \', \^, \`, \', \r to expand to the natural characters, the following commands are reserved:

   \chyph     % initializes Czech hyphenation and \frenschpacing
   \shyph     % initializes Slovak hyphenation and \frenschspacing
   \csaccents % causes different behavior of \', \v, \^, \`, \" and \r,
              % which expands now to characters of CSfont

Recommendation: The first line of the document should be such

   \chyph % use format csplain

When a user processes such document by other format then the \chyph isn't defined and the above line appears in the error message including comments so the user can see by what the document have to be processed.

To return to the original settings:

   \ehyph      % the default U.S. hyphenation and \nonfrenchspacing
   \cmmaccents % \', \v etc. expand to the \accent primitive

Other commands are just shortcuts to some of the characters in the CS-fonts:

   \clqq     % Czech left double quotation mark
   \crqq     % Czech right double quotation mark
   \flqq     % French left double quotation mark
   \frqq     % French right double quotation mark
   \promile  % permille character
   \uv       % the text quoted by Czech quotes: \uv{text}
   \ogonek a % Polish letter a with ogonek (assembled from components)

cstexman.pdf

For more information about defaulta in csplain and its differences against plainTeX you can use the Manual on CSTeX, paragraphs 4.3 and 9.4.

5. The macro file opmac.tex

The cplain format is designed as a minimal extension of plainTeX, so the format itself does not offer other features outside of the plain commands and commands described in the previous chapter. It is the basis for low-level processing of Czech and Slovak texts of all kinds. However, the user has to have programmed more typically used features: automatic creation of content, numbering, cross-references, verbatim environment, hyperlinks, font size switching etc. User can't do this work when (s)he uses opmac.tex macro file. This file is a part of csplain package since the end of 2012.

For more information about this macro use the OPmac www page.

6. UTF-8 encoded csplain

This chapter describes the behavior of csplain generated for input encoding UTF-8 using encTeX, i. e. in TeX and pdfTeX. The short notice about XeTeX and LuaTeX is at the end of this chapter. Since 2012, the UTF-8 encoding csplain is recommended by default.

CSplain format with UTF-8 input implicitly recognizes the following characters in input files:

  1. All ASCII characters (128 characters called ``seven-bit chars'')
  2. ÁáÄäČčĎďÉéĚěÍíĹ弾ŇňÓóÖöÔôŔŕŘřŠšŤťÚúŮůÜüÝýŽž characters.
  3. Characters that are defined in plainTeX or csplain as a control sequences: \ss, \l, \L, \ae, \oe, \AE, \OE, \o, \O, \i, \j, \aa, \AA, \S, \P, \copyright, \dots, \dag, \ddag, \clqq, \crqq, \elqq, \erqq, \elq, \erq, \flqq, \frqq, \promile. UTF-8 codes for these characters are tranformed into these control sequences in TeX input processor and they are transfomred back to the UTF-* codes during \write.

If any other character will be in input file (long dash, indivisible space, etc.) then UTF-8 encoded csplain displays on the terminal message similar to this:

  WARNING: unknown UTF-8 code: ` = ^^e2^^82^^ac' (line: 42)

and it inserts the balck box to DVI or PDF output. The user can map undefined code to the control sequence and the sequence define, like this:

  \mubyte\eurochar ^^e2^^82^^ac\endmubyte % kód znaku  mapován na \eurochar
  \def\eurochar{{\eurofont e}}            % definice \eurochar
  \font\eurofont=feymr10                  % použitý font

Following files are prepared to extend the set of UTF-8 codes which are understandable (mapped to the control sequence and defined):

  utf8lat1.tex ... mapping of UTF-8 codes from Latin-1 Supplement U+0080--U+00FF
  utf8lata.tex ... mapping of UTF-8 codes from Latin Extended-A U+0100--U+017F

I suppose that in the same manner as the files utf8lat1.tex and utf8lata.tex someone extends the possibility of mapping UTF-8 codes for other important blocks of the UNICODE table.

Another example about pupporting new UTF-8 codes is the file cyrchars.tex, which supports the cyrillic characters nativelly (without an explicit font switching). More documentation about this is at the end of the mentioned file.

If you give to the csplain input the file which isn't coded by UTF-8, the error message will appear:

  ! UTF-8 INPUT IS CORRUPTED! May be you are using another input encoding.

In such case, you can add one of the following two possible \input commands in your document:

  \input utf8off ...  switches off the UTF-8 encoding, input / output is in ISO-8859-2
  \input mixcodes ... the mix of following encodings can follows:
                      UTF-8 or ISO-8859-2 and CP1250. All processes
                      correctly without having to use a switch.
                      Output by \write is stored in UTF-8.

XeTeX and LuaTeX supports UTF-8 input encoding naturally without encTeX, thus the warnings about missing UTF-8 characters don't occur.

7. Fonts in csplain

The default font family in CSplain is CSfont that is a mild extension of Knuth's Computer Modern fonts. I is possible to switch to another font family using one of following font-files:

  \input lmfonts      % Latin Moder fonts
  \input ctimes       % Times font family
  \input chelvet      % Helvetica font family
  \input cavantga     % AvantGarde font family
  \input cbookman     % Bookman font family
  \input cncent       % NewCenturySchlbk font family
  \input cpalatin     % Palatino font family
  \input cs-bera      % Bera
  \input cs-arev      % ArevSans
  \input cs-charter   % Charter
  \input cs-antt      % Antykwa Torunska
  \input cs-polta     % Antykwa Poltawskiego
  \input cs-termes    % TeX Gyre Termes
  \input cs-adventor  % TeX Gyre Adventor
  \input cs-bonum     % TeX Gyre Bonum
  \input cs-heros     % TeX Grye Heros
  \input cs-pagella   % TeX Gyre Pagella
  \input cs-schola    % TeX Gyre Schola
  \input cs-cursor    % TeX Gyre Cursor
  \input cs-libertine % Linux Libertine
  \input kp-fonts     % KP-fonts

kpfonts-plain.pdf

The font-file loads one font family typicaly with variants accesible by \rm, \bf, \it and \bi selector-macros. Several font-files provides font modifiers of the selector-macros. For example \cond\it selects the condensed slanted variant, if cs-heros family is loaded. The detail information about font modifiers including large illustrations is in the article KP-fonts in plain TeX. Macro programmers can find an inspiration including technical documentation in the file cs-heros.tex

opmac-u-en.pdf

The \fontfam macro is available when OPmac macro is used. Fontfam gives a list of available families (implemented via font-files), prints a simple font catalogue and selects the given font family using \input font-file. The information about font modifiers and default math font collection for each text font family is listed too.

Math fonts collections are supported by macro files in the form foo-math.tex, for example tx-math.tex supports the TX collection of math fonts (visualy compatible with Times). Each foo-math.tex provides the symbol set at least as AMS fonts symbol set. The math alphabets \frak (Fracture), \script (script more rounded than \cal), \bbchar (double strokes letters), \bf, \bi (bold alphabet sansserif normal and slanted) are ready. When you keep the default CSfonts then you can do \input ams-math.tex in order to enlarge the symbol set and in order to give possibility of resizing the math fonts. Note that OPmac loads ams-math.tex as default.

The resizing engine of text and math fonts is ready in CSplain format. OPmac gives user comfortable macros \typosize and \typoscale for this. How to use this engine without OPmac is documented in csfontsm.tex file.

Many fonts in TeX distributions are encoded only in T1 encoding, which is incompatible with CSfont encoding used in csplain. But this does not matter, just type at the beginning of your document:

   \input t1code

and you can work with T1 encoded fonts. CSplain internally switches to T1 encoding including hyphenation patterns. If you are using UTF-8 input, you need not worry about anything else, the macro t1code does the change of the encoding tables for the input processor itself. When the encTeX isn't used, it is necessary to care the transcoding otherwise.

If you write \input t1code before \input font-file, the corresponding T1 encoded fonts are loaded.

When XeTeX or LuaTeX is used then \input font-file loads corresponding Unicode-ready fonts in OTF format. The implicit font family preloaded in csplain (CSfonts) does not work in Czech language when XeTeX or LuaTeX is used. The \input font-file (for example \input lmfonts) is explicitly needed at beginning of the document in such case.

8. Hyphenation patterns of various languages differently encoded

Hyphenation patterns are loaded when the format is generated. CSplain is ready to load hyphenation patterns of 54 languages (see here) in the three possible encodings. By default, it reads only English (the default pattern of plainTeX) and the Czech and Slovak patterns encoded by CSfont and T1 (Cork). If the generation by 16-bit TeX engine is detected, the Czech and Slovak hyphenation patterns are loaded in Unicode too.

Czech patterns is switched on by \chyph (or \czlang, which does the same thing), Slovak by \shyph (or \sklang) and English by \ehyph (or \uslang). These switchers operate in the context of their coding set command \input t1code (T1 encoding) or \input ucode (Unicode). If such \input command isn't used, hyphenation patterns are initialized in the CSfont encoding.

Other hyphenation patterns can be loaded during format generating if you uncomment corresponding line in the file hyphen.lan. Or, it is possible to add the request of hyphen-pattenrs loading in the command line which generates the format like this:

pdftex -ini -enc "\let\plCork=y \let\enc=u \input csplain.ini"

This example generates csplain with UTF-8 encoding and loads implicit hyphenation patterns and the hyphenation patterns of our Polish friends (pattenrs encoded by Cork). You can switch these hyphenation patterns on by command \pllang in your document. (Something like \phyph is no longer supported because of 54 possible languages and their hyphenation patterns but we have only 26 letters in the alphabet). The \pllang will not work until the \input t1code because Polish hyphenation patterns are loaded in Cork encoding only (aka T1).

If the 16-bit TeX engine (LuaTeX, XeTeX) is detected then it is possible to load hyphenation patterns marked \..Unicode, eg. \deUnicode, \ruUnicode, \plUnicode. You can switch on to these hyphenation patterns by \delang, \rulang, \pllang, \czlang, ..., if preceded by \input ucode. It is also necessary to establish the typesetting in Unicode by some Unicode font, otherwise the output will be garbaged. At this moment, you cen use \input lmfonts to load Latin Modern fonts in Unicode. The other possibility is to use TeXgyre fonts \input cs-termes, \input cs-adventor, ..., \input cs-schola, which are able to load unicode varinats of these fonts too.

Csplain set after \ input parameters ucode \ lccode only for the Czech and Slovak alphabet, ie if you are using a different language, it is necessary for him set the needed \ lccode. Otherwise unikódované hyphenation patterns of foreign languages will not work.

9. Recommended reading

Items are listed in the suggested order.

  1. Petr Olšák: First meeting with TeX.
  2. Petr Olšák: TeX for Pragmatists.
  3. Petr Olšák: TeXbook inside out.
  4. Petr Olšák: TeX typesetting system.
  5. Donald Knuth: The TeXbook.