Czech and Slovak in plain TeX

Petr Olšák, 2019

CSplain (and pdfcsplain) is a conservative extension of Knuth's plain TeX. Difference between plain format and csplain format lies in the fact that instead of the CM fonts the CS-fonts are used by default, which allows you to:

direct processing characters of the Czech and Slovak letters (without using macros),
hyphenation patterns for the Czech and Slovak language.

News in CSplain in May 2016:

font-files are supported by ff-mac.tex macro file, which gives a concept of independent font modifiers. The feature is desribed and illustrated in the article KP-fonts in plain TeX,
the Unicode Math font support is added in the file uni-math.tex,
The \fontfam macro from OPmac can list all font families available in font-files and a font catalog can be printed.

CSplain supports since Dec. 2012:

implicit input encoding UTF-8,
power managing with fonts (including resizing them),
usage of TeX, eTeX, pdfTeX, XeTeX or LuaTeX,
internal encodings: by CSfonts or by T1 or Unicode (the last one only with XeTeX or LuaTeX),
hyphenation patterns for 50+ languages in various internal encodings,
the powerful OPmac macro package which is a part of the csplain package.

csplain.tar.gz
csplain - TUGboat atricle
the lecture in Brno

csplain.tar.gz package contains not only files needed for csplain format generating but further macro support. It was a part of CSTeX. You don't need probably to download and extract the csplain package from this site, because it is a part of the common distributions of TeX (Texlive, MikTeX).

Contents

Formats csplain and pdfcsplain
Making the formats
Input and internal encoding
Using Czech and Slovak
The macro file opmac.tex
The UTF-8 input
Fonts in csplain
Hyphenation patterns of various languages differently encoded
Recommended reading

1. Formats luacsplain, csplain and pdfcsplain

CSplain works with classical TeX, pdfTeX, LuaTeX or XeTeX. The last two TeX programs are unicoded-TeXs, i.e. Unicode internal encoding is used, which is recommended for new users. CSplain with LuaTeX can be executed by the command

   luacsplain document   (...or luatex -fmt pdfcslain document)

The command luacsplain generates PDF output document.pdf. The source of the document document.tex needs to declare the font family used in the document first, \input lmfonts for example. See section 7 for more possibilities. When Czech language is used then appropriate hyphenation patterns must be initialized by \chyph. See section 8 for more possibilities. So, the source of a minimal document looks like:

\input lmfonts   % Unicode Latin Modern fonts
\chyph           % Czech hyphenation patterns
Tady je český text.
\bye

There was long-time development of CSplain with very long period of CSplain used only with classical TeX and pdfTeX (non-unicode TeX engines). The result of this is several elaborate encoding hacks used in such variants of CSplain. Conservative users use this varinats, so this documentation mentions these non-unicode variants too.

The command csplain uses classical TeX and outputs to DVI format, while the command pdfcsplain outputs implicitly to PDF using pdfTeX. More preciselly:

   pdfcsplain document

creates PDF output document.pdf. The declaration \input lmfonts is not needed in such case but if you use it then Latin Modern fonts are loaded in specific non-unicode encoding. Default fonts of CSplain are CSfonts.

You can use CSplain in Overleaf even though it is primary configured for LaTeX documents. It is sufficient to add a file with name latexmkrc to the main directory of our project. This file must include one line:

$pdflatex = 'pdfcsplain %O %S';

2. Making the formats

The following formats would be installed in your TeX distribution automatically:

1. csplain.fmt ...... input: UTF-8, output: DVI, engine: pdfTeX+encTeX,
                      commandline: csplain document 
2. pdfcsplain.fmt ... input: UTF-8, output: PDF, engine: pdfTeX+encTeX,
                      commandline: pdfcsplain document
3. luacsplain.fmt ... input: UTF-8, output: PDF, engine: luaTeX,
                      comandline: luacsplain document or
                      luatex -fmt pdfcsplain document
4. pdfcsplain.fmt ... input: UTF-8, output: PDF, engine: XeTeX,
                      commandline: xetex -fmt pdfcsplain document

If you need to generate the formats manually, here i the command lines:

1. pdftex -jobname csplain -ini -etex -enc csplain-utf8.ini
   ... the csplain.fmt file is created, save it to .../web2c/pdftex/
2. pdftex -jobname pdfcsplain -ini -etex -enc csplain-utf8.ini   
   ... the pdfcsplain.fmt file is created, save it to .../web2c/pdftex/
3. luatex -jobname luacsplain -ini csplain.ini
   ... the pdfcsplain.fmt file is created, save it to .../web2c/luatex/
4. xetex -jobname pdfcsplain -ini -etex csplain.ini
   ... the pdfcsplain.fmt file is created, save it to .../web2c/xetex/

You have to use texhash command (or something similar in your distribution) after the files are installed.

Note: You can generate your own formats for XeTeX or LuaTeX based on CSplain. See the files xeplain.ini and luaplain.ini

3. Input and internal encoding

Input encoding. The old version of csplain had have the input encoding depended on the used operting system. The new version (from December 2012) accepts only UTF-8 encoding.

Internal encoding in pdfTeX. The default for csplain in the CSfont encoding 8z (derived from ISO-8859-2). The CSfonts are loaded in csplain by default. You can use another fonts but with the same encoding. If you need to use the T1 encoded fonts then you have to write the followig line at the begin of your document:

\input t1code  % Encodig by Cork (T1 encoding, 8t encoding is set)
\input lmfonts % LM fonts in declared encoding is used instead CSfonts
               % (another alternatives are possible, see section 7)

Internal encoding in XeTeX or LuaTeX is Unicode. But XeTeX nor LuaTeX is unable to load Unicoded fonts into prepared format (as default set of fonts). So, CSfonts are preloaded but they are incompatible with Unicode. It is necessary to type:

\input lmfonts % LM fonts in Unicode is used (another alternatives are possible)

at the begining of the document.

4. Using Czech and Slovak

The csplain is started so that its default behavior is as plainTeX. It means that it is set to English hyphenation and control sequences \v, \' expand to the \accent primitive. It is also active the \nonfrenchspacing. The default setting is the same as in plainTeX. The difference is only in the implicit dimensions of typesetting area. The csplain creates one inch margins on A4, while the plainTeX is set for one inch for letter paper format.

To initialize the hyphenation patterns and setting the sequences \v, \', \^, \`, \', \r to expand to the natural characters, the following commands are reserved:

   \chyph     % initializes Czech hyphenation and \frenschpacing
   \shyph     % initializes Slovak hyphenation and \frenschspacing
   \csaccents % causes different behavior of \', \v, \^, \`, \" and \r,
              % which expands now to characters of CSfont

Recommendation: The first line of the document should be such

   \chyph % use format csplain

When a user processes such document by other format then the \chyph isn't defined and the above line appears in the error message including comments so the user can see by what the document have to be processed.

To return to the original settings:

   \ehyph      % the default U.S. hyphenation and \nonfrenchspacing
   \cmaccents  % \', \v etc. expand to the \accent primitive

Other commands are just shortcuts to some of the characters in the CS-fonts:

   \clqq     % Czech left double quotation mark
   \crqq     % Czech right double quotation mark
   \flqq     % French left double quotation mark
   \frqq     % French right double quotation mark
   \promile  % permille character
   \uv       % the text quoted by Czech quotes: \uv{text}
   \ogonek a % Polish letter a with ogonek (assembled from components)

5. The macro file opmac.tex

The cplain format is designed as a minimal extension of plainTeX, so the format itself does not offer other features outside of the plain commands and commands described in the previous chapter. It is the basis for low-level processing of Czech and Slovak texts of all kinds. However, the user has to have programmed more typically used features: automatic creation of content, numbering, cross-references, verbatim environment, hyperlinks, font size switching etc. User can't do this work when (s)he uses opmac.tex macro file. This file is a part of csplain package since the end of 2012.

For more information about this macro use the OPmac www page.

6. UTF-8 encoded csplain

This chapter describes the behavior of csplain generated for input encoding UTF-8 using encTeX, i. e. in TeX and pdfTeX. The short notice about XeTeX and LuaTeX is at the end of this chapter.

CSplain format implicitly recognizes the following characters in input files:

All ASCII characters (128 characters called ``seven-bit chars'')
ÁáÄäČčĎďÉéĚěÍíĹĺĽľŇňÓóÖöÔôŔŕŘřŠšŤťÚúŮůÜüÝýŽž characters.
Characters that are defined in plainTeX or csplain as a control sequences: \ss, \l, \L, \ae, \oe, \AE, \OE, \o, \O, \i, \j, \aa, \AA, \S, \P, \copyright, \dots, \dag, \ddag, \clqq, \crqq, \elqq, \erqq, \elq, \erq, \flqq, \frqq, \promile. UTF-8 codes for these characters are tranformed into these control sequences in TeX input processor and they are transfomred back to the UTF-* codes during \write.

If any other character will be in input file (long dash, indivisible space, etc.) then csplain displays on the terminal message similar to this:

  WARNING: unknown UTF-8 code: ` = ^^e2^^82^^ac' (line: 42)

and it inserts the black box to DVI or PDF output. The user can map undefined code to the control sequence and the sequence define, like this:

  \mubyte\eurochar ^^e2^^82^^ac\endmubyte % kód znaku  mapován na \eurochar
  \def\eurochar{{\eurofont e}}            % definice \eurochar
  \font\eurofont=feymr10                  % použitý font

Following files are prepared to extend the set of UTF-8 codes which are understandable (mapped to the control sequence and defined):

  utf8lat1.tex ... mapping of UTF-8 codes from Latin-1 Supplement U+0080--U+00FF
  utf8lata.tex ... mapping of UTF-8 codes from Latin Extended-A U+0100--U+017F

I suppose that in the same manner as the files utf8lat1.tex and utf8lata.tex someone extends the possibility of mapping UTF-8 codes for other important blocks of the UNICODE table.

Another example about supporting new UTF-8 codes is the file cyrchars.tex, which supports the cyrillic characters nativelly (without an explicit font switching). More documentation about this is at the end of the mentioned file.

If you give to the csplain input the file which isn't coded by UTF-8, the error message will appear:

  ! UTF-8 INPUT IS CORRUPTED! May be you are using another input encoding.

In such case, you can add one of the following two possible \input commands in your document:

  \input utf8off ...  switches off the UTF-8 encoding, input / output is in ISO-8859-2
  \input mixcodes ... the mix of following encodings can follows:
                      UTF-8 or ISO-8859-2 and CP1250. All processes
                      correctly without having to use a switch.
                      Output by \write is stored in UTF-8.

XeTeX and LuaTeX supports UTF-8 input encoding naturally without encTeX. Thus the warnings about missing UTF-8 characters can appear in different way.

7. Fonts in csplain

The default font family in CSplain is CSfont that is a mild extension of Knuth's Computer Modern fonts. I is possible to switch to another font family using one of following font-files:

  \input lmfonts       % Latin Moder fonts
  \input ctimes        % Times font family
  \input chelvet       % Helvetica font family
  \input cavantga      % AvantGarde font family
  \input cbookman      % Bookman font family
  \input cncent        % NewCenturySchlbk font family
  \input cpalatin      % Palatino font family
  \input cs-bera       % Bera
  \input cs-arev       % ArevSans
  \input cs-charter    % Charter
  \input cs-antt       % Antykwa Torunska
  \input cs-polta      % Antykwa Poltawskiego
  \input cs-termes     % TeX Gyre Termes, similar to Times
  \input cs-heros      % TeX Grye Heros, similar to Helvetica
  \input cs-adventor   % TeX Gyre Adventor, similar to AvantGarde
  \input cs-bonum      % TeX Gyre Bonum, similar to Bookman
  \input cs-schola     % TeX Gyre Schola, similar to NewCenturySchlbk
  \input cs-pagella    % TeX Gyre Pagella, similar to Palation
  \input cs-cursor     % TeX Gyre Cursor, similar to Courier 
  \input cs-libertine  % Linux Libertine
  \input cs-ebgaramond % EB Garamond
  \input kp-fonts      % KP-fonts

kpfonts-plain.pdf

The font-file loads one font family typicaly with variants accesible by \rm, \bf, \it and \bi selector-macros. Several font-files provides font modifiers of the selector-macros. For example \cond\it selects the condensed slanted variant, if cs-heros family is loaded. The detail information about font modifiers including large illustrations is in the article KP-fonts in plain TeX. Macro programmers can find an inspiration including technical documentation in the file cs-heros.tex

The mentioned font-files load fonts in declared encoding (\input t1code or \input il2code in pdfTeX or Unicode in XeTeX or LuaTeX). Only files ctimes...cpalatin and kp-fonts does not support Unicode. But there are good alternatives of ctimes...cpalatin from TeX Gyre project, Unicode is supported by these alternatives.

opmac-u-en.pdf

The \fontfam macro is available when OPmac macro is used. Fontfam gives a list of available families (implemented via font-files), prints a simple font catalogue and selects the given font family using \input font-file. The information about font modifiers and default math font collection for each text font family is listed too.

Math fonts collections are supported by macro files in the form foo-math.tex, for example tx-math.tex supports the TX collection of math fonts (visualy compatible with Times). Each foo-math.tex provides the symbol set at least as AMS fonts symbol set. The math alphabets \frak (Fracture), \script (script more rounded than \cal), \bbchar (double strokes letters), \bf, \bi (bold alphabet sansserif normal and slanted) are ready. When you keep the default CSfonts then you can do \input ams-math.tex in order to enlarge the symbol set and in order to give possibility of resizing the math fonts. Note that OPmac loads ams-math.tex as default.

The resizing engine of text and math fonts is ready in CSplain format. OPmac gives user comfortable macros \typosize and \typoscale for this. How to use this engine without OPmac is documented in csfontsm.tex file.

8. Hyphenation patterns of various languages differently encoded

Hyphenation patterns are loaded when the format is generated. CSplain is ready to load hyphenation patterns of 57 languages (see here), some of them in three possible encodings: 8z (CSfonts), 8t (Cork} and U (Unicode).

(pdf)csplain in pdfTeX (since March 2019) loads by default hyphenation patterns of the following languages: us(USplain), cs(czech,8z), sk(slovak,8z), cs(czech,8t), sk(slovak,8t), it(italian), engb(UKenglish), de(ngerman,8t), fr(french,8t), pl(polish,8t), es(spanish,8t), sl(slovenian,8t), fi(finnish,8t). User can select the hyphenation by macros \uslang, \cslang, \sklang, \itlang, \engblang, \delang, \frlang, \pllang, \eslang, \sllang, \filang. If the hyphenation is only in 8t encoding then \input t1code and \input lmfonts must be done before such selecting. The selectors \cslang, \sklang work in the previous declared encoding (8z or 8t), the encoding 8z is default. For backward compatibility there are aliases \chyph=\cslang, \shyph=\sklang and \ehyph=\uslang.

pdfcsplain in XeTeX loads by default the patterns us(USplain), cs(czech,8z), sk(slovak,8z), cs(czech,U), sk(slovak,U), it(italian), engb(UKenglish). Next hyphenation patterns can be loaded during format generation, see below.

pdfcsplain in LuaTeX (since March 2019) offers to direct usage all languages supported by TeX distibution (57 languages now). The hyphenation patterns are loaded at first occurence of a selcetor \cslang, \delang, \frlang, \itlang etc. in the document (the number of such selectors are the same as the number of supported languages). The list of all supported languages can be expanded by \langlist macro. A selector (\cslang, \delang, etc.) can be used more than once in the document but patterns are loaded only once. See the file lua-hyphen.lan for more information.

If you are not using LuaTeX and your language is not in the list of pre-loaded hyphention patterns, then such hyphenation patterns can be loaded during format generating. You can uncomment corresponding line in the file hyphen.lan and regenerate the format. Or, it is possible to add the request of hyphen-pattenrs loading in the command line which generates the format like this:

pdftex -ini -jobname pdfcsplain -etex -enc "\let\daCork=y \let\enc=u \input csplain.ini"

This example generates pdfcsplain and loads implicit hyphenation patterns and the hyphenation patterns of Dannish in Cork encoding. You can switch these hyphenation patterns on by the selector \dalang in your document. The \dalang will not work until the \input t1code because Danish hyphenation patterns are loaded in Cork encoding only.

9. Recommended reading

Items are listed in the suggested order.