Text files are sequences of UTF-8 (Unicode Transformation Format-8) characters without markup.
They are editable by simple system editors and are highly transportable. Although without markup, the
files have a fixed format described below to facilitate machine processing.
Text files lack Documentary Hypothesis markings.
Text files can be obtained in one of 4 ways:
- When viewing biblical text press the "Text" button
at the bottom of a text page. The file will be displayed by the browser and can be saved
via the browser's "Save Page as ..." command. The text file will have the Layout
and Content of the original text. (The font and font size are not in the file and are browser-dependent.)
- Text files for entire biblical books can be obtained by clicking on the book name on the Home
page. The middle row of the resulting page offers a table of available formats.
Click on the "Text" item to view the entire book in text format.
- Text files can also be obtained from the Server.txt by entering a query URL into a
browser address bar. Only 2 parameters,
layout, content, each having values from the pulldown lists are permitted. For example,
displays Deuteronomy 26:5-9 in Qere-only Layout with Consonants Content. Note that layout and content
are not capitalized, although Qere-only and Consonants are capitalized.
See the Servers page for more information
about using the Server.txt.
- Zipped archives of complete Tanach books in text format, Tanach.txt.zip, are available in the TextFiles directory.
See the "Zipped archives of Tanach books" section of the
Text file applications:
Because the files are without markup, the display of the text files is completely dependent on
the implementation of the Unicode bi-directional (bidi) text algorithm in the displaying application.
Although Unicode has been a web standard for a long time,
many applications do not have full implementations
of the bidi algorithm. Not all applications reliably display Unicode text files
that contain both left-to-right and right-to-left text.
Fortunately, the files appear correctly on all systems' default editor and many other applications:
Even if the display is distorted by an application's shortcomings in bidi implementation, machine processing of the files will not be effected.
||Taamey D Web
||Do not use WordPad!
||Frank Ruehl CLM
||Frank Ruehl CLM
Unicode contains three helpful
directional characters, the left-to-right embedding (LRE, 202a), the right-to-left embedding (RLE, 202b) and the
pop directional formating (PDF, 202c) to set the text direction
for the next block of characters.
The text files have been formatted with LRE, RLE, and PDF characters. Applications that
fail to handle these characters correctly will produce erroneous displays.
Lines consist of either an LRE or RLE character followed by a sequence of Unicode characters terminated by
a PDF character, a carriage return (000d), and a linefeed (000a). By this approach each line
starts with a directional specification and then 'escapes' this direction at the end of the line. Thus there is no change
in embedding level from line to line.
Lines containing labels or blanks begin with the LRE character and then the prefix xxxx,
followed by the text, and then are
terminated by a PDF character, a carriage return (000d), and a linefeed (000a).
Blank lines and a line with the chapter number are inserted
between chapters for ease of reading.
All Hebrew text lines begin with an RLE and follow this order:
In parsing the text, the first number found is the verse number, not the chapter number.
Hebrew text lines may contain transcription notes depending on the selected Layout.
These notes are denoted by values within square brackets, i.e. [x].
Because the square brackets and possibly the note itself may be left-to-right characters, all notes are preceded
by an LRE character and followed by a PDF character. Thus there is no change
in embedding level after entering and leaving a transcription note. Except in the Qere-only Layout, ketiv/qere sections are marked
with '*'s as in the original Michigan-Claremont coding.
The text files with "Full" Layout and "Accent" Content contain all the information
available in the Unicode/XML Leningrad Codex. The ordering of cantillation marks
may differ from that of the Unicode/XML Leningrad Codex, however. The mark ordering is that suggested by
John Hudson in his
SBL Hebrew Font User Manual.
- The initial RLE,
- one (1) non-breaking space (00a0),
If the Layout is "Full" or "Note-free":
- the verse number in a three-digit-wide field padded with non-breaking spaces (00a0),
- a sof pasuq '׃' (05c3, acting in place of a colon),
- the chapter number in a three-digit-wide field padded with non-breaking spaces (00a0),
"Text-only" and "Qere-only" Layouts omit the above 3 fields.
- one (1) non-breaking space (00a0), and
- the Hebrew text with transcription notes (see below), and
- the line termination of consisting of a PDF character, a carriage return (000d), and a linefeed (000a).