Skip to content

Commit 8677541

Browse files
committed
doc: Write about dictionary formats
1 parent 9f77bed commit 8677541

File tree

4 files changed

+168
-0
lines changed

4 files changed

+168
-0
lines changed

doc/advanced_usage.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,7 @@ other than English.
77
```{toctree}
88
:maxdepth: 1
99
10+
translation_language
11+
dict_formats
1012
plugins
1113
```

doc/conf.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,3 +63,29 @@
6363
"light_logo": "dolores.svg",
6464
"dark_logo": "dolores.svg",
6565
}
66+
67+
from pygments.lexer import RegexLexer, bygroups
68+
from pygments import token as t
69+
from sphinx.highlighting import lexers
70+
71+
class RTFLexer(RegexLexer):
72+
name = "rtf"
73+
74+
tokens = {
75+
"root": [
76+
(r"(\\[a-z*\\_~\{\}]+)(-?\d+)?", bygroups(t.Keyword, t.Number.Integer)),
77+
(r"{\\\*\\cxcomment\s+", t.Comment.Multiline, "comment"),
78+
(r"({)(\\\*\\cxs)(\s+)([A-Z#0-9\-/#!,]+)(})",
79+
bygroups(t.Operator, t.Keyword, t.Text, t.String, t.Operator)),
80+
(r"{", t.Operator),
81+
(r"}", t.Operator),
82+
(r".+?", t.Text),
83+
],
84+
"comment": [
85+
(r"{", t.Comment.Multiline, "#push"),
86+
(r"}", t.Comment.Multiline, "#pop"),
87+
(r".+", t.Comment.Multiline),
88+
]
89+
}
90+
91+
lexers["rtf"] = RTFLexer(startinline=True)

doc/dict_formats.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Dictionary Formats
2+
3+
Plover supports multiple proprietary steno dictionary formats, as well as
4+
some open formats widely used by the Open Steno community. At a high level,
5+
there are two main types of dictionaries: static and programmatic.
6+
7+
## Static Dictionaries
8+
9+
Static dictionaries consist of entries mapping steno outlines to translations.
10+
This is the simplest type of dictionary, and most Plover dictionaries will
11+
be of this form.
12+
13+
### JSON
14+
15+
The most common format for steno dictionaries in Plover is the **JavaScript
16+
Object Notation** (JSON) format. This consists of a series of key-value pairs
17+
separated by commas and surrounded by curly brackets `{}`:
18+
19+
```json
20+
{
21+
"KAT": "cat",
22+
"KAT/HROG": "catalog",
23+
"KA/TA/HROG": "catalog",
24+
"-S": "{^s}"
25+
}
26+
```
27+
28+
In each key-value pair, the key is the [canonical steno notation](steno-notation)
29+
of the outline, with the strokes separated by slashes, and the value is the
30+
translation for that outline in Plover's [translation language](translation_language).
31+
This format is used for dictionaries because it matches Plover's internal
32+
storage format almost exactly.
33+
34+
### RTF/CRE
35+
36+
Another common dictionary format, which is also supported by most proprietary
37+
steno software, is the
38+
[**Rich Text Format** with Court Reporting Extensions](http://www.legalxml.org/workgroups/substantive/transcripts/cre-spec.htm)
39+
(RTF/CRE) format. It was designed as an interchange format between steno
40+
systems, so Plover supports some of the features implemented into the format.
41+
42+
```rtf
43+
{\rtf1\ansi\cxrev100\cxdict
44+
{\*\cxs KAT}cat
45+
{\*\cxs KAT/HROG}catalog
46+
{\*\cxs KA/TA/HROG}catalog
47+
{\*\cxs -S}\cxds s{\*\cxcomment -s suffix}
48+
}
49+
```
50+
51+
In RTF dictionaries, while the steno outline is also written in the same
52+
notation, the translation isn't written in Plover's translation language;
53+
instead it uses RTF-specific formatting controls that translate to different
54+
commands for each steno system that supports it. RTF also supports some
55+
entry-level metadata, such as comments and historical usage data, but these
56+
can't be read by Plover.
57+
58+
It's generally not recommended to maintain RTF dictionaries, since they can be
59+
slow to parse and the format isn't especially well defined, but this is often
60+
an option if, for example, professional stenographers would also like to use
61+
their personal dictionaries with Plover.
62+
63+
### Proprietary Formats
64+
65+
Plover also supports some proprietary software's native dictionary formats,
66+
with the help of some plugins:
67+
68+
- [plover-casecat-dictionary](https://github.com/marnanel/plover_casecat_dictionary) -- Stenograph Case CATalyst dictionaries (`.sgdct`)
69+
- [plover-digitalcat-dictionary](https://github.com/marnanel/plover_digitalcat_dictionary) -- Stenovations digitalCAT dictionaries (`.dct`)
70+
- [plover-eclipse-dictionary](https://github.com/marnanel/plover_eclipse_dictionary) -- Advantage Software Eclipse dictionaries (`.dix`)
71+
72+
## Programmatic Dictionaries
73+
74+
Programmatic dictionaries, instead of containing a list of entries, calculate
75+
translations on the fly, the moment Plover requests them. This is most useful
76+
for heavily regular dictionaries like a symbol system or a syllabic theory.
77+
78+
The [plover-python-dictionary](https://github.com/benoit-pierre/plover_python_dictionary)
79+
plugin adds support for programmatic dictionaries written in Python, which can
80+
be used in Plover just like static ones.
81+
82+
Programmatic dictionaries primarily expose a lookup function, which calculates
83+
a translation for a given steno outline. Some dictionaries may also provide a
84+
reverse-lookup function, which calculates all the possible outlines that
85+
translate to a particular text.
86+
87+
```{data} LONGEST_KEY
88+
The maximum number of strokes that this dictionary can translate. Plover uses
89+
this value to optimize dictionary lookups by only using this dictionary when
90+
looking up outlines this length or shorter.
91+
92+
This attribute is **required**.
93+
```
94+
95+
```{function} lookup(outline: Tuple[str]) -> str
96+
Given an outline which is a tuple of steno strokes, returns the translation for
97+
this outline, or raises a `KeyError` when no translation is available. The
98+
translation should be in Plover's [translation language](translation_language).
99+
100+
This function is **required**.
101+
```
102+
103+
```{function} reverse_lookup(translation: str) -> List[Tuple[str]]
104+
Given a translation in Plover's [translation language](translation_language),
105+
returns the list of possible outlines that translate to it. The list may be
106+
empty if there are no possible outlines in this dictionary.
107+
108+
This function is *optional*; the dictionary still works without implementing
109+
it, but it will not support searching in the Lookup tool.
110+
```
111+
112+
Here is an example of a very basic programmatic dictionary which just
113+
translates `KP-PL` to `example`:
114+
115+
```python
116+
LONGEST_KEY = 1
117+
118+
119+
def lookup(outline):
120+
assert len(outline) == 1
121+
122+
stroke = outline[0]
123+
if stroke == "KP-PL":
124+
return "example"
125+
else:
126+
raise KeyError
127+
128+
129+
def reverse_lookup(translation):
130+
if translation == "example":
131+
return [("KP-PL",)]
132+
else:
133+
return []
134+
```

doc/translation_language.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Translation Language
2+
3+
In addition to translating normal text, Plover supports some special
4+
formatting operators as well as commands to control the steno engine behavior.
5+
This page describes all of these operators and how they are represented in
6+
translations.

0 commit comments

Comments
 (0)