Skip to content

Commit 64be2b0

Browse files
committed
add some documentation and notes
1 parent 17d54c0 commit 64be2b0

File tree

4 files changed

+56
-2
lines changed

4 files changed

+56
-2
lines changed

JankSQL/Engines/BTreeEngine/BTreeTable.cs

+1-1
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ internal BTreeTable(string tableName, ExpressionOperandType[] keyTypes, IEnumera
7575
/// <summary>
7676
/// Initializes a new instance of the <see cref="BTreeTable"/> class as a heap.
7777
/// Creates a "heap" table with no unique index. Our approach to this is a table that has a fake
78-
/// "uniquifier" key as its bookmar_key. That single-column bookmark key maps to the values,
78+
/// "uniquifier" key as its bookmark_key. That single-column bookmark key maps to the values,
7979
/// which are all the columns given.
8080
/// </summary>
8181
/// <param name="tableName">string with the name of our table.</param>

README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ On the other hand, we can expect that I'll want to extend the grammar to support
2828

2929
### Storage
3030

31-
The storage engine is based on the [CSharpTest.Net](https://github.com/csharptest/CSharpTest.Net.Collections) B-Tree implementation. The engines are pluggable through the `IEngine` and `IEngineTable` interfaces. Implementations for in-memory and on-disk storage against the CSharpTest B-Tree are supplied. A limited implementation against a CSV flat-file is also supplied.
31+
The storage engine is based on the p. The engines are pluggable through the `IEngine` and `IEngineTable` interfaces. Implementations for in-memory and on-disk storage against the CSharpTest B-Tree are supplied. A limited implementation against a CSV flat-file is also supplied.
3232

3333
### Tests
3434

@@ -65,6 +65,9 @@ The project is buildable, and I intend that the main branch always has all of it
6565

6666
There are lots of language features being added as I work, so the best way to see what's supported is to scan through the tests.
6767

68+
# Documentation
69+
70+
I've started writing [documentation](docs/index.md).
6871

6972
# Licensing
7073

docs/TableStructure.md

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
2+
# Table Structure
3+
4+
JankSQL uses the [CSharpTest.Net](https://github.com/csharptest/CSharpTest.Net.Collections) B-Tree implementation, which is some amazing software. It implements a simple interface in its `BTree` generic class so that we can make a BTree of keys and values over `BTree<Key, Value>`. The class supports persistence and locking and gives enumerators that look up keys and walks the values available starting at a key.
5+
6+
In JankSQL, The `BTree` class is used with the `Tuple` class that implements a tuple of typed values, each represented with the `ExpressionOperand` class. Tuple represents a set of values, so it's used for both the key and the value. Thus, JankSQL's use of `BTree` is always on `BTree<Tuple, Tuple>`. Tuple has helpers that implement the comparison and persistence interfaces that `BTree` requires.
7+
8+
## Tables
9+
10+
JankSQL implements a table, then, with a key-value store built on a `BTree<Tuple, Tuple>` object. The value `Tuple` contains all of the columns of the table. The key is a `Tuple` that contains a single integer which is used as a monotonically increasing row ID.
11+
12+
Since the table has no index, any operation against it is a scan. Inserting a new row simply adds a one to the last used key and inserts the row as the value for that key. Deleting a row simply removes the row, and the key number is not re-used.
13+
14+
For now, this approach is quite adequate, but it does mean that a table can't survive more than 2<sup>32</sup> operations because the row ID value will wrap-around. (This is tracked by [Issue #2](https://github.com/mikeblas/JankSQL/issues/2)).
15+
16+
Conceptually, we can consider table's fundamental storage -- sometimes called it "heap", perhaps incorrectly -- to be a map between the row ID and the actual row payload: `BTree<RowID, Tuple>`. It's just that the row ID itself is implemented as a `Tuple`, too.
17+
18+
## Unique Indexes
19+
20+
A unique index in Jank augments the fundamental `BTree` with another access path. Each index is implemented a map from the keys of the index to the row ID. We can consider the table and the first index as an example:
21+
22+
```csharp
23+
BTree<Tuple, Tuple> theTable; // key: row ID, value: rows
24+
BTree<Tuple, Tuple> firstIndex; // key: index key, value: row ID
25+
```
26+
27+
To find a row, we can look it up by key in `firstIndex` to get a row ID. Then, to get the remaining columns, the row ID is used to probe `theTable` to get that payload.
28+
29+
Any number of indexes can be created, all referencing back to `theTable` via the row ID key.
30+
31+
## Non-unique Indexes
32+
33+
Classically, BTrees implement only unique indexes: keys can't be duplicated. CSharpTest's implementation is no different, so Jank must provide some mechanism for handling duplicate key values in non-unique indexes.
34+
35+
Jank's approach simply appends a unique ID to the key set. If a non-unique index is created with key columns `Col1` and `Col2`, the effective key becomes `(Col1, Col2, uniqueifier)`. A probe for a value into a non-unique index naturally is a scan, since there may be zero, one, or more values matching the key due to its non-unique nature.
36+
37+
Jank is limited again by using a 32-bit integer here, so any non-unique index an have only 2<sup>32</sup> keys with the same value.
38+
39+
## Index maintenance
40+
41+
The addition or removal of a row to the table updates all indexes. Updating a value in an existing row must update the indexes that cover that column.
42+

docs/index.md

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Documentation
2+
3+
There are notes here about the implementation details, as well as information about writing code to use JankSQL.
4+
5+
6+
## Implementation
7+
8+
* [Table Structure](TableStructure.md) describes how tables and indexes are built.
9+

0 commit comments

Comments
 (0)