diff --git a/_config.yml b/_config.yml
index 8f31b82a..32833dbc 100644
--- a/_config.yml
+++ b/_config.yml
@@ -33,6 +33,7 @@ sphinx:
         tskit: ["https://tskit.dev/tskit/docs/stable", null]
         msprime: ["https://tskit.dev/msprime/docs/stable", null]
         pyslim: ["https://pyslim.readthedocs.io/en/stable", null]
+        numpy: ["https://numpy.org/doc/stable/", null]
       myst_enable_extensions:
       - colon_fence
       - deflist
diff --git a/_toc.yml b/_toc.yml
index 9bdd798c..ae0e00e0 100644
--- a/_toc.yml
+++ b/_toc.yml
@@ -13,6 +13,10 @@ parts:
   - file: simplification
   - file: metadata
   - file: tskitr
+- caption: Analysis
+  chapters:
+  - file: analysing_tree_sequences
+  - file: analysing_trees
 - caption: Backward time simulations
   # TODO this will be broken down more finely as we add more content
   chapters:
diff --git a/analysing_tree_sequences.md b/analysing_tree_sequences.md
new file mode 100644
index 00000000..02fec1de
--- /dev/null
+++ b/analysing_tree_sequences.md
@@ -0,0 +1,25 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.12
+    jupytext_version: 1.9.1
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+```{currentmodule} tskit
+```
+
+(sec_analysing_tree_sequences)=
+
+# Analysing tree sequences
+
+:::{todo}
+Add all stats material from https://tskit.dev/tskit/docs/stable/tutorial.html
+
+Should we rename this "calculating statistics", or something else with the word "statistics" in the name?
+:::
\ No newline at end of file
diff --git a/analysing_trees.md b/analysing_trees.md
new file mode 100644
index 00000000..819c3061
--- /dev/null
+++ b/analysing_trees.md
@@ -0,0 +1,409 @@
+---
+jupytext:
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.12
+    jupytext_version: 1.9.1
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+```{currentmodule} tskit
+```
+
+```{code-cell} ipython3
+:tags: [remove-cell]
+import io
+import pickle
+
+import msprime
+import tskit
+
+def tree_traversals():
+    nodes = """\
+    id      is_sample   time
+    0       1           0
+    1       1           0
+    2       1           0
+    3       1           0
+    4       1           0
+    5       0           1
+    6       0           2
+    7       0           3
+    """
+    edges = """\
+    left    right   parent  child
+    0       1       5       0,1,2
+    0       1       6       3,4
+    0       1       7       5,6
+    """
+    # NB same tree as used above, and we're using the same diagram.
+    ts = tskit.load_text(
+        nodes=io.StringIO(nodes), edges=io.StringIO(edges), strict=False
+    )
+    ts.dump("data/tree_traversals.trees")
+
+def different_time_samples():
+    samples = [
+        msprime.Sample(0, 0),
+        msprime.Sample(0, 1),
+        msprime.Sample(0, 20),
+    ]
+    ts = msprime.simulate(
+        Ne=1e6,
+        samples=samples,
+        demographic_events=[
+            msprime.PopulationParametersChange(time=10, growth_rate=2, population_id=0),
+        ],
+        random_seed=42,
+    )
+    ts.dump("data/different_time_samples.trees")
+
+def parsimony_simple():
+    ts = msprime.sim_ancestry(3, random_seed=42)
+    ts.dump("data/parsimony_simple.trees")
+
+def parsimony_map():
+    # pick a seed that gives the tips in the right order
+    ts = msprime.sim_ancestry(3, sequence_length=100, random_seed=38)
+    ts.dump("data/parsimony_map.trees")
+    ts = msprime.sim_mutations(
+        ts, rate=0.01, random_seed=123, discrete_genome=False)
+    data = [
+        {'pos':v.site.position, 'alleles': v.alleles, 'genotypes':v.genotypes}
+        for v in ts.variants()]
+    with open("data/parsimony_map.pickle", "wb") as f:
+        pickle.dump(data, f)
+
+
+def create_notebook_data():
+    tree_traversals()
+    different_time_samples()
+    parsimony_simple()
+    parsimony_map()
+
+# create_notebook_data()  # uncomment to recreate the tree seqs used in this notebook
+```
+
+(sec_analysing_trees)=
+# Analysing trees
+
+There are a number of different ways we might want to analyse a single  {class}`Tree`.
+Most involve some sort of traversal over the nodes, mutations, or branches in the tree.
+{program}`tskit` provides various way of traversing through a tree, and also some
+built in phylogenetic algorithms such as {meth}`Tree.map_mutations` which efficently
+places mutations ("characters" in phylogenetic terminology) on a given tree.
+
+
+(sec_analysing_trees_traversals)=
+
+## Tree traversals
+
+There are various a single {class}`Tree`, traversals in various orders are possible using the
+{meth}`~Tree.nodes` iterator. For example, in the following tree we can visit the
+nodes in different orders:
+
+
+```{code-cell} ipython3
+import tskit
+from IPython.display import SVG, display
+
+ts = tskit.load("data/tree_traversals.trees")
+tree = ts.first()
+display(SVG(tree.draw_svg()))
+
+for order in ["preorder", "inorder", "postorder"]:
+    print(f"{order}:\t", list(tree.nodes(order=order)))
+```
+
+Much of the time, the specific ordering of the nodes is not important
+and we can leave it out (defaulting to preorder traversal). For example,
+here we compute the total branch length of a tree:
+
+```{code-cell} ipython3
+total_branch_length = sum(tree.branch_length(u) for u in tree.nodes())
+print(f"Total branch length: {total_branch_length}")
+```
+
+Note that this is also available as the {attr}`Tree.total_branch_length` attribute.
+
+### Traversing upwards
+
+For many applications it is useful to be able to traverse upwards from the
+leaves. We can do this using the {meth}`Tree.parent` method, which
+returns the parent of a node. For example, we can traverse upwards from
+each of the samples in the tree:
+
+```{code-cell} ipython3
+for u in tree.samples():
+    path = []
+    v = u
+    while v != tskit.NULL:
+        path.append(v)
+        v = tree.parent(v)
+    print(u, "->", path)
+```
+
+### Traversals with information
+
+Sometimes we will need to traverse down the tree while maintaining
+some information about the nodes that are above it. While this
+can be done using recursive algorithms, it is often more convenient
+and efficient to use an iterative approach. Here, for example,
+we define an iterator that yields all nodes in preorder along with
+their path length to root:
+
+```{code-cell} ipython3
+def preorder_dist(tree):
+    for root in tree.roots:
+        stack = [(root, 0)]
+        while len(stack) > 0:
+            u, distance = stack.pop()
+            yield u, distance
+            for v in tree.children(u):
+                stack.append((v, distance + 1))
+
+print(list(preorder_dist(tree)))
+```
+
+
+(sec_tutorial_networkx)=
+
+## Networkx
+
+Traversals and other network analysis can also be performed using the sizeable
+[networkx](https://networkx.github.io/documentation/stable/index.html)
+library. This can be achieved by calling {meth}`Tree.as_dict_of_dicts` to
+convert a {class}`Tree` instance to a format that can be imported by networkx to
+create a graph:
+
+```{code-cell} ipython3
+import networkx as nx
+
+g = nx.DiGraph(tree.as_dict_of_dicts())
+print(sorted(g.edges))
+```
+
+### Traversing upwards in networkx
+
+We can revisit the above examples and traverse upwards with
+networkx using a depth-first search algorithm:
+
+```{code-cell} ipython3
+g = nx.DiGraph(tree.as_dict_of_dicts())
+for u in tree.samples():
+    path = [u] + [parent for parent, child, _ in
+                  nx.edge_dfs(g, source=u, orientation="reverse")]
+    print(u, "->", path)
+```
+
+
+### Calculating distances to the root
+
+Similarly, we can yield the nodes of a tree along with their distance to the
+root in pre-order in networkx as well. Running this on the example above gives us
+the same result as before:
+
+```{code-cell} ipython3
+g = nx.DiGraph(tree.as_dict_of_dicts())
+for root in tree.roots:
+    print(sorted(list(nx.shortest_path_length(g, source=root).items())))
+```
+
+
+### Finding nearest neighbors
+
+If some samples in a tree are not at time 0, then finding the nearest neighbor
+of a sample is a bit more involved. Instead of writing our own traversal code
+we can again draw on a networkx algorithm.
+Let us start with an example tree with three samples that were sampled at
+different time points:
+
+```{code-cell} ipython3
+ts = tskit.load("data/different_time_samples.trees")
+tree = ts.first()
+SVG(tree.draw_svg(y_axis=True, time_scale="rank"))
+```
+
+The generation times for these nodes are as follows:
+
+```{code-cell} ipython3
+for u in tree.nodes():
+    print(f"Node {u}: time {tree.time(u)}")
+```
+
+Note that samples 0 and 1 are about 35 generations apart from each other even though
+they were sampled at almost the same time. This is why samples 0 and 1 are
+closer to sample 2 than to each other.
+
+For this nearest neighbor search we will be traversing up and down the tree,
+so it is easier to treat the tree as an undirected graph:
+
+```{code-cell} ipython3
+g = nx.Graph(tree.as_dict_of_dicts())
+```
+
+When converting the tree to a networkx graph the edges are annotated with their
+branch length:
+
+```{code-cell} ipython3
+for e in g.edges(data=True):
+    print(e)
+```
+
+We can now use the "branch_length" field as a weight for a weighted shortest path
+search:
+
+```{code-cell} ipython3
+import collections
+import itertools
+
+# a dictionary of dictionaries to represent our distance matrix
+dist_dod = collections.defaultdict(dict)
+for source, target in itertools.combinations(tree.samples(), 2):
+    dist_dod[source][target] = nx.shortest_path_length(
+        g, source=source, target=target, weight="branch_length"
+    )
+    dist_dod[target][source] = dist_dod[source][target]
+
+# extract the nearest neighbor of nodes 0, 1, and 2
+nearest_neighbor_of = [min(dist_dod[u], key=dist_dod[u].get) for u in range(3)]
+
+print(dict(zip(range(3), nearest_neighbor_of)))
+```
+
+
+(sec_analysing_trees_parsimony)=
+
+## Parsimony
+
+The {meth}`Tree.map_mutations` method finds a parsimonious explanation for a
+set of discrete character observations on the samples in a tree using classical
+phylogenetic algorithms.
+
+```{code-cell} ipython3
+tree = tskit.load("data/parsimony_simple.trees").first()
+alleles = ["red", "blue", "green"]
+genotypes = [0, 0, 0, 0, 1, 2]
+styles = [f".n{j} > .sym {{fill: {alleles[g]}}}" for j, g in enumerate(genotypes)]
+display(SVG(tree.draw_svg(style="".join(styles))))
+
+ancestral_state, mutations = tree.map_mutations(genotypes, alleles)
+print("Ancestral state = ", ancestral_state)
+for mut in mutations:
+    print(f"Mutation: node = {mut.node} derived_state = {mut.derived_state}")
+```
+
+So, the algorithm has concluded, quite reasonably, that the most parsimonious
+description of this state is that the ancestral state is red and there was
+a mutation to blue and green over nodes 4 and 5.
+
+### Building tables
+
+One of the main uses of {meth}`Tree.map_mutations` is to position mutations on a tree
+to encode observed data. In the following example we show how a set
+of tables can be updated using the {ref}`Tables API<tskit:sec_tables_api>`; here we
+infer the location of mutations in an simulated tree sequence of one tree,
+given a set of site positions with their genotypes and allelic states:
+
+```{code-cell} ipython3
+import pickle
+ts = tskit.load("data/parsimony_map.trees")
+with open("data/parsimony_map.pickle", "rb") as file:
+    data = pickle.load(file)  # Load saved variant data from a file
+display(SVG(ts.draw_svg(size=(500, 300), time_scale="rank")))
+print("Variant data: pos, genotypes & alleles as described by the ts.variants() iterator:")
+for i, v in enumerate(data):
+    print(f"Site {i} (pos {v['pos']:7.4f}): alleles: {v['alleles']}, genotypes: {v['genotypes']}")
+print()
+tree = ts.first()  # there's only one tree anyway
+tables = ts.dump_tables()
+# Infer the sites and mutations from the variants.
+for variant in data:
+    ancestral_state, mutations = tree.map_mutations(variant["genotypes"], variant['alleles'])
+    site_id = tables.sites.add_row(variant['pos'], ancestral_state=ancestral_state)
+    info = f"Site {site_id}: parsimony sets ancestral state to {ancestral_state}"
+    parent_offset = len(tables.mutations)
+    for mut in mutations:
+        parent = mut.parent
+        if parent != tskit.NULL:
+            parent += parent_offset
+        mut_id = tables.mutations.add_row(
+            site_id, node=mut.node, parent=parent, derived_state=mut.derived_state)
+        info += f", and places mutation {mut.id} to {mut.derived_state} above node {mut.node}"
+    print(info)
+new_ts = tables.tree_sequence()
+```
+
+```{code-cell} ipython3
+mut_labels = {}  # An array of labels for the mutations
+for mut in new_ts.mutations():  # Make pretty labels showing the change in state
+    site = new_ts.site(mut.site)
+    older_mut = mut.parent >= 0  # is there an older mutation at the same position?
+    prev = new_ts.mutation(mut.parent).derived_state if older_mut else site.ancestral_state
+    mut_labels[site.id] = f"{mut.id}: {prev}→{mut.derived_state}"
+
+display(SVG(new_ts.draw_svg(size=(500, 300), mutation_labels=mut_labels, time_scale="rank")))
+```
+
+
+### Parsimony and missing data
+
+The Hartigan parsimony algorithm in {meth}`Tree.map_mutations` can also take missing data
+into account when finding a set of parsimonious state transitions. We do this by
+specifying the special value {data}`tskit.MISSING_DATA` (-1) as the state, which is
+treated by the algorithm as "could be anything".
+
+For example, here we state that sample 0 is missing, and use the colour white to indicate
+this:
+
+```{code-cell} ipython3
+tree = tskit.load("data/parsimony_simple.trees").first()
+alleles = ["red", "blue", "green", "white"]
+genotypes = [tskit.MISSING_DATA, 0, 0, 0, 1, 2]
+styles = [f".n{j} > .sym {{fill: {alleles[g]}}}" for j, g in enumerate(genotypes)]
+display(SVG(tree.draw_svg(style="".join(styles))))
+
+ancestral_state, mutations = tree.map_mutations(genotypes, alleles)
+print("Ancestral state = ", ancestral_state)
+for mut in mutations:
+    print(f"Mutation: node = {mut.node} derived_state = {mut.derived_state}")
+```
+
+
+The algorithm decided, again, quite reasonably, that the most parsimonious explanation
+for the input data is the same as before. Thus, if we used this information to fill
+out mutation table as above, we would impute the missing value for 0 as red.
+
+The output of the algorithm can be a little surprising at times. Consider this example::
+
+```{code-cell} ipython3
+tree = msprime.simulate(6, random_seed=42).first()
+alleles = ["red", "blue", "white"]
+genotypes = [1, -1, 0, 0, 0, 0]
+node_colours = {j: alleles[g] for j, g in enumerate(genotypes)}
+ancestral_state, mutations = tree.map_mutations(genotypes, alleles)
+print("Ancestral state = ", ancestral_state)
+for mut in mutations:
+    print(f"Mutation: node = {mut.node} derived_state = {mut.derived_state}")
+SVG(tree.draw(node_colours=node_colours))
+```
+
+
+Note that this is putting a mutation to blue over node 6, **not** node 0 as
+we might expect. Thus, we impute here that node 1 is blue. It is important
+to remember that the algorithm is minimising the number of state transitions;
+this may not correspond always to what we might consider the most parsimonious
+explanation.
+
+## Fast tree-based algorithms using numba
+
+:::{todo}
+Add a few examples here of how to use numba to speed up tree based dynamic programming
+algorithms. There are a good number of worked-up examples with timings in
+[issue #63](https://github.com/tskit-dev/tutorials/issues/63)
+:::
+
diff --git a/data/construction_example.trees b/data/construction_example.trees
new file mode 100644
index 00000000..5c7fc9af
Binary files /dev/null and b/data/construction_example.trees differ
diff --git a/data/different_time_samples.trees b/data/different_time_samples.trees
new file mode 100644
index 00000000..233cd155
Binary files /dev/null and b/data/different_time_samples.trees differ
diff --git a/data/parsimony_map.pickle b/data/parsimony_map.pickle
new file mode 100644
index 00000000..55dba4ef
Binary files /dev/null and b/data/parsimony_map.pickle differ
diff --git a/data/parsimony_map.trees b/data/parsimony_map.trees
new file mode 100644
index 00000000..0ac459ab
Binary files /dev/null and b/data/parsimony_map.trees differ
diff --git a/data/parsimony_simple.trees b/data/parsimony_simple.trees
new file mode 100644
index 00000000..cfb30429
Binary files /dev/null and b/data/parsimony_simple.trees differ
diff --git a/data/tables_example.trees b/data/tables_example.trees
new file mode 100644
index 00000000..51cda181
Binary files /dev/null and b/data/tables_example.trees differ
diff --git a/data/tables_example_muts.trees b/data/tables_example_muts.trees
new file mode 100644
index 00000000..30c71489
Binary files /dev/null and b/data/tables_example_muts.trees differ
diff --git a/data/tree_traversals.trees b/data/tree_traversals.trees
new file mode 100644
index 00000000..cdc41671
Binary files /dev/null and b/data/tree_traversals.trees differ
diff --git a/data_structures.md b/data_structures.md
index 568185dd..3c87de18 100644
--- a/data_structures.md
+++ b/data_structures.md
@@ -11,12 +11,529 @@ kernelspec:
   name: python3
 ---
 
+```{currentmodule} tskit
+```
+
+```{code-cell} ipython3
+:tags: [remove-cell]
+import io
+import pickle
+
+import msprime
+import tskit
+
+def tables_examples():
+    nodes = """\
+    id      is_sample   time
+    0       1           0
+    1       1           0
+    2       1           0
+    3       0           0.15
+    4       0           0.6
+    5       0           0.8
+    6       0           1.0
+    """
+    edges = """\
+    left    right   parent  child
+    20      80      3       0
+    20      80      3       2
+    0       100     4       1
+    0       20      4       2
+    80      100     4       2
+    20      80      4       3
+    80      100     5       0
+    80      100     5       4
+    0       20      6       0
+    0       20      6       4
+    """
+    sites = """\
+    id    position    ancestral_state
+    0     15          A
+    1     42          G
+    2     60          T
+    """
+    mutations = """\
+    site    node    derived_state    parent    time
+    0       4       G                -1        0.9
+    1       1       A                -1        0.4
+    2       3       C                -1        0.55
+    2       0       T                2         0.1
+    """
+    ts = tskit.load_text(
+        nodes=io.StringIO(nodes),
+        edges=io.StringIO(edges),
+        strict=False,
+    )
+    ts.dump("data/tables_example.trees")
+    ts = tskit.load_text(
+        nodes=io.StringIO(nodes),
+        edges=io.StringIO(edges),
+        sites=io.StringIO(sites),
+        mutations=io.StringIO(mutations),
+        strict=False,
+    )
+    ts.dump("data/tables_example_muts.trees")
+
+def construction_example():
+    nodes = """\
+    id      is_sample   time
+    0       1           0
+    1       1           0
+    2       0           10
+    3       1           0
+    4       0           20
+    """
+    edges = """\
+    left    right   parent  child
+    0       1000    2       0
+    0       1000    2       1
+    0       1000    4       2
+    0       1000    4       3
+    """
+    sites = """\
+    id    position    ancestral_state
+    0     500          A
+    """
+    mutations = """\
+    site    node    derived_state    parent
+    0       3       G                -1
+    """
+    ts = tskit.load_text(
+        nodes=io.StringIO(nodes),
+        edges=io.StringIO(edges),
+        sites=io.StringIO(sites),
+        mutations=io.StringIO(mutations),
+        strict=False,
+    )
+    ts.dump("data/construction_example.trees")
+    
+
+def create_notebook_data():
+    tables_examples()
+    construction_example()
+
+# create_notebook_data()  # uncomment to recreate the tree seqs used in this notebook
+```
+
+
 (sec_data_structures)=
 
 # Data structures
 
+(sec_data_structures_tables)=
 
 ## Tables
 
+Internally, a tree sequence can be thought of as a set of tables, and {program}`tskit`
+provides an interface for dealing with these tables directly. This is particularly
+relevant when you wish to {ref}`edit or otherwise modify<sec_data_structures_editing>`
+a tree sequence, although tables are also useful for bulk access to data contained in
+the tree sequence, such as the times of all nodes.
+
+There are eight tables that together define a tree sequence, although some may be empty,
+and together they form a {class}`TableCollection`.
+The tables are defined in the official {program}`tskit` documentation for
+{ref}`Table Definitions <tskit:sec_table_definitions>`, and the
+{ref}`Tables API <tskit:sec_tables_api>` section in the docs describes how to work with
+them. In this tutorial we give some pointers about what you can and cannot do with them.
+
+### Correspondance between tables and trees
+
+Consider the following sequence of trees:
+
+```{code-cell} ipython3
+from IPython.display import SVG
+ts = tskit.load("data/tables_example.trees")
+SVG(ts.draw_svg(y_axis=True))
+```
+
+Ancestral recombination events have produced three different trees
+that relate the three sampled genomes ``0``, ``1``, and ``2`` to each other
+along the chromosome of length 100.
+
+#### Node and edge tables
+
+Each node in each of the above trees represents a particular ancestral genome
+(a *haploid* genome; diploid individuals would be represented by two nodes).
+Details about each node, including the time it lived, are stored in a
+{class}`NodeTable` (details {ref}`here<tskit:sec_mutation_table_definition>`)
+
+```{code-cell} ipython3
+ts.tables.nodes
+```
+
+Importantly, the first column, ``id``, is not actually recorded, and is
+only shown when printing out node tables (as here) for convenience.
+The second column, ``flags`` records a ``1`` for the individuals that are *samples*,
+i.e., whose entire genealogical history is recorded by these trees.
+(Note that the trees above record that node 3 inherited from node 4
+on the middle portion of the genome, but not on the ends.)
+
+The way the nodes are related to each other (i.e. the tree topology) is stored
+in an {class}`EdgeTable` (details {ref}`here<tskit:sec_edge_table_definition>`).
+Since some edges are present in more than one tree (e.g., node 1 inherits from node 4
+across the entire sequence), each edge contains not only the IDs of a parent and a child
+node, but also the left and right positions which define the genomic region for which it
+appears in the trees:
+
+```{code-cell} ipython3
+ts.tables.edges
+```
+
+Since node 3 is most recent, the edge that says that nodes 0 and 2 inherit
+from node 3 on the interval between 0.2 and 0.8 comes first.  Next are the
+edges from node 4: there are four of these, as the edge from node 4 to node
+1 is shared across the entire sequence, and for each of the three
+genomic intervals there is an additional child node. At this
+point, we know the full tree on the middle interval.  Finally, edges
+specifying the common ancestor of 0 and 4 on the remaining intervals (parents 6
+and 5 respectively) allow us to construct all trees across the entire interval.
+
+
+
+#### Sites and mutations tables
+
+Most tree sequences have DNA variation data associated with them,
+{ref}`stored as mutations overlaid on the trees<sec_what_is_dna_data>`:
+
+```{code-cell} ipython3
+ts = tskit.load("data/tables_example_muts.trees")
+mut_labels = {}  # An array of labels for the mutations, listing position & allele change
+for mut in ts.mutations():  # This entire loop is just to make pretty labels
+    site = ts.site(mut.site)
+    older_mut = mut.parent >= 0  # is there an older mutation at the same position?
+    prev = ts.mutation(mut.parent).derived_state if older_mut else site.ancestral_state
+    mut_labels[mut.id] = "{}→{} @{:g}".format(prev, mut.derived_state, site.position)
+
+SVG(ts.draw_svg(y_axis=True, mutation_labels=mut_labels))
+```
+
+There are four mutations in the depiction above,
+marked by red crosses: one above node ``4`` on the first tree which records an A to G
+transition at position 15, another above node ``1`` in the second tree which records a G
+to A transition at position 45, and the final two above nodes ``2`` and ``0`` recording
+transitions, both at position 60, on the second tree. The positions are recorded in the
+{class}`SiteTable` (details {ref}`here<tskit:sec_site_table_definition>`):
+
+```{code-cell} ipython3
+ts.tables.sites
+```
+
+As with node tables, the ``id`` column is **not** actually recorded, but is
+implied by the position in the table.  The mutations themselves are recorded in the
+{class}`MutationTable` (details {ref}`here<tskit:sec_mutation_table_definition>`).
+This associates each mutation with a site ID, the ID of the node above which the
+mutation occurs, the derived state to which the allele has mutated, and (optionally) a
+time at which the mutation occured.
+
+
+```{code-cell} ipython3
+ts.tables.mutations
+```
+
+Where there are multiple mutations at the same site, there can be mutations
+stacked on top of each other. The "parent" column therefore contains the ID of the
+mutation immediately above the current one at the same site, or -1 if there is no parent.
+
+These sites and mutations allow us to calculate the DNA sequence, or haplotype, for each
+of the sample nodes:
+
+```{code-cell} ipython3
+for sample, h in enumerate(ts.haplotypes()):
+    print(f"Sample {sample}: {h}")
+```
+
+#### Other tables
+
+The other tables in a {class}`TableCollection` are the {class}`IndividualTable` (which
+records which {ref}`individual organism<sec_nodes_or_individuals>` a node is contained
+in), the {class}`PopulationTable` and the {class}`MigrationTable`, and the
+{class}`ProvenanceTable` which records the {ref}`tskit:sec_provenance` of the data in
+a tree sequence. These won't be covered in this tutorial.
+
+#### Metadata
+
+You may have noticed ``metadata`` columns in all the tables above. All tables can have
+(optional) metadata attached to their rows, as described in the {ref}`sec_metadata`
+section of the official {program}`tskit` documentation.
+The {ref}`metadata tutorial<sec_tutorial_metadata>` provides more information about how
+to set and use these metadata columns in tree sequence tables.
+
+
+### Accessing table data
+
+To look at a the contents of a table, you can simply `print` it:
+
+```{code-cell} ipython3
+print(ts.tables.mutations)
+```
+
+But {program}`tskit` also provides access to the rows and columns of each table.
+
+#### Row access
+
+Rows can be accessed using square braces, which will return an object containing the
+raw values:
+
+```{code-cell} ipython3
+row = ts.tables.mutations[1]
+print(f"A mutation at site {row.site} exists at time {row.time}")
+print(row)
+```
+
+Additionally, many rows can be extracted into a new table using slices or
+{ref}`numpy indexing<numpy:basics.indexing>`
+
+```{code-cell} ipython3
+ts.tables.mutations[2:4]
+```
+
+:::{note}
+When manipulating table data, it is quite possible to create a table collection that
+does not correspond to a valid tree sequence. For instance, if we replaced the mutations
+table in our original tree sequence with the table above, the parent column
+would refer to an invalid mutation ID (ID 2, when in the new tables we only have
+mutations with IDs 0 and 1). In this particular case there is a method,
+{meth}`TableCollection.compute_mutation_parents`, which will recalculate the parent IDs
+from the trees, but there is no general way to ensure that a manipulated table will
+remain valid.
+:::
+
+Rows can also be *added* to a table using ``.add_row()``. However, when modifying tables
+from a tree sequence, you should always modify a *copy* of the original data,
+using the {meth}`TreeSequence.dump_tables` method:
+
+```{code-cell} ipython3
+new_tables = ts.dump_tables()
+new_pos = 10
+new_site_id = new_tables.sites.add_row(position=new_pos, ancestral_state="G")
+print("New empty site allocated at position {new_pos} with ID {new_site_id}")
+new_tables.sites
+```
+
+Note that one of the requirements of a tree sequence is that the sites are listed in
+order of position, so this new table will require sorting (see
+{ref}`sec_data_structures_tables_creating`)
+
+
+#### Column access
+
+An entire column in a table can be extracted as a {program}`numpy` array from the table
+object. For instance, if ``n`` is a {class}`NodeTable`, then ``n.time``
+will return an array containing the birth times of the individuals whose genomes
+are represented by the nodes in the table. *However*, it is important to note that this
+is a copy of the data, and modifying individual elements of ``n.time`` will *not* change
+the node table ``n``. To change the column data, you need to take a copy, modify it,
+and assign it back in. For example, here we add 0.25 to every ``time`` except the first
+in the node table from our current example:
+
+```{code-cell} ipython3
+node_table = new_tables.nodes
+times = node_table.time
+print("Old node times:", times)
+times[1:] = times[1:] + 0.25
+node_table.time = times
+node_table
+```
+
+When assigning columns like this, an error will be raised if the column is not of the
+expected length:
+
+```{code-cell} ipython3
+:tags: [raises-exception]
+node_table.time = times[2:]
+```
+
+(sec_data_structures_tables_creating)=
+
+### Turning tables into a tree sequence
+
+The {meth}`TableCollection.tree_sequence` method will attempt to turn a table collection
+into a tree sequence. This is not guaranteed to work: it's possible you have created a
+nonsensical tree sequence where, for example, a child node has multiple parents at
+a given position. The {ref}`tskit:sec_valid_tree_sequence_requirements` section of the
+official tskit docs lists the requirements; these include the correct order for rows in
+a number of tables. For instance, the sites table is
+{ref}`required<tskit:sec_site_requirements>` to be sorted in order of position. Since
+this is not true of the sites table in the ``new_tables`` collection we have just
+created, we need to {meth}`~TableCollection.sort` the table collection before turning
+it into a tree sequence
+
+```{code-cell} ipython3
+# new_ts = new_tables.tree_sequence()  # This won't work
+new_tables.sort()
+new_ts = new_tables.tree_sequence()  # Now it will
+# Plot without mutation labels, for clarity 
+SVG(new_ts.draw_svg(y_axis=True, y_gridlines=True, mutation_labels={}))
+```
+
+Here you can see that a new empty site has been added at position 10 (represented by a
+tickmark above the X axis with no mutations on it), and all the nodes except node 0
+have had their times increased by 0.25.
+
+### Constructing a tree sequence
+
+With the tools above in mind, we can now see how to construct a tree sequence by hand.
+It's unlikely that you would ever need to do this from scratch, but it's helpful to
+understand how tables work. We'll build the simplest possible tree sequence, a single
+tree that looks like this:
+
+```{code-cell} ipython3
+SVG(tskit.load("data/construction_example.trees").draw_svg(y_axis=True))
+```
+
+Starting with an empty set of tables, we can fill, say, the node information by using
+{meth}`NodeTable.add_row` as follows:
+
+```{code-cell} ipython3
+tables = tskit.TableCollection(sequence_length=1e3)
+node_table = tables.nodes  # set up an alias, for efficiency
+node_table.add_row(flags=tskit.NODE_IS_SAMPLE)  # Node 0: defaults to time 0
+node_table.add_row(flags=tskit.NODE_IS_SAMPLE)  # Node 1: defaults to time 0
+node_table.add_row(time=10)  # Node 2: not a sample
+node_table.add_row(flags=tskit.NODE_IS_SAMPLE)  # Node 3
+node_table.add_row(time=20)  # Node 4
+node_table
+```
+
+The ``.add_row()`` method is natural (and should be reasonably efficient) if
+new records appear one-by-one. Alternatively ``.set_columns()`` can be used to
+set columns for all the rows at once, by passing in numpy arrays of the appropriate
+type. We'll use that for the edges:
+
+```{code-cell} ipython3
+import numpy as np
+edge_table = tables.edges
+edge_table.set_columns(
+    left=np.array([0.0, 0.0, 0.0, 0.0]),
+    right=np.array([1e3, 1e3, 1e3, 1e3]),
+    parent=np.array([2, 2, 4, 4], dtype=np.int32),  # References IDs in the node table
+    child=np.array([0, 1, 2, 3], dtype=np.int32),  # References IDs in the node table
+)
+edge_table
+```
+
+And finally we can add a site and a mutation: here we'll use 0/1 mutations rather than
+ATGC.
+
+```{code-cell} ipython3
+site_id = tables.sites.add_row(position=500.0, ancestral_state='0')
+tables.mutations.add_row(site=site_id, node=2, derived_state='1')
+ts = tables.tree_sequence()
+print("A hand-built tree sequence!")
+SVG(ts.draw_svg(y_axis=True))
+```
+
+:::{note}
+The ordering requirements for a valid tree sequence
+{ref}`do not specify<tskit:sec_node_requirements>` that rows in the
+node table have to be sorted in any particular order, e.g. by time. By convention,
+sample nodes are often the first ones listed in the node table, and this is the node
+order returned by {meth}`~TreeSequence.simplify`, but the example above shows that
+sample nodes need not necessarily be those with the IDs $0..n$.
+:::
+
+(sec_data_structures_editing)=
+
+## Editing tree sequences
+
+Sometimes we wish to make some minor modifications to a tree sequence that has
+been generated by a simulation. However, tree sequence objects are **immutable**
+and so we cannot edit them in place.
+
+
+Several {program}`tskit` methods will return a new tree sequence that has been
+modified in some way. For example:
+- {meth}`TreeSequence.delete_sites` returns a tree sequence with certain sites
+  deleted
+- {meth}`TreeSequence.delete_intervals` and {meth}`TreeSequence.keep_intervals` 
+  return a tree sequence whose trees cover a smaller fraction of the genome (and which can
+  be combined with {meth}`TreeSequence.trim` to change the coordinate system)
+- {meth}`TreeSequence.simplify` returns a new tree sequence with some sample nodes removed
+  (see the {ref}`simplification tutorial<sec_simplification>`).
+- {meth}`TreeSequence.union` returns a new tree sequence formed by merging two other tree
+  sequences together.
+
+However, if you want to do something not covered by those built-in methods, you will
+need to edit a copy of the underlying tables and then create a new tree sequence from
+the modified tables. In the following example, we use this
+approach to remove all singleton sites from a given tree sequence.
+
+```{code-cell} ipython3
+def strip_singletons(ts):
+    tables = ts.dump_tables()
+    tables.sites.clear()
+    tables.mutations.clear()
+    for tree in ts.trees():
+        for site in tree.sites():
+            assert len(site.mutations) == 1  # Only supports infinite sites muts.
+            mut = site.mutations[0]
+            if tree.num_samples(mut.node) > 1:
+                site_id = tables.sites.append(site)
+                mut = mut.replace(site=site_id)  # set the new site id
+                tables.mutations.append(mut)
+    return tables.tree_sequence()
+```
+
+This function takes a tree sequence containing some infinite sites mutations as
+input, and returns a copy in which all singleton sites have been removed.
+The approach is very simple: we get a copy of the underlying
+table data in a {class}`TableCollection` object, and first clear the
+site and mutation tables. We then consider each site in turn,
+and if the number of samples with
+the mutation is greater than one, we add the site and mutation to our
+output tables using {meth}`SiteTable.append` and {meth}`MutationTable.append`. These
+functions act exactly like {meth}`SiteTable.add_row` and {meth}`MutationTable.add_row`
+but they take an existing site or mutation and add all its details as a new row.
+
+
+:::{note}
+In this code we consider only simple infinite sites mutations,
+where we cannot have back or recurrent mutations. These would require a slightly
+more involved approach where we keep a map of mutation IDs so that
+mutation ``parent`` values could be computed.
+:::
+
+After considering each site, we then create a new tree sequence using
+the {meth}`TableCollection.tree_sequence` method on our updated tables.
+We can test this on a tree sequence that has been mutated using
+{func}`msprime.sim_mutations` with the ``discrete_genome``
+parameter set to ``False``, which creates infinite sites mutations:
+
+```{code-cell} ipython3
+import msprime
+
+ts = msprime.sim_ancestry(10, random_seed=123)
+ts = msprime.sim_mutations(ts, rate=10, discrete_genome=False, random_seed=123)
+print(ts.num_sites, "sites in the simulated tree sequence")
+
+ts_new = strip_singletons(ts)
+print(ts_new.num_sites, "sites after removing singletons")
+```
+
+:::{todo}
+Add another example here where we use the array oriented API to edit
+the nodes and edges of a tree sequence. Perhaps recapitating would be a
+good example?
+:::
+
+(sec_data_structures_trees)=
 
 ## Trees
+
+A {class}`Tree` represents a single tree in a {class}`TreeSequence`.
+The {program}`tskit` Tree implementation differs from most tree libraries by
+using **integer IDs** to refer to nodes rather than objects. Thus, when we wish to
+find the parent of the node with ID '0', we use ``tree.parent(0)``, which
+returns another integer. If '0' does not have a parent in the current tree
+(e.g., if it is a root), then the special value {data}`tskit.NULL`
+({math}`-1`) is returned. The children of a node are found using the
+{meth}`Tree.children` method. To obtain information about a particular node,
+one may either use ``tree.tree_sequence.node(u)`` to which returns the
+corresponding {class}`Node` instance, or use the {attr}`Tree.time` or
+{attr}`Tree.population` shorthands.
+
+:::{todo}
+Add an example of using the tree structure. Note that traversing through the
+structure is covered in a different tutorial.
+:::
diff --git a/getting_started.md b/getting_started.md
index 5f6f08ea..929e62b7 100644
--- a/getting_started.md
+++ b/getting_started.md
@@ -56,6 +56,8 @@ ts = msprime.sim_mutations(ts, rate=1e-8, random_seed=4321)
 ts
 ```
 
+You can see that there are many thousands of trees in this tree sequence.
+
 :::{note}
 Since we simulated the ancestry of 20 *diploid* individuals, our tree sequence
 contains 40 *sample nodes*, one for each genome.
@@ -68,25 +70,61 @@ sequence. This process underlies many tree sequence algorithms, including those
 encounter later in this tutorial for calculating 
 {ref}`population genetic statistics<tskit:sec_stats>`.
 To iterate over your own tree sequence you can use
-{meth}`TreeSequence.trees()<tskit.TreeSequence.trees>`. Here, for example, is
-how to check whether all trees in the tree sequence have fully coalesced (which is to be
-expected in reverse-time, coalescent simulations, but not always for tree sequences
-produced by {ref}`forward simulation <sec_tskit_forward_simulations>`).
+{meth}`TreeSequence.trees()<tskit.TreeSequence.trees>`.
+
+```{code-cell} ipython3
+print(f"== Tree sequence has {ts.num_trees} trees ==")
+for tree in ts.trees():
+    print(f"Tree {tree.index} covers {tree.interval}")
+    if tree.index >= 4:
+        print("...")
+        break
+print(f"Tree {ts.last().index} covers {ts.last().interval}")
+```
+
+Here we've also used {meth}`TreeSequence.last()<tskit.TreeSequence.last>` to access
+the last tree directly; it may not surprise you that there's a corresponding
+{meth}`TreeSequence.first()<tskit.TreeSequence.first>` method to return the first tree.
+
+:::{warning}
+The code above doesn't store the tree anywhere, but merely performs a calculation on
+each tree within the ``for`` loop.  That's because, for efficiency, the
+{meth}`trees()<tskit.TreeSequence.trees>` method repeatedly returns the same tree object,
+updated internally to reflect the (usually small) changes from tree to tree along the
+sequence. For that reason, this won't work:
+
+```
+list(ts.trees())  # Don't do this! Every tree in the list will be an identical "null" tree
+```
+
+If you really do want separate instances of each tree (inefficient, and for large tree
+sequences risks using up all your computer memory), you can use the
+{meth}`TreeSequence.aslist()<tskit.TreeSequence.aslist>` method.
+:::
+
+Above, we stopped iterating after Tree 4 to limit the printed output, but iterating
+forwards through trees in a tree sequence (or indeed backwards using the standard Python
+``reversed`` function) is efficient. That means it's quick, for example to check if all
+the trees in a tree sequence have fully coalesced (which is to be expected in
+reverse-time, coalescent simulations, but not always for tree sequences produced by
+{ref}`forward simulation <sec_tskit_forward_simulations>`).
 
 ```{code-cell} ipython3
+import time
+start = time.time()
 for tree in ts.trees():
     if tree.has_multiple_roots:
         print("Tree {tree.index} has not coalesced")
         break
 else:
-    print("All trees in the tree sequence have coalesced")
+    print(f"All {ts.num_trees} trees have coalesced. Checked in {time.time()-start} seconds")
 ```
 
-Since all the trees *have* coalesced, at each position in the genome this tree sequence
-must have only one most recent common ancestor (MRCA) of the 40 sample nodes. Below, we
-iterate over the trees again, finding the IDs of the root (MRCA) node for each tree. The
+Now that we know all trees have coalesced, we know that at each position in the genome
+all the 40 sample nodes must have one most recent common ancestor (MRCA). Below, we
+iterate over the trees, finding the IDs of the root (MRCA) node for each tree. The
 time of this root node can be found via the {meth}`tskit.TreeSequence.node` method, which
-returns a {class}`node object<tskit.Node>` with attributes including the node time:
+returns a {class}`node object<tskit.Node>` whose attributes include the node time:
 
 ```{code-cell} ipython3
 import matplotlib.pyplot as plt
@@ -105,17 +143,21 @@ plt.show()
 ```
 
 It's obvious that there's something unusual about the trees in the middle of this
-chromosome, where the selective sweep occurred. It's not particularly efficient to pull
-out a tree from the middle of a tree sequence, but it *can* be done, via the
-{meth}`TreeSequence.at()<tskit.TreeSequence.at>` method. Here's the tree at the location
-of the sweep, drawn using the  {meth}`Tree.draw_svg()<tskit.Tree.draw_svg>` method
-(see the {ref}`visualization tutorial <sec_tskit_viz>` for more visualization
-possibilities):
+chromosome, where the selective sweep occurred. 
+
+Currently, it's not particularly efficient to pull out individual trees from the middle
+of a tree sequence
+(please comment on [this issue](https://github.com/tskit-dev/tskit/issues/684) if you
+have a need for this to be efficient) but it *can* be done, via the
+{meth}`TreeSequence.at()<tskit.TreeSequence.at>` method. Here's the tree at location
+$5\ 000\ 000$ --- the position of the sweep --- drawn using the 
+{meth}`Tree.draw_svg()<tskit.Tree.draw_svg>` method (see the
+{ref}`visualization tutorial <sec_tskit_viz>` for more visualization possibilities):
 
 ```{code-cell} ipython3
 from IPython.display import SVG
 
-swept_tree = ts.at(5_000_000)  # or you can get e.g. the first tree using ts.first()
+swept_tree = ts.at(5_000_000)  # or you can get e.g. the nth tree using ts.at_index(n)
 intvl = swept_tree.interval
 print(f"Tree number {swept_tree.index}, which runs from position {intvl.left} to {intvl.right}:")
 # Draw it at a wide size, to make room for all 40 tips
diff --git a/requirements.txt b/requirements.txt
index c49632c0..c833bd18 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,4 +3,5 @@ matplotlib
 seaborn
 pandas
 numpy
+networkx
 msprime>=1.0