doc: add external links

jdreo · jdreo · commit 31d34a522d3f · 2026-03-04T08:32:09.000+01:00
diff --git a/docs/sections/glob.rst b/docs/sections/glob.rst
@@ -1,10 +1,13 @@
 
 In some specific cases, you may want to load several data files at once, and
-merge them in a single table before mapping the data. For instance, "parquet"
+merge them in a single table before mapping the data. For instance,
+"`parquet <https://parquet.apache.org/>`_"
 files often come as a set of files.
 
-To do so, you can use the "globbing" syntax that you may know from your command
-line shell.
+To do so, you can use the
+"`globbing <https://en.wikipedia.org/wiki/Glob_(programming)>`_" syntax that
+you may know from your command line
+`shell <https://en.wikipedia.org/wiki/Shell_(computing)>`_.
 
 For instance, if you want to select all the files ending with the ``.parquet``
 extension in the ``my_dir`` directory:
diff --git a/docs/sections/how_to.rst b/docs/sections/how_to.rst
@@ -212,7 +212,7 @@ accessing the list of node and edge types:
 How to map properties on several nodes of the same type
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-In some cases there might be a need to filter properties of the same ontological type. 
+In some cases there might be a need to filter properties of the same ontological type.
 For example, if you have a table of proteins defining sources and targets of interactions, and  you want to have the uniProt IDs as a property of these nodes:
 
 ====== ====== ================= =================
@@ -222,7 +222,7 @@ A      B      uniprot_id_A      uniprot_id_B
 C      A      uniprot_id_C      uniprot_id_A
 ====== ====== ================= =================
 
-In a conventional way of mapping, you would map the ``SOURCE`` column to the node type ``protein`` and the ``TARGET`` column to the node type ``protein``. 
+In a conventional way of mapping, you would map the ``SOURCE`` column to the node type ``protein`` and the ``TARGET`` column to the node type ``protein``.
 
 By default, OntoWeaver will attach properties to all nodes of the same *type*. The ``UNIPROT_ID_SOURCE`` and ``UNIPROT_ID_TARGET`` columns would hence be mapped as properties to the type ``protein``.
 
@@ -511,11 +511,11 @@ How to load multiple Parquet files?
 How to access several keys in nested dictionaries?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The *get* transformer allows you to access a value located in nested key-stores.
+The *nested* transformer allows you to access a value located in nested key-stores.
 But it can only access *one* value.
 
 If you want to access several different keys in the same cell, then you will
-have to call the *get* transformer again, with the same first key, but with
+have to call the *nested* transformer again, with the same first key, but with
 different sequence of keys.
 
 For instance, if you have this data table:
@@ -531,19 +531,19 @@ For instance, if you have this data table:
 Then, you will want to access first the column named "WORDS", and the key
 named "en" in the nested JSON object.
 
-To do so with *get*, you need to indicate the *sequence* of keys, in the order
+To do so with *nested*, you need to indicate the *sequence* of keys, in the order
 of the nesting. For instance:
 
 .. code:: yaml
 
     transformers:
-        - get:
+        - nested:
             keys:
                 - WORDS
                 - en
             to_object: word  # The usual.
             via_relation: has_en_translation
-        - get:
+        - nested:
             keys:
                 - WORDS
                 - fr
diff --git a/docs/sections/install.rst b/docs/sections/install.rst
@@ -17,7 +17,7 @@ You can install the necessary dependencies in a virtual environment like this:
 UV will create a virtual environment according to your configuration
 (either centrally or in the project folder).
 You can then run any script or command using ``uv run``.
-For isntance, to run the ontoweave command: ``uv run ontoweave``.
+For instance, to run the ontoweave command: ``uv run ontoweave``.
 
 
 Output Database
@@ -38,10 +38,12 @@ documentation <https://biocypher.org/output/index.html>`__).
 Graph visualization with Neo4j
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Neo4j is a popular graph database management system that offers a
-flexible and efficient way to store, query, and manipulate complex,
-interconnected data. Cypher is the query language used to interact with
-Neo4j databases. In order to visualize graphs extracted from databases
+`Neo4j <https://neo4j.com>`_ is a popular graph database management system that
+offers a flexible and efficient way to store, query, and manipulate complex,
+interconnected data.
+`Cypher <https://en.wikipedia.org/wiki/Cypher_(query_language)>`_ is the query
+language used to interact with Neo4j databases.
+In order to visualize graphs extracted from databases
 using OntoWeaver and BioCypher, you can download the `Neo4j Graph
 Database Self-Managed <https://neo4j.com/deployment-center/>`__ for your
 operating system. It has been extensively tested with the Community
diff --git a/docs/sections/intro_SKG.rst b/docs/sections/intro_SKG.rst
@@ -182,7 +182,9 @@ for you than a relational database:
   drastically change the way you query the data.
 * Queries can leverage the taxonomy very easily.
 * Overall, we find queries on SKG to be more easy to state (especially with
-  the GQL language) than their counterpart (using SQL, for instance).
+  the `GQL language <https://www.gqlstandards.org>`_) than their counterpart
+  (using `SQL <https://fr.wikipedia.org/wiki/Structured_Query_Language>`_,
+  for instance).
 
 For example, here is a GQL query meaning "show me all the drugs for which there
 is a sensitivity to a sleep disorder":
@@ -233,17 +235,18 @@ connected to what other type of node.
 To do so, an ontology file is taking a very generic approach: everything there
 is modelled as a "triple": a (subject) is linked by a [predicate] to an (object).
 This means that *everything in an ontology* is represented by:
-``(subject)--[predicate]->(object)``. Nodes are subjects or objects, edges are
+``(subject)-[predicate]->(object)``. Nodes are subjects or objects, edges are
 predicates. But "node having a type" is also represented as a triple:
-``(my node)--[is a]->(my type)``.
+``(my node)-[is a]->(my type)``.
 
 
 What is *Turtle*?
 ~~~~~~~~~~~~~~~~~
 
 There are several formats with different syntaxes to write down ontology files,
-but we will only see here the "turtle" one, which is the most readable by a
-human being. In turtle, a triple is written as a "sentence":
+but we will only see here the
+"`turtle <https://www.w3.org/TeamSubmission/turtle>`_" one, which is the most
+readable by a human being. In turtle, a triple is written as a "sentence":
 
 .. code-block:: ttl
 
@@ -269,16 +272,17 @@ For instance, this is a valid turtle section:
 What is *OWL*?
 ~~~~~~~~~~~~~~
 
-*OWL* is a ---slightly eccentric--- accronym for "Web Ontology Language".
+*`OWL <https://www.w3.org/OWL>`_* is a ---slightly eccentric--- accronym for
+"Web Ontology Language".
 The "language" it refers to is a pre-defined set of predicates and types
 (hence the term "vocabulary").
 This language defines how to *model* a SKG, using a standardized vocabulary
 on which everyone can agree.
 
 But in fact, OWL is built up on top of *two* other standards:
 
-1. the Resource Description Framework (RDF),
-2. the RDF Schema (RDFS).
+1. the `Resource Description Framework <https://www.w3.org/RDF/>`_ (RDF),
+2. the `RDF Schema <https://www.w3.org/TR/rdf12-schema>`_ (RDFS).
 
 Of course, RDFS is a vocabulary that sits on top of RDF, with a carefully chosen
 ---and not at all confusing--- name.
@@ -375,6 +379,8 @@ Why do I need OntoWeaver to make an SKG?
 
 OntoWeaver is a tool that shines if you need to build up an SKG that:
 
+* has an original graph structure, the one that *you* understands, and the one
+  that best suits your needs, not the ones of a random computer scientist,
 * is integrating several heterogeneous data sources,
 * is automatically built, in a reproducible way,
 * allows using independent data sources, for which import scripts and taxonomies
diff --git a/docs/sections/iterable_data.rst b/docs/sections/iterable_data.rst
@@ -10,8 +10,9 @@ However, technically, OntoWeaver can consume any iterable data, providing that
 it has an "Adapter" class knowing how to do it.
 
 The more generic adapters run a query on a data file, which issue a set of
-iterable data. For instance, you can run a XPath query on an XML document,
-or a JMESPath query on a JSON file.
+iterable data. For instance, you can run a
+`XPath <https://www.w3.org/TR/xpath/>`_ query on an XML document,
+or a `JMESPath <https://jmespath.org/>`_ query on a JSON file.
 
 For a basic usage through the `ontoweave` command, OntoWeaver will guess the
 input data type from the input file extension. Then, it will read the
@@ -40,7 +41,7 @@ csv, tsv, txt,  xls, xlsx, xlsm, xlsb, odf, ods, odt, json, html, hdf,
 feather, parquet, pickle, orc, sas, spss, stata.
 
 Simple tables being the most common data format, we use it for all examples in
-the :ref:`Mapping API` section.
+the :ref:`mapping-api` section.
 
 
 Web Ontology data
@@ -101,7 +102,7 @@ OWL & automap
 The simplest way to read the input data from an ontology file is to use
 the *automatic* OWL adapter.
 This adapter can be used by passing the ``automap`` keyword in place of a mapping
-file into the ``ontoweave`` command, or the ``weave`` function:
+file into the ``ontoweave`` command, or the :py:func:`ontoweave.weave` function:
 
 .. code:: sh
 
@@ -110,8 +111,8 @@ file into the ``ontoweave`` command, or the ``weave`` function:
 This will automatically map the individuals defined into the input graph found
 in the ontology file to the types found in the taxonomy of the *same* ontology
 file.
-Using this ``OWLAutoAdapter``, you thus don't need to define a mapping, it will
-be automatically extracted from the input ontology file.
+Using this :py:class:`ontoweaver.owl.OWLAutoAdapter`, you thus don't need to
+define a mapping, it will be automatically extracted from the input ontology file.
 
 .. figure:: ../OntoWeaver__owl-automap.svg
 
@@ -178,7 +179,8 @@ as a subject, and then map "object properties" via a relation.
    ``owl:DataProperty``.
 
 
-If you need to call the adapter yourself, use the ``OWLAdapter`` class.
+If you need to call the adapter yourself, use the
+:py:class:`ontoweaver.owl.OWLAdapter` class.
 
 
 OWL Example
diff --git a/docs/sections/mapping_api.rst b/docs/sections/mapping_api.rst
@@ -1,10 +1,12 @@
-Mapping API
------------
+.. _mapping-api:
+
+Writing a mapping
+-----------------
 
 OntoWeaver essentially creates a Biocypher adapter from the description
 of a mapping from a table to ontology types. As such, its core input is
-a dictionary, that takes the form of a YAML file. This configuration
-file indicates:
+a dictionary, that takes the form of a `YAML <https://yaml.org>`_ file.
+This configuration file indicates:
 
 - to which (node) type to map each line of the table,
 - to which (node) type to map columns of the table,
@@ -15,9 +17,11 @@ How are config files related?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 It may be difficult to understand how the type tags indicated in OntoWeaver's
-*mapping* are related to the types indicated in BioCypher's *schema*.
+*mapping* are related to the types indicated in
+`BioCypher's *schema* <https://biocypher.org/BioCypher/reference/schema-config>`_.
 
-In the schema, the header of a block is the RDFS label that lies in the
+In the schema, the header of a block is the
+`RDFS <https://www.w3.org/TR/rdf12-schema/>`_ label that lies in the
 taxonomy of the ontology file, while the ``label_in_input`` is a kind of tag
 that is written in the mapping, after a keyword (e.g. ``to_object``).
 
@@ -67,9 +71,10 @@ For example, if you have the following CSV table of phenotypes/patients:
    0,A
    1,B
 
-and if you target the Biolink ontology, using a schema configuration
-(i.e. subset of types), defined in your ``schema_config.yaml`` file, as
-below:
+and if you target the
+`Biolink ontology <https://biolink.github.io/biolink-model/>`_, using a schema
+configuration (i.e. subset of types), defined in your ``schema_config.yaml``
+file, as below:
 
 .. code:: yaml
 
@@ -252,9 +257,12 @@ nested
 ~~~~~~
 
 The *nested* transformer can access values in nested key-value store.
-For instance, if your table cells contains a Python dictionary,
-or a Pandas one-dimensional DataFrame, or a flat JSON object string,
-*nested* will be able to access a value into it.
+For instance, if your table cells contains a
+`Python dictionary <https://docs.python.org/3/tutorial/datastructures.html#dictionaries>`_,
+or a
+`Pandas one-dimensional DataFrame <https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dataframe>`_,
+or a flat `JSON object <https://www.json.org>`_ string, *nested* will be able
+to access a value into it.
 
 For instance, if your table looks like:
 
@@ -285,9 +293,10 @@ of the nesting. For instance:
 .. note::
 
    The *nested* transformer can detect and parse JSON object notation, but if the
-   nested cell value is not a string, it will try to access it with the bracket
-   syntax, e.g. ``value[key]``. This should be enough to allow it to use a large
-   number of data structures.
+   nested cell value is not a string, it will try to access it as a Python
+   variable, using the bracket syntax, e.g. ``value[key]``.
+   This should be enough to allow it to use a large number of data structures,
+   providing that they can be accessed with this syntax.
 
 
 split_nested