Skip to content

Commit 31d34a5

Browse files
committed
doc: add external links
1 parent f6f9b3e commit 31d34a5

6 files changed

Lines changed: 67 additions & 45 deletions

File tree

docs/sections/glob.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
11

22
In some specific cases, you may want to load several data files at once, and
3-
merge them in a single table before mapping the data. For instance, "parquet"
3+
merge them in a single table before mapping the data. For instance,
4+
"`parquet <https://parquet.apache.org/>`_"
45
files often come as a set of files.
56

6-
To do so, you can use the "globbing" syntax that you may know from your command
7-
line shell.
7+
To do so, you can use the
8+
"`globbing <https://en.wikipedia.org/wiki/Glob_(programming)>`_" syntax that
9+
you may know from your command line
10+
`shell <https://en.wikipedia.org/wiki/Shell_(computing)>`_.
811

912
For instance, if you want to select all the files ending with the ``.parquet``
1013
extension in the ``my_dir`` directory:

docs/sections/how_to.rst

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ accessing the list of node and edge types:
212212
How to map properties on several nodes of the same type
213213
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
214214

215-
In some cases there might be a need to filter properties of the same ontological type.
215+
In some cases there might be a need to filter properties of the same ontological type.
216216
For example, if you have a table of proteins defining sources and targets of interactions, and you want to have the uniProt IDs as a property of these nodes:
217217

218218
====== ====== ================= =================
@@ -222,7 +222,7 @@ A B uniprot_id_A uniprot_id_B
222222
C A uniprot_id_C uniprot_id_A
223223
====== ====== ================= =================
224224

225-
In a conventional way of mapping, you would map the ``SOURCE`` column to the node type ``protein`` and the ``TARGET`` column to the node type ``protein``.
225+
In a conventional way of mapping, you would map the ``SOURCE`` column to the node type ``protein`` and the ``TARGET`` column to the node type ``protein``.
226226

227227
By default, OntoWeaver will attach properties to all nodes of the same *type*. The ``UNIPROT_ID_SOURCE`` and ``UNIPROT_ID_TARGET`` columns would hence be mapped as properties to the type ``protein``.
228228

@@ -511,11 +511,11 @@ How to load multiple Parquet files?
511511
How to access several keys in nested dictionaries?
512512
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
513513

514-
The *get* transformer allows you to access a value located in nested key-stores.
514+
The *nested* transformer allows you to access a value located in nested key-stores.
515515
But it can only access *one* value.
516516

517517
If you want to access several different keys in the same cell, then you will
518-
have to call the *get* transformer again, with the same first key, but with
518+
have to call the *nested* transformer again, with the same first key, but with
519519
different sequence of keys.
520520

521521
For instance, if you have this data table:
@@ -531,19 +531,19 @@ For instance, if you have this data table:
531531
Then, you will want to access first the column named "WORDS", and the key
532532
named "en" in the nested JSON object.
533533

534-
To do so with *get*, you need to indicate the *sequence* of keys, in the order
534+
To do so with *nested*, you need to indicate the *sequence* of keys, in the order
535535
of the nesting. For instance:
536536

537537
.. code:: yaml
538538
539539
transformers:
540-
- get:
540+
- nested:
541541
keys:
542542
- WORDS
543543
- en
544544
to_object: word # The usual.
545545
via_relation: has_en_translation
546-
- get:
546+
- nested:
547547
keys:
548548
- WORDS
549549
- fr

docs/sections/install.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ You can install the necessary dependencies in a virtual environment like this:
1717
UV will create a virtual environment according to your configuration
1818
(either centrally or in the project folder).
1919
You can then run any script or command using ``uv run``.
20-
For isntance, to run the ontoweave command: ``uv run ontoweave``.
20+
For instance, to run the ontoweave command: ``uv run ontoweave``.
2121

2222

2323
Output Database
@@ -38,10 +38,12 @@ documentation <https://biocypher.org/output/index.html>`__).
3838
Graph visualization with Neo4j
3939
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4040

41-
Neo4j is a popular graph database management system that offers a
42-
flexible and efficient way to store, query, and manipulate complex,
43-
interconnected data. Cypher is the query language used to interact with
44-
Neo4j databases. In order to visualize graphs extracted from databases
41+
`Neo4j <https://neo4j.com>`_ is a popular graph database management system that
42+
offers a flexible and efficient way to store, query, and manipulate complex,
43+
interconnected data.
44+
`Cypher <https://en.wikipedia.org/wiki/Cypher_(query_language)>`_ is the query
45+
language used to interact with Neo4j databases.
46+
In order to visualize graphs extracted from databases
4547
using OntoWeaver and BioCypher, you can download the `Neo4j Graph
4648
Database Self-Managed <https://neo4j.com/deployment-center/>`__ for your
4749
operating system. It has been extensively tested with the Community

docs/sections/intro_SKG.rst

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,9 @@ for you than a relational database:
182182
drastically change the way you query the data.
183183
* Queries can leverage the taxonomy very easily.
184184
* Overall, we find queries on SKG to be more easy to state (especially with
185-
the GQL language) than their counterpart (using SQL, for instance).
185+
the `GQL language <https://www.gqlstandards.org>`_) than their counterpart
186+
(using `SQL <https://fr.wikipedia.org/wiki/Structured_Query_Language>`_,
187+
for instance).
186188

187189
For example, here is a GQL query meaning "show me all the drugs for which there
188190
is a sensitivity to a sleep disorder":
@@ -233,17 +235,18 @@ connected to what other type of node.
233235
To do so, an ontology file is taking a very generic approach: everything there
234236
is modelled as a "triple": a (subject) is linked by a [predicate] to an (object).
235237
This means that *everything in an ontology* is represented by:
236-
``(subject)--[predicate]->(object)``. Nodes are subjects or objects, edges are
238+
``(subject)-[predicate]->(object)``. Nodes are subjects or objects, edges are
237239
predicates. But "node having a type" is also represented as a triple:
238-
``(my node)--[is a]->(my type)``.
240+
``(my node)-[is a]->(my type)``.
239241

240242

241243
What is *Turtle*?
242244
~~~~~~~~~~~~~~~~~
243245

244246
There are several formats with different syntaxes to write down ontology files,
245-
but we will only see here the "turtle" one, which is the most readable by a
246-
human being. In turtle, a triple is written as a "sentence":
247+
but we will only see here the
248+
"`turtle <https://www.w3.org/TeamSubmission/turtle>`_" one, which is the most
249+
readable by a human being. In turtle, a triple is written as a "sentence":
247250

248251
.. code-block:: ttl
249252
@@ -269,16 +272,17 @@ For instance, this is a valid turtle section:
269272
What is *OWL*?
270273
~~~~~~~~~~~~~~
271274

272-
*OWL* is a ---slightly eccentric--- accronym for "Web Ontology Language".
275+
*`OWL <https://www.w3.org/OWL>`_* is a ---slightly eccentric--- accronym for
276+
"Web Ontology Language".
273277
The "language" it refers to is a pre-defined set of predicates and types
274278
(hence the term "vocabulary").
275279
This language defines how to *model* a SKG, using a standardized vocabulary
276280
on which everyone can agree.
277281

278282
But in fact, OWL is built up on top of *two* other standards:
279283

280-
1. the Resource Description Framework (RDF),
281-
2. the RDF Schema (RDFS).
284+
1. the `Resource Description Framework <https://www.w3.org/RDF/>`_ (RDF),
285+
2. the `RDF Schema <https://www.w3.org/TR/rdf12-schema>`_ (RDFS).
282286

283287
Of course, RDFS is a vocabulary that sits on top of RDF, with a carefully chosen
284288
---and not at all confusing--- name.
@@ -375,6 +379,8 @@ Why do I need OntoWeaver to make an SKG?
375379

376380
OntoWeaver is a tool that shines if you need to build up an SKG that:
377381

382+
* has an original graph structure, the one that *you* understands, and the one
383+
that best suits your needs, not the ones of a random computer scientist,
378384
* is integrating several heterogeneous data sources,
379385
* is automatically built, in a reproducible way,
380386
* allows using independent data sources, for which import scripts and taxonomies

docs/sections/iterable_data.rst

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ However, technically, OntoWeaver can consume any iterable data, providing that
1010
it has an "Adapter" class knowing how to do it.
1111

1212
The more generic adapters run a query on a data file, which issue a set of
13-
iterable data. For instance, you can run a XPath query on an XML document,
14-
or a JMESPath query on a JSON file.
13+
iterable data. For instance, you can run a
14+
`XPath <https://www.w3.org/TR/xpath/>`_ query on an XML document,
15+
or a `JMESPath <https://jmespath.org/>`_ query on a JSON file.
1516

1617
For a basic usage through the `ontoweave` command, OntoWeaver will guess the
1718
input data type from the input file extension. Then, it will read the
@@ -40,7 +41,7 @@ csv, tsv, txt, xls, xlsx, xlsm, xlsb, odf, ods, odt, json, html, hdf,
4041
feather, parquet, pickle, orc, sas, spss, stata.
4142

4243
Simple tables being the most common data format, we use it for all examples in
43-
the :ref:`Mapping API` section.
44+
the :ref:`mapping-api` section.
4445

4546

4647
Web Ontology data
@@ -101,7 +102,7 @@ OWL & automap
101102
The simplest way to read the input data from an ontology file is to use
102103
the *automatic* OWL adapter.
103104
This adapter can be used by passing the ``automap`` keyword in place of a mapping
104-
file into the ``ontoweave`` command, or the ``weave`` function:
105+
file into the ``ontoweave`` command, or the :py:func:`ontoweave.weave` function:
105106

106107
.. code:: sh
107108
@@ -110,8 +111,8 @@ file into the ``ontoweave`` command, or the ``weave`` function:
110111
This will automatically map the individuals defined into the input graph found
111112
in the ontology file to the types found in the taxonomy of the *same* ontology
112113
file.
113-
Using this ``OWLAutoAdapter``, you thus don't need to define a mapping, it will
114-
be automatically extracted from the input ontology file.
114+
Using this :py:class:`ontoweaver.owl.OWLAutoAdapter`, you thus don't need to
115+
define a mapping, it will be automatically extracted from the input ontology file.
115116

116117
.. figure:: ../OntoWeaver__owl-automap.svg
117118

@@ -178,7 +179,8 @@ as a subject, and then map "object properties" via a relation.
178179
``owl:DataProperty``.
179180

180181

181-
If you need to call the adapter yourself, use the ``OWLAdapter`` class.
182+
If you need to call the adapter yourself, use the
183+
:py:class:`ontoweaver.owl.OWLAdapter` class.
182184

183185

184186
OWL Example

docs/sections/mapping_api.rst

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,12 @@
1-
Mapping API
2-
-----------
1+
.. _mapping-api:
2+
3+
Writing a mapping
4+
-----------------
35

46
OntoWeaver essentially creates a Biocypher adapter from the description
57
of a mapping from a table to ontology types. As such, its core input is
6-
a dictionary, that takes the form of a YAML file. This configuration
7-
file indicates:
8+
a dictionary, that takes the form of a `YAML <https://yaml.org>`_ file.
9+
This configuration file indicates:
810

911
- to which (node) type to map each line of the table,
1012
- to which (node) type to map columns of the table,
@@ -15,9 +17,11 @@ How are config files related?
1517
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1618

1719
It may be difficult to understand how the type tags indicated in OntoWeaver's
18-
*mapping* are related to the types indicated in BioCypher's *schema*.
20+
*mapping* are related to the types indicated in
21+
`BioCypher's *schema* <https://biocypher.org/BioCypher/reference/schema-config>`_.
1922

20-
In the schema, the header of a block is the RDFS label that lies in the
23+
In the schema, the header of a block is the
24+
`RDFS <https://www.w3.org/TR/rdf12-schema/>`_ label that lies in the
2125
taxonomy of the ontology file, while the ``label_in_input`` is a kind of tag
2226
that is written in the mapping, after a keyword (e.g. ``to_object``).
2327

@@ -67,9 +71,10 @@ For example, if you have the following CSV table of phenotypes/patients:
6771
0,A
6872
1,B
6973

70-
and if you target the Biolink ontology, using a schema configuration
71-
(i.e. subset of types), defined in your ``schema_config.yaml`` file, as
72-
below:
74+
and if you target the
75+
`Biolink ontology <https://biolink.github.io/biolink-model/>`_, using a schema
76+
configuration (i.e. subset of types), defined in your ``schema_config.yaml``
77+
file, as below:
7378

7479
.. code:: yaml
7580
@@ -252,9 +257,12 @@ nested
252257
~~~~~~
253258

254259
The *nested* transformer can access values in nested key-value store.
255-
For instance, if your table cells contains a Python dictionary,
256-
or a Pandas one-dimensional DataFrame, or a flat JSON object string,
257-
*nested* will be able to access a value into it.
260+
For instance, if your table cells contains a
261+
`Python dictionary <https://docs.python.org/3/tutorial/datastructures.html#dictionaries>`_,
262+
or a
263+
`Pandas one-dimensional DataFrame <https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dataframe>`_,
264+
or a flat `JSON object <https://www.json.org>`_ string, *nested* will be able
265+
to access a value into it.
258266

259267
For instance, if your table looks like:
260268

@@ -285,9 +293,10 @@ of the nesting. For instance:
285293
.. note::
286294

287295
The *nested* transformer can detect and parse JSON object notation, but if the
288-
nested cell value is not a string, it will try to access it with the bracket
289-
syntax, e.g. ``value[key]``. This should be enough to allow it to use a large
290-
number of data structures.
296+
nested cell value is not a string, it will try to access it as a Python
297+
variable, using the bracket syntax, e.g. ``value[key]``.
298+
This should be enough to allow it to use a large number of data structures,
299+
providing that they can be accessed with this syntax.
291300

292301

293302
split_nested

0 commit comments

Comments
 (0)