-
Notifications
You must be signed in to change notification settings - Fork 46
Direct mapping vs. CSV2RDF, problematic case with foreign keys
Consider three simple tables, called People, Addresses, and Departments respectively (see the original example in the Direct Mapping document):
| ID | fname | addr | deptName | deptCity |
|---|---|---|---|---|
| 7 | Bob | 18 | accounting | Cambridge |
| ID | city | state |
|---|---|---|
| 18 | Cambridge | MA |
| ID | name | city |
|---|---|---|
| 23 | accounting | Cambridge |
A relational table always has a schema, that the RDF Direct Mapping makes use of. In this example, the schema defines that:
- For the
Peopletable-
IDis a primary key in the table, and the cells are integers - The
namecolumn contains strings - The
addrcolumn contains integers
-
- For the
Addressestable-
IDis primary key in the table, and the cells are integers - both the
cityandstatecolumns contain strings
-
- For the
Departmentstable-
IDis primary key in the table, and the cells are integers - both the
cityandstatecolumns contain strings - both the
nameandcitycolumns are strings
-
- The
addrcolumn in thePeopletable contains foreign keys that references theIDfield of theAddressestable. - The
deptNameanddeptCityin thePeopletable contain joint foreign keys referencing candidate keys for thenameandcitypair of theDepartmenttable as unique keys
The generated RDF Graph is as follows:
<http://foo.example/DB/People/ID=7> rdf:type <http://foo.example/DB/People>;
<http://foo.example/DB/People/#ID> 7;
<http://foo.example/DB/People/#fname> "Bob";
<http://foo.example/DB/People/#addr> 18;
<http://foo.example/DB/People/#ref-addr> <http://foo.example/DB/Addresses/ID=18>;
<http://foo.example/DB/People/#deptName> "accounting";
<http://foo.example/DB/People/#deptCity> "Cambridge";
<http://foo.example/DB/People/#ref-deptName;deptCity> <http://foo.example/DB/Department/ID=23>;
.
<http://foo.example/DB/Addresses/ID=18> rdf:type <http://foo.example/DB/Addresses>.
<http://foo.example/DB/Addresses/#ID> 18;
<http://foo.example/DB/Addresses/#city> "Cambridge";
<http://foo.example/DB/Addresses/#addr> "MA"
.
<http://foo.example/DB/Departments/ID=23> rdf:type <http://foo.example/DB/Addresses>.
<http://foo.example/DB/Departments/#ID> 23;
<http://foo.example/DB/Departments/#name> "accounting";
<http://foo.example/DB/Departments/#city> "Cambridge";
<http://foo.example/DB/Departments/#manager> 8;
<http://foo.example/DB/Departments/#ref-manager> <http://foo.example/DB/People/ID=8>
.
The major difference, compared to the simpler example with foreign keys is the usage of the unique keys. To generate the right object URI in the RDF graph (i.e., http://foo.example/DB/Department/ID=23) the RDF Direct Mapping processor has to find that unique key combination in the target table (Department) and, using that combination, has to establish the subject for that specific row (http://foo.example/DB/Departments/ID=23) which can then be used as an object URL. This can be done because the RDF Direct Mapping processor has access to several tables within the same Relation Database. If these tables are related through foreign and unique candidate keys, it is therefore straightforward to access all the tables in parallel and establish the relationships to generate the triples.
However, this combination cannot be handled in the CSVW Metadata. The situation for CSV files is indeed different: CSV tables are typically single and, potentially, very large files, meaning that a CSV processor cannot be expected to handle several CSV tables in parallel. That is the reason why the Metadata specification does not introduce features that would force to access several tables randomly; this also means that such an RDF mapping cannot be reproduced in CSV2RDF.