-
Notifications
You must be signed in to change notification settings - Fork 1
Import Data
Net.Create can import nodes and edges via separate .csv
files.
The easiest way to set up a CSV file for import is to first export a few nodes/edges from your existing project. They key is to set up the Template first with the appropriate headers. Then you can export and modify the csv files, then reimport them.
For both nodes and edges, you should be able to:
- Add new nodes/edges
- Partially replace existing nodes/edges
- Partially replace existing nodes/edges AND add new nodes/edges in the same file
During an import, Net.Create uses node and edge ids to match imported data to existing data.
If the id does not match an existing node or edge id, the app will output an error message listing the problem id and the row of the id. "row" refers to the line number in the csv file. Line 1 is the header. Line 2 would be the first data row.
To add a new node or edge, you need to use an id of "new". For example, to add "Tacitus" and "Granicus", you would define an import csv file with two rows where the ID is set to "new".
import_node.csv
ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
new,"Tacitus",
new,"Granicus"
You can mix "new" and replacements, e.g. this will replace existing node 1 with "Claudius", and add "Tacitus" and "Granicus".
ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
1, "Claudius",
new,"Tacitus",
new,"Granicus"
The "new" keyword is case-insensitive. e.g. "NEW", "New", "new", and "nEw" all work.
The fields required by the graph for import are defined by the template. Currently any fields that are defined in the template and NOT HIDDEN will be required when you import. E.g. if you define 6 fields in the template, then the import file MUST have 6 fields with headers that match the exportLabel
fields defined in the template, or the importer will complain about missing fields.
Any fields that are marked in the template as hidden will not be exported, nor will they be required for import.
While all non-hidden headers are required in the csv file, you can skip fields in the node/edge data rows. For example, when importing a node, you can just specify "id" and "label" and skip the other fields, e.g.:
ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
1, "Claudius",
new,"Tacitus",,"He's the man!"
It's possible we might want to relax this and just require only the main fields (e.g. id and label with nodes, source target and id with edges). This will give you more flexibility when importing. On the other hand, you can also leave the fields empty, so long as you have the right headers in the csv.
The fields required in the import/export headers are defined via the exportLabel
property in templates. The exportLabel
will map the headers to built-in Net.Create fields.
You can rename exportLabel
fields to match the fields required/used by your external graph application. For example, if your graph application expects type
to be labeled as OPTIONS
, you can set the node type
exportLabel to OPTIONS
. The labels are case sensitive.
Since we're using .csv
as the export/import data format, there are a few considerations for encoding:
- Carriage returns are allowed inside of quotes.
- Double Quotes need to be encoded. If you need to use double quotes, use two of them next to each other. Excel should automatically encode quotes as a double quote when exporting to a CSV file. NOTE: An extraneous double quote will probably generate a bad header error during import. Here's an example of a valid and invalid use of quotes:
ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
// valid -- note the correct use of two "" around Egyptians
7,"Alexandria","Place","Alexandria is one of two places where ""Egyptians"" introduced the plague.","","",""
// invalid -- the single quote around Egyptians will cause import to fail, usually with a bad header message
7,"Alexandria","Place","Alexandria is one of two places where "Egyptians" introduced the plague.","","",""
There are probably other exceptions that we'll need to add validation for, especially control characters.
The app provides two levels of validation during import. When you select a file for import, the system will:
- Check the file's headers to make sure they match the expected headers as defined in the template. If they do not, an error message is displayed.
- Read the data and run simple validation on ids. If it encounters an error, it will display the errors. You can then fix and/or select a new file to import.
Errors Caught:
- Headers in import csv do not match headers defined in template. (Hide or remove the field in the template to make it not required for import)
- Node or edge uses an invalid
id
- Edge refers to nonexistent source or target node ids
Errors Not Caught:
- If your data values do not match up to the header values, the system will blindly import the values and you might end up with data matched to the wrong header/field, e.g. "Date" might get matched to "Label".
- The
row
number that an error is reported on can be thrown off if there are carriage returns in quoted text.
Troubleshooting
Oftentimes, bad encoding errors (e.g. mismatched double quotes, extraneous commas, extraneous carriage returns) will result in a Header error. If you see an error about mismatched headers and you know your headers are...
- well defined
- match the non-hidden headers in the template ...then you might try the following to troubleshoot:
- Open the csv in a text editor or Excel to make sure there aren't stray characters.
- Open the csv in Excel to make sure each record gets is own row. If not, then you might have some stray carriage returns (usually appearing outside of quoted text).
- Try deleting everything but the header and one line of data and see if that imports. If that imports, then the culprit is somewhere within your data encoding.
After importing data, the app will display a list of nodes/edges that have been replaced as well as a count of all the nodes/edges that were added or replaced.
The ability to import is restricted to prevent accidental or unintended imports.
Admins are always allowed to import data.
Non-admin users are normally not allowed to import data. However, there is a Template setting to allow logged in users to import data. If the template setting allowLoggedInUserToImport
is set to true
, then logged in users will be able to import data. By default allowLoggedInUserToImport
is false
.
Since importing modifies the database, during an import, editing the Template and editing individual Nodes and Edges are locked out. Conversely, if someone is editing a Node, Edge, or Template, Importing is locked out. This prevents accidental overwriting of data.
If you navigate away from the Import panel, the import is cancelled, and the edit lock is released. This might be a little surprising and awkward so we might want to revisit this. But there wasn't another clear way to "Cancel" the import lockout.
Importing is also disabled in standalone mode, since you are not allowed to modify the database.
Every time you click the "Import" button to import data (nodes or edges), the Net.Create server will make a backup of the current database into the runtime/backups
folder before executing the import. The backup file will be named the same as the open database file with a timestamp appended.
If you are running this on a server or using nc-multiplex
you'll want to periodically monitor the runtime/backups
folder to make sure it does not grow too large. You'll want to periodically clear out the backup files.
If you need to restore a backup, you can copy it to runtime/
and just open it directly via ./nc.js --dataset=xxx
call, or rename the database file back to the original name, copy it over the original in runtime/
and open that via ./nc.js
.
Template editing, importing data, and node and edge editing are all mutually exclusive actions: If someone on the network is doing one of those activities, others are prevented from doing the same. (The one exception is that if you are editing a node or edge, others are only prevented from editing the same node or edge and editing the template or importing, but they can still edit other nodes and edges.) When the edit/import is complete the lock on editing should be released.
Every once in a while, the release message is lost and the edit lock remains in place. If this happens, an administrator can go to the More > Import / Export panel and click the "Force Unlock All" button to release the edit lock and re-enable template editing, importing, and node and edge editing.
WARNING: Use this with utmost caution! If someone is actively editing or importing, you can delete their work, or even worse, corrupt the database!
For more information, see #217.