Skip to content

Import Data

Flynn Duniho edited this page May 29, 2024 · 3 revisions

Net.Create can import nodes and edges via separate .csv files.

The easiest way to set up a CSV file for import is to first export a few nodes/edges from your existing project. They key is to set up the Template first with the appropriate headers. Then you can export and modify the csv files, then reimport them.

For both nodes and edges, you should be able to:

  • Add new nodes/edges
  • Partially replace existing nodes/edges
  • Partially replace existing nodes/edges AND add new nodes/edges in the same file

Replacing Existing Nodes and Edges

During an import, Net.Create uses node and edge ids to match imported data to existing data.

If the id does not match an existing node or edge id, the app will output an error message listing the problem id and the row of the id. "row" refers to the line number in the csv file. Line 1 is the header. Line 2 would be the first data row.

Adding New Nodes/Edges

To add a new node or edge, you need to use an id of "new". For example, to add "Tacitus" and "Granicus", you would define an import csv file with two rows where the ID is set to "new".

import_node.csv

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
new,"Tacitus",
new,"Granicus"

You can mix "new" and replacements, e.g. this will replace existing node 1 with "Claudius", and add "Tacitus" and "Granicus".

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
1, "Claudius",
new,"Tacitus",
new,"Granicus"

The "new" keyword is case-insensitive. e.g. "NEW", "New", "new", and "nEw" all work.

Defining Import Fields

The fields required by the graph for import are defined by the template. Currently any fields that are defined in the template and NOT HIDDEN will be required when you import. E.g. if you define 6 fields in the template, then the import file MUST have 6 fields with headers that match the exportLabel fields defined in the template, or the importer will complain about missing fields.

Any fields that are marked in the template as hidden will not be exported, nor will they be required for import.

While all non-hidden headers are required in the csv file, you can skip fields in the node/edge data rows. For example, when importing a node, you can just specify "id" and "label" and skip the other fields, e.g.:

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
1, "Claudius",
new,"Tacitus",,"He's the man!"

It's possible we might want to relax this and just require only the main fields (e.g. id and label with nodes, source target and id with edges). This will give you more flexibility when importing. On the other hand, you can also leave the fields empty, so long as you have the right headers in the csv.

The fields required in the import/export headers are defined via the exportLabel property in templates. The exportLabel will map the headers to built-in Net.Create fields.

You can rename exportLabel fields to match the fields required/used by your external graph application. For example, if your graph application expects type to be labeled as OPTIONS, you can set the node type exportLabel to OPTIONS. The labels are case sensitive.

Encoding and Special Characters

Since we're using .csv as the export/import data format, there are a few considerations for encoding:

  • Carriage returns are allowed inside of quotes.
  • Double Quotes need to be encoded. If you need to use double quotes, use two of them next to each other. Excel should automatically encode quotes as a double quote when exporting to a CSV file. NOTE: An extraneous double quote will probably generate a bad header error during import. Here's an example of a valid and invalid use of quotes:
ID,Label,Type,Notes,Info,Degrees,Created,Last Updated

// valid -- note the correct use of two "" around Egyptians
7,"Alexandria","Place","Alexandria is one of two places where ""Egyptians"" introduced the plague.","","",""

// invalid -- the single quote around Egyptians will cause import to fail, usually with a bad header message
7,"Alexandria","Place","Alexandria is one of two places where "Egyptians" introduced the plague.","","",""

There are probably other exceptions that we'll need to add validation for, especially control characters.

Error Checking

The app provides two levels of validation during import. When you select a file for import, the system will:

  1. Check the file's headers to make sure they match the expected headers as defined in the template. If they do not, an error message is displayed.
  2. Read the data and run simple validation on ids. If it encounters an error, it will display the errors. You can then fix and/or select a new file to import.

Errors Caught:

  • Headers in import csv do not match headers defined in template. (Hide or remove the field in the template to make it not required for import)
  • Node or edge uses an invalid id
  • Edge refers to nonexistent source or target node ids

Errors Not Caught:

  • If your data values do not match up to the header values, the system will blindly import the values and you might end up with data matched to the wrong header/field, e.g. "Date" might get matched to "Label".
  • The row number that an error is reported on can be thrown off if there are carriage returns in quoted text.

Troubleshooting

Oftentimes, bad encoding errors (e.g. mismatched double quotes, extraneous commas, extraneous carriage returns) will result in a Header error. If you see an error about mismatched headers and you know your headers are...

  1. well defined
  2. match the non-hidden headers in the template ...then you might try the following to troubleshoot:
  • Open the csv in a text editor or Excel to make sure there aren't stray characters.
  • Open the csv in Excel to make sure each record gets is own row. If not, then you might have some stray carriage returns (usually appearing outside of quoted text).
  • Try deleting everything but the header and one line of data and see if that imports. If that imports, then the culprit is somewhere within your data encoding.

Import Report

After importing data, the app will display a list of nodes/edges that have been replaced as well as a count of all the nodes/edges that were added or replaced.

Import Permissions

The ability to import is restricted to prevent accidental or unintended imports.

Roles: Admins vs Non-admins

Admins are always allowed to import data.

Non-admin users are normally not allowed to import data. However, there is a Template setting to allow logged in users to import data. If the template setting allowLoggedInUserToImport is set to true, then logged in users will be able to import data. By default allowLoggedInUserToImport is false.

Edit Lock During Import

Since importing modifies the database, during an import, editing the Template and editing individual Nodes and Edges are locked out. Conversely, if someone is editing a Node, Edge, or Template, Importing is locked out. This prevents accidental overwriting of data.

If you navigate away from the Import panel, the import is cancelled, and the edit lock is released. This might be a little surprising and awkward so we might want to revisit this. But there wasn't another clear way to "Cancel" the import lockout.

Standalone Mode

Importing is also disabled in standalone mode, since you are not allowed to modify the database.

Import Backups

Every time you click the "Import" button to import data (nodes or edges), the Net.Create server will make a backup of the current database into the runtime/backups folder before executing the import. The backup file will be named the same as the open database file with a timestamp appended.

If you are running this on a server or using nc-multiplex you'll want to periodically monitor the runtime/backups folder to make sure it does not grow too large. You'll want to periodically clear out the backup files.

If you need to restore a backup, you can copy it to runtime/ and just open it directly via ./nc.js --dataset=xxx call, or rename the database file back to the original name, copy it over the original in runtime/ and open that via ./nc.js.

Admin Tools

"Force Unlock All"

Template editing, importing data, and node and edge editing are all mutually exclusive actions: If someone on the network is doing one of those activities, others are prevented from doing the same. (The one exception is that if you are editing a node or edge, others are only prevented from editing the same node or edge and editing the template or importing, but they can still edit other nodes and edges.) When the edit/import is complete the lock on editing should be released.

Every once in a while, the release message is lost and the edit lock remains in place. If this happens, an administrator can go to the More > Import / Export panel and click the "Force Unlock All" button to release the edit lock and re-enable template editing, importing, and node and edge editing.

WARNING: Use this with utmost caution! If someone is actively editing or importing, you can delete their work, or even worse, corrupt the database!


For more information, see #217.

Clone this wiki locally