Feature: Import Data #217

benloh · 2022-03-03T00:24:37Z

Branch: dev-bl/import

Overview

This adds an Import Data feature to Net.Create.

This is a pretty significant feature that ended up touching ALL aspects of the app. A very thorough QA cycle will be needed.

To Test

To test, first export nodes and edges from an existing graph. (NOTE please create new exports. Do not use exported csv files created previously from the dev-bl/export branch, there were errors in the previous export routine.)

git fetch && git checkout dev-bl/import
./nc.js --dataset=yourdataset
Select "More..."
Select "Import/Export"
Click "Export Nodes"
Click "Export Edges"

Next, try modifying the labels of the exported nodes, and reimport them.

Open the exported nodes csv.
Change the label to something else.
Select "More..."
Select "Import/Export"
Click "Choose File" for nodes.
The "Import" button should appear if the files are valid.
Click "Import"

Next, create a new graph and try importing the nodes and edges.

Open the exported nodes csv file.
Change all the "ids" to "new"
ctrl-c to quit Net.Create.
./nc.js --dataset=empty
Select "More..."
Select "Import/Export"
Click "Choose File" for nodes.
The "Import" button should appear if the files are valid.
Click "Import"
Your original dataset should be recreated

Other things to test:

Try adding new nodes/edges to an existing graph. Any matching ids will be replaced, any ids makred "new" will be newly created.
Try adding ONLY nodes, or ONLY edges. You can add them in any order one at a time.
Open Net.Create remotely. You should not be able to import data.
Open Net.Create remotely with ?admin=true. You should be able to import data.
Edit your existing template, add and set allowLoggedInUserToImport to true. Open Net.Create remotely without admin priviledges. Import is disabled. Log in. Import is now enabled.

Test error checking:

Try importing nodes with bad ids
Try importing edges with bad ids
Try importing edges that link to bad node ids
Try importing nodes/edges that have a missing header field
Try importing nodes/edges that have missing fields in the rows
Try importing nodes/edges that have linefeed characters in side of text
Try import nodes/edges that have other unexpected characters -- please document any bugs that emerge.

Import Feature

Net.Create can import nodes and edges via separate .csv files.

The easiest way to set up a CSV file for import is to first export a few nodes/edges from your existing project. They key is to set up the Template first with the appropriate headers. Then you can export and modify the csv files, then reimport them.

For both nodes and edges, you should be able to:

Add new nodes/edges
Partially replace existing nodes/edges
Partially replace existing nodes/edges AND add new nodes/edges in the same file

Replacing Existing Nodes and Edges

During an import, Net.Create uses node and edge ids to match imported data to existing data.

If the id does not match an existing node or edge id, the app will output an error message listing the problem id and the row of the id. "row" refers to the line number in the csv file. Line 1 is the header. Line 2 would be the first data row.

Adding New Nodes/Edges

To add a new node or edge, you need to use an id of "new". For example, to add "Tacitus" and "Granicus", you would define an import csv file with two rows where the ID is set to "new".

import_node.csv

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
new,"Tacitus",
new,"Granicus"

You can mix "new" and replacements, e.g. this will replace existing node 1 with "Claudius", and add "Tacitus" and "Granicus".

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
1, "Claudius",
new,"Tacitus",
new,"Granicus"

The "new" keyword is case-insensitive. e.g. "NEW", "New", "new", and "nEw" all work.

Defining Import Fields

The fields required by the graph for import are defined by the template. Currently any fields that are defined in the template and NOT HIDDEN will be required when you import. E.g. if you define 6 fields in the template, then the import file MUST have 6 fields with headers that match the exportLabel fields defined in the template, or the importer will complain about missing fields.

Any fields that are marked in the template as hidden will not be exported, nor will they be required for import.

While all non-hidden headers are required in the csv file, you can skip fields in the node/edge data rows. For example, when importing a node, you can just specify "id" and "label" and skip the other fields, e.g.:

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated
1, "Claudius",
new,"Tacitus",,"He's the man!"

It's possible we might want to relax this and just require only the main fields (e.g. id and label with nodes, source target and id with edges). This will give you more flexibility when importing. On the other hand, you can also leave the fields empty, so long as you have the right headers in the csv.

The fields required in the import/export headers are defined via the exportLabel property in templates. The exportLabel will map the headers to built-in Net.Create fields.

You can rename exportLabel fields to match your the fields required/used by your external graph application. For example, if your graph application expects type to be labeled as OPTIONS, you can set the node type exportLabel to OPTIONS. The labels are case sensitive.

Encoding / Special Characters

Since we're using .csv as the export/import data format, there are a few considerations for encoding:

Carriage returns are allowed inside of quotes.
Double Quotes need to be encoded. If you need to use double quotes, use two of them next to each other. Excel should automatically encode quotes as a double quote when exporting to a CSV file. NOTE: An extraneous double quote will probably generate a bad header error during import. Here's an example of a valid and invalid use of quotes:

ID,Label,Type,Notes,Info,Degrees,Created,Last Updated

// valid -- note the correct use of two "" around Egyptians
7,"Alexandria","Place","Alexandria is one of two places where ""Egyptians"" introduced the plague.","","",""

// invalid -- the single quote around Egyptians will cause import to fail, usually with a bad header message
7,"Alexandria","Place","Alexandria is one of two places where "Egyptians" introduced the plague.","","",""

There are probably other exceptions that we'll need to add validation for, especially control characters.

Error Checking

The app provides two levels of validation during import. When you select a file for import, the system will:

Check the file's headers to make sure they match the expected headers as defined in the template. If they do not, an error message is displayed.
Read the data and run simple validation on ids. If it encounters an error, it will display the errors. You can then fix and/or select a new file to import.

Errors Caught:

Headers in import csv do not match headers defined in template. (Hide or remove the field in the template to make it not required for import)
Node or edge uses an invalid id
Edge refers to nonexistent source or target node ids

Errors Not Caught:

If your data values do not match up to the header values, the system will blindly import the values and you might end up with data matched to the wrong header/field.
The row number that an error is reported on can be thrown off if there are carriage returns in quoted text.

Troubleshooting

Oftentimes, bad encoding errors (e.g. mismatched double quotes, extraneous commas, extraneous carriage returns) will result in a Header error. If you see an error about mismatched headers and you know your headers are...

well defined
match the non-hidden headers in the template
...then you might try the following to troubleshoot:

Open the csv in a text editor or Excel to make sure there aren't stray characters.
Open the csv in Excel to make sure each record gets is own row. If not, then you might have some stray carriage returns (usually appearing outside of quoted text).
Try deleting everything but the header and one line of data and see if that imports. If that imports, then the culprit is somewhere within your data encoding.

Import Report

After importing data, the app will display a list of nodes/edges that have been replaced as well as a count of all the nodes/edges that were added or replaced.

Import Permissions

Roles: Admins vs Non-admins

Admins are always allowed to import data.

Non-admin users are normally not allowed to import data. This pull request adds a new option to the Templates to allow logged in users to import data. If the template setting allowLoggedInUserToImport is set to true, then logged in users will be able to import data. By default allowLoggedInUserToImport is false.

Edit States

Since importing modifies the database, during an import, editing the Template and editing individual Nodes and Edges is locked out. Conversely, if someone is editing a Node, Edge, or Template, Importing is locked out. This prevents accidental overwriting of data.

If you navigate away from the Import panel, the import is cancelled. This might be a little surprising and awkward so we might want to revisit this. But there wasn't another clear way to "Cancel" the import lockout.

Standalone Mode

Importing is also disabled in standalone mode, since you are not allowed to modify the database.

Import Backups

Every time you click the "Import" button to import data (nodes or edges), the Net.Create server will make a backup of the current database into the runtime/backups folder before executing the import. The backup file will be named the same as the open database file with a timestamp appended.

If you are running this on a server or using nc-multiplex you'll want to periodically monitor the runtime/backups folder to make sure it does not grow too large. You'll want to periodically clear out the backup files.

If you need to restore a backup, you can copy it to runtime/ and just open it directly via ./nc.js --dataset=xxx call, or rename the database file back to the original name, copy it over the original in runtime/ and open that via ./nc.js.

Admin Tools

"Force Unlock All"

Template editing, importing data, and node and edge editing are all mutually exclusive actions: If someone on the network is doing one of those activities, others are prevented from doing the same. (The one exception is that if you are editing a node or edge, others are only prevented from editing the same node or edge and editing the template or importing, but they can still edit other nodes and edges.) When the edit/import is complete the lock on editing should be released.

Every once in a while, the release message is lost and the edit lock remains in place. If this happens, an administrator can go to the More > Import / Export panel and click the "Force Unlock All" button to release the edit lock and re-enable template editing, importing, and node and edge editing.

WARNING: Use this with utmost caution! If someone is actively editing or importing, you can delete their work, or even worse, corrupt the database!

Other Changes

`revision` update bug

There was a bug where the revision field was either not updated at all, or was being updated only once per session (after a reload), instead of being updated with every database update. It now properly updates every time you edit and save a node or edge.

TOML template file update

The default toml template and schema have been updated. You'll want to review or update your existing template files.

"weight"-ready

While the UI (EdgeEditor) does not currently support it, the app logic now calculates edge line sizes by summing up edge weight values. e.g. if two nodes are connected by three edges with a weight of 1, 2, and 4, the size of the edge will be 6.

weight defaults to 1. It can support values smaller than 1.

weight is not currently settable via the EdgeEditor UI. EdgeTable does not yet display weight either. It is also not saved in the database.

Optimization

We've done a little bit of optimizing the d3 render loop -- node sizes are now calculated before the render and done more efficiently.

Data Model Refinement

This is mostly under the hood stuff, but we have now more formally separated the raw network data from the rendered d3 data. This should make it easier to do future updates.

Force Updates

You might notice graphs look very different. With the refinement of the data model, we updated the way d3 is rendering forces. Hopefully this is an improvement, but we will probably have to do some exploration to make sure all graphs look better.

Node Table ID Sorting

In debug mode, NodeTables show IDs. You can now sort by the ID.

…issing `_nlog`.

…se, radius would not be defined.

…pState can be deregistered.

…DATA handlers so they can be deregistered.

…importing new data.

…t it shown.

…s to the database.

…d3-processed edge objectgs where `source` and `target` have been transformed form `id` to node objects.

…rm and distinguish it from data directly used and modified by D3.

…tate changes.

…ort to avoid extraneous UNISYS calls to nc-logic.

… a debug logger.

* dev-bl/import: import: Fix missing `isBeingEdited` declaration. It got hidden behind a debug logger.

…valid file is automatically removed so it won't be imported.

benloh · 2022-03-25T19:05:33Z

@jdanish @kalanicraig I've completely rewritten the import UI and validation. Now as soon as you select a file, we do both header checking and id validation and report the results immediately. This has a number of implications:

if you are trying to import edges that reference novel node ids in a node import file, you will first have to select the node file to get it to load and validate, then the edges file. If you load the edges file first, the system will not find the novel node ids.
If you define new node ids using the "new" keyword, you will not be able to define edges that link them -- since you have no ids to define the links. I suppose in a future version we can allow linking by source/target labels, but that will add another layer of complexity to an already very complex import system.

Please give it a whirl! Hopefully it's an improvement.

jdanish · 2022-03-25T19:33:03Z

Initial reaction is "sweet!!"

Will bang on it, but looks good to me so far!!

… to edges.source.label and edge.target.label.

benloh · 2022-03-25T21:07:49Z

EdgeTable sorting was broken due to NCDATA changes. Sorting on all fields should now work again.

benloh · 2022-03-25T21:34:44Z

…s when comparing filter values. (Filtering by source/target did not work because they were using 'id' not the 'label')

benloh · 2022-03-25T23:59:34Z

Edge filtering functionality is now restored. NCDATA changes resulted in filters operating on ids rather than labels.

… highlight vs unhighlighted is more pronounced. Otherwise it was never clear which lines were considered unhighlighted, especially for thicker lines.

benloh · 2022-03-29T15:40:15Z

QA

standalone mode works
nc-multiplex works
requiredForImportExport will be implemented next round -- implementing it properly gets very complex as we have to deal with pre-existing fields, field removal vs hiding, and arbitrary field import.

jdanish · 2022-03-29T15:42:03Z

Awesome. Just to be really clear, the comment about "requiredForImportExport" means that in the current version (until we get more funding), if you hide a field via template, it will not export or import. So, anything you want in the graph should be listed as visible for that process.

I think that's fine and mostly users treat them as identical for now, just want to make sure we know. Thanks!

benloh · 2022-03-29T15:53:12Z

Yeah. I had started to implement it last night, but as I sketched things out, I realized it was WAY to complicated if we want to properly handle all the cases. See netcreateorg#32

Think of "hidden" as a way to temporarily turn off fields that you might want to restore later. If you don't need a field at all, you can just not include them in the template.

…DATA now treat edge.source and edge.target as `numbers`.

benloh · 2022-04-11T12:52:33Z

Kalani wrote: I’ve now tested import/export, filtering, standalone and template changes on 4-5 different networks, including a Japanese-language network and a network with markup in the notes, and not had problems with any of them. I’d guess there are still some bugs floating around, but it’s probably time to merge into dev and also maybe to release nc-multiplex.

benloh added 30 commits March 2, 2022 09:18

import: m_forceProperties is a constant.

7c08f96

import: Clarify that 'data' is {nodes, edges}

3023b5c

import: Doc, add debug statements

24dc3ed

import: Add method to clear SVG objects (not currently used)

f6e11d3

import: Remove unused parameter.

43787ea

import: Fix bug where imported data would generate error because of m…

64382b5

…issing `_nlog`.

import: If edges are not defined, use default size for nodes. Otherwi…

9976451

…se, radius would not be defined.

import: When importing data, merge import data into existing data.

3f395dd

import: Only update simulation force if edges are defined.

103bf33

import: Add _HandleFilteredD3DataUpdate method so FILTEREDD3UPDATE Ap…

e9cf8fd

…pState can be deregistered.

import: Add deregister method for d3-simpenetgraph. Add methods for U…

916b249

…DATA handlers so they can be deregistered.

import: Add methods to contruct and destruct d3NetGraph. Needed when …

9f9d097

…importing new data.

import: Gracefully handle no edge data, especially when importing.

c3bea5f

import: Add sort on "ID" button for NodeTable.

5ac7954

import: Comment out "heartbeat" log. Just uncomment if you REALLY wan…

34fed88

…t it shown.

import: Add built-in meta data to schema.

0a5894e

import: Add methods for inserting or updating imported nodes and edge…

d97fa5f

…s to the database.

import: Add UI for selecting and managing import files.

46a842f

import: Load import data.

071c237

import: Add Import message handlers.

b5bf6ea

import: Skip edge filter updates if edges have not been defined.

eae66ae

import: Add CLI debug log commands.

586b406

import: Close "More" panel after importing data.

eca7a7c

import: Properly unmount EdgeEdtior, removing listeners.

ee1dc0b

import: EdgeEditor now formally uses NCDATA (raw data) rather than …

1583b37

…d3-processed edge objectgs where `source` and `target` have been transformed form `id` to node objects.

import: Rename 'D3DATA' to 'NCDATA' to better reflect its raw data fo…

e5f37d9

…rm and distinguish it from data directly used and modified by D3.

import: Rename 'export-logic.js' to 'importexport-logic.js'

80d41df

import: Remove bad debug output.

b36f055

import: Export missing parameter fields as empty ""

a5f27c3

import: nc-logic needs to update its copy of NCDATA when NCDATA app s…

8fcc82b

…tate changes.

benloh added 9 commits March 24, 2022 11:54

import-revamp: Move importexport-logic out of nc-logic into ImportExp…

99d62a3

…ort to avoid extraneous UNISYS calls to nc-logic.

import: Fix missing isBeingEdited declaration. It got hidden behind…

2ffd2b8

… a debug logger.

import: Fix missing isBeingEdited declaration. It got hidden behind…

35d905b

… a debug logger.

import-revamp: Complete refactor of import. Streamline UI management.

4d9e6be

* dev-bl/import: import: Fix missing `isBeingEdited` declaration. It got hidden behind a debug logger.

import-revamp: Lint/Remove debug.

fdb713a

import-revamp: Improve messaging language.

ee531e7

import-revamp: Enable "Import" even if only one file is valid. Any in…

38fd81c

…valid file is automatically removed so it won't be imported.

import-revamp: Clarify whether import file will be imported.

7efdea0

import-revamp: Clear messages and reset "Import" button after import.

e2710b3

import-revamp: Fix EdgeTable sorting. NCDATA changes broke references…

875de81

… to edges.source.label and edge.target.label.

import-revamp: Fix edge filtering to look up source/target node label…

3b6767f

…s when comparing filter values. (Filtering by source/target did not work because they were using 'id' not the 'label')

This was referenced Mar 26, 2022

EdgeTable entries use defaultTransparency even when filters are off #197

Closed

Handling Transparency #199

Closed

benloh added 2 commits March 26, 2022 10:31

import: Tweak edge transparency values so that the difference between…

a6eae84

… highlight vs unhighlighted is more pronounced. Otherwise it was never clear which lines were considered unhighlighted, especially for thicker lines.

import: Doc

a098792

benloh marked this pull request as ready for review March 29, 2022 15:21

benloh added 2 commits March 29, 2022 14:21

import: Doc import, templates

379c005

import: Fix deprecated edge references to source.id and target.id. NC…

93bc853

…DATA now treat edge.source and edge.target as `numbers`.

benloh mentioned this pull request Mar 31, 2022

Delete node causes error #235

Closed

benloh merged commit 5d2359d into dev Apr 11, 2022

benloh deleted the dev-bl/import branch May 28, 2023 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Import Data #217

Feature: Import Data #217

Uh oh!

benloh commented Mar 3, 2022 •

edited

Loading

Uh oh!

benloh commented Mar 25, 2022

Uh oh!

jdanish commented Mar 25, 2022

Uh oh!

benloh commented Mar 25, 2022

Uh oh!

benloh commented Mar 25, 2022 •

edited

Loading

Uh oh!

benloh commented Mar 25, 2022

Uh oh!

benloh commented Mar 29, 2022

Uh oh!

jdanish commented Mar 29, 2022

Uh oh!

benloh commented Mar 29, 2022

Uh oh!

benloh commented Apr 11, 2022

Uh oh!

Uh oh!

Feature: Import Data #217

Feature: Import Data #217

Uh oh!

Conversation

benloh commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

To Test

Import Feature

Replacing Existing Nodes and Edges

Adding New Nodes/Edges

Defining Import Fields

Encoding / Special Characters

Error Checking

Import Report

Import Permissions

Roles: Admins vs Non-admins

Edit States

Standalone Mode

Import Backups

Admin Tools

"Force Unlock All"

Other Changes

revision update bug

TOML template file update

"weight"-ready

Optimization

Data Model Refinement

Force Updates

Node Table ID Sorting

Uh oh!

benloh commented Mar 25, 2022

Uh oh!

jdanish commented Mar 25, 2022

Uh oh!

benloh commented Mar 25, 2022

Uh oh!

benloh commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To Do

Uh oh!

benloh commented Mar 25, 2022

Uh oh!

benloh commented Mar 29, 2022

Uh oh!

jdanish commented Mar 29, 2022

Uh oh!

benloh commented Mar 29, 2022

Uh oh!

benloh commented Apr 11, 2022

Uh oh!

Uh oh!

benloh commented Mar 3, 2022 •

edited

Loading

`revision` update bug

benloh commented Mar 25, 2022 •

edited

Loading