-
Notifications
You must be signed in to change notification settings - Fork 3
Feature: Import Data #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Import Data #217
Conversation
…se, radius would not be defined.
…pState can be deregistered.
…DATA handlers so they can be deregistered.
…importing new data.
…s to the database.
…d3-processed edge objectgs where `source` and `target` have been transformed form `id` to node objects.
…rm and distinguish it from data directly used and modified by D3.
…ort to avoid extraneous UNISYS calls to nc-logic.
… a debug logger.
… a debug logger.
* dev-bl/import: import: Fix missing `isBeingEdited` declaration. It got hidden behind a debug logger.
…valid file is automatically removed so it won't be imported.
@jdanish @kalanicraig I've completely rewritten the import UI and validation. Now as soon as you select a file, we do both header checking and id validation and report the results immediately. This has a number of implications:
Please give it a whirl! Hopefully it's an improvement. |
Initial reaction is "sweet!!" Will bang on it, but looks good to me so far!! |
… to edges.source.label and edge.target.label.
EdgeTable sorting was broken due to NCDATA changes. Sorting on all fields should now work again. |
To Do
|
…s when comparing filter values. (Filtering by source/target did not work because they were using 'id' not the 'label')
Edge filtering functionality is now restored. NCDATA changes resulted in filters operating on ids rather than labels. |
… highlight vs unhighlighted is more pronounced. Otherwise it was never clear which lines were considered unhighlighted, especially for thicker lines.
QA
|
Awesome. Just to be really clear, the comment about "requiredForImportExport" means that in the current version (until we get more funding), if you hide a field via template, it will not export or import. So, anything you want in the graph should be listed as visible for that process. I think that's fine and mostly users treat them as identical for now, just want to make sure we know. Thanks! |
Yeah. I had started to implement it last night, but as I sketched things out, I realized it was WAY to complicated if we want to properly handle all the cases. See netcreateorg#32 Think of "hidden" as a way to temporarily turn off fields that you might want to restore later. If you don't need a field at all, you can just not include them in the template. |
…DATA now treat edge.source and edge.target as `numbers`.
Kalani wrote: I’ve now tested import/export, filtering, standalone and template changes on 4-5 different networks, including a Japanese-language network and a network with markup in the notes, and not had problems with any of them. I’d guess there are still some bugs floating around, but it’s probably time to merge into dev and also maybe to release nc-multiplex. |
Branch:
dev-bl/import
Overview
This adds an Import Data feature to Net.Create.
This is a pretty significant feature that ended up touching ALL aspects of the app. A very thorough QA cycle will be needed.
To Test
To test, first export nodes and edges from an existing graph. (NOTE please create new exports. Do not use exported csv files created previously from the
dev-bl/export
branch, there were errors in the previous export routine.)git fetch && git checkout dev-bl/import
./nc.js --dataset=yourdataset
Next, try modifying the labels of the exported nodes, and reimport them.
Next, create a new graph and try importing the nodes and edges.
"new"
ctrl-c
to quit Net.Create../nc.js --dataset=empty
Other things to test:
?admin=true
. You should be able to import data.allowLoggedInUserToImport
to true. Open Net.Create remotely without admin priviledges. Import is disabled. Log in. Import is now enabled.Test error checking:
Import Feature
Net.Create can import nodes and edges via separate
.csv
files.The easiest way to set up a CSV file for import is to first export a few nodes/edges from your existing project. They key is to set up the Template first with the appropriate headers. Then you can export and modify the csv files, then reimport them.
For both nodes and edges, you should be able to:
Replacing Existing Nodes and Edges
During an import, Net.Create uses node and edge ids to match imported data to existing data.
If the id does not match an existing node or edge id, the app will output an error message listing the problem id and the row of the id. "row" refers to the line number in the csv file. Line 1 is the header. Line 2 would be the first data row.
Adding New Nodes/Edges
To add a new node or edge, you need to use an id of "new". For example, to add "Tacitus" and "Granicus", you would define an import csv file with two rows where the ID is set to "new".
import_node.csv
You can mix "new" and replacements, e.g. this will replace existing node 1 with "Claudius", and add "Tacitus" and "Granicus".
The "new" keyword is case-insensitive. e.g. "NEW", "New", "new", and "nEw" all work.
Defining Import Fields
The fields required by the graph for import are defined by the template. Currently any fields that are defined in the template and NOT HIDDEN will be required when you import. E.g. if you define 6 fields in the template, then the import file MUST have 6 fields with headers that match the
exportLabel
fields defined in the template, or the importer will complain about missing fields.Any fields that are marked in the template as hidden will not be exported, nor will they be required for import.
While all non-hidden headers are required in the csv file, you can skip fields in the node/edge data rows. For example, when importing a node, you can just specify "id" and "label" and skip the other fields, e.g.:
It's possible we might want to relax this and just require only the main fields (e.g. id and label with nodes, source target and id with edges). This will give you more flexibility when importing. On the other hand, you can also leave the fields empty, so long as you have the right headers in the csv.
The fields required in the import/export headers are defined via the
exportLabel
property in templates. TheexportLabel
will map the headers to built-in Net.Create fields.You can rename
exportLabel
fields to match your the fields required/used by your external graph application. For example, if your graph application expectstype
to be labeled asOPTIONS
, you can set the nodetype
exportLabel toOPTIONS
. The labels are case sensitive.Encoding / Special Characters
Since we're using
.csv
as the export/import data format, there are a few considerations for encoding:There are probably other exceptions that we'll need to add validation for, especially control characters.
Error Checking
The app provides two levels of validation during import. When you select a file for import, the system will:
Errors Caught:
id
Errors Not Caught:
row
number that an error is reported on can be thrown off if there are carriage returns in quoted text.Troubleshooting
Oftentimes, bad encoding errors (e.g. mismatched double quotes, extraneous commas, extraneous carriage returns) will result in a Header error. If you see an error about mismatched headers and you know your headers are...
...then you might try the following to troubleshoot:
Import Report
After importing data, the app will display a list of nodes/edges that have been replaced as well as a count of all the nodes/edges that were added or replaced.
Import Permissions
Roles: Admins vs Non-admins
Admins are always allowed to import data.
Non-admin users are normally not allowed to import data. This pull request adds a new option to the Templates to allow logged in users to import data. If the template setting
allowLoggedInUserToImport
is set totrue
, then logged in users will be able to import data. By defaultallowLoggedInUserToImport
isfalse
.Edit States
Since importing modifies the database, during an import, editing the Template and editing individual Nodes and Edges is locked out. Conversely, if someone is editing a Node, Edge, or Template, Importing is locked out. This prevents accidental overwriting of data.
If you navigate away from the Import panel, the import is cancelled. This might be a little surprising and awkward so we might want to revisit this. But there wasn't another clear way to "Cancel" the import lockout.
Standalone Mode
Importing is also disabled in standalone mode, since you are not allowed to modify the database.
Import Backups
Every time you click the "Import" button to import data (nodes or edges), the Net.Create server will make a backup of the current database into the
runtime/backups
folder before executing the import. The backup file will be named the same as the open database file with a timestamp appended.If you are running this on a server or using
nc-multiplex
you'll want to periodically monitor theruntime/backups
folder to make sure it does not grow too large. You'll want to periodically clear out the backup files.If you need to restore a backup, you can copy it to
runtime/
and just open it directly via./nc.js --dataset=xxx
call, or rename the database file back to the original name, copy it over the original inruntime/
and open that via./nc.js
.Admin Tools
"Force Unlock All"
Template editing, importing data, and node and edge editing are all mutually exclusive actions: If someone on the network is doing one of those activities, others are prevented from doing the same. (The one exception is that if you are editing a node or edge, others are only prevented from editing the same node or edge and editing the template or importing, but they can still edit other nodes and edges.) When the edit/import is complete the lock on editing should be released.
Every once in a while, the release message is lost and the edit lock remains in place. If this happens, an administrator can go to the More > Import / Export panel and click the "Force Unlock All" button to release the edit lock and re-enable template editing, importing, and node and edge editing.
WARNING: Use this with utmost caution! If someone is actively editing or importing, you can delete their work, or even worse, corrupt the database!
Other Changes
revision
update bugThere was a bug where the
revision
field was either not updated at all, or was being updated only once per session (after a reload), instead of being updated with every database update. It now properly updates every time you edit and save a node or edge.TOML template file update
The default toml template and schema have been updated. You'll want to review or update your existing template files.
"weight"-ready
While the UI (EdgeEditor) does not currently support it, the app logic now calculates edge line sizes by summing up edge
weight
values. e.g. if two nodes are connected by three edges with a weight of 1, 2, and 4, the size of the edge will be 6.weight
defaults to 1. It can support values smaller than 1.weight
is not currently settable via the EdgeEditor UI. EdgeTable does not yet displayweight
either. It is also not saved in the database.Optimization
We've done a little bit of optimizing the d3 render loop -- node sizes are now calculated before the render and done more efficiently.
Data Model Refinement
This is mostly under the hood stuff, but we have now more formally separated the raw network data from the rendered d3 data. This should make it easier to do future updates.
Force Updates
You might notice graphs look very different. With the refinement of the data model, we updated the way d3 is rendering forces. Hopefully this is an improvement, but we will probably have to do some exploration to make sure all graphs look better.
Node Table ID Sorting
In debug mode, NodeTables show IDs. You can now sort by the ID.