Open
Description
Hi there,
I imported the musicbrainz database to Neo4j using the following approach, helped by @jexp:
Define 2 indexes (one mbid
exact, for MBIDs and one mb
fulltext, for everything else) in batch.properties:
batch_import.keep_db=false
batch_import.mapdb_cache.disable=true
batch_import.node_index.mb=fulltext
batch_import.node_index.mbid=exact
batch_import.csv.quotes=false
cache_type=none
use_memory_mapped_buffers=true
neostore.nodestore.db.mapped_memory=300M
neostore.relationshipstore.db.mapped_memory=3G
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=500M
neostore.propertystore.db.arrays.mapped_memory=0M
neostore.propertystore.db.index.keys.mapped_memory=15M
neostore.propertystore.db.index.mapped_memory=15M
Then, create the indexing instructions directly in the node.csv and rels.csv files, so we don't need the ...index.csv files anymore, see https://github.com/jexp/batch-import -> automatic indexing
kind:string:mb comment status position name:string:mb area gender format barcode number ended length end_date_year begin_date_year mbid:string:mbid type:string:mb pk
artist Talkshow Boy f e8d94cf5-fafa-48fc-a6fa-aa50cf54d7f3 288762
artist Vibulator f 735bfaad-6eb1-4f9c-b21d-cbaef7c79a92 97944
artist Eat Me f c38a93e8-2ecf-4848-b1d2-364202d9dc0c Group 499198
artist Uffe Andersen f a7f3c871-3ba3-40b1-ba58-d08b40312789 Person 514886
artist Headust f eda60727-7036-437b-b53d-ae472818ee3a 212148
artist Sons Of The Subway f 232d5716-c2b2-47e1-aa0c-264ec69e6a18 100774
artist The Poe Boy Family f 672d599e-6a6c-456e-98ba-dac5a45e3ed8 43132
artist Ralph Gusovius Germany Male f 1950 6ecfcea1-677d-427b-a38b-9c76ce92e313 Person 295052
artist Elastik Band f 46e0639c-1ccf-45f5-b886-4cbf5549a2a1 61467
And then import the two files with something like
java -Xmx10G -server -Dfile.encoding=UTF-8 -jar ~/neo/batch-import/target/batch-import-jar-with-dependencies.jar ./graph.db nodes.csv rels.csv
WDYT? It would make the output a lot easier, and the import took about 10min on my machine, 160M Properties, 75M relatoinships ...