This is an internal repository containing the workdirs with project-specific scripts that are called from kubeflow pipelines.
Three repositories are required to test run dataset preparation locally:
Install them separately, use master branch for wikidata-workdir, wip/entity-recovery-mode branch for qald repository, and wip/qald for genie-toolkit. Then inside the wikidata-workdir directory, create configuration file config.mk with the following lines
geniedir=<PATH_TO_YOUR_GENIE_INSTALLATION>
qalddir=<PATH_TO_YOUR_QALD_INSTALLATION>To generate a sample dataset, run the following command:
make datadirThis will generate a small sample dataset with oracle NED. If ReFinED entity linker is desired, add the following options to the command:
entity_recovery_mode=true
refined_model=models/refined
ned=refined
synthetic_ned=refined
To evaluate an existing model:
-
Install genienlp.
-
Download the model using the following command, where
<path>is the folder containing the model under azure bucketpvc-a8853620-9ac7-4885-a30e-0ec357f17bb6. The model will be downloaded undermodels/<model_name>.
./sync-models.sh <path> <model_name>- Run the following command to evaluate, where
<eval_set>isevalfor dev set andtestfor test set.
make \
refined_model=models/refined \
entity_recovery_mode=true \
ned=refined \
metric=answer \
eval_set=<eval_set> \
<eval_set>/<model_name>.resultsNote that generating manifest.tt file takes very long. Once it's generated and no update is needed, option update_manifest=false to all make commands above to save time.
If some command failed in the middle or there is a dataset update, run make safe-clean to clean up the folder before rerun the command.