-
Notifications
You must be signed in to change notification settings - Fork 7
Workflows
In order to do flowchart analysis on verbs, we need to correct some coding errors.
We also need to enrich constituents surrounding the verb occurrences with higher level features, that can be used as input for the flowchart decisions.
The notebook enrich defines a correction workflow and an enrichment workflow. In both workflows, spreadsheets of data are generated, to be copied and filled out manually. The filled out copies are read back in, and converted to additional features.
The correction workflow comes first. The enrichment workflow makes use of the corrected features. The flowchart algorithm makes use of the enriched features.
In the correction and enrichment workflow we perform the following tasks:
- correction workflow
- generate correction sheets for selected verbs,
- process the set of filled in correction sheets
- enrichment workflow
- generate sheets with computed, new features (based on corrected values, valence related) to be edited manually
- transform the set of filled in enrichment sheets into an annotation package
In the correction workflow, Janet enters corrections in the sheets. In the enrichment workflow, Janet inspects the automatically generated enrichments, suggests improvements to the enrichment algorithm, and enters ad hoc improvements.
The results of the workflows are stored as a new data module in text-fabric format, which can be used along side the big dataset of the Hebrew Bible, also in text-fabric format. The new data module offers the corrections enrichments as new data features.
We restrict ourselves to verb occurrences where the verb is the nucleus of a phrase with function predicate. There are also verb occurrences in other kinds of phrases, and these also can have complements. These cases are coded very differently in the database. See for example Joshua 3:8. (and you command the priest carrying the ark ...).
An other limitation is that we restrict ourselves to verb occurrences in the root formation Qal.
Because the flowchart assigns meanings to verbs depending on the number and nature of complements found in their context, it is important that the phrases in those clauses are labelled correctly, i.e. that the function feature for those phrases have the correct label.
However, it is a daunting task to correct all constituents surrounding all verbs, so we have singled out some verbs for this stage. This set of verbs is open, the workflow we implement is such that new verbs can be added easily.
We need higher level features that characterize the constituents (clauses and phrases) that are associated with verb occurrences. This we do in several stages:
- we define a list of default values for higher level features, based on the existing features;
- we apply generic enrichment logic, by taking into account computed characteristics of the constituents we find in the context;
- we modify the generic logic by verb-dependent rules;
- the values computed at this stage are used for "blank" enrichment sheets;
- we draw in hand written enrichments from filled out enrichment sheets.