-
Notifications
You must be signed in to change notification settings - Fork 7
Discussion
The meaning of verbs depends critically on the number and nature of the complements found in their neighbourhood. It is possible to write a flowchart, that lists those meanings as a function of the patterns of complements.
Having a linguistic text database of the Hebrew Bible at our disposal, it is possible to implement those flowcharts into an algorithm and apply that algorithm to all verb occurrences in the Hebrew Bible.
While the idea expressed above is simple, the execution of it meets a number of challenges.
The ETCBC data has been encoded by an ongoing effort of decades, during which principles of encoding and linguistic theories were subject to change. This has led to several inconsistencies in the details of the encoding, in particular the assignment of functions to phrases.
We have met this challenge by implementing a data correction workflow, where faulty phrase function assignments have been replaced by better ones.
Moreover, the stance of the ETCBC encoders is objectivistic: a piece of markup is applied only when there is objective, measured evidence to do so. On the other hand, the traits needed by the flowcharts to base decisions on, are often at a higher level of interpretation. For example, the notion of indirect object is not present in the ETCBC encoding.
We have met this challenge by enriching the encoding with a bunch of higher level features, that we compute from the ensemble of lower level features. We also have a workflow in place to manually adjust the outcome of this process.
However, the present results do not rely on manual enrichment, because the enrichment algorithm is still work in progress. For now, we bet on improving the algorithm, rather than supplying manual enhancements. In this stage, manual enhancements are counter productive, because checking manual enhancements after updates to the algorithm, is an extra layer of complexity, especially when the vocabulary of the enrichments is in development as well.
When talking about the meanings of Hebrew verbs in English, the verbal valence patterns in the English language interfere with those in the Hebrew language. This hampers a clear organization of meanings of Hebrew verbs in their own terms, and may camouflage complexities in Hebrew meanings, or complicate simple meaning structures.
We deal with this challenge by decoupling the valence patterns from the meanings. So, for each verb context, we annotate its valence pattern, categorize the relevant bits of context, and leave it there. For selected verbs, we have an explicit map from valence patterns to meanings in English, a.k.a. a flowchart. For those verbs, we will insert a link to the flowchart of that verb.
We need to discriminate between various types of direct objects, especially when there are multiple direct objects in a clause.
If there is multiple direct objects in a clause, we will compute which is the principal one. The others are deemed secundary ones. If there is only one direct object, we do not mark it as principal.
An object can be a phrase or a clause.
We will treat clauses marked as Objc
by feature
rela
as direct objects.
Additionally, we identify clauses marked as InfC
by feature
typ
as direct objects if they are preceded by the preposition L and if there is a direct object phrase elsewhere in the clause.
We will not mark all these object clauses as principal direct objects, by rules stated later on.
There are many cases where there is a direct object without it being marked as such in the data. Those are cases where there are no objective, unambiguous signals for a direct object. We call them implied objects. Examples:
- the relative pronoun in relative clauses
- complements starting with MN (from) or L (to)
- assumed objects: objects that figure in the context, but do not have a concrete presence in the clause under consideration
In the case of implied objects we have to guess. Initially we assume that there are no implied objects.
Later, when we inspect individual cases, we can mark principal objects and implied objects manually for those cases where these rules do not suffice.
When there are multiple direct objects, we use the rules formulated by (Janet Dyk, Reinoud Oosting and Oliver Glanz, 2014) to determine which one is the principal one. The rules are stated below where we make some remarks about how we apply them to our data.
When looking for principal direct objects, we restrict ourselves to direct objects at the phrase level, either being complete phrases, or pronominal suffixes within phrases. The following rules express a preference for the principal direct object. In a given context, we select the direct object that is preferred by applying those rules as the principal direct object. We only apply these rules if there are at least two direct objects. If there is only one direct object, it is not marked as principal.
In a given clause, we collect all phrases with function PreO
or PtcO
.
If this collection is non-empty, we pick the one that is textually first (by rule 3 below) and stop applying rules.
Otherwise, we proceed as follows.
We collect all the phrases with function Objc
.
If this collection is empty, there will not be a principal object.
Otherwise, we split it up in marked and unmarked object phrases.
An object phrase is marked if and only if it contains, somewhere, the object marker >T
.
If there are marked object phrases, we pick the one that is textually first (by rule 3 below) and stop applying rules.
Otherwise we proceed with the next rule.
We only arrive here if there are multiple Objc
phrases, neither of which is marked.
In this case, we take the textually first one (by rule 3) which has the value det
for its feature det
,
if there is one, and stop applying rules.
Otherwise we proceed with the next rule.
This rule is implicitly applied if one of the rules before yielded more than one candidate for the principal object.
Furthermore, we arrive here if the previous rules have not selected any principal direct object,
while we do have more than one Objc
phrase.
In this case, we pick the textually first Objc
phrase.
In case there is a principal object, we divide the other objects into two kinds:
- clause objects
- phrase objects
We will give the phrase objects the grammatical label NP_direct_object
.
In some cases, a complement functions as objects, such as in Genesis 21:13 I make him (into) a people.
Candidates are those complements that:
- start with either preposition
L
orK
and - the
L
orK
in question does not carry a pronominal suffix - should also not be followed by a body part
We generated grammatical labels L_object
and K_object
in these cases.
The flowchart will make a distinction between L_object
and K_object
.
An L/K object is never a principal direct object.
The ETCBC database has not feature that marks indirect objects. We will use computation to determine whether a complement is an indirect object or a locative. This computation is just an approximation.
-
# loc lexemes
how many distinct lexemes with a locative meaning occur in the complement (given by a fixed list) -
# topo
how many lexemes with nametype =topo
occur in the complement (nametype is a feature of the lexicon) -
# prep_b
how many occurrences of the prepositionB
occur in the complement -
# h_loc
how many H-locales are carried on words in the complement -
body_part
is 2 if the phrase starts with the prepositionL
followed by a body part, else 0 -
locativity
($loc$ ) a crude measure of the locativity of the complement, just the sum of# loc lexemes
,#topo
,# prep_b
,# h_loc
andbody_part
.
-
# prep_l
how many occurrences of the prepositionL
or>L
with a pronominal suffix on it occur in the complement -
# L prop
how many occurrences ofL
or>L
plus proper name or person reference word occur in the complement -
indirect object
($ind$ ) a crude indicator of whether the complement is an indirect object, just the sum of# prep_l
and# L prop
We take a decision as follows.
In words:
- if there are positive signals for L or I and none for the other, we choose the one for which there are positive signals;
- if there are positive signals for both L and I, we follow the majority count, but only if the difference is at least two;
- in all other cases we leave it at C: not necessarily locative and not necessarily indirect object.
See the enrich notebook for a more formal account of the decision.