Skip to content

Refine Polyline-to-Bounding Box Matching Strategy #106

@Saidgurbuz

Description

@Saidgurbuz

In the current matching strategy, a point on a polyline is associated with the smallest bounding box that contains it.

if box["l"] <= point[0] <= box["r"] and box["t"] <= point[1] <= box["b"]:
current_area = (box["r"] - box["l"]) * (box["b"] - box["t"])
if index == -1 or current_area < area:
area = current_area
index = i
box_result = box

This approach works for certain link types, such as to_footnote, to_value, and to_caption. However, for links like reading_order, merge, or group, we expect the points to be associated with the outermost bounding boxes under certain conditions. For example, in the case of a table, the reading_order should be attached to the table's bounding box, not to the bounding box of an individual table_row.

To ensure the validation methods defined in PR #102 work as intended, the find_box function needs to be updated accordingly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions