-
Notifications
You must be signed in to change notification settings - Fork 61
feat: Support rich table cell content #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Christoph Auer <[email protected]>
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 Require two reviewer for test updatesThis rule is failing.When test data is updated, we require two reviewers
🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
Signed-off-by: Christoph Auer <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Panos Vagenas <[email protected]>
Signed-off-by: Christoph Auer <[email protected]>
|
||
doc = _construct_doc() | ||
|
||
html_pred = doc.export_to_html() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to other tests (see here for example) we could simplify the test results removing the styling
html_pred = doc.export_to_html() | |
html_pred = doc.export_to_html(html_head="") |
"""TableCell.""" | ||
|
||
bbox: Optional[BoundingBox] = None | ||
row_span: int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a variable here? I think this can be a method computing end_row_offset_idx-start_row_offset_idx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do something about this but it is unrelated to the changes of this PR, and we cannot break backwards-compatibility.
|
||
bbox: Optional[BoundingBox] = None | ||
row_span: int = 1 | ||
col_span: int = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a variable here? I think this can be a method computing end_col_offset_idx-start_col_offset_idx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do something about this but it is unrelated to the changes of this PR, and we cannot break backwards-compatibility.
row_span: int = 1 | ||
col_span: int = 1 | ||
start_row_offset_idx: int | ||
end_row_offset_idx: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check somewhere that end_*>start_*
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can generally put a model validator for such things.
Feature
This PR introduces changes to allow table cells to contain any rich content instead of a basic text string. To enable this, the
TableCell
model now inherits fromNodeItem
. In order to remain backward compatible, theTableCell.text
field is still present (default = "") and theself_ref
uses a relative JSON pointer ("0") to refer itself as long as the TableCell is constructed in isolation before belonging to a table in a doc (as in previous usage conventions). A new methodTableCell.has_rich_content()
reports if the table cell contains child nodes.New methods in
TableItem
allow to modify a table's cells after adding the table to aDoclingDocument
, which ensures proper reference handling of table cells:TableItem.update_cell
is used to insert or overwrite a cell in a table, which allows to useDoclingDocument.add_text
and other APIs with the returned cell asparent
argument.TableItem.delete_cells
allows to delete cells in a given row/col index rangeTable cells with valid
self_ref
are created by theupdate_cell
method and will look like this example:#/tables/0/data/table_cells/2
The serializers are updated to consider table cells with rich content to ensure the
text
field is not considered when children are present. Children are serialized like any other subtree in a document.