Skip to content

Implement more efficient chunking strategies #2103

@bobome-ola

Description

@bobome-ola

Motivation

Currently the chunking is possible using Layout, Page, Fixed-size and Paragraph strategies with possible overlap. I would suggest an additional strategy focused solely on quality: LLM chunking. A LLM is called to have coherent and relevant chunks, each chunk expressing the same idea or concept or thought.

In all chunking strategies, a LLM can be used to generate additional metadata to improve the reranking by Azure AI Search when semantic search is True. The LLM would populate for each chunk:

  • "Title" by summarizing the chunk into one short sentence
  • "Keyword" by extracting the main keywords of the chunk

Please note that in order to make the reranking useful, ticket 2093 needs to be implemented.

Tasks

To be filled in by the engineer picking up the issue

  • Task 1
  • Task 2
  • ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions