Confluence Connector Pagination #3320

WildDogOne · 2025-03-22T20:22:40Z

Bug Description

The fullsync on the confluence connector only pulls 50 documents if a CQL is set.

To Reproduce

Set a CQL as an "advanced rule" in the connector "sync rules" for example:
[
{
"query": "created >= now('-5y')"
}
]

Expected behavior

Pull the confluence content of the last 5 years (obvious overkill but that is a different story)

Environment

8.17.3

Solution

I have been playing around with the "paginated_api_call" function in "confluence.py" and have noticed that the function looks for a next link.
However in the /api/search call this does not actually seem to exist according to the API documentation:
https://docs.atlassian.com/atlassian-confluence/REST/6.6.0/#content-search

It seems that pagination for a search has to be done with moving of the start window.
quick prof of concept while still keeping the next link if it would be needed by another function:

    async def paginated_api_call(self, url_name, **url_kwargs):
        """Make a paginated API call for Confluence objects using the passed url_name.
        Args:
            url_name (str): URL Name to identify the API endpoint to hit
        Yields:
            response: JSON response.
        """
        base_url = os.path.join(self.host_url, URLS[url_name].format(**url_kwargs))
        start = 0

        while True:
            try:
                url = f"{base_url}&start={start}"
                print("Starting Pagination for API endpoint: ", url)
                self._logger.debug(f"Starting pagination for API endpoint {url}")
                response = await self.api_call(url=url)
                json_response = await response.json()

                #print(json_response)
                links = json_response.get("_links")
                yield json_response
                print(links.get("next"))
                if links.get("next"):
                    print("Next URL Found")
                    url = os.path.join(
                        self.host_url,
                        links.get("next")[1:],
                    )
                elif json_response.get("start") + json_response.get("size") < json_response.get("totalSize"):
                    print("Calculating next URL")
                    start = json_response.get("start") + json_response.get("size")
                    url = f"{base_url}&start={start}"
                    print("Next URL: ", url)
                else:
                    print("No more data to fetch")
                    return
            except Exception as exception:
                print("Exception: ", exception)
                self._logger.warning(
                    f"Skipping data for type {url_name} from {base_url}. Exception: {exception}."
                )
                break

While debugging this I also found another issue in the function "search_by_query", it never is checked if "entity_details" exists, so if entity details is none, it will fail.
I fixed this with an additional condition

    async def search_by_query(self, query):
        async for entity in self.confluence_client.search_by_query(query=query):
            # entity can be space or content
            entity_details = entity.get(SPACE) or entity.get(CONTENT)

            if not entity_details:
                continue
            if (entity_details.get("type", "") == "attachment"
                and entity_details.get("container", {}).get("title") is None
            ):
                continue

The text was updated successfully, but these errors were encountered:

seanstory · 2025-03-28T19:34:20Z

Thanks for filing, @WildDogOne! Indeed, this feels like a bug worth fixing.

WildDogOne added the bug Something isn't working label Mar 22, 2025

WildDogOne linked a pull request Mar 22, 2025 that will close this issue

Improve pagination handling in Confluence API client #3321

Open

11 tasks

seanstory added the community-driven label Mar 28, 2025

seanstory added effort:low priority:medium labels Mar 28, 2025

erikcurrin-elastic added the sdh-driven label Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Confluence Connector Pagination #3320

Confluence Connector Pagination #3320

WildDogOne commented Mar 22, 2025 •

edited

Loading

seanstory commented Mar 28, 2025

Uh oh!

Confluence Connector Pagination #3320

Confluence Connector Pagination #3320

Comments

WildDogOne commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug Description

To Reproduce

Expected behavior

Environment

Solution

seanstory commented Mar 28, 2025

Uh oh!

WildDogOne commented Mar 22, 2025 •

edited

Loading