Skip to content

Merge main vol7#1891

Merged
dexters1 merged 3 commits intodevfrom
merge-main-vol7
Dec 11, 2025
Merged

Merge main vol7#1891
dexters1 merged 3 commits intodevfrom
merge-main-vol7

Conversation

@dexters1
Copy link
Collaborator

@dexters1 dexters1 commented Dec 11, 2025

Description

Add commits from main to dev branch

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring
  • Performance improvement
  • Other (please specify):

Screenshots/Videos (if applicable)

Pre-submission Checklist

  • I have tested my changes thoroughly before submitting this PR
  • This PR contains minimal changes necessary to address the issue/feature
  • My code follows the project's coding standards and style guidelines
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if applicable)
  • All new and existing tests pass
  • I have searched existing PRs to ensure this change hasn't been submitted already
  • I have linked any relevant issues in the description
  • My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

Summary by CodeRabbit

  • Refactor

    • Removed permission validation checks from the data processing pipeline, streamlining the overall workflow and reducing processing steps.
    • Updated task sequences across task handlers to reflect the removal of the validation step.
  • Documentation

    • Updated processing pipeline documentation and example code to reflect the new streamlined task sequence.

✏️ Tip: You can customize this high-level summary in your review settings.

martin0731 and others added 3 commits November 13, 2025 08:31
<!-- .github/pull_request_template.md -->

## Description
This PR removes the obsolete `check_permissions_on_dataset` task and all
its related imports and usages across the codebase.
The authorization logic is now handled earlier in the pipeline, so this
task is no longer needed.
These changes simplify the default Cognify pipeline and make the code
cleaner and easier to maintain.

### Changes Made
- Removed `cognee/tasks/documents/check_permissions_on_dataset.py` 
- Removed import from `cognee/tasks/documents/__init__.py` 
- Removed import and usage in `cognee/api/v1/cognify/cognify.py` 
- Removed import and usage in
`cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py`
- Updated comments in
`cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py`
(index positions changed)
- Removed usage in `notebooks/cognee_demo.ipynb` 
- Updated documentation in `examples/python/simple_example.py` (process
description)

---

## Type of Change
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
- [x] Code refactoring
- [x] Other (please specify): Task removal / cleanup of deprecated
function

---

## Pre-submission Checklist
- [ ] **I have tested my changes thoroughly before submitting this PR**
- [x] **This PR contains minimal changes necessary to address the
issue**
- [x] My code follows the project's coding standards and style
guidelines
- [ ] All new and existing tests pass
- [x] I have searched existing PRs to ensure this change hasn't been
submitted already
- [x] I have linked any relevant issues in the description (Closes
#1771)
- [x] My commits have clear and descriptive messages

---

## DCO Affirmation
I affirm that all code in every commit of this pull request conforms to
the terms of the Topoteretes Developer Certificate of Origin.
@dexters1 dexters1 requested a review from hajdul88 December 11, 2025 18:12
@dexters1 dexters1 self-assigned this Dec 11, 2025
@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

The pull request removes the permission validation step from the Cognify processing pipeline. This includes eliminating the check_permissions_on_dataset function entirely, removing its imports and references from task pipelines in both the main API and evaluation framework, and updating example documentation to reflect the revised task sequence.

Changes

Cohort / File(s) Summary
Permission check function removal
cognee/tasks/documents/check_permissions_on_dataset.py, cognee/tasks/documents/__init__.py
Deleted the check_permissions_on_dataset async function and its import, removing the entire permission validation module
Task pipeline updates
cognee/api/v1/cognify/cognify.py, cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py
Removed the check_permissions_on_dataset import and deleted the permission check task from default and temporal task lists
Base task indices selection
cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
Modified task index selections in get_no_summary_tasks and get_just_chunks_tasks to exclude the extract_chunks task by changing indices from [0, 1, 2] to [0, 1]
Documentation and examples
examples/python/simple_example.py
Updated status messages to reflect the new pipeline sequence, replacing the permission check step with text chunk extraction and reordering subsequent steps

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

The changes follow a consistent, repetitive pattern of removing the same permission validation functionality across multiple files. All modifications are straightforward deletions with minor textual updates. No complex logic, architectural changes, or intricate dependencies are introduced.

Possibly related PRs

  • PR #1786: Directly related—removes the same check_permissions_on_dataset function, its imports, and usages across the codebase
  • PR #641: Related to eval_framework task-getter pipeline modifications, particularly task-selection logic in TaskGetters
  • PR #700: Related to get_default_tasks_by_indices.py changes and base task selection logic

Suggested labels

run-checks

Suggested reviewers

  • hajdul88
  • Vasilije1990

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description only states 'Add commits from main to dev branch' without explaining the actual changes made. The pre-submission checklist is entirely unchecked, indicating incomplete preparation. Provide a detailed description of the changes including why permission validation was removed and how it affects the pipeline. Complete or update the pre-submission checklist to reflect the actual work done.
Title check ❓ Inconclusive The title 'Merge main vol7' is vague and does not clearly convey the specific changes made in the pull request, such as the removal of permission checks or the pipeline reorganization. Use a more descriptive title that summarizes the main changes, such as 'Remove permission validation from cognify pipeline' or 'Refactor cognify pipeline to remove permission checks'.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch merge-main-vol7

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (2)

33-35: Confirm base task ordering and reduce reliance on hard‑coded indices

The change to [0, 1] assumes that get_default_tasks now returns classify and extract‑chunks at positions 0 and 1 (as per the comment). Please double‑check that this ordering is still correct after removing the permission task; otherwise this helper will silently change which steps run in the “no summary” pipeline.

Longer‑term, consider selecting tasks by a stable identifier (e.g., name/type) instead of positional indices to make this more robust to future changes in the default task list.


54-56: Align docstring and comment with actual task set, and avoid duplicated base‑task logic

Here you make the same [0, 1] assumption as above and the comment says 0=classify, 1=extract_chunks, but the docstring for get_just_chunks_tasks says “only chunk extraction and data points addition.” If classification is indeed included in base_tasks, it would be good to clarify the docstring (or adjust the indices) so callers know exactly what this pipeline does.

Given both helpers share the same base‑task selection, you might also consider a small shared helper (e.g., _get_classify_and_chunk_tasks(...)) to avoid duplicating the index knowledge in multiple places.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 46ddd4f and 59f8d12.

📒 Files selected for processing (7)
  • cognee/api/v1/cognify/cognify.py (2 hunks)
  • cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py (0 hunks)
  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (2 hunks)
  • cognee/tasks/documents/__init__.py (0 hunks)
  • cognee/tasks/documents/check_permissions_on_dataset.py (0 hunks)
  • examples/python/simple_example.py (1 hunks)
  • notebooks/cognee_demo.ipynb (1 hunks)
🔥 Files not summarized due to errors (1)
  • notebooks/cognee_demo.ipynb: Error: Server error: no LLM provider could handle the message
💤 Files with no reviewable changes (3)
  • cognee/tasks/documents/check_permissions_on_dataset.py
  • cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py
  • cognee/tasks/documents/init.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use 4-space indentation in Python code
Use snake_case for Python module and function names
Use PascalCase for Python class names
Use ruff format before committing Python code
Use ruff check for import hygiene and style enforcement with line-length 100 configured in pyproject.toml
Prefer explicit, structured error handling in Python code

Files:

  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
  • cognee/api/v1/cognify/cognify.py
  • examples/python/simple_example.py

⚙️ CodeRabbit configuration file

**/*.py: When reviewing Python code for this project:

  1. Prioritize portability over clarity, especially when dealing with cross-Python compatibility. However, with the priority in mind, do still consider improvements to clarity when relevant.
  2. As a general guideline, consider the code style advocated in the PEP 8 standard (excluding the use of spaces for indentation) and evaluate suggested changes for code style compliance.
  3. As a style convention, consider the code style advocated in CEP-8 and evaluate suggested changes for code style compliance.
  4. As a general guideline, try to provide any relevant, official, and supporting documentation links to any tool's suggestions in review comments. This guideline is important for posterity.
  5. As a general rule, undocumented function definitions and class definitions in the project's Python code are assumed incomplete. Please consider suggesting a short summary of the code for any of these incomplete definitions as docstrings when reviewing.

Files:

  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
  • cognee/api/v1/cognify/cognify.py
  • examples/python/simple_example.py
cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use shared logging utilities from cognee.shared.logging_utils in Python code

Files:

  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
  • cognee/api/v1/cognify/cognify.py
cognee/api/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Public APIs should be type-annotated in Python where practical

Files:

  • cognee/api/v1/cognify/cognify.py
🧠 Learnings (2)
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.

Applied to files:

  • cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py
📚 Learning: 2024-10-16T07:06:28.669Z
Learnt from: borisarzentar
Repo: topoteretes/cognee PR: 144
File: cognee/tasks/chunking/query_chunks.py:1-17
Timestamp: 2024-10-16T07:06:28.669Z
Learning: The `query_chunks` function in `cognee/tasks/chunking/query_chunks.py` is used within the `search` function in `cognee/api/v1/search/search_v2.py`.

Applied to files:

  • cognee/api/v1/cognify/cognify.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: End-to-End Tests / Conversation sessions test (Redis)
  • GitHub Check: End-to-End Tests / Conversation sessions test (FS)
  • GitHub Check: End-to-End Tests / Test dataset database handlers in Cognee
  • GitHub Check: End-to-End Tests / Test Cognify - Edge Centered Payload
  • GitHub Check: End-to-End Tests / Test Entity Extraction
  • GitHub Check: End-to-End Tests / Test multi tenancy with different situations in Cognee
  • GitHub Check: End-to-End Tests / Test graph edge ingestion
  • GitHub Check: End-to-End Tests / Deduplication Test
  • GitHub Check: End-to-End Tests / Concurrent Subprocess access test
  • GitHub Check: End-to-End Tests / Run Telemetry Pipeline Test
  • GitHub Check: End-to-End Tests / Test permissions with different situations in Cognee
  • GitHub Check: End-to-End Tests / Test using different async databases in parallel in Cognee
  • GitHub Check: End-to-End Tests / Server Start Test
  • GitHub Check: End-to-End Tests / Test Feedback Enrichment
  • GitHub Check: End-to-End Tests / S3 Bucket Test
  • GitHub Check: Basic Tests / Run Integration Tests
  • GitHub Check: Basic Tests / Run Linting
  • GitHub Check: Basic Tests / Run Simple Examples
  • GitHub Check: Basic Tests / Run Unit Tests
  • GitHub Check: Basic Tests / Run Formatting Check
  • GitHub Check: CLI Tests / CLI Functionality Tests
  • GitHub Check: CLI Tests / CLI Integration Tests
🔇 Additional comments (4)
notebooks/cognee_demo.ipynb (1)

591-597: Limited visibility into substantive changes.

The provided code fragment only shows Jupyter notebook cell metadata (lines 591–597), with "execution_count": null appearing as the changed line. However, the AI summary indicates significant functional changes: removal of the check_permissions_on_dataset function and its references from task pipelines, plus updated example documentation.

The visible cell metadata structure is standard and unproblematic, but we cannot verify the actual implementation changes claimed in the summary without seeing the notebook cell contents.

Please verify that:

  • All references to check_permissions_on_dataset have been removed from task definitions in this notebook
  • The example task sequence documentation has been updated to reflect the removal of the permission validation step
  • The notebook cells execute without errors after the removal

If possible, provide the full diff or the substantial portions of changed cells to enable proper review of the pipeline changes.

cognee/api/v1/cognify/cognify.py (2)

319-319: Docstring update aligns with permission removal.

The updated parameter description for user is consistent with the removal of permission checks from the temporal pipeline. However, this change is part of the broader security concern regarding the removal of check_permissions_on_dataset.

Note: Please address the security verification requested in the previous comment about the removal of permission validation.


81-85: No changes needed—permission validation is properly implemented.

The processing pipeline documentation correctly lists technical steps (classification, chunking, extraction, etc.), which are distinct from authorization checks. Permission validation is not part of the documented processing pipeline because it's a cross-cutting concern handled at the pipeline boundary, not a processing step.

Permission validation is already implemented via resolve_authorized_user_datasets() in cognee/modules/pipelines/operations/pipeline.py:44, which verifies "write" permissions before any processing begins. This is the correct architectural pattern—authorization at the pipeline boundary, not within the documented technical pipeline steps.

examples/python/simple_example.py (1)

35-41: Example documentation updated to reflect new pipeline.

The progress narration has been updated to accurately reflect the removal of the permission check step from the cognify pipeline. The new sequence correctly describes the updated processing flow.

Note: This example change is consistent with the API modifications in cognee/api/v1/cognify/cognify.py. However, please ensure the security implications of removing permission validation are addressed (see previous comments).

@dexters1 dexters1 merged commit 7b3d997 into dev Dec 11, 2025
150 of 156 checks passed
@dexters1 dexters1 deleted the merge-main-vol7 branch December 11, 2025 19:11
@coderabbitai coderabbitai bot mentioned this pull request Jan 11, 2026
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants