Feature/autolayout improvements by 09hanvaik · Pull Request #412 · Data-Simply/openretailscience

09hanvaik · 2025-11-20T09:04:50Z

Summary

Implement Reingold-Tilford algorithm for automatic tree layout with O(n) complexity
Add contour-based collision detection to prevent node overlap
Support both LR (left-to-right) and TB (top-to-bottom) orientations
Extract magic numbers to named constants
Consolidate get_left_contour/get_right_contour into single get_contour() function

- Replace manual positioning with Reingold-Tilford algorithm - Remove num_rows/num_cols parameters from TreeGrid.__init__() - Remove "position" keys from tree structure, only "children" required - Add automatic parent centering over children - Update RevenueTree to use simplified auto-layout API - Update all TreeGrid tests for new API - Add comprehensive layout verification tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…r preservation - Replace _find_parent() O(n²) helper with O(n) parent_map dictionary - Use more Pythonic patterns (deque initialization, max with default) - Fix bug where Qwen's version sorted leaves alphabetically instead of preserving tree structure order - Add noqa: C901 for acceptable algorithm complexity - Maintain correct two-pass Reingold-Tilford inspired approach 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…n rendering IMPROVEMENTS: - Extract DRY violation: Created build_node() helper function in revenue_tree.py * Reduced 84 lines of repetitive node definitions to 46 lines (45% reduction) * Eliminates code duplication across 7 node definitions - Fix inconsistent spacing calculations in TreeGrid * horizontal_spacing and vertical_spacing now use consistent logic * Both now calculate as: node_dimension + fixed_gap * Removed asymmetric division by grid_spacing - Extract magic numbers to named constants * Added DEFAULT_VERTICAL_GAP, DEFAULT_HORIZONTAL_GAP, DEFAULT_LR_DEPTH_GAP * Added 10 layout constants to DetailedTreeNode * Improves maintainability and clarity - Consolidate duplicate contour functions * Unified get_left_contour and get_right_contour into single get_contour() * Reduced 28 lines to 16 lines with parameterized behavior * Eliminates near-duplicate code - Fix TB orientation connection rendering * Added threshold check for small horizontal offsets * Uses simple straight lines when offset < 2× curve_radius * Eliminates jagged appearance on nearly-vertical connections * Preserves smooth S-curves for larger offsets PRESERVED: - TB/LR orientation support for future speeddrill feature - All existing functionality and public API - Test coverage maintained at 99% for revenue_tree.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

IMPROVEMENTS: - Add shared fixtures for common tree structures * simple_three_node_tree for SimpleTreeNode tests * detailed_three_node_tree for DetailedTreeNode tests - Consolidate redundant BaseRoundedBox vertex tests * Reduced 5 tests to 3 with better coverage * Combined partial radius tests into single test - Parametrize duplicate orientation tests * Combined test_orientation_lr and test_orientation_tb * Now uses @pytest.mark.parametrize for both orientations - Add missing test scenarios (TestMissingScenarios class): * test_tb_orientation_position_mapping - verify Y axis mapping * test_custom_spacing_parameters - custom spacing overrides * test_default_spacing_uses_constants - verify constant usage * test_detailed_node_without_contribution - optional field * test_contour_prevents_overlap_in_complex_tree - overlap prevention * test_tree_with_fixture - fixture usage validation * test_lr_orientation_depth_maps_to_x - LR depth mapping Test count: 57 tests (was 52), all passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Fix ruff linting issues (magic numbers, unused variables, SIM114) - Add matplotlib Agg backend to avoid tkinter errors in CI - Format code with ruff formatter - All 57 tests now pass with 97% coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

claude · 2025-11-20T09:05:06Z

Claude finished @09hanvaik's task —— View job

PR Review Complete ✅

Read and analyze changed files
Review code quality and best practices
Check for potential bugs or issues
Evaluate performance considerations
Check security concerns
Review test coverage
Post comprehensive review feedback

Summary

I've completed a comprehensive review of this PR implementing the Reingold-Tilford auto-layout algorithm. The detailed review has been posted as a separate comment.

Overall Assessment: High-quality work with excellent implementation ⭐

Key Findings:

✅ Outstanding algorithm implementation with proper O(n) complexity
✅ Comprehensive test coverage (525+ lines of well-structured tests)
✅ Excellent code quality with proper constant extraction
✅ No security concerns
⚠️ One medium-priority issue: spacing inconsistency between orientations (detailed in review)
💡 Several minor suggestions for improvement

Recommendation: Approve with minor suggestions. The spacing inconsistency should be reviewed before merge, but all other items are nice-to-haves.

claude · 2025-11-20T09:07:09Z

Code Review Summary

I've completed a comprehensive review of this PR implementing the Reingold-Tilford auto-layout algorithm. Overall, this is excellent work with high code quality, comprehensive test coverage, and solid adherence to best practices. Below are my findings organized by category.

✅ Strengths

Algorithm Implementation

Outstanding implementation of the Reingold-Tilford algorithm with O(n) complexity in tree_diagram.py:419-570
Contour-based collision detection is correctly implemented and prevents node overlap
Clean separation of layout computation (_compute_positions) from rendering (render)
Both LR and TB orientations are well-supported with proper coordinate mapping

Code Quality

Excellent constant extraction throughout:
- SimpleTreeNode: GREEN_THRESHOLD = 1.0, RED_THRESHOLD = -1.0 (lines 202-203)
- DetailedTreeNode: Layout constants like HEADER_HEIGHT_RATIO, NODE_PADDING_HORIZONTAL (lines 832-841)
- TreeGrid: Connection and spacing constants (lines 334-341)
Clear, descriptive variable names and comprehensive docstrings
Good use of type annotations throughout

Test Coverage

Exceptional test suite with 525 lines of well-structured tests
Proper use of pytest.mark.parametrize for testing multiple scenarios efficiently
Realistic retail domain data in tests (revenue, customers, transactions)
Tests focus on behavior validation, not implementation details
Good use of fixtures for test data reuse

Architecture

Clean abstraction with TreeNode ABC allowing multiple node implementations
Proper separation of concerns between nodes, grid, and rendering
Validation of inputs with clear error messages

🔍 Issues Found

1. Code Quality - Consolidation Opportunity (Minor)

Location: tree_diagram.py:472-499

The PR summary mentions consolidating get_left_contour and get_right_contour into a single get_contour() function, which is great! However, the implementation could be slightly cleaner:

def get_contour(node_id: str, mod_sum: float, get_leftmost: bool) -> dict[int, float]:
    contour: dict[int, float] = {}
    node_depth = depths[node_id]
    contour[node_depth] = prelim_x[node_id] + mod_sum
    
    children = children_map.get(node_id, [])
    if children:
        child_contours = [get_contour(child, mod_sum + modifier[node_id], get_leftmost) for child in children]
        for child_contour in child_contours:
            for depth, x in child_contour.items():
                if (
                    depth not in contour
                    or (get_leftmost and x < contour[depth])
                    or (not get_leftmost and x > contour[depth])
                ):
                    contour[depth] = x
    return contour

Suggestion: The conditional logic on lines 493-497 could be simplified. Consider extracting the comparison logic:

def should_update_contour(new_x: float, current_x: float, get_leftmost: bool) -> bool:
    return get_leftmost and new_x < current_x or not get_leftmost and new_x > current_x

# Then use:
if depth not in contour or should_update_contour(x, contour[depth], get_leftmost):
    contour[depth] = x

This is a minor readability improvement - the current code is correct and clear enough.

2. Potential Bug - Spacing Inconsistency (Medium)

Location: tree_diagram.py:395-407

There's an inconsistency in how spacing is calculated between orientations:

# Lines 395-398
default_vertical = node_height + self.DEFAULT_VERTICAL_GAP
default_horizontal = node_width + self.DEFAULT_HORIZONTAL_GAP
self.vertical_spacing = vertical_spacing if vertical_spacing is not None else default_vertical
self.horizontal_spacing = horizontal_spacing if horizontal_spacing is not None else default_horizontal

# Lines 401-407
if self.orientation == "LR":
    self.depth_spacing = node_width + self.DEFAULT_LR_DEPTH_GAP  # Uses DEFAULT_LR_DEPTH_GAP (1.0)
else:
    self.depth_spacing = self.vertical_spacing  # Uses vertical_spacing (node_height + 0.8)

Issue: In LR orientation, depth_spacing uses DEFAULT_LR_DEPTH_GAP = 1.0, but in TB orientation it reuses vertical_spacing which includes DEFAULT_VERTICAL_GAP = 0.8. This asymmetry might cause inconsistent spacing between orientations.

Recommendation: Consider either:

Using consistent gap values (DEFAULT_DEPTH_GAP = 1.0 for both), or
Documenting why LR needs more space (e.g., for connection line routing)

3. Test Quality - Missing Edge Case (Minor)

Location: test_tree_diagram.py

The test suite is comprehensive, but there's one missing edge case:

Missing: Test for trees with None in children list (though the code handles it via .get("children", []) or [])

Suggestion: Add a test like:

def test_none_children_treated_as_leaf(self):
    """Test that None children value is treated as empty list."""
    tree_structure = {
        "root": {
            "header": "Root",
            "percent": 5.0,
            "value1": "00K",
            "value2": "5K",
            "children": None,  # Explicit None instead of [] or missing
        },
    }
    grid = TreeGrid(tree_structure=tree_structure, node_class=SimpleTreeNode)
    ax = grid.render()
    assert len(ax.patches) == 2  # Should render as leaf node

4. Code Quality - Magic Number Still Present (Minor)

Location: revenue_tree.py:80

Line 80 contains a magic number:

period_count = 2
if len(df.index) == period_count:

Recommendation: While this is named as a constant locally, consider making it a module-level constant if this value is significant to the domain logic, or add a comment explaining why exactly 2 periods requires special handling.

5. Documentation - Algorithm Complexity (Minor)

Location: tree_diagram.py:419

The docstring mentions O(n) complexity, which is excellent! However, it would be helpful to explain why it's O(n):

"""Compute node positions using Reingold-Tilford algorithm with contour tracking.

Time Complexity: O(n) where n is the number of nodes. Each node is visited exactly
once in the post-order pass and once in the pre-order pass. Contour merging at each
node processes only the depths present in its subtrees.
"""

🔒 Security

No security concerns identified. The code:

Properly validates inputs (empty tree_structure, invalid orientation, invalid node_class)
Uses safe dictionary operations with .get() and proper default handling
No external input processing or file I/O operations
No SQL, command injection, or XSS risks

⚡ Performance

Excellent performance characteristics:

✅ O(n) layout algorithm as claimed
✅ Proper use of dictionaries for O(1) lookups
✅ No nested loops over large datasets
✅ Efficient contour tracking with depth-based dictionaries

Minor optimization opportunity: Lines 490-498 iterate over child contours. For very wide trees (many children), this could be optimized with a single pass, but this is likely premature optimization for typical use cases.

🧪 Test Coverage

Outstanding test coverage (91.6% based on PR description):

✅ Comprehensive coverage includes:

Base class instantiation validation
Node rendering with various data configurations
Color threshold testing at boundaries and ranges
Missing required keys validation
Layout algorithm correctness (centering, spacing, depths)
Both orientations (LR and TB)
Edge cases (single node, wide trees, deep trees, asymmetric trees)
Integration tests with both node types

✅ Tests follow best practices:

Use pytest.mark.parametrize effectively (lines 248-257, 282-285, 567-586, 682-690)
Proper fixtures for test data reuse
Realistic retail domain data
Focus on behavior, not implementation
Clear, descriptive test names

Suggestions:

Add test for children: None case (mentioned above)
Consider adding a performance test that verifies O(n) complexity for large trees (e.g., 100+ nodes)

📋 Best Practices Compliance

✅ Follows CLAUDE.md guidelines:

Google-style docstrings with proper types
Type annotations throughout
Snake_case for functions/variables, CamelCase for classes
Max line length respected (120 chars)
Double quotes for strings
Proper use of is not None instead of truthiness checks
Constants extracted with descriptive names
Vectorized operations where applicable (numpy/matplotlib)

✅ Clean git history:

Conventional commits used
Logical commit progression
Clear commit messages

🎯 Recommendations

High Priority

Review spacing inconsistency between LR and TB orientations (Issue Mkdocs Deploy #2 above)

Medium Priority

Document why LR needs DEFAULT_LR_DEPTH_GAP = 1.0 vs TB's 0.8
Add test for children: None edge case

Low Priority (Nice-to-have)

Consider the contour comparison logic simplification (Issue Removed missing extensions from mkdocs.yml #1)
Add complexity explanation to algorithm docstring (Issue Testing pipeline Trigger #5)
Explain or extract the period_count = 2 magic number (Issue Mkdocs pipeline #4)

✨ Conclusion

This is high-quality work that successfully implements a non-trivial algorithm with excellent code organization, comprehensive testing, and strong adherence to best practices. The issues identified are minor and don't block merging.

Recommendation: APPROVE with minor suggestions

The spacing inconsistency (Issue #2) is the only item I'd suggest addressing before merge, as it could cause unexpected visual differences between orientations. All other items are nice-to-haves that could be addressed in follow-up PRs.

Great job on this implementation! The Reingold-Tilford algorithm is well-executed and the test coverage is exemplary. 🎉

codecov · 2025-11-20T09:08:24Z

Codecov Report

❌ Patch coverage is 93.61702% with 12 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pyretailscience/plots/tree_diagram.py	93.33%	8 Missing and 4 partials ⚠️

Files with missing lines	Coverage Δ
pyretailscience/analysis/revenue_tree.py	`99.13% <100.00%> (+0.01%)`	⬆️
pyretailscience/plots/tree_diagram.py	`95.91% <93.33%> (-2.85%)`	⬇️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

09hanvaik and others added 5 commits November 17, 2025 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/autolayout improvements#412

Feature/autolayout improvements#412
09hanvaik wants to merge 5 commits into
mainfrom
feature/autolayout-improvements

09hanvaik commented Nov 20, 2025

Uh oh!

claude Bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

claude Bot commented Nov 20, 2025

Uh oh!

codecov Bot commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

09hanvaik commented Nov 20, 2025

Uh oh!

claude Bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Summary

Uh oh!

claude Bot commented Nov 20, 2025

Code Review Summary

✅ Strengths

Algorithm Implementation

Code Quality

Test Coverage

Architecture

🔍 Issues Found

1. Code Quality - Consolidation Opportunity (Minor)

2. Potential Bug - Spacing Inconsistency (Medium)

3. Test Quality - Missing Edge Case (Minor)

4. Code Quality - Magic Number Still Present (Minor)

5. Documentation - Algorithm Complexity (Minor)

🔒 Security

⚡ Performance

🧪 Test Coverage

📋 Best Practices Compliance

🎯 Recommendations

High Priority

Medium Priority

Low Priority (Nice-to-have)

✨ Conclusion

Uh oh!

codecov Bot commented Nov 20, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Nov 20, 2025 •

edited

Loading