Skip to content

Conversation

johnathan79717
Copy link
Contributor

This PR is not intended to be merged, but only serve as testing the performance when the degree is reduced.

Before halving the degree:

----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
construct_proof_ultrahonk_power_of_2/15        386 ms          301 ms            2

════════════════════════════════════════════════════════════════════════════════════════════════════════════
  Benchmark Results
════════════════════════════════════════════════════════════════════════════════════════════════════════════
  ├─ sumcheck.prove                                                                    [1]  17.9 %   169.2ms    (56.40 ms x 3)
    ├─ PartiallyEvaluatedMultivariates constructor                                       [2]  56.0 %             
      ├─ spinning main thread                                                              [3]  19.3 %             
      ├─ do_iterations()                                                                   [3]  14.3 %             
      └─ (other)                                                                           [3]  66.4 %             
    ├─ sumcheck loop                                                                     [2]  33.0 %             
      ├─ compute_univariate_with_row_skipping                                              [3]  51.5 %             
        ├─ NOTE: Shared children. Can add up to > 100%.
        ├─ spinning main thread                                                              [4]  26.8 %             
        ├─ do_iterations()                                                                   [4]  24.4 %             
      ├─ spinning main thread                                                              [3]  13.7 %             
      ├─ do_iterations()                                                                   [3]  7.2  %             
      └─ (other)                                                                           [3]  27.5 %             
    ├─ rest of sumcheck round 1                                                          [2]  3.8  %             
      ├─ spinning main thread                                                              [3]  74.9 %             
      └─ do_iterations()                                                                   [3]  61.0 %             
    ├─ compute_univariate_with_row_skipping                                              [2]  3.7  %             
    └─ GateSeparatorPolynomial::compute_beta_products                                    [2]  1.4  %             
      ├─ do_iterations()                                                                   [3]  41.8 %             
      └─ (other)                                                                           [3]  58.2 %      
----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
construct_proof_ultrahonk_power_of_2/20       2012 ms         1453 ms            1

════════════════════════════════════════════════════════════════════════════════════════════════════════════
  Benchmark Results
════════════════════════════════════════════════════════════════════════════════════════════════════════════
├─ sumcheck.prove                                                                    [1]  19.6 %   289.1ms   
    ├─ sumcheck loop                                                                     [2]  56.1 %   162.2ms    (8.54 ms x 19)
      ├─ spinning main thread                                                              [3]  47.1 %             
      ├─ compute_univariate_with_row_skipping                                              [3]  47.0 %             
        ├─ NOTE: Shared children. Can add up to > 100%.
        ├─ do_iterations()                                                                   [4]  108.8%             
        └─ spinning main thread                                                              [4]  30.2 %             
      └─ do_iterations()                                                                   [3]  30.7 %             
    ├─ compute_univariate_with_row_skipping                                              [2]  15.3 %             
    ├─ PartiallyEvaluatedMultivariates constructor                                       [2]  14.4 %             
      ├─ spinning main thread                                                              [3]  40.3 %             
      ├─ do_iterations()                                                                   [3]  38.0 %             
      └─ (other)                                                                           [3]  21.7 %             
    ├─ rest of sumcheck round 1                                                          [2]  11.6 %             
      ├─ do_iterations()                                                                   [3]  99.5 %             
      └─ spinning main thread                                                              [3]  97.3 %             
    └─ GateSeparatorPolynomial::compute_beta_products                                    [2]  2.3  %             
      ├─ do_iterations()                                                                   [3]  24.5 %             
      └─ (other)                                                                           [3]  75.5 %         

After halving the degree:

----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
construct_proof_ultrahonk_power_of_2/15        386 ms          302 ms            2

════════════════════════════════════════════════════════════════════════════════════════════════════════════
  Benchmark Results
════════════════════════════════════════════════════════════════════════════════════════════════════════════
  ├─ sumcheck.prove                                                                    [1]  16.1 %   156.7ms    (52.24 ms x 3)
    ├─ PartiallyEvaluatedMultivariates constructor                                       [2]  54.0 %             
      ├─ spinning main thread                                                              [3]  14.2 %             
      ├─ do_iterations()                                                                   [3]  12.5 %             
      └─ (other)                                                                           [3]  73.4 %             
    ├─ sumcheck loop                                                                     [2]  35.0 %             
      ├─ compute_univariate_with_row_skipping                                              [3]  49.2 %             
        ├─ NOTE: Shared children. Can add up to > 100%.
        ├─ do_iterations()                                                                   [4]  22.9 %             
        ├─ spinning main thread                                                              [4]  22.3 %             
      ├─ spinning main thread                                                              [3]  10.5 %             
      ├─ do_iterations()                                                                   [3]  7.7  %             
      └─ (other)                                                                           [3]  32.7 %             
    ├─ rest of sumcheck round 1                                                          [2]  4.1  %             
      ├─ spinning main thread                                                              [3]  73.6 %             
      └─ do_iterations()                                                                   [3]  52.4 %             
    ├─ compute_univariate_with_row_skipping                                              [2]  3.9  %             
    └─ GateSeparatorPolynomial::compute_beta_products                                    [2]  1.6  %             
      ├─ do_iterations()                                                                   [3]  23.5 %             
      └─ (other)                                                                           [3]  76.5 %             
----------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations
----------------------------------------------------------------------------------
construct_proof_ultrahonk_power_of_2/20       1851 ms         1413 ms            1

════════════════════════════════════════════════════════════════════════════════════════════════════════════
  Benchmark Results
════════════════════════════════════════════════════════════════════════════════════════════════════════════
  ├─ sumcheck.prove                                                                    [1]  19.0 %   245.4ms   
    ├─ sumcheck loop                                                                     [2]  48.4 %   118.7ms    (6.25 ms x 19)
      ├─ compute_univariate_with_row_skipping                                              [3]  48.7 %             
        ├─ NOTE: Shared children. Can add up to > 100%.
        ├─ do_iterations()                                                                   [4]  112.5%             
        └─ spinning main thread                                                              [4]  27.1 %             
      ├─ spinning main thread                                                              [3]  40.8 %             
      └─ do_iterations()                                                                   [3]  32.3 %             
    ├─ PartiallyEvaluatedMultivariates constructor                                       [2]  19.2 %             
      ├─ do_iterations()                                                                   [3]  33.3 %             
      ├─ spinning main thread                                                              [3]  32.1 %             
      └─ (other)                                                                           [3]  34.6 %             
    ├─ rest of sumcheck round 1                                                          [2]  15.2 %             
      ├─ do_iterations()                                                                   [3]  99.6 %             
      └─ spinning main thread                                                              [3]  98.3 %             
    ├─ compute_univariate_with_row_skipping                                              [2]  14.4 %             
    └─ GateSeparatorPolynomial::compute_beta_products                                    [2]  2.6  %             
      └─ do_iterations()                                                                   [3]  341.8%             

@johnathan79717
Copy link
Contributor Author

Results of running the relations_bench

Benchmark Before (ns) After (ns) Change (%) Max Degree Before Max Degree After Degree Reduction
execute_relation_for_pg_univariates
UltraArithmetic 1682 1358 -19.3% 6 3 50%
DeltaRangeConstraint 2001 1160 -42.0% 6 3 50%
Elliptic 3031 1952 -35.6% 6 3 50%
Memory 3871 2687 -30.6% 6 3 50%
NonNativeField 2524 1880 -25.5% 6 3 50%
LogDerivLookup 2483 1984 -20.1% 5 3 40%
UltraPermutation 3525 2806 -20.4% 6 3 50%
Poseidon2External 3070 1285 -58.1% 7 3 57%
Poseidon2Internal 1790 1111 -37.9% 7 3 57%
execute_relation_for_univariates
UltraArithmetic 1521 1219 -19.9% 6 3 50%
DeltaRangeConstraint 2087 1243 -40.4% 6 3 50%
Elliptic 2685 1651 -38.5% 6 3 50%
Memory 3268 2117 -35.2% 6 3 50%
NonNativeField 2309 1678 -27.3% 6 3 50%
LogDerivLookup 1781 1336 -25.0% 5 3 40%
UltraPermutation 1994 1265 -36.6% 6 3 50%
Poseidon2External 2961 1374 -53.6% 7 3 57%
Poseidon2Internal 1673 1027 -38.6% 7 3 57%

@johnathan79717
Copy link
Contributor Author

Sumcheck.prove Degree Scaling Analysis

Executive Summary

Analysis of performance changes in sumcheck.prove when halving the degrees of relations, comparing benchmarks for construct_proof_ultrahonk_power_of_2/15 and /20.

Performance Data Tables

Circuit Size 2^15

Component Before (ms) After (ms) Change % Change Scales with Degree
sumcheck.prove (total) 120.1 106.2 -13.9 -11.6% Partial
sumcheck loop 59.1 52.4 -6.7 -11.3% Partial
└─ compute_univariate in loop 42.3 35.9 -6.4 -15.1% Partial
    └─ accumulate_relation_univariates 24.8 19.1 -5.7 -23.0% ✅ Yes
    └─ extend_edges 7.3 7.3 0.0 0.0% ❌ No
    └─ batch_over_relations 1.3 0.7 -0.6 -46.2% ✅ Yes
└─ partially_evaluate in loop 14.6 14.8 +0.2 +1.4% ❌ No
compute first univariate 31.4 26.0 -5.4 -17.2% Partial
└─ accumulate_relation_univariates 23.7 18.2 -5.5 -23.2% ✅ Yes
└─ extend_edges 6.9 6.9 0.0 0.0% ❌ No
└─ batch_over_relations 0.1 0.2 +0.1 +100.0% ✅ Yes
PartiallyEvaluatedMultivariates constructor 12.4 12.5 +0.1 +0.8% ❌ No
rest of sumcheck round 1 12.0 11.8 -0.2 -1.7% ❌ No
└─ first partially_evaluate 11.8 11.7 -0.1 -0.8% ❌ No
GateSeparatorPolynomial::compute_beta_products 0.6 0.6 0.0 0.0% ❌ No
extend_edges in compute_virtual_contribution 0.1 0.1 0.0 0.0% ❌ No

Circuit Size 2^20

Component Before (ms) After (ms) Change % Change Scales with Degree
sumcheck.prove (total) 793.0 704.7 -88.3 -11.1% Partial
sumcheck loop 388.3 346.7 -41.6 -10.7% Partial
└─ compute_univariate in loop 290.4 249.1 -41.3 -14.2% Partial
    └─ accumulate_relation_univariates 179.8 127.9 -51.9 -28.9% ✅ Yes
    └─ extend_edges 104.3 111.2 +6.9 +6.6% ❌ No
    └─ batch_over_relations 0.5 0.3 -0.2 -40.0% ✅ Yes
└─ partially_evaluate in loop 97.1 97.1 0.0 0.0% ❌ No
compute first univariate 272.9 226.0 -46.9 -17.2% Partial
└─ accumulate_relation_univariates 177.7 128.9 -48.8 -27.5% ✅ Yes
└─ extend_edges 92.1 94.0 +1.9 +2.1% ❌ No
└─ batch_over_relations 0.0 0.1 +0.1 N/A ✅ Yes
rest of sumcheck round 1 92.7 92.8 +0.1 +0.1% ❌ No
└─ first partially_evaluate 92.7 92.7 0.0 0.0% ❌ No
PartiallyEvaluatedMultivariates constructor 32.8 34.0 +1.2 +3.7% ❌ No
GateSeparatorPolynomial::compute_beta_products 4.8 4.7 -0.1 -2.1% ❌ No
extend_edges in compute_virtual_contribution 0.0 0.0 0.0 0.0% ❌ No

Summary Statistics

Circuit Size Total Time Before Total Time After Improvement Degree-Dependent Work Degree-Independent Work
2^15 120.1ms 106.2ms 11.6% ~19.8ms (18.6%) ~86.4ms (81.4%)
2^20 793.0ms 704.7ms 11.1% ~128.2ms (18.2%) ~576.5ms (81.8%)

Key Findings

Components That Scale with Degree

These components show significant performance improvements when degrees are halved:

1. accumulate_relation_univariates

  • 2^15: 24.8ms → 19.1ms (23% improvement in first univariate)
  • 2^20: 179.8ms → 127.9ms (29% improvement in first univariate)
  • Scaling: Near-linear with degree reduction
  • Reason: Directly processes relation evaluations; fewer degree terms = less computation

2. batch_over_relations

  • 2^15: 1.3ms → 0.7ms (46% improvement)
  • 2^20: 0.5ms → 0.3ms (40% improvement)
  • Scaling: Proportional to degree
  • Reason: Batches univariate contributions; fewer terms to combine

Components That DON'T Scale with Degree

These components show little to no improvement when degrees are halved:

1. extend_edges

  • 2^15: 7.3ms → 7.3ms (0% change in loop)
  • 2^20: 104.3ms → 111.2ms (-7% regression in loop)
  • Scaling: Independent of degree
  • Reason: Only extends from degree 2 to MAX_PARTIAL_RELATION_LENGTH; input is always degree 2 regardless of relation degrees

2. partially_evaluate

  • 2^15: 14.6ms → 14.8ms (-1% change)
  • 2^20: 97.1ms → 97.1ms (0% change)
  • Scaling: Independent of degree
  • Reason: Updates polynomial evaluations at challenges; workload depends on number of polynomials and circuit size, not relation degrees

3. PartiallyEvaluatedMultivariates constructor

  • 2^15: 12.4ms → 12.5ms (-1% change)
  • 2^20: 32.8ms → 34.0ms (-4% change)
  • Scaling: Independent of degree
  • Reason: Memory allocation and initialization based on circuit size, not relation degrees

4. GateSeparatorPolynomial::compute_beta_products

  • 2^15: 0.6ms → 0.6ms (0% change)
  • 2^20: 4.8ms → 4.7ms (2% improvement, within noise)
  • Scaling: Independent of degree
  • Reason: Computes products of gate challenges; depends on circuit structure, not relation degrees

5. extend_edges in compute_virtual_contribution

  • Both sizes: ~0.1ms → ~0.1ms
  • Scaling: Negligible computation regardless of degree

Overall Performance Impact

Circuit Size 2^15

  • Total sumcheck.prove time: 120.1ms → 106.2ms (12% improvement)
  • Degree-dependent work: ~26.1ms → ~19.8ms (24% improvement)
  • Degree-independent work: ~94.0ms (78% of total after optimization)

Circuit Size 2^20

  • Total sumcheck.prove time: 793.0ms → 704.7ms (11% improvement)
  • Degree-dependent work: ~180.3ms → ~128.2ms (29% improvement)
  • Degree-independent work: ~576.5ms (82% of total after optimization)

Bottleneck Analysis

Primary Bottlenecks (Degree-Independent)

  1. extend_edges: 35-45% of compute_univariate time

    • Always extends from degree 2 polynomials
    • Could benefit from specialized handling for low-degree inputs
  2. partially_evaluate: 25-28% of sumcheck loop time

    • Polynomial bookkeeping overhead
    • Potential for memory access optimization
  3. Memory allocation: PartiallyEvaluatedMultivariates constructor

    • One-time cost but significant (4-11% of total)

Secondary Bottlenecks (Partially Degree-Dependent)

  1. accumulate_relation_univariates: Still 50-60% of compute_univariate
    • Improved with degree reduction but remains dominant
    • Further optimization possible through relation-specific handling

Recommendations

Some of these AI recommendations may not make sense but I'll just leave them here

Short-term Optimizations

  1. Optimize extend_edges for degree-2 inputs: Since input is always degree 2, create specialized fast path
  2. Improve partially_evaluate memory access patterns: Potential for better cache utilization
  3. Consider lazy initialization for PartiallyEvaluatedMultivariates

Long-term Architectural Considerations

  1. Separate degree-dependent and independent computations for better parallelization
  2. Investigate alternative polynomial representation that reduces bookkeeping overhead
  3. Profile memory access patterns in partially_evaluate for cache optimization

Conclusion

Halving the relation degrees provides a 11-12% improvement in total sumcheck.prove time, but 78-82% of the computation remains degree-independent. The primary bottlenecks are:

  • Polynomial extension (extend_edges)
  • Evaluation bookkeeping (partially_evaluate)
  • Memory management

These degree-independent operations dominate the runtime and represent the best opportunities for further optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant