Skip to content

Performance regression in "spellcheck" string processing benchmark #50458

Closed
@KristofferC

Description

@KristofferC

The benchmark at https://github.com/JuliaCI/BaseBenchmarks.jl/blob/master/src/problem/SpellCheck.jl has a ~4x regression vs 1.9. A repro that can be copy pasted is

using Downloads
spellcheck = "https://raw.githubusercontent.com/JuliaCI/BaseBenchmarks.jl/master/src/problem/data/norvig_spellcheck.txt"
mkpath("data")
Downloads.download(spellcheck, joinpath("data", "norvig_spellcheck.txt"))

module ProblemBenchmarks
    using Downloads
    const PROBLEM_DATA_DIR = joinpath(dirname(@__FILE__), "data")
    file = Downloads.download("https://raw.githubusercontent.com/JuliaCI/BaseBenchmarks.jl/master/src/problem/SpellCheck.jl")
    include(file)
end

using BenchmarkTools

@btime ProblemBenchmarks.SpellCheck.perf_spellcheck()

This gives

1.324 s (23983215 allocations: 1.49 GiB) # 1.9
4.900 s (133224596 allocations: 4.76 GiB) # 1.10

Quickly looking at a profile, this looks suspicious:

 13╎    ╎    ╎    ╎    ╎    ╎    ╎   2154 none:0; (::Main.ProblemBenchmarks.SpellCheck.var"#4#9")(::Tuple{Tuple{String, String}, Char})
 18╎    ╎    ╎    ╎    ╎    ╎    ╎    2057 @Base/strings/substring.jl:225; string
489╎    ╎    ╎    ╎    ╎    ╎    ╎     489  @Base/strings/substring.jl:229; _string(::String, ::Vararg{Union{Char, SubString{String}, String, Symbol}})
156╎    ╎    ╎    ╎    ╎    ╎    ╎     156  @Base/strings/substring.jl:231; _string(::String, ::Vararg{Union{Char, SubString{String}, String, Symbol}})
  3╎    ╎    ╎    ╎    ╎    ╎    ╎     555  @Base/strings/substring.jl:243; _string(::String, ::Vararg{Union{Char, SubString{String}, String, Symbol}})
551╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 552  @Base/tuple.jl:72; iterate(t::Tuple{String, Vararg{Union{Char, SubString{String}, String, Symbol}}}, i::Int64)
165╎    ╎    ╎    ╎    ╎    ╎    ╎     165  @Base/strings/substring.jl:246; _string(::String, ::Vararg{Union{Char, SubString{String}, String, Symbol}})
  1╎    ╎    ╎    ╎    ╎    ╎    ╎     593  @Base/strings/substring.jl:254; _string(::String, ::Vararg{Union{Char, SubString{String}, String, Symbol}})
586╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 592  @Base/tuple.jl:72; iterate(t::Tuple{String, Vararg{Union{Char, SubString{String}, String, Symbol}}}, i::Int64)

In 1.9, there seems to be way less time spent in that part:

   24╎    ╎    ╎    ╎    ╎    ╎    ╎   201   @Base/array.jl:0; (::Main.ProblemBenchmarks.SpellCheck.var"#4#9")(::Tuple{Tuple{String, String}, Char})
     ╎    ╎    ╎    ╎    ╎    ╎    ╎    63    @Base/strings/substring.jl:237; string(::String, ::Char, ::Vararg{Union{Char, SubString{String}, String, Symbol}})
   63╎    ╎    ╎    ╎    ╎    ╎    ╎     63    @Base/strings/string.jl:90; _string_n

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceMust go fasterregressionRegression in behavior compared to a previous versionstrings"Strings!"

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions