Skip to content

Invalid read of size 4 at 0x4F8F2D8: incr_sub_inclusive_time (NYTProf.xs:2242) #227

@jsoref

Description

@jsoref

Reproduced using 71b1fd5

Running on ubuntu jammy arm

Linux lima-rancher-desktop 6.6.71-0-virt #1-Alpine SMP PREEMPT_DYNAMIC 2025-01-10 14:56:02 aarch64 aarch64 aarch64 GNU/Linux
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.5 LTS
Release: 22.04
Codename: jammy

ii  perl           5.34.0-3ubuntu1.3 arm64        Larry Wall's Practical Extraction and Report Language
ii  valgrind       1:3.18.1-1ubuntu2 arm64        instrumentation framework for building dynamic analysis tools
valgrind output
$ NYTPROF=addpid=1 valgrind `which perl` -d -d:NYTProf -T -w -Ilib t/UnknownWordSplitter.t
==14227== Memcheck, a memory error detector
==14227== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==14227== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==14227== Command: /usr/bin/perl -d -d:NYTProf -T -w -Ilib t/UnknownWordSplitter.t
==14227== 
1..63
ok 1 - use CheckSpelling::UnknownWordSplitter;
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10
ok 11
ok 12
ok 13
ok 14
ok 15
ok 16
ok 17
ok 18
ok 19
ok 20
ok 21
ok 22
ok 23
ok 24
ok 25
ok 26
ok 27
ok 28
ok 29
ok 30
ok 31
ok 32
ok 33
ok 34
ok 35
ok 36
ok 37
ok 38
ok 39
ok 40
ok 41
ok 42
ok 43
ok 44
ok 45
ok 46
ok 47
ok 48
ok 49
ok 50
ok 51
ok 52
ok 53
==14227== Invalid read of size 4
==14227==    at 0x4F8F2D8: incr_sub_inclusive_time (NYTProf.xs:2242)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x80fb740 is 38,752 bytes inside a block of size 38,920 free'd
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x21E5CB: Perl_sv_inc (in /usr/bin/perl)
==14227==    by 0x4F8F287: incr_sub_inclusive_time (NYTProf.xs:2226)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x24277B: Perl_pp_regcomp (in /usr/bin/perl)
==14227==    by 0x4F94E0F: pp_subcall_profiler (NYTProf.xs:2719)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
==14227== Invalid read of size 4
==14227==    at 0x4F8F3F4: incr_sub_inclusive_time (NYTProf.xs:2259)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x80fb6cc is 38,636 bytes inside a block of size 38,920 free'd
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x21E5CB: Perl_sv_inc (in /usr/bin/perl)
==14227==    by 0x4F8F287: incr_sub_inclusive_time (NYTProf.xs:2226)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x24277B: Perl_pp_regcomp (in /usr/bin/perl)
==14227==    by 0x4F94E0F: pp_subcall_profiler (NYTProf.xs:2719)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
==14227== Invalid read of size 4
==14227==    at 0x4F8EDB8: subr_entry_destroy (NYTProf.xs:2057)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x80fb6c8 is 38,632 bytes inside a block of size 38,920 free'd
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x21E5CB: Perl_sv_inc (in /usr/bin/perl)
==14227==    by 0x4F8F287: incr_sub_inclusive_time (NYTProf.xs:2226)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x24277B: Perl_pp_regcomp (in /usr/bin/perl)
==14227==    by 0x4F94E0F: pp_subcall_profiler (NYTProf.xs:2719)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
==14227== Invalid read of size 8
==14227==    at 0x4F8EE68: subr_entry_destroy (NYTProf.xs:2069)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x80fb730 is 38,736 bytes inside a block of size 38,920 free'd
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x21E5CB: Perl_sv_inc (in /usr/bin/perl)
==14227==    by 0x4F8F287: incr_sub_inclusive_time (NYTProf.xs:2226)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x24277B: Perl_pp_regcomp (in /usr/bin/perl)
==14227==    by 0x4F94E0F: pp_subcall_profiler (NYTProf.xs:2719)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
==14227== Invalid write of size 8
==14227==    at 0x4F8EE78: subr_entry_destroy (NYTProf.xs:2071)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x80fb730 is 38,736 bytes inside a block of size 38,920 free'd
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x21E5CB: Perl_sv_inc (in /usr/bin/perl)
==14227==    by 0x4F8F287: incr_sub_inclusive_time (NYTProf.xs:2226)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x24277B: Perl_pp_regcomp (in /usr/bin/perl)
==14227==    by 0x4F94E0F: pp_subcall_profiler (NYTProf.xs:2719)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
==14227== Invalid read of size 8
==14227==    at 0x4F8EE7C: subr_entry_destroy (NYTProf.xs:2073)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x80fb758 is 38,776 bytes inside a block of size 38,920 free'd
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x21E5CB: Perl_sv_inc (in /usr/bin/perl)
==14227==    by 0x4F8F287: incr_sub_inclusive_time (NYTProf.xs:2226)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x24277B: Perl_pp_regcomp (in /usr/bin/perl)
==14227==    by 0x4F94E0F: pp_subcall_profiler (NYTProf.xs:2719)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
==14227== Invalid write of size 8
==14227==    at 0x4F8EE8C: subr_entry_destroy (NYTProf.xs:2075)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x80fb758 is 38,776 bytes inside a block of size 38,920 free'd
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x21E5CB: Perl_sv_inc (in /usr/bin/perl)
==14227==    by 0x4F8F287: incr_sub_inclusive_time (NYTProf.xs:2226)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x24277B: Perl_pp_regcomp (in /usr/bin/perl)
==14227==    by 0x4F94E0F: pp_subcall_profiler (NYTProf.xs:2719)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
==14227== Invalid read of size 8
==14227==    at 0x4F8EE94: subr_entry_destroy (NYTProf.xs:2077)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x80fb6d8 is 38,648 bytes inside a block of size 38,920 free'd
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x21E5CB: Perl_sv_inc (in /usr/bin/perl)
==14227==    by 0x4F8F287: incr_sub_inclusive_time (NYTProf.xs:2226)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x486A190: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFF8F: Perl_safesysrealloc (in /usr/bin/perl)
==14227==    by 0x23BAD3: Perl_save_alloc (in /usr/bin/perl)
==14227==    by 0x1E6517: Perl_mg_get (in /usr/bin/perl)
==14227==    by 0x24277B: Perl_pp_regcomp (in /usr/bin/perl)
==14227==    by 0x4F94E0F: pp_subcall_profiler (NYTProf.xs:2719)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
Attempt to free unreferenced scalar: SV 0x7fcac80, Perl interpreter: 0x4b26040 at lib/CheckSpelling/UnknownWordSplitter.pm line 370, <FILE> in /tmp/YlXQd2Buf_ - line 1.
==14227== Invalid write of size 4
==14227==    at 0x4F94A28: memcpy (string_fortified.h:29)
==14227==    by 0x4F94A28: subr_entry_setup (NYTProf.xs:2478)
==14227==    by 0x4F94DE3: pp_subcall_profiler (NYTProf.xs:2709)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x8131690 is 0 bytes inside a block of size 12 free'd
==14227==    at 0x4867AD0: free (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x21417F: Perl_sv_clear (in /usr/bin/perl)
==14227==    by 0x212C9B: Perl_sv_free2 (in /usr/bin/perl)
==14227==    by 0x4F8EE77: subr_entry_destroy (NYTProf.xs:2070)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227==    by 0x20ADD7: Perl_pp_leavesub (in /usr/bin/perl)
==14227==    by 0x4F92023: pp_leave_profiler (NYTProf.xs:2943)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x177FCF: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x4865058: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFDBF: Perl_safesysmalloc (in /usr/bin/perl)
==14227==    by 0x20B1AF: Perl_sv_grow (in /usr/bin/perl)
==14227==    by 0x21CD37: Perl_sv_setpvn (in /usr/bin/perl)
==14227==    by 0x4F94B3F: subr_entry_setup (NYTProf.xs:2567)
==14227==    by 0x4F94DE3: pp_subcall_profiler (NYTProf.xs:2709)
==14227==    by 0x178343: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==    by 0x1E288B: Perl_warner (in /usr/bin/perl)
==14227==    by 0x4F8EE8B: subr_entry_destroy (NYTProf.xs:2074)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227== 
==14227== Invalid write of size 1
==14227==    at 0x4F94A2C: memcpy (string_fortified.h:29)
==14227==    by 0x4F94A2C: subr_entry_setup (NYTProf.xs:2478)
==14227==    by 0x4F94DE3: pp_subcall_profiler (NYTProf.xs:2709)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x8131694 is 4 bytes inside a block of size 12 free'd
==14227==    at 0x4867AD0: free (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x21417F: Perl_sv_clear (in /usr/bin/perl)
==14227==    by 0x212C9B: Perl_sv_free2 (in /usr/bin/perl)
==14227==    by 0x4F8EE77: subr_entry_destroy (NYTProf.xs:2070)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227==    by 0x20ADD7: Perl_pp_leavesub (in /usr/bin/perl)
==14227==    by 0x4F92023: pp_leave_profiler (NYTProf.xs:2943)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x177FCF: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x4865058: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFDBF: Perl_safesysmalloc (in /usr/bin/perl)
==14227==    by 0x20B1AF: Perl_sv_grow (in /usr/bin/perl)
==14227==    by 0x21CD37: Perl_sv_setpvn (in /usr/bin/perl)
==14227==    by 0x4F94B3F: subr_entry_setup (NYTProf.xs:2567)
==14227==    by 0x4F94DE3: pp_subcall_profiler (NYTProf.xs:2709)
==14227==    by 0x178343: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==    by 0x1E288B: Perl_warner (in /usr/bin/perl)
==14227==    by 0x4F8EE8B: subr_entry_destroy (NYTProf.xs:2074)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227== 
==14227== Invalid write of size 1
==14227==    at 0x486D0C4: __GI_memcpy (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x4F94A43: memcpy (string_fortified.h:29)
==14227==    by 0x4F94A43: subr_entry_setup (NYTProf.xs:2479)
==14227==    by 0x4F94DE3: pp_subcall_profiler (NYTProf.xs:2709)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x8131695 is 5 bytes inside a block of size 12 free'd
==14227==    at 0x4867AD0: free (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x21417F: Perl_sv_clear (in /usr/bin/perl)
==14227==    by 0x212C9B: Perl_sv_free2 (in /usr/bin/perl)
==14227==    by 0x4F8EE77: subr_entry_destroy (NYTProf.xs:2070)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227==    by 0x20ADD7: Perl_pp_leavesub (in /usr/bin/perl)
==14227==    by 0x4F92023: pp_leave_profiler (NYTProf.xs:2943)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x177FCF: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x4865058: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFDBF: Perl_safesysmalloc (in /usr/bin/perl)
==14227==    by 0x20B1AF: Perl_sv_grow (in /usr/bin/perl)
==14227==    by 0x21CD37: Perl_sv_setpvn (in /usr/bin/perl)
==14227==    by 0x4F94B3F: subr_entry_setup (NYTProf.xs:2567)
==14227==    by 0x4F94DE3: pp_subcall_profiler (NYTProf.xs:2709)
==14227==    by 0x178343: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==    by 0x1E288B: Perl_warner (in /usr/bin/perl)
==14227==    by 0x4F8EE8B: subr_entry_destroy (NYTProf.xs:2074)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227== 
==14227== Invalid read of size 1
==14227==    at 0x48700D8: __memcpy_chk (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x4F8F0EF: memcpy (string_fortified.h:29)
==14227==    by 0x4F8F0EF: incr_sub_inclusive_time (NYTProf.xs:2170)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x813169c is 0 bytes after a block of size 12 free'd
==14227==    at 0x4867AD0: free (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x21417F: Perl_sv_clear (in /usr/bin/perl)
==14227==    by 0x212C9B: Perl_sv_free2 (in /usr/bin/perl)
==14227==    by 0x4F8EE77: subr_entry_destroy (NYTProf.xs:2070)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227==    by 0x20ADD7: Perl_pp_leavesub (in /usr/bin/perl)
==14227==    by 0x4F92023: pp_leave_profiler (NYTProf.xs:2943)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x177FCF: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x4865058: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFDBF: Perl_safesysmalloc (in /usr/bin/perl)
==14227==    by 0x20B1AF: Perl_sv_grow (in /usr/bin/perl)
==14227==    by 0x21CD37: Perl_sv_setpvn (in /usr/bin/perl)
==14227==    by 0x4F94B3F: subr_entry_setup (NYTProf.xs:2567)
==14227==    by 0x4F94DE3: pp_subcall_profiler (NYTProf.xs:2709)
==14227==    by 0x178343: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==    by 0x1E288B: Perl_warner (in /usr/bin/perl)
==14227==    by 0x4F8EE8B: subr_entry_destroy (NYTProf.xs:2074)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227== 
==14227== Invalid read of size 1
==14227==    at 0x48700E8: __memcpy_chk (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x4F8F0EF: memcpy (string_fortified.h:29)
==14227==    by 0x4F8F0EF: incr_sub_inclusive_time (NYTProf.xs:2170)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0x813169a is 10 bytes inside a block of size 12 free'd
==14227==    at 0x4867AD0: free (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x21417F: Perl_sv_clear (in /usr/bin/perl)
==14227==    by 0x212C9B: Perl_sv_free2 (in /usr/bin/perl)
==14227==    by 0x4F8EE77: subr_entry_destroy (NYTProf.xs:2070)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227==    by 0x20ADD7: Perl_pp_leavesub (in /usr/bin/perl)
==14227==    by 0x4F92023: pp_leave_profiler (NYTProf.xs:2943)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x177FCF: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==  Block was alloc'd at
==14227==    at 0x4865058: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-arm64-linux.so)
==14227==    by 0x1DFDBF: Perl_safesysmalloc (in /usr/bin/perl)
==14227==    by 0x20B1AF: Perl_sv_grow (in /usr/bin/perl)
==14227==    by 0x21CD37: Perl_sv_setpvn (in /usr/bin/perl)
==14227==    by 0x4F94B3F: subr_entry_setup (NYTProf.xs:2567)
==14227==    by 0x4F94DE3: pp_subcall_profiler (NYTProf.xs:2709)
==14227==    by 0x178343: Perl_call_sv (in /usr/bin/perl)
==14227==    by 0x1E1873: ??? (in /usr/bin/perl)
==14227==    by 0x1E2237: Perl_vwarner (in /usr/bin/perl)
==14227==    by 0x1E288B: Perl_warner (in /usr/bin/perl)
==14227==    by 0x4F8EE8B: subr_entry_destroy (NYTProf.xs:2074)
==14227==    by 0x23FD0B: Perl_leave_scope (in /usr/bin/perl)
==14227== 
==14227== Use of uninitialised value of size 8
==14227==    at 0x212DA0: ??? (in /usr/bin/perl)
==14227==    by 0x213C5B: Perl_sv_clear (in /usr/bin/perl)
==14227==    by 0x212C9B: Perl_sv_free2 (in /usr/bin/perl)
==14227==    by 0x4F8EE8B: subr_entry_destroy (NYTProf.xs:2074)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227== 
==14227== Invalid read of size 1
==14227==    at 0x212DA0: ??? (in /usr/bin/perl)
==14227==    by 0x213C5B: Perl_sv_clear (in /usr/bin/perl)
==14227==    by 0x212C9B: Perl_sv_free2 (in /usr/bin/perl)
==14227==    by 0x4F8EE8B: subr_entry_destroy (NYTProf.xs:2074)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  Address 0xff00000012 is not stack'd, malloc'd or (recently) free'd
==14227== 
==14227== 
==14227== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==14227==  Access not within mapped region at address 0xFF00000012
==14227==    at 0x212DA0: ??? (in /usr/bin/perl)
==14227==    by 0x213C5B: Perl_sv_clear (in /usr/bin/perl)
==14227==    by 0x212C9B: Perl_sv_free2 (in /usr/bin/perl)
==14227==    by 0x4F8EE8B: subr_entry_destroy (NYTProf.xs:2074)
==14227==    by 0x4F8F40F: incr_sub_inclusive_time (NYTProf.xs:2262)
==14227==    by 0x4F9531F: pp_subcall_profiler (NYTProf.xs:2915)
==14227==    by 0x1F8613: Perl_runops_standard (in /usr/bin/perl)
==14227==    by 0x179A87: perl_run (in /usr/bin/perl)
==14227==    by 0x151F37: main (in /usr/bin/perl)
==14227==  If you believe this happened as a result of a stack
==14227==  overflow in your program's main thread (unlikely but
==14227==  possible), you can try to increase the size of the
==14227==  main thread stack using the --main-stacksize= flag.
==14227==  The main thread stack size used in this run was 8388608.
==14227== 
==14227== HEAP SUMMARY:
==14227==     in use at exit: 16,987,198 bytes in 84,617 blocks
==14227==   total heap usage: 246,803 allocs, 162,186 frees, 36,524,787 bytes allocated
==14227== 
==14227== LEAK SUMMARY:
==14227==    definitely lost: 0 bytes in 0 blocks
==14227==    indirectly lost: 0 bytes in 0 blocks
==14227==      possibly lost: 7,330,244 bytes in 17,063 blocks
==14227==    still reachable: 9,656,954 bytes in 67,554 blocks
==14227==                       of which reachable via heuristic:
==14227==                         newarray           : 53,728 bytes in 1,640 blocks
==14227==         suppressed: 0 bytes in 0 blocks
==14227== Rerun with --leak-check=full to see details of leaked memory
==14227== 
==14227== Use --track-origins=yes to see where uninitialised values come from
==14227== For lists of detected and suppressed errors, rerun with: -s
==14227== ERROR SUMMARY: 33 errors from 15 contexts (suppressed: 0 from 0)
Segmentation fault
t/UnknownWordSplitter.t
#!/usr/bin/env -S perl -T -w -Ilib

use 5.022;
use feature 'unicode_strings';
use strict;
use warnings;
use Encode qw/decode_utf8 FB_DEFAULT/;
use Cwd 'abs_path';
use File::Basename;
use File::Temp qw/ tempfile tempdir /;
use Capture::Tiny ':all';

use Test::More;
plan tests => 63;

use_ok('CheckSpelling::UnknownWordSplitter');

sub read_file {
    my ($file) = @_;
    local $/ = undef;
    my ($content, $output);
    if (open $output, '<:utf8', $file) {
        $content = <$output>;
        close $output;
    }
    return $content;
}

sub check_output_file {
    my ($file, $expected) = @_;
    my $content = read_file($file);
    is($content, $expected);
}

sub sort_lines {
    my ($text) = @_;
    return join "\n", (sort (split /\n/, $text));
}

sub check_output_file_sorted_lines {
    my ($file, $expected) = @_;
    is(sort_lines(read_file($file)), sort_lines($expected));
}

$ENV{splitter_timeout} = 300000;
my ($fh, $filename) = tempfile();
print $fh "foo
Mooprh
BROADDEPlay

bar";
close $fh;
is(CheckSpelling::UnknownWordSplitter::file_to_re($filename), "(?:foo)|(?:Mooprh)|(?:BROADDEPlay)|(?:bar)");
$CheckSpelling::UnknownWordSplitter::word_match = CheckSpelling::UnknownWordSplitter::valid_word();
is($CheckSpelling::UnknownWordSplitter::word_match, q<(?^u:\b(?:\w|'){3,}\b)>);
$CheckSpelling::UnknownWordSplitter::shortest=100;
$CheckSpelling::UnknownWordSplitter::longest="";
CheckSpelling::UnknownWordSplitter::load_dictionary($filename);
is(scalar %CheckSpelling::UnknownWordSplitter::dictionary, 4);
is($CheckSpelling::UnknownWordSplitter::shortest, 3);
is($CheckSpelling::UnknownWordSplitter::longest, 13);
is($CheckSpelling::UnknownWordSplitter::word_match, q<(?^u:\b(?:[A-Z]|[a-z]|'){3,13}\b)>);
$ENV{'INPUT_LONGEST_WORD'} = 5;
$ENV{'INPUT_SHORTEST_WORD'} = '';
CheckSpelling::UnknownWordSplitter::load_dictionary($filename);
is(scalar %CheckSpelling::UnknownWordSplitter::dictionary, 4);
is($CheckSpelling::UnknownWordSplitter::word_match, '(?^u:\b(?:[A-Z]|[a-z]|\'){3,5}\b)');
my $directory = tempdir();
open $fh, '>:utf8', "$directory/words";
print $fh 'bar
foo
';
close $fh;
my $output_dir;
my $dirname = tempdir();
CheckSpelling::UnknownWordSplitter::init($dirname);

open $fh, '>', "$dirname/forbidden.txt";
print $fh '# forbidden
# donut
\bdonut\b

# Flag duplicated "words"
\s([A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,})\s\g{-1}\s
';
close $fh;
%CheckSpelling::UnknownWordSplitter::dictionary = ();
my $output_directory;
open(my $outputFH, '>', \$output_directory) or die; # This shouldn't fail
my $oldFH = select $outputFH;
CheckSpelling::UnknownWordSplitter::main($directory, ($filename));
select $oldFH;
ok($output_directory =~ /.*\n/);
chomp($output_directory);
ok(-d $output_directory);
check_output_file("$output_directory/name", $filename);
check_output_file("$output_directory/stats", '{words: 2, unrecognized: 1, unknown: 1, unique: 2}');
check_output_file("$output_directory/unknown", 'Play
');
check_output_file("$output_directory/warnings", ":3:8 ... 12: `Play`
");
open $fh, '>:utf8', $filename;
print $fh ("bar "x1000)."\n\n";
close $fh;
$output_dir=CheckSpelling::UnknownWordSplitter::split_file($filename);
check_output_file("$output_dir/skipped", 'average line width (4001) exceeds the threshold (1000). (minified-file)
');
open $fh, '>:utf8', $filename;
print $fh "FooBar baz Bar elf baz bar supercalifragelisticexpialidocious
FooBarBar
";
close $fh;
$CheckSpelling::UnknownWordSplitter::forbidden_re='FooBarBar';
$output_dir=CheckSpelling::UnknownWordSplitter::split_file($filename);
$CheckSpelling::UnknownWordSplitter::forbidden_re='$^';
check_output_file("$output_dir/name", $filename);
check_output_file("$output_dir/stats", '{words: 4, unrecognized: 3, unknown: 2, unique: 2}');
check_output_file_sorted_lines("$output_dir/warnings", ":1:8 ... 11: `baz`
:1:20 ... 23: `baz`
:1:16 ... 19: `elf`
:2:1 ... 10, Warning - `FooBarBar` matches a line_forbidden.patterns entry. (forbidden-pattern)
");
check_output_file("$output_dir/unknown", 'baz
elf
');

$CheckSpelling::UnknownWordSplitter::largest_file = 1;
$output_dir=CheckSpelling::UnknownWordSplitter::split_file($filename);
$CheckSpelling::UnknownWordSplitter::forbidden_re='$^';
check_output_file("$output_dir/name", $filename);
check_output_file("$output_dir/stats", undef);
check_output_file("$output_dir/skipped", "size `72` exceeds limit `1`. (large-file)
");
$CheckSpelling::UnknownWordSplitter::largest_file = 1000000;
$CheckSpelling::UnknownWordSplitter::patterns_re = 'i.';
$ENV{'INPUT_LONGEST_WORD'} = 8;
CheckSpelling::UnknownWordSplitter::load_dictionary($filename);
is(scalar %CheckSpelling::UnknownWordSplitter::dictionary, 1);
$output_dir=CheckSpelling::UnknownWordSplitter::split_file($filename);
check_output_file("$output_dir/name", $filename);
check_output_file("$output_dir/stats", '{words: 0, unrecognized: 13, unknown: 8, unique: 0}');
check_output_file_sorted_lines("$output_dir/warnings", ":1:1 ... 4: `Foo`
:1:12 ... 15: `Bar`
:1:16 ... 19: `elf`
:1:20 ... 23: `baz`
:1:24 ... 27: `bar`
:1:28 ... 36: 'supercal'
:1:38 ... 43: 'ragel'
:1:4 ... 7: `Bar`
:1:48 ... 51: 'exp'
:1:8 ... 11: `baz`
:2:1 ... 4: `Foo`
:2:4 ... 7: `Bar`
:2:7 ... 10: `Bar`
");
check_output_file("$output_dir/unknown", 'Bar
bar
baz
elf
exp
Foo
ragel
supercal
');
$CheckSpelling::UnknownWordSplitter::patterns_re = '$^';

close $fh;
open $fh, '>', "$dirname/words";
print $fh 'apple
banana
cherry
donut
egg
fruit
grape
';
close $fh;
CheckSpelling::UnknownWordSplitter::init($dirname);
($fh, $filename) = tempfile();
print $fh "banana cherry
cherry fruit fruit egg
fruit donut grape donut banana
egg \xE2\x80\x99ham
grape
";
close $fh;
$output_dir=CheckSpelling::UnknownWordSplitter::split_file($filename);
check_output_file("$output_dir/name", $filename);
check_output_file("$output_dir/stats", '{words: 9, unrecognized: 1, unknown: 1, unique: 5, forbidden: [2,1], forbidden_lines: [3:7:12,2:7:20]}');
check_output_file_sorted_lines("$output_dir/warnings", ":2:7 ... 20, Warning - ` fruit fruit ` matches a line_forbidden.patterns entry: `\\s([A-Z]{3,}|[A-Z][a-z]{2,}|[a-z]{3,})\\s\\g{-1}\\s`. (forbidden-pattern)
:3:19 ... 24, Warning - `donut` matches a line_forbidden.patterns entry: `\\bdonut\\b`. (forbidden-pattern)
:3:7 ... 12, Warning - `donut` matches a line_forbidden.patterns entry: `\\bdonut\\b`. (forbidden-pattern)
:4:6 ... 9: `ham`
");
check_output_file("$output_dir/unknown", 'ham
');
open $fh, '>', "$dirname/candidates.txt";
print $fh '# grape
grape

# pig
ham

';
close $fh;
unlink("$dirname/forbidden.txt");
CheckSpelling::UnknownWordSplitter::init($dirname);
open($outputFH, '>', \$output_directory) or die; # This shouldn't fail
$oldFH = select $outputFH;
CheckSpelling::UnknownWordSplitter::main($directory, ($filename));
select $oldFH;
ok($output_directory =~ /.*\n/);
chomp($output_directory);
ok(-d $output_directory);
check_output_file("$output_directory/stats", '{words: 13, unrecognized: 1, unknown: 1, unique: 6, candidates: [0,1], candidate_lines: [0,4:6:9], forbidden: [0,0], forbidden_lines: [0,0]}');
check_output_file_sorted_lines("$output_directory/warnings", ":4:6 ... 9: `ham`
");
check_output_file("$output_directory/unknown", 'ham
');

open $fh, '>', "$dirname/block-delimiters.list";
print $fh '# test
fruit
donut
';
close $fh;

CheckSpelling::UnknownWordSplitter::init($dirname);
open($outputFH, '>', \$output_directory) or die; # This shouldn't fail
$oldFH = select $outputFH;
CheckSpelling::UnknownWordSplitter::main($directory, ($filename));
select $oldFH;
ok($output_directory =~ /.*\n/);
chomp($output_directory);
ok(-d $output_directory);
check_output_file("$output_directory/name", $filename);
check_output_file("$output_directory/stats", '{words: 4, unrecognized: 1, unknown: 1, unique: 4, candidates: [0,1], candidate_lines: [0,4:6:9], forbidden: [0,0], forbidden_lines: [0,0]}');
check_output_file_sorted_lines("$output_directory/warnings", ":4:6 ... 9: `ham`");
check_output_file("$output_directory/unknown", 'ham
');

open $fh, '>', "$dirname/block-delimiters.list";
print $fh '# test
fruit
missing
';
close $fh;

CheckSpelling::UnknownWordSplitter::init($dirname);
open($outputFH, '>', \$output_directory) or die; # This shouldn't fail
$oldFH = select $outputFH;
CheckSpelling::UnknownWordSplitter::main($directory, ($filename));
select $oldFH;
ok($output_directory =~ /.*\n/);
chomp($output_directory);
ok(-d $output_directory);
check_output_file("$output_directory/name", $filename);
check_output_file("$output_directory/stats", '{words: 2, unrecognized: 0, unknown: 0, unique: 2, candidates: [0,0], candidate_lines: [0,0], forbidden: [0,0], forbidden_lines: [0,0]}');
check_output_file_sorted_lines("$output_directory/warnings", ":2:1 ... 1, Warning - failed to find matching end marker for `fruit` (unclosed-block-ignore-begin)
:5:1 ... 1, Warning - expected to find end block marker `missing` (unclosed-block-ignore-end)");
check_output_file("$output_directory/unknown", '');

open $fh, '>', "$dirname/block-delimiters.list";
print $fh '# test
fruit
';
close $fh;

($fh, $filename) = tempfile();
close $fh;
my $early_warnings = $filename;
$ENV{early_warnings} = $early_warnings;
my ($stdout, $stderr, @result) = capture { CheckSpelling::UnknownWordSplitter::init($dirname); };
my $warnings = $stderr;
check_output_file($early_warnings, "$dirname/block-delimiters.list:1:Block delimiters must come in pairs (uneven-block-delimiters)
");
is($warnings, "block-delimiter unmatched S: `fruit`
");

$dirname = tempdir();
($fh, $filename) = tempfile();
close $fh;
$ENV{PATH}='/usr/bin';
$ENV{INPUT_USE_MAGIC_FILE}=1;
CheckSpelling::UnknownWordSplitter::init($dirname);
$output_dir=CheckSpelling::UnknownWordSplitter::split_file($filename);
check_output_file("$output_dir/skipped", "appears to be a binary file ('inode/x-empty'). (binary-file)
");

$dirname = tempdir();
($fh, $filename) = tempfile();
print $fh "\x00"x5;
close $fh;
CheckSpelling::UnknownWordSplitter::init($dirname);
$CheckSpelling::UnknownWordSplitter::INPUT_LARGEST_FILE = 0;
$CheckSpelling::UnknownWordSplitter::INPUT_LARGEST_FILE = undef;
$output_dir=CheckSpelling::UnknownWordSplitter::split_file($filename);
check_output_file("$output_dir/skipped", "appears to be a binary file ('application/octet-stream'). (binary-file)
");

my $hunspell_dictionary_path = tempdir();
$ENV{'hunspell_dictionary_path'} = $hunspell_dictionary_path;
open $fh, '>', "$hunspell_dictionary_path/test.dic";
close $fh;
open $fh, '>', "$hunspell_dictionary_path/test.aff";
close $fh;

$dirname = tempdir();
open $fh, '>:encoding(UTF-8)', "$directory/words";
print $fh "bar
fo'od
gunz
";
close $fh;
#$ENV{'dict'} = "$directory/words";
CheckSpelling::UnknownWordSplitter::init($dirname);
($fh, $filename) = tempfile();
print $fh "bar
gunz
foad
fooo'd
fo'od
fa'ad
";
close $fh;
$output_dir=CheckSpelling::UnknownWordSplitter::split_file($filename);
lib/Check-Spelling/UnknownWordSplitter.pm
#! -*-perl-*-

# ~/bin/w
# Search for potentially misspelled words
# Output is:
# misspellled
# woord (WOORD, Woord, woord, woord's)
package CheckSpelling::UnknownWordSplitter;

use 5.022;
use feature 'unicode_strings';
use strict;
use warnings;
no warnings qw(experimental::vlb);
use Encode qw/decode_utf8 encode FB_DEFAULT/;
use File::Basename;
use Cwd 'abs_path';
use File::Temp qw/ tempfile tempdir /;
use CheckSpelling::Util;
our $VERSION='0.1.0';

my ($longest_word, $shortest_word, $word_match, $forbidden_re, $patterns_re, $candidates_re, $disable_word_collating, $check_file_names);
my $begin_block_re = '';
my @begin_block_list = ();
my @end_block_list = ();
my ($ignore_pattern, $upper_pattern, $lower_pattern, $not_lower_pattern, $not_upper_or_lower_pattern, $punctuation_pattern);
my ($shortest, $longest) = (255, 0);
my @forbidden_re_list;
my @candidates_re_list;
my $hunspell_dictionary_path;
my @hunspell_dictionaries;
my %dictionary = ();
my $base_dict;
my %unique;
my %unique_unrecognized;
my ($last_file, $words, $unrecognized) = ('', 0, 0);

my $disable_flags;

sub test_re {
  my ($expression) = @_;
  return eval { qr /$expression/ };
}

sub quote_re {
  my ($expression) = @_;
  return $expression if $expression =~ /\?\{/;
  $expression =~ s/
   \G
   (
      (?:[^\\]|\\[^Q])*
   )
   (?:
      \\Q
      (?:[^\\]|\\[^E])*
      (?:\\E)?
   )?
/
   $1 . (defined($2) ? quotemeta($2) : '')
/xge;
  return $expression;
}

sub file_to_list {
  my ($re) = @_;
  my @file;
  return @file unless open(FILE, '<:utf8', $re);

  local $/=undef;
  my $file=<FILE>;
  close FILE;
  my $line_number = 0;
  for (split /\R/, $file) {
    ++$line_number;
    next if /^#/;
    chomp;
    next unless s/^(.+)/(?:$1)/;
    my $quoted = quote_re($1);
    if (test_re $quoted) {
      push @file, $_;
    } else {
      my $error = $@;
      my $home = dirname(__FILE__);
      $error =~ s/$home.*?\.pm line \d+\./$re line $line_number (bad-regular-expression)/;
      print STDERR $error;
      push @file, '(?:\$^ - skipped because bad-regular-expression)';
    }
  }

  return @file;
}

sub list_to_re {
  my (@list) = @_;
  @list = map { my $quoted = quote_re($_); test_re($quoted) ? $quoted : '' } @list;
  @list = grep { $_ ne '' } @list;
  return '$^' unless scalar @list;
  return join "|", (@list);
}

sub file_to_re {
  my ($re) = @_;
  return list_to_re(file_to_list($re));
}

sub not_empty {
  my ($thing) = @_;
  return defined $thing && $thing ne ''
}

sub parse_block_list {
  my ($re) = @_;
  my @file;
  return @file unless (open(FILE, '<:utf8', $re));

  local $/=undef;
  my $file=<FILE>;
  my $last_line = $.;
  close FILE;
  for (split /\R/, $file) {
    next if /^#/;
    chomp;
    s/^\\#/#/;
    next unless /^./;
    push @file, $_;
  }

  my $pairs = (0+@file) / 2;
  my $true_pairs = $pairs | 0;
  unless ($pairs == $true_pairs) {
    my $early_warnings = CheckSpelling::Util::get_file_from_env('early_warnings', '/dev/null');
    open EARLY_WARNINGS, ">>:encoding(UTF-8)", $early_warnings;
    print EARLY_WARNINGS "$re:$last_line:Block delimiters must come in pairs (uneven-block-delimiters)\n";
    close EARLY_WARNINGS;
    my $i = 0;
    while ($i < $true_pairs) {
      print STDERR "block-delimiter $i S: $file[$i*2]\n";
      print STDERR "block-delimiter $i E: $file[$i*2+1]\n";
      $i++;
    }
    print STDERR "block-delimiter unmatched S: `$file[$i*2]`\n";
    @file = ();
  }

  return @file;
}

sub valid_word {
  # shortest_word is an absolute
  our ($shortest, $longest, $shortest_word, $longest_word);
  $shortest = $shortest_word if $shortest_word;
  if ($longest_word) {
    # longest_word is an absolute
    $longest = $longest_word;
  } elsif (not_empty($longest)) {
    # we allow for some sloppiness (a couple of stuck keys per word)
    # it's possible that this should scale with word length
    $longest += 2;
  }
  our ($upper_pattern, $lower_pattern, $punctuation_pattern);
  my $word_pattern = join '|', (grep { defined $_ && /./ } ($upper_pattern, $lower_pattern, $punctuation_pattern));
  $word_pattern = q<\\w|'> unless $word_pattern;
  if ((defined $shortest && not_empty($longest)) &&
      ($shortest > $longest)) {
    $word_pattern = "(?:$word_pattern){3}";
    return qr/$word_pattern/;
  }
  $shortest = 3 unless defined $shortest;
  $longest = '' unless defined $longest;
  $word_match = "(?:$word_pattern){$shortest,$longest}";
  return qr/\b$word_match\b/;
}

sub load_dictionary {
  my ($dict) = @_;
  our ($word_match, $longest, $shortest, $longest_word, $shortest_word, %dictionary);
  $longest_word = CheckSpelling::Util::get_val_from_env('INPUT_LONGEST_WORD', undef);
  $shortest_word = CheckSpelling::Util::get_val_from_env('INPUT_SHORTEST_WORD', undef);
  our ($ignore_pattern, $upper_pattern, $lower_pattern, $not_lower_pattern, $not_upper_or_lower_pattern, $punctuation_pattern);
  $ignore_pattern = CheckSpelling::Util::get_file_from_env_utf8('INPUT_IGNORE_PATTERN', q<[^a-zA-Z']>);
  $upper_pattern = CheckSpelling::Util::get_file_from_env_utf8('INPUT_UPPER_PATTERN', '[A-Z]');
  $lower_pattern = CheckSpelling::Util::get_file_from_env_utf8('INPUT_LOWER_PATTERN', '[a-z]');
  $not_lower_pattern = CheckSpelling::Util::get_file_from_env_utf8('INPUT_NOT_LOWER_PATTERN', '[^a-z]');
  $not_upper_or_lower_pattern = CheckSpelling::Util::get_file_from_env_utf8('INPUT_NOT_UPPER_OR_LOWER_PATTERN', '[^A-Za-z]');
  $punctuation_pattern = CheckSpelling::Util::get_file_from_env_utf8('INPUT_PUNCTUATION_PATTERN', q<'>);
  %dictionary = ();

  open(DICT, '<:utf8', $dict);
  while (!eof(DICT)) {
    my $word = <DICT>;
    chomp $word;
    next unless $word =~ $word_match;
    my $l = length $word;
    $longest = -1 unless not_empty($longest);
    $longest = $l if $l > $longest;
    $shortest = $l if $l < $shortest;
    $dictionary{$word}=1;
  }
  close DICT;

  $word_match = valid_word();
}

sub hunspell_dictionary {
  my ($dict) = @_;
  my $name = $dict;
  $name =~ s{/src/index/hunspell/index\.dic$}{};
  $name =~ s{.*/}{};
  my $aff = $dict;
  my $encoding;
  $aff =~ s/\.dic$/.aff/;
  if (open AFF, '<', $aff) {
    while (<AFF>) {
      next unless /^SET\s+(\S+)/;
      $encoding = $1 if ($1 !~ /utf-8/i);
      last;
    }
    close AFF;
  }
  return {
    name => $name,
    dict => $dict,
    aff => $aff,
    encoding => $encoding,
    engine => Text::Hunspell->new($aff, $dict),
  }
}

sub init {
  my ($configuration) = @_;
  our ($word_match, %unique, $patterns_re, @forbidden_re_list, $forbidden_re, @candidates_re_list, $candidates_re);
  our ($begin_block_re, @begin_block_list, @end_block_list);
  our $sandbox = CheckSpelling::Util::get_file_from_env('sandbox', '');
  our $hunspell_dictionary_path = CheckSpelling::Util::get_file_from_env('hunspell_dictionary_path', '');
  our $timeout = CheckSpelling::Util::get_val_from_env('splitter_timeout', 30);
  if ($hunspell_dictionary_path) {
    our @hunspell_dictionaries = ();
    if (eval 'use Text::Hunspell; 1') {
      my @hunspell_dictionaries_list = glob("$hunspell_dictionary_path/*.dic");
      for my $hunspell_dictionary_file (@hunspell_dictionaries_list) {
        push @hunspell_dictionaries, hunspell_dictionary($hunspell_dictionary_file);
      }
    } else {
      print STDERR "Could not load Text::Hunspell for dictionaries (hunspell-unavailable)\n";
    }
  }

  if (-e "$configuration/block-delimiters.list") {
    my @block_delimiters = parse_block_list "$configuration/block-delimiters.list";
    if (@block_delimiters) {
      @begin_block_list = ();
      @end_block_list = ();

      while (@block_delimiters) {
        my ($begin, $end) = splice @block_delimiters, 0, 2;
        push @begin_block_list, $begin;
        push @end_block_list, $end;
      }

      $begin_block_re = join '|', (map { '('.quote_re("\Q$_\E").')' } @begin_block_list);
    }
  }

  my (@patterns_re_list, %in_patterns_re_list);
  if (-e "$configuration/patterns.txt") {
    @patterns_re_list = file_to_list "$configuration/patterns.txt";
    $patterns_re = list_to_re @patterns_re_list;
    %in_patterns_re_list = map {$_ => 1} @patterns_re_list;
  } else {
    $patterns_re = undef;
  }

  if (-e "$configuration/forbidden.txt") {
    @forbidden_re_list = file_to_list "$configuration/forbidden.txt";
    $forbidden_re = list_to_re @forbidden_re_list;
  } else {
    $forbidden_re = undef;
  }

  if (-e "$configuration/candidates.txt") {
    @candidates_re_list = file_to_list "$configuration/candidates.txt";
    @candidates_re_list = map { my $quoted = quote_re($_); $in_patterns_re_list{$_} || !test_re($quoted) ? '' : $quoted } @candidates_re_list;
    $candidates_re = list_to_re @candidates_re_list;
  } else {
    $candidates_re = undef;
  }

  our $largest_file = CheckSpelling::Util::get_val_from_env('INPUT_LARGEST_FILE', 1024*1024);

  my $disable_flags = CheckSpelling::Util::get_file_from_env('INPUT_DISABLE_CHECKS', '');
  our $disable_word_collating = $disable_flags =~ /(?:^|,|\s)word-collating(?:,|\s|$)/;
  our $disable_minified_file = $disable_flags =~ /(?:^|,|\s)minified-file(?:,|\s|$)/;
  our $disable_single_line_file = $disable_flags =~ /(?:^|,|\s)single-line-file(?:,|\s|$)/;

  our $check_file_names = CheckSpelling::Util::get_file_from_env('check_file_names', '');

  our $use_magic_file = CheckSpelling::Util::get_val_from_env('INPUT_USE_MAGIC_FILE', '');

  $word_match = valid_word();

  our $base_dict = CheckSpelling::Util::get_file_from_env('dict', "$configuration/words");
  $base_dict = '/usr/share/dict/words' unless -e $base_dict;
  load_dictionary($base_dict);
}

sub split_line {
  our (%dictionary, $word_match, $disable_word_collating);
  our ($ignore_pattern, $upper_pattern, $lower_pattern, $not_lower_pattern, $not_upper_or_lower_pattern, $punctuation_pattern);
  our @hunspell_dictionaries;
  our $shortest;
  my $shortest_threshold = $shortest + 2;
  my $pattern = '.';
  # $pattern = "(?:$upper_pattern){$shortest,}|$upper_pattern(?:$lower_pattern){2,}\n";

  # https://www.fileformat.info/info/unicode/char/2019/
  my $rsqm = "\xE2\x80\x99";

  my ($words, $unrecognized) = (0, 0);
  my ($line, $unique_ref, $unique_unrecognized_ref, $unrecognized_line_items_ref) = @_;
    $line =~ s/(?:$rsqm|&apos;|&#39;|\%27|&#8217;|&#x2019;|&rsquo;|\\u2019|\x{2019}|')+/'/g;
    $line =~ s/(?:$ignore_pattern)+/ /g;
    while ($line =~ s/($upper_pattern{2,})($upper_pattern$lower_pattern{2,})/ $1 $2 /g) {}
    while ($line =~ s/((?:$lower_pattern|$punctuation_pattern)+)($upper_pattern)/$1 $2/g) {}
    for my $token (split /\s+/, $line) {
      next unless $token =~ /$pattern/;
      $token =~ s/^(?:'|$rsqm)+//g;
      $token =~ s/(?:'|$rsqm)+s?$//g;
      my $raw_token = $token;
      $token =~ s/^[^Ii]?'+(.*)/$1/;
      $token =~ s/(.*?)'+$/$1/;
      next unless $token =~ $word_match;
      if (defined $dictionary{$token}) {
        ++$words;
        $unique_ref->{$token}=1;
        next;
      }
      if (@hunspell_dictionaries) {
        my $found = 0;
        for my $hunspell_dictionary (@hunspell_dictionaries) {
          my $token_encoded = defined $hunspell_dictionary->{'encoding'} ?
            encode($hunspell_dictionary->{'encoding'}, $token) : $token;
          next unless ($hunspell_dictionary->{'engine'}->check($token_encoded));
          ++$words;
          $dictionary{$token} = 1;
          $unique_ref->{$token}=1;
          $found = 1;
          last;
        }
        next if $found;
      }
      my $key = lc $token;
      if (defined $dictionary{$key}) {
        ++$words;
        $unique_ref->{$key}=1;
        next;
      }
      unless ($disable_word_collating) {
        $key =~ s/''+/'/g;
        $key =~ s/'[sd]$// unless length $key >= $shortest_threshold;
      }
      if (defined $dictionary{$key}) {
        ++$words;
        $unique_ref->{$key}=1;
        next;
      }
      ++$unrecognized;
      $unique_unrecognized_ref->{$raw_token}=1;
      $unrecognized_line_items_ref->{$raw_token}=1;
    }
    return ($words, $unrecognized);
}

sub skip_file {
  my ($temp_dir, $reason) = @_;
  open(SKIPPED, '>:utf8', "$temp_dir/skipped");
  print SKIPPED $reason;
  close SKIPPED;
}

sub split_file {
  my ($file) = @_;
  our (
    $unrecognized, $shortest, $largest_file, $words,
    $word_match, %unique, %unique_unrecognized, $forbidden_re,
    @forbidden_re_list, $patterns_re, %dictionary,
    $begin_block_re, @begin_block_list, @end_block_list,
    $candidates_re, @candidates_re_list, $check_file_names, $use_magic_file, $disable_minified_file,
    $disable_single_line_file,
    $sandbox,
  );
  our ($ignore_pattern, $upper_pattern, $lower_pattern, $not_lower_pattern, $not_upper_or_lower_pattern, $punctuation_pattern);

  # https://www.fileformat.info/info/unicode/char/2019/
  my $rsqm = "\xE2\x80\x99";

  my @candidates_re_hits = (0) x scalar @candidates_re_list;
  my @candidates_re_lines = (0) x scalar @candidates_re_list;
  my @forbidden_re_hits = (0) x scalar @forbidden_re_list;
  my @forbidden_re_lines = (0) x scalar @forbidden_re_list;
  my $temp_dir = tempdir(DIR=>$sandbox);
  print STDERR "checking file: $file\n" if defined $ENV{'DEBUG'};
  open(NAME, '>:utf8', "$temp_dir/name");
    print NAME $file;
  close NAME;
  my $file_size = -s $file;
  if (defined $largest_file) {
    unless ($check_file_names eq $file) {
      if ($file_size > $largest_file) {
        skip_file($temp_dir, "size `$file_size` exceeds limit `$largest_file`. (large-file)\n");
        return $temp_dir;
      }
    }
  }
  if ($use_magic_file) {
    if (open(my $file_fh, '-|',
              '/usr/bin/file',
              '-b',
              '--mime',
              '-e', 'cdf',
              '-e', 'compress',
              '-e', 'csv',
              '-e', 'elf',
              '-e', 'json',
              '-e', 'tar',
              $file)) {
      my $file_kind = <$file_fh>;
      close $file_fh;
      if ($file_kind =~ /^(.*?); charset=binary/) {
        skip_file($temp_dir, "appears to be a binary file ('$1'). (binary-file)\n");
        return $temp_dir;
      }
    }
  }
  open FILE, '<', $file;
  binmode FILE;
  my $head;
  read(FILE, $head, 4096);
  $head =~ s/(?:\r|\n)+$//;
  my $dos_new_lines = () = $head =~ /\r\n/gi;
  my $unix_new_lines = () = $head =~ /\n/gi;
  my $mac_new_lines = () = $head =~ /\r/gi;
  local $/;
  if ($unix_new_lines == 0 && $mac_new_lines == 0) {
    $/ = "\n";
  } elsif ($dos_new_lines >= $unix_new_lines && $dos_new_lines >= $mac_new_lines) {
    $/ = "\r\n";
  } elsif ($mac_new_lines > $unix_new_lines) {
    $/ = "\r";
  } else {
    $/ = "\n";
  }
  seek(FILE, 0, 0);
  ($words, $unrecognized) = (0, 0);
  %unique = ();
  %unique_unrecognized = ();

  local $SIG{__WARN__} = sub {
    my $message = shift;
    $message =~ s/> line/> in $file - line/;
    chomp $message;
    print STDERR "$message\n";
  };

  open(WARNINGS, '>:utf8', "$temp_dir/warnings");
  our $timeout;
  eval {
    local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
    alarm $timeout;

    my ($current_begin_marker, $next_end_marker, $start_marker_line) = ('', '', '');
    my $offset = 0;
    LINE: while (<FILE>) {
      if ($. == 1) {
        unless ($disable_minified_file) {
          if ($file_size >= 512 && length($_) == $file_size) {
            skip_file($temp_dir, "file is a single line file. (single-line-file)\n");
            last;
          }
        }
      }
      $_ = decode_utf8($_, FB_DEFAULT);
      if (/[\x{D800}-\x{DFFF}]/) {
        skip_file($temp_dir, "file contains a UTF-16 surrogate. This is not supported. (utf16-surrogate-file)\n");
        last;
      }
      s/\R$//;
      s/^\x{FEFF}// if $. == 1;
      next unless /./;
      my $raw_line = $_;
      my $parsed_block_markers;

      # hook for custom multiline based text exclusions:
      if ($begin_block_re) {
        FIND_END_MARKER: while (1) {
          while ($next_end_marker ne '') {
            next LINE unless /\Q$next_end_marker\E/;
            s/.*?\Q$next_end_marker\E//;
            ($current_begin_marker, $next_end_marker, $start_marker_line) = ('', '', '');
            $parsed_block_markers = 1;
          }
          my @captured = (/^.*?$begin_block_re/);
          last unless (@captured);
          for my $capture (0 .. $#captured) {
            if ($captured[$capture]) {
              ($current_begin_marker, $next_end_marker, $start_marker_line) = ($begin_block_list[$capture], $end_block_list[$capture], "$.:1 ... 1");
              s/^.*?\Q$begin_block_list[$capture]\E//;
              $parsed_block_markers = 1;
              next FIND_END_MARKER;
            }
          }
        }
        next if $parsed_block_markers;
      }

      # hook for custom line based text exclusions:
      if (defined $patterns_re) {
        s/($patterns_re)/"="x length($1)/ge;
      }
      my $previous_line_state = $_;
      my $line_flagged;
      if ($forbidden_re) {
        while (s/($forbidden_re)/"="x length($1)/e) {
          $line_flagged = 1;
          my ($begin, $end, $match) = ($-[0] + 1, $+[0] + 1, $1);
          my $found_trigger_re;
          for my $i (0 .. $#forbidden_re_list) {
            my $forbidden_re_singleton = $forbidden_re_list[$i];
            my $test_line = $previous_line_state;
            if ($test_line =~ s/($forbidden_re_singleton)/"="x length($1)/e) {
              next unless $test_line eq $_;
              my ($begin_test, $end_test, $match_test) = ($-[0] + 1, $+[0] + 1, $1);
              next unless $begin == $begin_test;
              next unless $end == $end_test;
              next unless $match eq $match_test;
              $found_trigger_re = $forbidden_re_singleton;
              my $hit = "$.:$begin:$end";
              $forbidden_re_hits[$i]++;
              $forbidden_re_lines[$i] = $hit unless $forbidden_re_lines[$i];
              last;
            }
          }
          my $wrapped = CheckSpelling::Util::wrap_in_backticks($match);
          if ($found_trigger_re) {
            $found_trigger_re =~ s/^\(\?:(.*)\)$/$1/;
            my $quoted_trigger_re = CheckSpelling::Util::wrap_in_backticks($found_trigger_re);
            print WARNINGS ":$.:$begin ... $end, Warning - $wrapped matches a line_forbidden.patterns entry: $quoted_trigger_re. (forbidden-pattern)\n";
          } else {
            print WARNINGS ":$.:$begin ... $end, Warning - $wrapped matches a line_forbidden.patterns entry. (forbidden-pattern)\n";
          }
          $previous_line_state = $_;
        }
      }
      # This is to make it easier to deal w/ rules:
      s/^/ /;
      my %unrecognized_line_items = ();
      my ($new_words, $new_unrecognized) = split_line($_, \%unique, \%unique_unrecognized, \%unrecognized_line_items);
      $words += $new_words;
      $unrecognized += $new_unrecognized;
      my $line_length = length($raw_line);
      for my $token (sort CheckSpelling::Util::case_biased keys %unrecognized_line_items) {
        my $found_token = 0;
        my $raw_token = $token;
        $token =~ s/'/(?:'|\x{2019}|\&apos;|\&#39;)+/g;
        my $before;
        if ($token =~ /^$upper_pattern$lower_pattern/) {
          $before = '(?<=.)';
        } elsif ($token =~ /^$upper_pattern/) {
          $before = "(?<!$upper_pattern)";
        } else {
          $before = "(?<=$not_lower_pattern)";
        }
        my $after = ($token =~ /$upper_pattern$/) ? "(?=$not_upper_or_lower_pattern)|(?=$upper_pattern$lower_pattern)" : "(?=$not_lower_pattern)";
        while ($raw_line =~ /(?:\b|$before)($token)(?:\b|$after)/g) {
          $line_flagged = 1;
          $found_token = 1;
          my ($begin, $end, $match) = ($-[0] + 1, $+[0] + 1, $1);
          next unless $match =~ /./;
          my $wrapped = CheckSpelling::Util::wrap_in_backticks($match);
          print WARNINGS ":$.:$begin ... $end: $wrapped\n";
        }
        unless ($found_token) {
          if ($raw_line !~ /$token.*$token/ && $raw_line =~ /($token)/) {
            my ($begin, $end, $match) = ($-[0] + 1, $+[0] + 1, $1);
            print WARNINGS ":$.:$begin ... $end: '$match'\n";
          } else {
            my $offset = $line_length + 1;
            my $wrapped = CheckSpelling::Util::wrap_in_backticks($raw_token);
            print WARNINGS ":$.:1 ... $offset, Warning - Could not identify whole word $wrapped in line. (token-is-substring)\n";
          }
        }
      }
      if ($line_flagged && $candidates_re) {
        $_ = $previous_line_state;
        s/($candidates_re)/"="x length($1)/ge;
        if ($_ ne $previous_line_state) {
          $_ = $previous_line_state;
          for my $i (0 .. $#candidates_re_list) {
            my $candidate_re = $candidates_re_list[$i];
            next unless $candidate_re =~ /./ && $raw_line =~ /$candidate_re/;
            if (($_ =~ s/($candidate_re)/"="x length($1)/e)) {
              my ($begin, $end) = ($-[0] + 1, $+[0] + 1);
              my $hit = "$.:$begin:$end";
              $_ = $previous_line_state;
              my $replacements = ($_ =~ s/($candidate_re)/"="x length($1)/ge);
              $candidates_re_hits[$i] += $replacements;
              $candidates_re_lines[$i] = $hit unless $candidates_re_lines[$i];
              $_ = $previous_line_state;
            }
          }
        }
      }
      unless ($disable_minified_file) {
        s/={3,}//g;
        $offset += length;
        my $ratio = int($offset / $.);
        my $ratio_threshold = 1000;
        if ($ratio > $ratio_threshold) {
          skip_file($temp_dir, "average line width ($ratio) exceeds the threshold ($ratio_threshold). (minified-file)\n");
        }
      }
    }
    if ($next_end_marker) {
      if ($start_marker_line) {
        my $wrapped = CheckSpelling::Util::wrap_in_backticks($current_begin_marker);
        print WARNINGS ":$start_marker_line, Warning - failed to find matching end marker for $wrapped (unclosed-block-ignore-begin)\n";
      }
      my $wrapped = CheckSpelling::Util::wrap_in_backticks($next_end_marker);
      print WARNINGS ":$.:1 ... 1, Warning - expected to find end block marker $wrapped (unclosed-block-ignore-end)\n";
    }

    alarm 0;
  };
  if ($@) {
    die unless $@ eq "alarm\n";
    print WARNINGS ":$.:1 ... 1, Warning - Could not parse file within time limit. (slow-file)\n";
    skip_file($temp_dir, "could not parse file within time limit. (slow-file)\n");
  }

  close FILE;
  close WARNINGS;

  if ($unrecognized || @candidates_re_hits || @forbidden_re_hits) {
    open(STATS, '>:utf8', "$temp_dir/stats");
      print STATS "{words: $words, unrecognized: $unrecognized, unknown: ".(keys %unique_unrecognized).
      ", unique: ".(keys %unique).
      (@candidates_re_hits ? ", candidates: [".(join ',', @candidates_re_hits)."]" : "").
      (@candidates_re_lines ? ", candidate_lines: [".(join ',', @candidates_re_lines)."]" : "").
      (@forbidden_re_hits ? ", forbidden: [".(join ',', @forbidden_re_hits)."]" : "").
      (@forbidden_re_lines ? ", forbidden_lines: [".(join ',', @forbidden_re_lines)."]" : "").
      "}";
    close STATS;
    open(UNKNOWN, '>:utf8', "$temp_dir/unknown");
      print UNKNOWN map { "$_\n" } sort CheckSpelling::Util::case_biased keys %unique_unrecognized;
    close UNKNOWN;
  }

  return $temp_dir;
}

sub main {
  my ($configuration, @ARGV) = @_;
  our %dictionary;
  unless (%dictionary) {
    init($configuration);
  }

  # read all input
  my @reports;

  for my $file (@ARGV) {
    my $temp_dir = split_file($file);
    push @reports, "$temp_dir\n";
  }
  print join '', @reports;
}

1;
lib/CheckSpelling/Util.pm
#! -*-perl-*-

use v5.20;
use feature 'unicode_strings';

package CheckSpelling::Util;

use Encode qw/decode_utf8 encode_utf8 FB_DEFAULT/;
use HTTP::Date;
use feature 'signatures';
no warnings qw(experimental::signatures);

our $VERSION='0.1.0';

sub get_file_from_env {
  my ($var, $fallback) = @_;
  return $fallback unless defined $ENV{$var};
  $ENV{$var} =~ /(.*)/s;
  return $1;
}

sub get_file_from_env_utf8 {
  return decode_utf8(get_file_from_env(@_));
}

sub get_val_from_env {
  my ($var, $fallback) = @_;
  return $fallback unless defined $ENV{$var};
  $ENV{$var} =~ /^(\d+)$/;
  return $1 || $fallback;
}

sub case_biased :prototype($$) ($a, $b) {
  lc($a) cmp lc($b) || $a cmp $b;
}

sub wrap_in_backticks {
  my ($a) = @_;
  my $longest = 0;
  while ($a =~ /(`+)/g) {
    my $length = length $1;
    $longest = $length if $length > $longest;
  }
  my $q = '`'x ($longest + 1);
  return "$q$a$q";
}

1;

The code in question comes from https://github.com/check-spelling/check-spelling/tree/ff42ff6e416558ffab656fdb0a029d32737703de although it isn't identical, I've tried to remove some code that isn't relevant in an effort to rule things out.

The code in question is roughly:
this original alloc:

op = run_original_op(op_type);

and this:
sv_inc(AvARRAY(subr_call_av)[NYTP_SCi_CALL_COUNT]);

which appears to change the shape of an object

and then a number of stale references to an object:

if (subr_entry->called_cv_depth <= 1) {

NYTP_write_call_return(out, subr_entry->subr_prof_depth, called_subname_pv, incl_subr_ticks, excl_subr_ticks);

Death under:
sv_free(subr_entry->called_subnam_sv);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions