Skip to content

Compiled regex MatchString(string) with $ in compile expression and Hindi characters in match string. #62295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mdonahue-godaddy opened this issue Aug 25, 2023 · 3 comments

Comments

@mdonahue-godaddy
Copy link

mdonahue-godaddy commented Aug 25, 2023

What version of Go are you using (go version)?

$ go version
1.20.x - 1.21.x

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE='on'
GOARCH='arm64'
GOBIN=''
GOCACHE='[REDACTED]/Library/Caches/go-build'
GOENV='[REDACTED]/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='[REDACTED]/go/pkg/mod'
GONOPROXY='[REDACTED]'
GONOSUMDB='[REDACTED]'
GOOS='darwin'
GOPATH='[REDACTED]/go'
GOPRIVATE='[REDACTED]'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/opt/go/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/opt/go/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.0'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='0'
GOMOD='[REDACTED]/github.com/[REDACTED]/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/d9/fy99_6gj5dbb4fv8ns59wfx00000gq/T/go-build3450324136=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Behavior present on latest macOS and in Go Playground https://play.golang.com/p/x6q9rbx33ag

package main

import (
	"fmt"
	"regexp"
)

const (
	DomainRegexText = `^[\p{L}\p{N}\-]{1,63}$`
)

func main() {
	domainRegex := regexp.MustCompile(DomainRegexText)
	domains := []string{"", "a", "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz-0123456789", "abcdefghijklmnopqrstuvwxyz-abcdefghijklmnopqrstuvwxyz-0123456789", "कारबीमा"}
	for _, domain := range domains {
		fmt.Printf("'%s' with len := %d\n", domain, len(domain))

		if domainRegex.MatchString(domain) {
			fmt.Println("REGEX matched!")
		} else {
			fmt.Println("REGEX did NOT match!!")
		}
	}
}
'' with len := 0
REGEX did NOT match!!
'a' with len := 1
REGEX matched!
'abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz-0123456789' with len := 63
REGEX matched!
'abcdefghijklmnopqrstuvwxyz-abcdefghijklmnopqrstuvwxyz-0123456789' with len := 64
REGEX did NOT match!!
'कारबीमा' with len := 21
REGEX did NOT match!!

What did you expect to see?

I'm not a Hindi language expert, but it seems that 'कारबीमा' should match the RegEx

What did you see instead?

'कारबीमा' failed match

Dropping the $ from the compiled RegEx allows it to match, but then the size check fails over 63 characters (as it should).

@fzipp
Copy link
Contributor

fzipp commented Aug 25, 2023

Not every rune in the string "कारबीमा" is a letter, some are marks:
https://go.dev/play/p/sF5sYpZRAeW

You can match marks with \p{M}.

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Aug 25, 2023
@mdonahue-godaddy
Copy link
Author

Thank you, will look at that closer.

@mdonahue-godaddy
Copy link
Author

mdonahue-godaddy commented Aug 26, 2023

That resolved the issue. So the issue was on my side. Thanks again!

@golang golang locked and limited conversation to collaborators Aug 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants