Discussion:
Does character class %s ever match hardspace?
Dirk Laurie
2018-10-22 09:27:33 UTC
Permalink
The manual says:

The definitions of letter, space, and other character groups depend on
the current locale.

Is there a standard locale in which character class %s matches the
hardspace character?
Egor Skriptunoff
2018-10-22 20:03:28 UTC
Permalink
Post by Dirk Laurie
Is there a standard locale in which character class %s matches the
hardspace character?
Yes, of course.
For example, Russian Windows locale.

Lua 5.3.5 Copyright (C) 1994-2018 Lua.org, PUC-Rio
Post by Dirk Laurie
os.setlocale""
Russian_Russia.1251
Post by Dirk Laurie
for code = 0, 255 do
if string.char(code):match"%s" then
print(code)
end
end
9
10
11
12
13
32
160
Gé Weijers
2018-10-25 01:58:25 UTC
Permalink
On my MacOS machine the following character codes match '%s':

"C" locale: 9 10 11 12 13 32

"en_US.UTF-8" locale: 9 10 11 12 13 32 133 160

133 == NEXT LINE (NEL)
160 == NO-BREAK SPACE (NBSP)
Post by Egor Skriptunoff
Post by Dirk Laurie
Is there a standard locale in which character class %s matches the
hardspace character?
Yes, of course.
For example, Russian Windows locale.
Lua 5.3.5 Copyright (C) 1994-2018 Lua.org, PUC-Rio
Post by Dirk Laurie
os.setlocale""
Russian_Russia.1251
Post by Dirk Laurie
for code = 0, 255 do
if string.char(code):match"%s" then
print(code)
end
end
9
10
11
12
13
32
160
--
--
Gé
Dirk Laurie
2018-10-25 05:11:22 UTC
Permalink
Post by Gé Weijers
"C" locale: 9 10 11 12 13 32
"en_US.UTF-8" locale: 9 10 11 12 13 32 133 160
133 == NEXT LINE (NEL)
160 == NO-BREAK SPACE (NBSP)
Thanks to you and Egor. Egor's I understand: it is an 8-bit character
set. I find your example a little surprising, though. It's not that
way on Ubuntu. Are you using Lua 5.3? Surely single characters in the
range 128-255 are not legal UTF-8?

$ lua
Lua 5.3.5 Copyright (C) 1994-2018 Lua.org, PUC-Rio
Post by Gé Weijers
os.setlocale"en_US.UTF-8"
en_US.UTF-8
Post by Gé Weijers
for k=0,255 do if string.char(k):match"%s" then io.write(k,' ') end end
9 10 11 12 13 32 >
Post by Gé Weijers
utf8.len"The\160quick brown fox"
nil 4
Gé Weijers
2018-10-25 18:27:27 UTC
Permalink
Post by Dirk Laurie
Post by Gé Weijers
"C" locale: 9 10 11 12 13 32
"en_US.UTF-8" locale: 9 10 11 12 13 32 133 160
133 == NEXT LINE (NEL)
160 == NO-BREAK SPACE (NBSP)
Thanks to you and Egor. Egor's I understand: it is an 8-bit character
set. I find your example a little surprising, though. It's not that
way on Ubuntu. Are you using Lua 5.3? Surely single characters in the
range 128-255 are not legal UTF-8?
I’m using 5.3. Different libc I guess.

Loading...