Discussion:
OR, quantifier support in Lua patterns
Sai Manoj Kumar Yadlapati
2018-09-27 04:39:19 UTC
Permalink
Hi all,

Lua supports its own version of regular expression matching.
But it doesn't have the | (pipe symbol) support and the quantifier support
- a{1,5} meaning a can occur anywhere from 1 to 5 times.

Both of these are present in PCRE. I am curious to know why these are not
supported.Is it not supported intentionally or was it never considered?

Thanks
Sai Manoj
Andrew Gierth
2018-09-27 05:15:39 UTC
Permalink
Sai> Hi all,
Sai> Lua supports its own version of regular expression matching.

Well, to be precise it supports a pattern-matching function that falls a
long way short of regular expressions.

Sai> But it doesn't have the | (pipe symbol) support and the quantifier
Sai> support - a{1,5} meaning a can occur anywhere from 1 to 5 times.

Sai> Both of these are present in PCRE. I am curious to know why these
Sai> are not supported.Is it not supported intentionally or was it
Sai> never considered?

Maybe this answers your question:

% size liblua-5.3.so libpcre.so
text data bss dec hex filename
236048 6457 0 242505 0x3b349 liblua-5.3.so
483084 1237 152 484473 0x76479 libpcre.so

i.e. PCRE is nearly double the size of the entirety of Lua. (Even a
relatively minimal POSIX regexp implementation would be 2.5 times the
size of the Lua string library - ~50kB vs. ~20kB on my system.)

You can use LPEG instead (which is even more powerful than regular
expressions though has a bit of a learning curve), or if you're not
worried about size then there's a Lua binding for PCRE.
--
Andrew.
Jim
2018-09-28 20:18:14 UTC
Permalink
On Thu, Sep 27, 2018 at 6:40 AM Sai Manoj Kumar Yadlapati
Post by Sai Manoj Kumar Yadlapati
Lua supports its own version of regular expression matching.
But it doesn't have the | (pipe symbol) support and the quantifier support - a{1,5} meaning a can occur anywhere from 1 to 5 times.
Both of these are present in PCRE. I am curious to know why these are not supported.Is it not supported intentionally or was it
never considered?
this is a very useful and often needed feature to add to Lua's builtin patterns.
i also would like to see it added and had that topic also on my
wishlist of items
that should be added to Lua.

"|" grouping of alternatives is not only present in PCRE, but also in
POSIX regex,
contained in the libc of unix systems and hence usable without the need of
extra libs like PCRE.
Lorenzo Donati
2018-09-29 08:38:30 UTC
Permalink
Post by Sai Manoj Kumar Yadlapati
Hi all,
Lua supports its own version of regular expression matching.
But it doesn't have the | (pipe symbol) support and the quantifier support
- a{1,5} meaning a can occur anywhere from 1 to 5 times.
Both of these are present in PCRE. I am curious to know why these are not
supported.Is it not supported intentionally or was it never considered?
Thanks
Sai Manoj
To reinforce what Andrew said in his reply: please note that Lua
patterns are NOT regular expressions. That is they haven't got the same
expressive power as regexes, and that's /by design/. The goal was/is to
keep Lua size small.

I can't say if implementing alternation (i.e. that OR operator) will
increase Lua size by much, but I suspect it will.

-- Lorenzo
Jim
2018-09-30 17:25:04 UTC
Permalink
that's /by design/. The goal was/is to keep Lua size small.
I can't say if implementing alternation (i.e. that OR operator) will
increase Lua size by much, but I suspect it will.
how comes that squirrel has them ?
does that make squirrel NOT small ?
or has that more to do with it being written in c++ ?
(which was an unnecessary mistake imo)

btw: squirrel separated the interpreter and its std libs
into 2 different c libs which looks like a good idea to me.
Lorenzo Donati
2018-10-01 08:01:32 UTC
Permalink
Post by Jim
that's /by design/. The goal was/is to keep Lua size small.
I can't say if implementing alternation (i.e. that OR operator) will
increase Lua size by much, but I suspect it will.
how comes that squirrel has them ?
does that make squirrel NOT small ?
or has that more to do with it being written in c++ ?
(which was an unnecessary mistake imo)
I don't know squirrel, so I can't say. Anyway, did you compare the size
(both source and executable) of Lua with those of squirrel?

If squirrel can implement a full regex engine in less space than Lua, it
could be worth pointing that out to Lua team.

On the other hand I suspect squirrel regex engine may be implemented
using C++ regex classes, so the implementation could be very terse in
squirrel source. Moreover even executable size could be smaller because
the object code of the C++ regex engine could reside in some
system/platform DLL, if not linked statically into the squirrel interpreter.

Keep in mind that Lua is written in very portable C (almost all C89,
some few parts C99) and its pattern facility is built into the source,
so it can be compiled on any system with a barebone C compiler. Lua
/can/ be compiled as C++, but it doesn't use any C++ library facility
that is not also in a C library.

If squirrel regex implementation relies on C++-specific libraries,
comparing it to Lua is not actually fair: you should compare it against
Lua /together with/ a regex engine binding, like a PCRE binding, instead.
Post by Jim
btw: squirrel separated the interpreter and its std libs
into 2 different c libs which looks like a good idea to me.
If you really don't need some Lua library you can compile a version of
Lua interpreter disabling some of them. The fact that this is not done
by default at the Lua code level seen by the interpreter depends on the
fact that Lua is primarily an engine to be embedded in some custom C
code, the so called "C application" (this is by design). So a C
programmer can choose which library to include in the compilation anyway.

Moreover, if Lua is compiled as a DLL on a PC-class machine, the C
application can be small by simply linking to Lua dynamically.

If you really need Lua code statically linked to your C code, then you
can customize what parts of Lua you really need anyway.

The standard interpreter is just a very lightweight C application that
happens to embed a Lua engine. Lua "the language" wasn't designed to be
run only in the context of a command line interpreter.
Jim
2018-10-01 20:57:47 UTC
Permalink
Post by Lorenzo Donati
I don't know squirrel, so I can't say. Anyway, did you compare the size
(both source and executable) of Lua with those of squirrel?
well, yes, squirrel(-lang.org) is in deed bigger:
static libs:
interpreter/vm lib:
-rw-r--r-- 1 root root 523K May 27 2016 /usr/local/lib/libsquirrel_static.a
its std lib (which is in a separate lib):
rw-r--r-- 1 root root 138K May 27 2016 /usr/local/lib/libsqstdlib_static.a
vs:
-rw-r--r-- 1 root root 423K Jul 24 22:52 /usr/local/lib64/liblua.a
for lua interpreter + stdlib

binaries:
-rwxr-xr-x 1 root root 20K May 27 2016 /usr/local/bin/sq*
-rwxr-xr-x 1 root root 415K May 27 2016 /usr/local/bin/sq_static*
(the interpreter/vm bin is also used for compiling bytecode, no
separate binary necessary)

vs lua:
-rwxr-xr-x 1 root root 213K Jul 24 22:14 /usr/local/bin/lua*
-rwxr-xr-x 1 root root 144K Jul 24 22:14 /usr/local/bin/luac*
Post by Lorenzo Donati
On the other hand I suspect squirrel regex engine may be implemented
using C++ regex classes, so the implementation could be very terse in
squirrel source.
in deed, i have not considered this as i am no c++ user and try to avoid
the crap at all costs, but under windows heavy usage of c++ seems to be
the norm.

but the squirrel author has implemented a "tiny regex lib" in ansi C
(T-Rex is a minimalistic regular expression library written in ANSI C)
because he "couldn't find any free regular expression library that wasn't huge
and bloated, while most of the time he needed just basic functionalities"
as he wrote on
http://www.demichelis.net/default.aspx?content=projects&template=projects

a quick look into the squirrel sources reveals that he has implemented the
squirrel regex functions in sqstdlib/sqstdrex.cpp (663 lines) in procedural
c style without use of any c++ stdlib regex helper classes.

so does that really bloat and make squirrel BIG in any way ?
or is it just the usual cheap excuse as in "we cant bloat lua with binary/octal
integer literals" (which anyone else has of course) but have hex
integer literals,
since it does NOT bloat the language in any way and can't be done with
tonumber().

i am really tired of always the same lamenting.
we would not use lua if we had not already written thousands of lines
of binding
c code (which was a very stupid decision we bitterly regret by now).
the only reason that stopped us from using squirrel in the first place was
that is written in c++ (with all the dependencies that introduces
without any gain).
if squirrel could be rewritten in c we would use it instantly and port
all the c binding
code to it.

the squirrel c api is also much better designed.
Post by Lorenzo Donati
Keep in mind that Lua is written in very portable C (almost all C89,
some few parts C99) and its pattern facility is built into the source,
so it can be compiled on any system with a barebone C compiler. Lua
/can/ be compiled as C++, but it doesn't use any C++ library facility
that is not also in a C library.
well, the usual lamento.
on unix you get posix regex and LOTS of other useful functions for FREE
by the c lib (which you have to use anyway) or as direct syscalls.

when building via "make linux" (for instance) you know what platform is used
and what it offers (at least std posix functions).
(you can also check feature macros if you prefer)
so instead of linking against big bloated crap like libreadline use what's
already in the c lib for FREE.

that could provide a table "regex" (or "re") that makes use of the c lib's
extended posix regex (which has to be there as required by the posix std.
btw: are the c++ std regex classes implemented using this c lib support ?)

there should be also a table "posix" (or "unix" or just "sys") that contains the
std posix functions (like chdir, mkdir, setenv and the like) and is a
metatable of
the "os" table.

that "posix" table should also have the metatable "linux" on Linux which should
contain Linux-only bindings (similar on freebsd, solaris etc)
Post by Lorenzo Donati
If squirrel regex implementation relies on C++-specific libraries,
which is not the case.
Post by Lorenzo Donati
Moreover, if Lua is compiled as a DLL on a PC-class machine, the C
application can be small by simply linking to Lua dynamically.
The standard interpreter is just a very lightweight C application that
happens to embed a Lua engine. Lua "the language" wasn't designed to be
run only in the context of a command line interpreter.
i totally understand that, but having above additional tables does not cost much
and should also be provided with the possibility of disabling them when they are
not needed (or even harmful) analogous to the tables/modules Lua
already provides.
Sean Conner
2018-10-02 04:11:30 UTC
Permalink
Post by Jim
but the squirrel author has implemented a "tiny regex lib" in ansi C
(T-Rex is a minimalistic regular expression library written in ANSI C)
because he "couldn't find any free regular expression library that wasn't huge
and bloated, while most of the time he needed just basic functionalities"
as he wrote on
http://www.demichelis.net/default.aspx?content=projects&template=projects
a quick look into the squirrel sources reveals that he has implemented the
squirrel regex functions in sqstdlib/sqstdrex.cpp (663 lines) in procedural
c style without use of any c++ stdlib regex helper classes.
Well, one could wrap those regex functions into a Lua module so it's
available for you to use.
Post by Jim
so does that really bloat and make squirrel BIG in any way ? or is it just
the usual cheap excuse as in "we cant bloat lua with binary/octal integer
literals" (which anyone else has of course) but have hex integer literals,
since it does NOT bloat the language in any way and can't be done with
tonumber().
That's for Luis and Roberto to answer. For me, I never needed to use
octal, and I'm past the need for binary literals (I wouldn't mind them, but
I'm not lamenting their lack).
Post by Jim
i am really tired of always the same lamenting. we would not use lua if we
had not already written thousands of lines of binding c code (which was a
very stupid decision we bitterly regret by now). the only reason that
stopped us from using squirrel in the first place was that is written in
c++ (with all the dependencies that introduces without any gain). if
squirrel could be rewritten in c we would use it instantly and port all
the c binding code to it.
Why was Lua picked in the first place if you now regret it? Is is the
fact that Lua doesn't have real regex that makes it suck? Or are there
other factors that make you regret the choice of Lua?
Post by Jim
the squirrel c api is also much better designed.
Is this the language described by squirrel-lang.org? Because if so, the
API seems very close to the Lua API (it downright seems Lua influenced the
design from what I can tell). What is it about the Lua C API that sucks?
Or why is the squirrel one better?

Because from my brief look, they seem very similar.
Post by Jim
Post by Lorenzo Donati
Keep in mind that Lua is written in very portable C (almost all C89,
some few parts C99) and its pattern facility is built into the source,
so it can be compiled on any system with a barebone C compiler. Lua
/can/ be compiled as C++, but it doesn't use any C++ library facility
that is not also in a C library.
well, the usual lamento. on unix you get posix regex and LOTS of other
useful functions for FREE by the c lib (which you have to use anyway) or
as direct syscalls.
In some respects yes. In other respects no. On Linux you need to link
with pthreads of you use that; not so on other systems. On Solaris you need
to link with nt if you want to use the network API (socket(), bind(),
accept(), etc) but no so with other Unix systems. On Windows, POSIX isn't
part of the C library (although I could be wrong, but I would find it
surprising).
Post by Jim
when building via "make linux" (for instance) you know what platform is
used and what it offers (at least std posix functions). (you can also
check feature macros if you prefer) so instead of linking against big
bloated crap like libreadline use what's already in the c lib for FREE.
that could provide a table "regex" (or "re") that makes use of the c lib's
extended posix regex (which has to be there as required by the posix std.
btw: are the c++ std regex classes implemented using this c lib support ?)
I don't know which Unix you are using, but the ones I've had experience
with never came with regex "for free" (as part of libc).
Post by Jim
there should be also a table "posix" (or "unix" or just "sys") that
contains the std posix functions (like chdir, mkdir, setenv and the like)
and is a metatable of the "os" table.
that "posix" table should also have the metatable "linux" on Linux which
should contain Linux-only bindings (similar on freebsd, solaris etc)
There are Lua modules that provide such functionality but I'm guessing you
want those built into the base Lua distribution. Roberto and Luis have
different priorities; if you agree with them, use Lua. If you don't, don't
use Lua.

-spc
Dirk Laurie
2018-10-02 06:32:34 UTC
Permalink
<troll_alert>
Post by Sean Conner
Post by Jim
there should be also a table "posix" (or "unix" or just "sys") that
contains the std posix functions (like chdir, mkdir, setenv and the like)
and is a metatable of the "os" table.
that "posix" table should also have the metatable "linux" on Linux which
should contain Linux-only bindings (similar on freebsd, solaris etc)
</troll_alert>

"there should be" is not an acceptable way of making a dubious
suggestion more plausible.
Post by Sean Conner
There are Lua modules that provide such functionality but I'm guessing you
want those built into the base Lua distribution. Roberto and Luis have
different priorities; if you agree with them, use Lua. If you don't, don't
use Lua.
Or have a standard set of patches that moulds your personal Lua
version to your liking.
E.g. on my machine "lua -l lexer=pl.lexer" is legal.

-- Dirk
Jim
2018-10-02 21:44:01 UTC
Permalink
Post by Dirk Laurie
"there should be" is not an acceptable way of making a
why ? this should be there to make some use of it.
Post by Dirk Laurie
dubious suggestion more plausible.
how is that dubious ? for me it's very clear.
the example i gave is also very clear and to the point.
could you please explain how it is dubious ?
Jim
2018-10-02 22:57:47 UTC
Permalink
^^^^^^^ why are you trying to
ridicule me ?
Post by Sean Conner
Well, one could wrap those regex functions into a Lua module so it's
available for you to use.
sure, that's what we will do.
how about directly using squirrel that has that already available ?
Post by Sean Conner
That's for Luis and Roberto to answer. For me, I never needed to use
octal, and I'm past the need for binary literals (I wouldn't mind them, but
I'm not lamenting their lack).
we make heavy use of octal integer literals and that's absoluely nothing
very extraordinary or exotic in any way.
Post by Sean Conner
Why was Lua picked in the first place if you now regret it? Is is the
fact that Lua doesn't have real regex that makes it suck? Or are there
other factors that make you regret the choice of Lua?
we thougt it could be used as a scripting language, akin to perl, python, ruby
and all the others.

we failed to recognize that it was only designed for authors that use it as
a config language for their c program to avoid inventing one of their own.
Lua is obviously not designed to run scripts that do any real work.
it's just a toy, a study of how a config language could look, it's not made
for scripting like say perl or other scripting languages,
it simply was not made for this and thus obviously is not up for the task.
we thought having an interpreter around was for interpreting scripts as the
perl interpreter for instance does.

it's a really poor design that changes with every release because something
is broken again or poorly thought out.
for instance: how dumb and clueless must one be to release a language that
has only floating point arithmetic in a time were 386, 486SX and other cpus
without an fpu were in wide usage ?
we dont use any floating point arithmetic btw., integer is enough for us.
was Lua designed as a new fortran that could also be used as a config language ?
and now look how many decades (!!!!) it took until someone decided that
integer arithmetic and bitwise ops wasn't that bad.

now look at the regex module, it doesnt work with lua5.3, so we dont have
even a binding for posix regex (that would suffice for us, no need for PCRE),
so we have to implement our own.

have a look at lua rocks that uses a collection of unmaintained garbage
laying and rotting around for years abondoned by the authors since they
stopped using lua for obvious reasons.
that crap is not even usable on linux, and its a total mess on solaris.
imagine perl. python or anyone else would deliver such poor tooling ...
Post by Sean Conner
Is this the language described by squirrel-lang.org?
well, obvously, mr. genius, that's why i mentioned it.
Post by Sean Conner
Because if so, the
API seems very close to the Lua API (it downright seems Lua influenced the
design from what I can tell). What is it about the Lua C API that sucks?
Or why is the squirrel one better?
squirrel was based on Lua, the author tried to fix some of the main problems.
a brief look on its api is not enough, read and compare it point by point with
lua's api, then use it in some code and you will understand what i am about.
Post by Sean Conner
Because from my brief look, they seem very similar.
from a brief look lua might also look like a scripting language, but a
brief look
is not enough. use the squirrel api and you will see what i mean.
Post by Sean Conner
In some respects yes. In other respects no. On Linux you need to link
with pthreads of you use that; not so on other systems. On Solaris you need
to link with nt if you want to use the network API (socket(), bind(),
accept(), etc) but no so with other Unix systems. On Windows, POSIX isn't
part of the C library (although I could be wrong, but I would find it
surprising).
#include <unistd.h>

static int Sgetuid ( lua_State * const L )
{
/* getuid() always succeeds, as required by posix */
lua_pushinteger ( L, getuid () ) ;
return 1 ;
}

thats the same on linux, solaris, aix, all of the bsds, and even crap
like macos X.
same for geteuid(), get(g)id(), get(p)pid(), umask(), fork() to name
a few. how does that complicate anything, how exactly does that
bring in pthreads, i missed the point, could you make this more clear
and enlighten us a bit ?
Post by Sean Conner
I don't know which Unix you are using, but the ones I've had experience
with never came with regex "for free" (as part of libc).
well posix requires regex to be found in <regex.h>, so every unix has
them, from aix to the bsds, its in the libc that one has to use anyway.
so its there for FREE on all relevant unix platforms.
Post by Sean Conner
There are Lua modules that provide such functionality but I'm guessing
and dont work for any recent lua version ? like the regex module ?
Post by Sean Conner
you want those built into the base Lua distribution.
exactly.
Post by Sean Conner
different priorities; if you agree with them, use Lua. If you don't, don't
use Lua.
exactly,.

i figured out recently that the ruby c api is quite usable by hand, though i
dont like ruby and its oo style that is forced on all its users very much.
advantage is that it is a scripting language not just a config lang,
and scripting is what we are doing.
so we started porting our c bindigs and helper functions to ruby as an
interim workaround, buts thats not the end solution.

in the meantime we had a look at several script languages with a usable
c api, from older ones like forth (fth), (regina) rexx, tcl, to java script
implementations like duktape and mujs to newer inventions like ring-lang.net
but so far we have not found anything that pleases us and in the long run
we will have to implement our own solution as no existing tool is up to the
task (at least we did not found one).

that was exactly what we tried to avoid as we were not interested in
inventing the next extension lang (that only we and no one else will use)
and reinvent the wheel.
Sean Conner
2018-10-02 23:44:32 UTC
Permalink
Post by Jim
^^^^^^^ why are you trying to
ridicule me ?
I've been using that opening line in email since the early 90s (and you
can check other messages I've sent on this list to see it in use, and this
is the first time it has received a negative repsonse to it.
Post by Jim
Post by Sean Conner
Well, one could wrap those regex functions into a Lua module so it's
available for you to use.
sure, that's what we will do. how about directly using squirrel that has
that already available ?
Because this is the *Lua* mailing list, not the *Squirrel* mailing list?
Post by Jim
Post by Sean Conner
Why was Lua picked in the first place if you now regret it? Is is the
fact that Lua doesn't have real regex that makes it suck? Or are there
other factors that make you regret the choice of Lua?
we thougt it could be used as a scripting language, akin to perl, python,
ruby and all the others.
we failed to recognize that it was only designed for authors that use it
as a config language for their c program to avoid inventing one of their
own. Lua is obviously not designed to run scripts that do any real work.
That's news to me, since I use Lua to process SIP messages for Verizon
Wireless. It currently handles around 60,000,000 messages per day without
issue (and we expect that level to rise 10-fold over the next year).
Post by Jim
it's just a toy, a study of how a config language could look, it's not
made for scripting like say perl or other scripting languages, it simply
was not made for this and thus obviously is not up for the task. we
thought having an interpreter around was for interpreting scripts as the
perl interpreter for instance does.
That would also be news to wireshark users, as Lua is used there. It's
also used in multiple online games as a scripting language. Oh, and Redis
also uses Lua for scripting. Guess we're all deluding ourselves into
thinking Lua is a programing language.
Post by Jim
it's a really poor design that changes with every release because something
is broken again or poorly thought out.
for instance: how dumb and clueless must one be to release a language that
has only floating point arithmetic in a time were 386, 486SX and other cpus
without an fpu were in wide usage ?
Netscape did the exact same thing with Javascript back in the 90s, and it
still only supports floating point. It's not neccessarily a *bad* design
choice, given at the time systems were 32-bit and one can easily do 52-bit
integer arithmatic with IEEE-754 floating point (I know there are systems
with non-IEEE-754 floating point but they tend to be rare, or were designed
prior to 1985 when IEEE-754 standard was released). Doing that means you
only have one numeric type to support. It's only with the rise in 64-bit
CPUs that such a design becomes problematic and why it was changed for Lua
5.3.
Post by Jim
Post by Sean Conner
Is this the language described by squirrel-lang.org?
well, obvously, mr. genius, that's why i mentioned it.
I wanted to make sure I had the right references.
Post by Jim
Post by Sean Conner
Because if so, the
API seems very close to the Lua API (it downright seems Lua influenced the
design from what I can tell). What is it about the Lua C API that sucks?
Or why is the squirrel one better?
squirrel was based on Lua, the author tried to fix some of the main problems.
a brief look on its api is not enough, read and compare it point by point with
lua's api, then use it in some code and you will understand what i am about.
Post by Sean Conner
Because from my brief look, they seem very similar.
from a brief look lua might also look like a scripting language, but a
brief look
is not enough. use the squirrel api and you will see what i mean.
Post by Sean Conner
In some respects yes. In other respects no. On Linux you need to link
with pthreads of you use that; not so on other systems. On Solaris you need
to link with nt if you want to use the network API (socket(), bind(),
accept(), etc) but no so with other Unix systems. On Windows, POSIX isn't
part of the C library (although I could be wrong, but I would find it
surprising).
#include <unistd.h>
static int Sgetuid ( lua_State * const L )
{
/* getuid() always succeeds, as required by posix */
lua_pushinteger ( L, getuid () ) ;
return 1 ;
}
thats the same on linux, solaris, aix, all of the bsds, and even crap
like macos X.
same for geteuid(), get(g)id(), get(p)pid(), umask(), fork() to name
a few. how does that complicate anything, how exactly does that
bring in pthreads, i missed the point, could you make this more clear
and enlighten us a bit ?
Okay, round two. If I have a program that makes use of pthreads, on
Solaris it comes "for free" (your terms) in libc. On Linux, the pthreads
API is NOT in libc, so it's not "for free" in that reguard---you have to
link with libpthread.

if I have a program that uses socket(), bind(), accept(), listen(), etc.
(the Berkeley sockets API), those calls come "for free" on Linux---they're
part of libc. On Solaris, they are not "for free"---they are not part of
libc and you are required to link against libnt.

On Windows, you don't even get get*id(), umask(), fork() or wait() AT ALL!
Windows does not natively support POSIX.
Post by Jim
Post by Sean Conner
I don't know which Unix you are using, but the ones I've had
experience with never came with regex "for free" (as part of libc).
well posix requires regex to be found in <regex.h>, so every unix has
them, from aix to the bsds, its in the libc that one has to use anyway. so
its there for FREE on all relevant unix platforms.
Again, not always so in my exerience.
Post by Jim
Post by Sean Conner
There are Lua modules that provide such functionality but I'm guessing
and dont work for any recent lua version ? like the regex module ?
Post by Sean Conner
you want those built into the base Lua distribution.
exactly.
Post by Sean Conner
different priorities; if you agree with them, use Lua. If you don't, don't
use Lua.
exactly,.
-spc
Russell Haley
2018-10-03 02:07:48 UTC
Permalink
Post by Sean Conner
Post by Jim
^^^^^^^ why are you trying to
ridicule me ?
I've been using that opening line in email since the early 90s (and you
can check other messages I've sent on this list to see it in use, and this
is the first time it has received a negative repsonse to it.
I was devastated when I realized you answered everyone like that. I thought
someone had finally recognized my brilliance. ;)

Russ
Post by Sean Conner
Post by Jim
Post by Sean Conner
Well, one could wrap those regex functions into a Lua module so it's
available for you to use.
sure, that's what we will do. how about directly using squirrel that has
that already available ?
Because this is the *Lua* mailing list, not the *Squirrel* mailing list?
Post by Jim
Post by Sean Conner
Why was Lua picked in the first place if you now regret it? Is is
the
Post by Jim
Post by Sean Conner
fact that Lua doesn't have real regex that makes it suck? Or are there
other factors that make you regret the choice of Lua?
we thougt it could be used as a scripting language, akin to perl, python,
ruby and all the others.
we failed to recognize that it was only designed for authors that use it
as a config language for their c program to avoid inventing one of their
own. Lua is obviously not designed to run scripts that do any real work.
That's news to me, since I use Lua to process SIP messages for Verizon
Wireless. It currently handles around 60,000,000 messages per day without
issue (and we expect that level to rise 10-fold over the next year).
Post by Jim
it's just a toy, a study of how a config language could look, it's not
made for scripting like say perl or other scripting languages, it simply
was not made for this and thus obviously is not up for the task. we
thought having an interpreter around was for interpreting scripts as the
perl interpreter for instance does.
That would also be news to wireshark users, as Lua is used there. It's
also used in multiple online games as a scripting language. Oh, and Redis
also uses Lua for scripting. Guess we're all deluding ourselves into
thinking Lua is a programing language.
Post by Jim
it's a really poor design that changes with every release because
something
Post by Jim
is broken again or poorly thought out.
for instance: how dumb and clueless must one be to release a language
that
Post by Jim
has only floating point arithmetic in a time were 386, 486SX and other
cpus
Post by Jim
without an fpu were in wide usage ?
Netscape did the exact same thing with Javascript back in the 90s, and it
still only supports floating point. It's not neccessarily a *bad* design
choice, given at the time systems were 32-bit and one can easily do 52-bit
integer arithmatic with IEEE-754 floating point (I know there are systems
with non-IEEE-754 floating point but they tend to be rare, or were designed
prior to 1985 when IEEE-754 standard was released). Doing that means you
only have one numeric type to support. It's only with the rise in 64-bit
CPUs that such a design becomes problematic and why it was changed for Lua
5.3.
Post by Jim
Post by Sean Conner
Is this the language described by squirrel-lang.org?
well, obvously, mr. genius, that's why i mentioned it.
I wanted to make sure I had the right references.
Post by Jim
Post by Sean Conner
Because if so, the
API seems very close to the Lua API (it downright seems Lua influenced
the
Post by Jim
Post by Sean Conner
design from what I can tell). What is it about the Lua C API that
sucks?
Post by Jim
Post by Sean Conner
Or why is the squirrel one better?
squirrel was based on Lua, the author tried to fix some of the main
problems.
Post by Jim
a brief look on its api is not enough, read and compare it point by
point with
Post by Jim
lua's api, then use it in some code and you will understand what i am
about.
Post by Jim
Post by Sean Conner
Because from my brief look, they seem very similar.
from a brief look lua might also look like a scripting language, but a
brief look
is not enough. use the squirrel api and you will see what i mean.
Post by Sean Conner
In some respects yes. In other respects no. On Linux you need to
link
Post by Jim
Post by Sean Conner
with pthreads of you use that; not so on other systems. On Solaris you need
to link with nt if you want to use the network API (socket(), bind(),
accept(), etc) but no so with other Unix systems. On Windows, POSIX
isn't
Post by Jim
Post by Sean Conner
part of the C library (although I could be wrong, but I would find it
surprising).
#include <unistd.h>
static int Sgetuid ( lua_State * const L )
{
/* getuid() always succeeds, as required by posix */
lua_pushinteger ( L, getuid () ) ;
return 1 ;
}
thats the same on linux, solaris, aix, all of the bsds, and even crap
like macos X.
same for geteuid(), get(g)id(), get(p)pid(), umask(), fork() to name
a few. how does that complicate anything, how exactly does that
bring in pthreads, i missed the point, could you make this more clear
and enlighten us a bit ?
Okay, round two. If I have a program that makes use of pthreads, on
Solaris it comes "for free" (your terms) in libc. On Linux, the pthreads
API is NOT in libc, so it's not "for free" in that reguard---you have to
link with libpthread.
if I have a program that uses socket(), bind(), accept(), listen(), etc.
(the Berkeley sockets API), those calls come "for free" on Linux---they're
part of libc. On Solaris, they are not "for free"---they are not part of
libc and you are required to link against libnt.
On Windows, you don't even get get*id(), umask(), fork() or wait() AT ALL!
Windows does not natively support POSIX.
Post by Jim
Post by Sean Conner
I don't know which Unix you are using, but the ones I've had
experience with never came with regex "for free" (as part of libc).
well posix requires regex to be found in <regex.h>, so every unix has
them, from aix to the bsds, its in the libc that one has to use anyway.
so
Post by Jim
its there for FREE on all relevant unix platforms.
Again, not always so in my exerience.
Post by Jim
Post by Sean Conner
There are Lua modules that provide such functionality but I'm
guessing
Post by Jim
and dont work for any recent lua version ? like the regex module ?
Post by Sean Conner
you want those built into the base Lua distribution.
exactly.
Post by Sean Conner
different priorities; if you agree with them, use Lua. If you don't,
don't
Post by Jim
Post by Sean Conner
use Lua.
exactly,.
-spc
Paul Merrell
2018-10-03 07:47:59 UTC
Permalink
Post by Jim
we thougt it could be used as a scripting language, akin to perl, python, ruby
and all the others.
Post by Jim
we failed to recognize that it was only designed for authors that use it as
a config language for their c program to avoid inventing one of their own.
Lua is obviously not designed to run scripts that do any real work.
it's just a toy, a study of how a config language could look, it's not made
for scripting like say perl or other scripting languages,
it simply was not made for this and thus obviously is not up for the task.
we thought having an interpreter around was for interpreting scripts as the
perl interpreter for instance does.

Gee, I must have dreamed that I wrote hundreds of extending scripts in
Lua for the NoteCase Pro outliner over the last several years and that
all these other developers embed Lua as a a scripting engine for their
users. <https://sites.google.com/site/marbux/home/where-lua-is-used>

:-)
Post by Jim
it's a really poor design that changes with every release because something
is broken again or poorly thought out.

NoteCase Pro has upgraded to the latest Lua version with each release
since v. 5.1. In all that time, I've only needed slight tweaks in
three scripts because of Lua changes. YMMV.

Best regards,

Paul
Roberto Ierusalimschy
2018-10-03 13:04:40 UTC
Permalink
Please, don't feed the trolls.

Thanks,

-- Roberto
Gé Weijers
2018-10-03 22:18:00 UTC
Permalink
Post by Sean Conner
On Windows, you don't even get get*id(), umask(), fork() or wait() AT ALL!
Windows does not natively support POSIX.
Well, now we have this on Windows 10:

https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux


--
Gé
Lorenzo Donati
2018-10-04 15:49:01 UTC
Permalink
Post by Sean Conner
Post by Jim
^^^^^^^ why are you trying to
ridicule me ?
I've been using that opening line in email since the early 90s (and you
can check other messages I've sent on this list to see it in use, and this
is the first time it has received a negative repsonse to it.
It has always brought me memories of Arthurian or "Tolkienian" sagas.
Nice literary touch in our cold, harsh world of bits and bytes! ;-)

[...]
Post by Sean Conner
-spc
Tim Hill
2018-10-04 00:13:51 UTC
Permalink
Post by Jim
in the meantime we had a look at several script languages with a usable
c api, from older ones like forth (fth), (regina) rexx, tcl, to java script
implementations like duktape and mujs to newer inventions like ring-lang.net <http://ring-lang.net/>
but so far we have not found anything that pleases us and in the long run
we will have to implement our own solution as no existing tool is up to the
task (at least we did not found one).
Well it looks to me like you are looking for perfection. When you have completed your perfect language please let us know so we can all start using it.

ALL languages are compromises and balance conflicting requirements. You are of course free to disagree with the decisions made for Lua, but that doesn't make them “crap”, it just means the Lua authors and you disagree on those compromises.

—Tim
Victor Krapivensky
2018-10-09 21:47:28 UTC
Permalink
Post by Sai Manoj Kumar Yadlapati
Hi all,
Lua supports its own version of regular expression matching.
But it doesn't have the | (pipe symbol) support and the quantifier support
- a{1,5} meaning a can occur anywhere from 1 to 5 times.
Both of these are present in PCRE. I am curious to know why these are not
supported.Is it not supported intentionally or was it never considered?
Thanks
Sai Manoj
To reinforce what Andrew said in his reply: please note that Lua patterns
are NOT regular expressions. That is they haven't got the same expressive
power as regexes, and that's /by design/. The goal was/is to keep Lua size
small.
I can't say if implementing alternation (i.e. that OR operator) will
increase Lua size by much, but I suspect it will.
I am not sure why everybody seems to believe that (non-PCRE) regular
expression engine has to be complex. See
https://swtch.com/~rsc/regexp/regexp1.html for implementation in less than
400 lines of C (probably less rewritten in a "modern" style).
-- Lorenzo
Ricardo Ramos Massaro
2018-10-10 18:16:06 UTC
Permalink
On Tue, Oct 9, 2018 at 6:47 PM Victor Krapivensky
See https://swtch.com/~rsc/regexp/regexp1.html for implementation in less than
400 lines of C (probably less rewritten in a "modern" style).
That's some really nice code, but it's missing *at least* one
essential feature that people expect
from a regex engine: captures.

The author calls captures "submatch extraction" and claims that
"Thompson-style algorithms
can be adapted to track submatch boundaries without giving up
efficient performance". If true,
it would be nice to see how simple and small a fast implementation can be.

- Ricardo
Roberto Ierusalimschy
2018-10-10 19:13:15 UTC
Permalink
Post by Ricardo Ramos Massaro
See https://swtch.com/~rsc/regexp/regexp1.html for implementation in less than
400 lines of C (probably less rewritten in a "modern" style).
That's some really nice code, but it's missing *at least* one
essential feature that people expect
from a regex engine: captures.
The author calls captures "submatch extraction" and claims that
"Thompson-style algorithms
can be adapted to track submatch boundaries without giving up
efficient performance". If true,
it would be nice to see how simple and small a fast implementation can be.
It also misses character classes. Although it shouldn't be difficult
to add them, I don't think the proposed implementation would handle
them nicely. Translating '[a-z]' to 'a|b|c|d|...|z' and then running
that non-deterministically would involve a lot of states. It also
doesn't show the 're2post' function, doesn't handle errors (e.g.,
out of memory), etc.

And, of course, it cannot implement back-references, as they are
NP-complete.

-- Roberto
Taj Khattra
2018-10-11 18:36:47 UTC
Permalink
On Wed, Oct 10, 2018 at 11:16 AM Ricardo Ramos Massaro <
Post by Ricardo Ramos Massaro
The author calls captures "submatch extraction" and claims that
"Thompson-style algorithms
can be adapted to track submatch boundaries without giving up
efficient performance". If true,
it would be nice to see how simple and small a fast implementation can be.
the next article in the series explains how to add support for submatch
tracking "simply" by adding a new bytecode instruction to a regexp vm:
https://swtch.com/~rsc/regexp/regexp2.html

Jay Carlson
2018-10-10 20:20:33 UTC
Permalink
I take no position on the suitability of this implementation.
Post by Victor Krapivensky
I am not sure why everybody seems to believe that (non-PCRE) regular
expression engine has to be complex. See
https://swtch.com/~rsc/regexp/regexp1.html for implementation in less than
400 lines of C (probably less rewritten in a "modern" style).
Counting semicolons:

$ cat *.c *.h | tr -c -d ';' | wc
0 1 656

Using https://dwheeler.com/sloccount/

$ sloccount regcomp.c regcomp.h regerror.c regerror.h regexec.c regsub.c regaux.c rregexec.c rregsub.c
[...]
Totals grouped by language (dominant language first):
ansic: 1139 (100.00%)
[...]

But with clang -O2 on Darwin-x86_64:

$ gsize --total -d libregexp9.a | tail -1
10500 0 752 11252 2bf4 (TOTALS)

And with clang -Os, it's 8293 bytes. gcc -Os on Linux is 7503 bytes.
--
Jay
Loading...