Discussion:
LPEG: P(fct) seems not to consume input
Dirk Laurie
2018-10-31 19:00:17 UTC
Permalink
I define a function to be used with LPEG that simply returns the first
character. It _cannot_ match the empty string:

function fct(str)
if #str==0 then return false
else return 2,str:sub(1,1)
end
end

I turn it into an LPEG pattern:

lpeg.version() --> 1.0.0
patt=lpeg.P(fct)

"patt" is supposed to do exactly the same as lepg.C(1). It works OK on its own:
patt:match"" --> nil
patt:match"abc" --> a

But I can't make it match twice:
(patt*patt):match"abc" --> a a [expected: a b]

This seems to contradict the manual's statement that: "If the call
returns a number, the match succeeds and the returned number becomes
the new current position."
Sean Conner
2018-10-31 19:48:48 UTC
Permalink
Post by Dirk Laurie
I define a function to be used with LPEG that simply returns the first
function fct(str)
if #str==0 then return false
else return 2,str:sub(1,1)
end
end
The manual states:

lpeg.P (value)

...

If the argument is a function, returns a pattern equivalent
to a match-time capture over the empty string.

And for the match-time capture:

lpeg.Cmt(patt, function)

Creates a match-time capture. Unlike all other captures,
this one is evaluated immediately when a match occurs. It
forces the immediate evaluation of all its nested captures
and then calls function.

The given function gets as arguments the entire subject, the
current position (after the match of patt), plus any capture
values produced by patt.

So the function should be:

function fct(subject,position,capture)
if #subject == 0 then
return false
else
return position,subject:sub(position-1,position-1)
end
end
Post by Dirk Laurie
lpeg.version() --> 1.0.0
patt=lpeg.P(fct)
You aren't giving it any pattern to match, so I suspect it's the same as
if you did:

patt = lpeg.Cmt(lpeg.P"" * fct)
Post by Dirk Laurie
patt:match"" --> nil
patt:match"abc" --> a
(patt*patt):match"abc" --> a a [expected: a b]
Because the pattern, the empty string, is being matched twice by the two
calls. If you change it to:

patt = lpeg.P(lpeg.P(1) * fct)

it will work.
Post by Dirk Laurie
This seems to contradict the manual's statement that: "If the call
returns a number, the match succeeds and the returned number becomes
the new current position."
Also, your original code is always returning the first character,
regardless of where the match happens.

-spc
Dirk Laurie
2018-10-31 20:18:42 UTC
Permalink
Post by Sean Conner
Post by Dirk Laurie
I define a function to be used with LPEG that simply returns the first
function fct(str)
if #str==0 then return false
else return 2,str:sub(1,1)
end
end
function fct(subject,position,capture)
if #subject == 0 then
return false
else
return position,subject:sub(position-1,position-1)
end
end
Aha! I don't get *str+pos, I get *str,pos.

Let me try that in my notation:

function fct(str,pos)
if #str<pos then return false
else return pos+1, str:sub(pos,pos)
end
end

patt=lpeg.P(fct)
(patt^-4):match"abc" --> a b c [as expected, no fourth value]

But:
(patt^0):match"abc" --> stdin:1: loop body may accept empty string

I don't understand why I get that.
Sean Conner
2018-10-31 20:43:21 UTC
Permalink
Post by Dirk Laurie
Post by Sean Conner
Post by Dirk Laurie
I define a function to be used with LPEG that simply returns the first
function fct(str)
if #str==0 then return false
else return 2,str:sub(1,1)
end
end
function fct(subject,position,capture)
if #subject == 0 then
return false
else
return position,subject:sub(position-1,position-1)
end
end
Aha! I don't get *str+pos, I get *str,pos.
function fct(str,pos)
if #str<pos then return false
else return pos+1, str:sub(pos,pos)
end
end
function fct(str,pos)
if #str < pos then
return false
else
return pos + 1 , str:sub(pos,pos)
end
end

There, fixed that for you 8-P
Post by Dirk Laurie
patt=lpeg.P(fct)
(patt^-4):match"abc" --> a b c [as expected, no fourth value]
(patt^0):match"abc" --> stdin:1: loop body may accept empty string
I don't understand why I get that.
I don't either. When I get that error, I start making changes to the code
until the error goes away and the code does what I want. You could try:

(patt^1):match"abc" + lpeg.Cc(false)

-spc
Albert Chan
2018-10-31 22:30:50 UTC
Permalink
Post by Sean Conner
Post by Dirk Laurie
patt=lpeg.P(fct)
(patt^-4):match"abc" --> a b c [as expected, no fourth value]
(patt^0):match"abc" --> stdin:1: loop body may accept empty string
I don't understand why I get that.
I don't either. When I get that error, I start making changes to the code
(patt^1):match"abc" + lpeg.Cc(false)
-spc
that were a lpeg safety feature, by checking fixedlen(patt) > 0
Since patt = P(fct) is matching pattern "", fixedlen = 0

All this check is to avoid patt^n get into infinite loops.

P(fct) does not know fct will skip forward.
To be safe, lpeg assumed no skipping.
To be double safe, P(fct) is not allowed to go "backward".

patt^1 + Cc(false) will not compile.
Possible infinite loop situation remained.
Dirk Laurie
2018-11-01 05:02:18 UTC
Permalink
Post by Albert Chan
that were a lpeg safety feature, by checking fixedlen(patt) > 0
Since patt = P(fct) is matching pattern "", fixedlen = 0
All this check is to avoid patt^n get into infinite loops.
P(fct) does not know fct will skip forward.
No, it doesn't — but it is obvious and easy for P(fct) to call
fct("",1) and check that "false" is returned, and taking the message
"may accept empty string" literally, that's all that the xhwxk should
be worried about.

I have not yet had the temerity to look into the LPEG source, but this
time, I might.
Albert Chan
2018-11-01 12:01:02 UTC
Permalink
Post by Dirk Laurie
Post by Albert Chan
All this check is to avoid patt^n get into infinite loops.
P(fct) does not know fct will skip forward.
No, it doesn't — but it is obvious and easy for P(fct) to call
fct("",1) and check that "false" is returned, and taking the message
"may accept empty string" literally, that's all that the xhwxk should
be worried about.
That will be hard to do, checking not just position 1, but all others.
Also, fct(s,i) first argument is the string to be matched, not ""
It is possible fct() might stop advancing for some (s,i).

I had a similar issue trying to patch Lpeg to go backward.
https://github.com/achan001/LPeg-anywhere

The solution is to assume moving back n positions also have fixedlen of -n.
Infinite loops can happen, but you gain flexibility with matching.

lua> lpeg = require 'lpeg' -- my patched lpeg
lua> -- 3 steps forward, 2 steps back, fixedlen = 3-2 = 1
lua> patt = lpeg.C(3) * lpeg.B(-2)

lua> lpeg.match(patt^0, '123456')
123 234 345 456

Albert Chan
2018-10-31 21:15:41 UTC
Permalink
Post by Sean Conner
Because the pattern, the empty string, is being matched twice by the two
patt = lpeg.P(lpeg.P(1) * fct)
it will work.
With this patt, the check for empty string in fct is not needed
Since P(fct) does not capture anything, argument capture is not needed.

patt is just a 1 liner:

patt = 1 * P(function(s, i) return i, s:sub(i-1, i-1) end)
Dirk Laurie
2018-10-31 21:43:07 UTC
Permalink
Post by Albert Chan
Post by Sean Conner
Because the pattern, the empty string, is being matched twice by the two
patt = lpeg.P(lpeg.P(1) * fct)
it will work.
With this patt, the check for empty string in fct is not needed
Since P(fct) does not capture anything, argument capture is not needed.
I'll be using P(fct)/action in the application, which translates a
script to Lua.
Post by Albert Chan
patt = 1 * P(function(s, i) return i, s:sub(i-1, i-1) end)
This is a toy pattern, illustrating the problem I had at first, which
Sean has cleared up for me. The actual pattern I wish to capture is
based on a Lua pattern involving "%b", which in LPEG requires
techniques I have not mastered.
Sean Conner
2018-10-31 21:58:32 UTC
Permalink
Post by Dirk Laurie
This is a toy pattern, illustrating the problem I had at first, which
Sean has cleared up for me. The actual pattern I wish to capture is
based on a Lua pattern involving "%b", which in LPEG requires
techniques I have not mastered.
There was a thread kind of about this last year on this list. Start here
for a direct reference to %b and LPeg:

http://lua-users.org/lists/lua-l/2017-10/msg00126.html

-spc
Dirk Laurie
2018-11-01 05:38:23 UTC
Permalink
Post by Sean Conner
Post by Dirk Laurie
This is a toy pattern, illustrating the problem I had at first, which
Sean has cleared up for me. The actual pattern I wish to capture is
based on a Lua pattern involving "%b", which in LPEG requires
techniques I have not mastered.
There was a thread kind of about this last year on this list. Start here
http://lua-users.org/lists/lua-l/2017-10/msg00126.html
Memory, n.
The faculty by which a lua-l member remains aware of past own contributions.

I can still remember clearly mine :-), an opinion that has not changed.

http://lua-users.org/lists/lua-l/2017-10/msg00125.html

But thanks for reminding me of your demonstration of %b, and
especially that it could also do multibyte delimiters. It will solve
also the more tricky problem in my actual application.
Loading...