Discussion:
LPeg question about substitution captures with group captures
Sean Conner
2018-12-08 05:54:37 UTC
Permalink
I'm working on a personal project [1] and for some media types, I'm using
mailcap files to specify external programs to view media types not directly
supported by the program I'm writing. So I have a mailcap file:

application/x-foo; foo -t %t %s
application/x-bar; bar -t %t

This, I can parse [2]. The first field is the MIME type, followed by the
command to run, but there are substitutions that need to happen before the
command is run. The '%t' is replaced by the MIME type, and the '%s' is
replaced by the file; if '%s' is *NOT* specified, then the data is piped in
via stdin. This is where I'm having an issue. I would like to have LPeg do
the substitutions but the part I'm having trouble with is indicating if '%s'
was indeed, part of the command. While I could check to see if '%s' exists
in the string before I do the substition, I'd prefer if I didn't have to.

My current attempt:

lpeg = require "lpeg"

char = lpeg.P"%s" * lpeg.Carg(1) / "%1" * lpeg.Cg(lpeg.Cc(false),'redirect')
+ lpeg.P"%t" * lpeg.Carg(2) / "%1"
+ lpeg.R" ~"
cmd = lpeg.Cg(lpeg.Cc(true),'redirect')
* lpeg.Cs(char^1)
* lpeg.Cb'redirect'

print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
print(cmd:match("bar -t %t", 1,"/tmp/foo.bar","application/x-bar"))

This outputs:

foo -t application/x-foo /tmp/bar.foo true
bar -t application/x-bar true

I'd like the output to be:

foo -t application/x-foo /tmp/bar.foo false
bar -t application/x-bar true

Now, lpeg.Cg() states:

An anonymous group serves to join values from several captures into
a single capture. A named group has a different behavior. In most
situations, a named group returns no values at all. Its values are
only relevant for a following back capture or when used inside a
table capture.

and lpeg.Cs():

Creates a substitution capture, which captures the substring of the
subject that matches patt, with substitutions. For any capture
inside patt with a value, the substring that matched the capture is
replaced by the capture value (which should be a string). The final
captured value is the string resulting from all replacements.

I'm using a named group to track if I need redirection or not, and since a
named group does not return a value, it shouldn't affect the substitution
capture (and it doesn't). But the group capture in the char expression
seems to be ignored.

What's going on here? Am I misunderstanding the documentation?

-spc

[1] A gopher client for those curious.

[2] There's more to the format but I don't want to bog down the issue
more than I have to, and as I said, parsing the mailcap file isn't
the issue.
Sean Conner
2018-12-08 06:17:17 UTC
Permalink
Post by Sean Conner
lpeg = require "lpeg"
char = lpeg.P"%s" * lpeg.Carg(1) / "%1" * lpeg.Cg(lpeg.Cc(false),'redirect')
+ lpeg.P"%t" * lpeg.Carg(2) / "%1"
+ lpeg.R" ~"
cmd = lpeg.Cg(lpeg.Cc(true),'redirect')
* lpeg.Cs(char^1)
* lpeg.Cb'redirect'
print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
print(cmd:match("bar -t %t", 1,"/tmp/foo.bar","application/x-bar"))
foo -t application/x-foo /tmp/bar.foo true
bar -t application/x-bar true
foo -t application/x-foo /tmp/bar.foo false
bar -t application/x-bar true
I'm using a named group to track if I need redirection or not, and since a
named group does not return a value, it shouldn't affect the substitution
capture (and it doesn't). But the group capture in the char expression
seems to be ignored.
What's going on here? Am I misunderstanding the documentation?
I think I'm misunderstanding the documention. lpeg.Cb() states:

Creates a back capture. This pattern matches the empty string and
produces the values produced by the most recent group capture named
name (where name can be any Lua value).

Most recent means the last complete outermost group capture with the
given name. A Complete capture means that the entire pattern
corresponding to the capture has matched. An Outermost capture means
that the capture is not inside another complete capture.

In the same way that LPeg does not specify when it evaluates
captures, it does not specify whether it reuses values previously
produced by the group or re-evaluates them.

So even if I were to use lpeg.Cmt() to force evaluation of all nested
captures, I'm still not garenteed to get what I want (I think---I tried and
no, it still didn't work, but I would like to hear from Roberto if I'm
interpreting this correctly.

-spc (I would still like to find an LPeg solution, but not hopeful ... )
Dirk Laurie
2018-12-08 09:45:02 UTC
Permalink
Post by Sean Conner
So even if I were to use lpeg.Cmt() to force evaluation of all nested
captures, I'm still not garenteed to get what I want (I think---I tried and
no, it still didn't work, but I would like to hear from Roberto if I'm
interpreting this correctly.
1. Is this a question about what Cb, Cg and Cmt are supposed to do or
a challenge to achieve your task?
2. Are you aware that Lua without Lpeg can do that task effortlessly?
Sean Conner
2018-12-08 10:37:13 UTC
Permalink
This post might be inappropriate. Click to display it.
Dirk Laurie
2018-12-08 11:07:26 UTC
Permalink
Post by Sean Conner
Personally, I prefer LPeg as I find it easier to read than the Lua
patterns.
There is nothing that forces you to write a Lua pattern as a single
daunting string literal. You can compose it as table.concat{first,
second, third}, defining the components separately. As you would in a
decently written piece of Lpeg code.
Andrew Gierth
2018-12-08 09:20:29 UTC
Permalink
Sean> I'd like the output to be:

Sean> foo -t application/x-foo /tmp/bar.foo false
Sean> bar -t application/x-bar true

My solution:

local lpeg = require "lpeg"

local P,R = lpeg.P, lpeg.R
local Carg,Cc,C,Cg,Ct,Cs = lpeg.Carg, lpeg.Cc, lpeg.C, lpeg.Cg, lpeg.Ct, lpeg.Cs

local char = P"%s" * Carg(1) * Cg(Cc(true),'noredirect')
+ P"%t" * Carg(2)
+ C(P"%") -- I'm guessing a P"%%" * Cc"%" is missing here
+ C((R" ~" - P"%")^1)

local cmd = Ct( char^1 )

t = cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo")
print(table.concat(t), not t.noredirect)
t = cmd:match("bar -t %t", 1,"/tmp/foo.bar","application/x-bar")
print(table.concat(t), not t.noredirect)
--
Andrew.
Sean Conner
2018-12-08 22:32:20 UTC
Permalink
Post by Andrew Gierth
Sean> foo -t application/x-foo /tmp/bar.foo false
Sean> bar -t application/x-bar true
local lpeg = require "lpeg"
local P,R = lpeg.P, lpeg.R
local Carg,Cc,C,Cg,Ct,Cs = lpeg.Carg, lpeg.Cc, lpeg.C, lpeg.Cg, lpeg.Ct, lpeg.Cs
local char = P"%s" * Carg(1) * Cg(Cc(true),'noredirect')
+ P"%t" * Carg(2)
+ C(P"%") -- I'm guessing a P"%%" * Cc"%" is missing here
+ C((R" ~" - P"%")^1)
local cmd = Ct( char^1 )
t = cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo")
print(table.concat(t), not t.noredirect)
t = cmd:match("bar -t %t", 1,"/tmp/foo.bar","application/x-bar")
print(table.concat(t), not t.noredirect)
That's a nice solution and it lends itself to an easy way to support
redirection if required:

t = cmd:match(cmdstring,1,filename,type)
if not t.noredirect then
table.insert(t,string.format("< %s",filename))
end
print(table.concat(t),not t.noredirect)

The only change I'd make is to remove the double negation aspect of it.

local char = P"%s" * Carg(1) * Cg(Cc(false),'redirect')
+ ...
local cmd = Ct(Cg(Cc(true),'redirect') * char^1)

Unless that too, is technically undefined per the LPeg spec (I hope
not---I use that idiom [1] quite often).

-spc (Because I was taught don't use no double negatives ... )

[1] Of setting a table field to a defined value before parsing the rest
of the string.
Gabriel Bertilson
2018-12-08 22:34:17 UTC
Permalink
Cs is apparently a barrier that blocks outside access to all its
captures. They are only accessible to patterns inside of Cs, not those
outside. Minimal testcase:

local lpeg = require 'lpeg'
setmetatable(_ENV, {__index = lpeg})

local patt1 = Cg(Cc('test!'), 'not inside Cs') * Cb 'not inside Cs'
local patt2 = Cs(Cg(Cc('test!'), 'inside Cs')) * Cb 'inside Cs'

print(patt1:match '') -- no error
print(patt2:match '') -- error!

So you can't do a substitution over the whole "char" pattern and
access the capture named "redirect". But if Cs is put at a lower level
of the pattern, you can:

lpeg = require "lpeg"

char = lpeg.Cs(lpeg.P"%s" * lpeg.Carg(1) / "%1") *
lpeg.Cg(lpeg.Cc(false),'redirect')
+ lpeg.Cs(lpeg.P"%t" * lpeg.Carg(2) / "%1")
+ lpeg.C(lpeg.R" ~")
cmd = lpeg.Cg(lpeg.Cc(true),'redirect')
* lpeg.Cf(char^1, function (a, b) return a .. (b or "") end)
* lpeg.Cb'redirect'

print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
print(cmd:match("bar -t %t", 1,"/tmp/foo.bar","application/x-bar"))

It's a messy solution because it requires concatenating all the
captures from "char^1" (inefficient because it creates a bunch of
intermediate string objects!), I don't know if this is better or worse
than the solutions already posted.

— Gabriel
Post by Sean Conner
I'm working on a personal project [1] and for some media types, I'm using
mailcap files to specify external programs to view media types not directly
application/x-foo; foo -t %t %s
application/x-bar; bar -t %t
This, I can parse [2]. The first field is the MIME type, followed by the
command to run, but there are substitutions that need to happen before the
command is run. The '%t' is replaced by the MIME type, and the '%s' is
replaced by the file; if '%s' is *NOT* specified, then the data is piped in
via stdin. This is where I'm having an issue. I would like to have LPeg do
the substitutions but the part I'm having trouble with is indicating if '%s'
was indeed, part of the command. While I could check to see if '%s' exists
in the string before I do the substition, I'd prefer if I didn't have to.
lpeg = require "lpeg"
char = lpeg.P"%s" * lpeg.Carg(1) / "%1" * lpeg.Cg(lpeg.Cc(false),'redirect')
+ lpeg.P"%t" * lpeg.Carg(2) / "%1"
+ lpeg.R" ~"
cmd = lpeg.Cg(lpeg.Cc(true),'redirect')
* lpeg.Cs(char^1)
* lpeg.Cb'redirect'
print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
print(cmd:match("bar -t %t", 1,"/tmp/foo.bar","application/x-bar"))
foo -t application/x-foo /tmp/bar.foo true
bar -t application/x-bar true
foo -t application/x-foo /tmp/bar.foo false
bar -t application/x-bar true
An anonymous group serves to join values from several captures into
a single capture. A named group has a different behavior. In most
situations, a named group returns no values at all. Its values are
only relevant for a following back capture or when used inside a
table capture.
Creates a substitution capture, which captures the substring of the
subject that matches patt, with substitutions. For any capture
inside patt with a value, the substring that matched the capture is
replaced by the capture value (which should be a string). The final
captured value is the string resulting from all replacements.
I'm using a named group to track if I need redirection or not, and since a
named group does not return a value, it shouldn't affect the substitution
capture (and it doesn't). But the group capture in the char expression
seems to be ignored.
What's going on here? Am I misunderstanding the documentation?
-spc
[1] A gopher client for those curious.
[2] There's more to the format but I don't want to bog down the issue
more than I have to, and as I said, parsing the mailcap file isn't
the issue.
Loading...