Discussion:
Questions about string.pack() and string.unpack()
Sean Conner
2014-11-20 01:06:27 UTC
Permalink
I'm looking over string.pack() and string.unpack() in Lua 5.3, and I have
a few questions:

1. Why is string.unpack(fmt,s[,pos]) instead of string.unpack(s,fmt[,pos])?
This means I can't do:

magic,timestamp,tag = raw:unpack("I4I4I4")

Instead, it's:

magic,timestamp,tag = string.unpack("I4I4I4",raw)

This also means I may have to cache the string table in a local variable
in a module when it might be avoided. The former also "reads" better
to me.

2. I notice there isn't an option to return the rest of the string. The
closest you have is the "c[n]" format, but that works when you know the
length of the string. In my case, the rest of the packet isn't binary
but all text and thus, the regular Lua patterns or LPeg can be used. I
could use raw:sub(pastbinaryportion) to get it, but again, it seems like
it could be a bit cleaner with a way to return just the rest of the
string.

3. Is the following a legal format?

">I4<i4"

That is, a big endian integer, followed by a little endian integer?

4. The Lua lpack module includes a way to specify native endian.
string.pack() and string.unpack() default to native endian, but if #3 is
valid, then a way to flip back to the native endian would be nice.

5. Could we get these functions in a standalone module for Lua 5.1 and 5.2?
I would love to be able to use these (lots of network parsing and what
not) but I'm currently stuck at Lua 5.1 at work for the time being [1].

-spc (Also, the link to "debug.sizeof" is broken in the 5.3 manual)

[1] Major reason being the remote possibility of using LuaJIT. While
currently we're running on SPARC (not supported by LuaJIT), there is
a very remote possiblity of a switch to Linux.
Coda Highland
2014-11-20 02:46:43 UTC
Permalink
Post by Sean Conner
I'm looking over string.pack() and string.unpack() in Lua 5.3, and I have
1. Why is string.unpack(fmt,s[,pos]) instead of string.unpack(s,fmt[,pos])?
magic,timestamp,tag = raw:unpack("I4I4I4")
magic,timestamp,tag = string.unpack("I4I4I4",raw)
This also means I may have to cache the string table in a local variable
in a module when it might be avoided. The former also "reads" better
to me.
The rationale is because most functions (e.g. printf) take their
format strings as the first parameter.

Why not ("l4l4l4"):unpack(raw)? That avoids the caching the string table.

/s/ Adam
Sean Conner
2014-11-20 03:08:02 UTC
Permalink
Post by Coda Highland
Post by Sean Conner
I'm looking over string.pack() and string.unpack() in Lua 5.3, and I have
1. Why is string.unpack(fmt,s[,pos]) instead of string.unpack(s,fmt[,pos])?
magic,timestamp,tag = raw:unpack("I4I4I4")
magic,timestamp,tag = string.unpack("I4I4I4",raw)
This also means I may have to cache the string table in a local variable
in a module when it might be avoided. The former also "reads" better
to me.
The rationale is because most functions (e.g. printf) take their
format strings as the first parameter.
Why not ("l4l4l4"):unpack(raw)?
The joke answer is: I want 4-byte unsigned integers, not an invalid
format trying to unpack native longs.

I suppose you could also do:

packetlayout = "I4I4I4"

magic,timestamp,tag = packetlayout:unpack(raw)

-- elsewhere in the code

raw = packetlayout:pack(MAGICCOOKIE,os.time(),myid)

which I could live with (DRY [1] and all that).

-spc

[1] Don't Repeat Yourself
Dirk Laurie
2014-11-20 10:43:57 UTC
Permalink
Post by Sean Conner
5. Could we get these functions in a standalone module for Lua 5.1 and 5.2?
http://www.inf.puc-rio.br/~roberto/struct
Sean Conner
2014-11-20 17:01:22 UTC
Permalink
Post by Dirk Laurie
Post by Sean Conner
5. Could we get these functions in a standalone module for Lua 5.1 and 5.2?
http://www.inf.puc-rio.br/~roberto/struct
Oh, thanks.

But I noticed that the format string between the two is slightly
different:

struct Lua 5.3
c0 s[n]
s z
Xop

The rest are all fine.

-spc
Oliver Kroth
2014-11-27 12:44:34 UTC
Permalink
Hi,

I tested the string.pack() and string.unpack in Lua 5.2.3 (copied them
to lstrlib.c), which works fine.

And of course :-) some questions came up:

1) why is in string.pack the default length for c (simple string) 1, and
not the string's length?
pack() throws an error if the string's length does not match the coded
length, but there is no method to encode the length as a variable

2) why is there no way to put literal characters into the pack()ed string?

3) the description for the 'X' specifier
"an empty item that aligns according to option op (which is otherwise
ignored)"
was not clear to me at first. Possibly
"align like but ignore the format option op"
would be faster to grab

--
Oliver
Roberto Ierusalimschy
2014-11-27 13:34:58 UTC
Permalink
Post by Oliver Kroth
1) why is in string.pack the default length for c (simple string) 1,
and not the string's length?
Coding a variable-length string without its length would be
unreadable (there would be no way to unpack it). (In particular,
we want that anything packed with some format 'f' should be readable
with exactly the same format.)
Post by Oliver Kroth
pack() throws an error if the string's length does not match the
coded length, but there is no method to encode the length as a
variable
Don't you mean option "s"?
Post by Oliver Kroth
2) why is there no way to put literal characters into the pack()ed string?
Binary data usually do not contain fixed stuff. (Even the name 'pack'
implies not wasting space...). You can always code literal characters
with a combination of "c" and literal strings.

-- Roberto
Oliver Kroth
2014-11-27 14:48:32 UTC
Permalink
Hello Roberto,
Post by Roberto Ierusalimschy
Post by Oliver Kroth
1) why is in string.pack the default length for c (simple string) 1,
and not the string's length?
Coding a variable-length string without its length would be
unreadable (there would be no way to unpack it). (In particular,
we want that anything packed with some format 'f' should be readable
with exactly the same format.)
There are a few encodings out there that don't put the length of the
string in front of it
Look at the PKZIP file header; there are three strings involved
(compressed file content, file name, and comment);
but their length is encoded in a separate field.

Possibly we need a special length encoding that is read from the
parameter list,
similar to the printf field width (or precision) specifier '*' :
-- get numerics
version, gpf, method, mtime, mdate, crc, compressed. length, namelen,
extralen,offset = ('<I2I2I2I2I2I4I4I2I2'):unpack(data)
-- get strings
filename, extra, content=("c*c*c*"):unpack( data, offset, namelen,
extralen, length)
Post by Roberto Ierusalimschy
Post by Oliver Kroth
pack() throws an error if the string's length does not match the
coded length, but there is no method to encode the length as a
variable
Don't you mean option "s"?
No, the s option takes the actual string's length and encodes it in as
many bytes as specified.
I mean the c option that requires a length that must match the string's
length.
Post by Roberto Ierusalimschy
Post by Oliver Kroth
2) why is there no way to put literal characters into the pack()ed string?
Binary data usually do not contain fixed stuff. (Even the name 'pack'
implies not wasting space...). You can always code literal characters
with a combination of "c" and literal strings.
False for "do not contain fixed stuff". Check RIFF files, PNG, PKZIP.
All of these contain fixed header strings.

True for the "c" plus literal statement

--
Oliver

Loading...