So If I uynderstand well, the metatable or table is not changed, what is
changed is only the presence of the object in the "list" of objects to be
finalized, which is filled at start of the mark phase with all known
objects, then removed from the list when they are reached from the stack
and marked as reachable
At end of the mark phase it remains a list of of unreachable objects that
will ned to be finalized; then the finalization step starts which takes
each object from the list and removes it, then it calls the finalizer if
there's one; but is there any action in the finalizer that determines that
the object will then be sweeped?
The ONLY action I see is the fact that it calls setmetatable(); you are
saying that this does NOT change the metatable, strange!
But it must also make something else and will then mark the object to be
not sweeped; however the call to setmetatable is not the end of the
finalizer which has still not returned to the GC sweeper; the finalizer may
still change the state of the metatable *after* calling setmetatable(), so
it could still set or remove its "__gc" entry. And there will nothing else
happening before the finalizer returns, so there will be nothing that can
actually set the required bit/flag property in the object itself properly.
Let's suppose that the GC then inspects the metatable at end to see if
there's a __gc entry mapped to a function: how can it determine that the
function called setmetatable() or changed the entry in its metatable and
differentiate it from the action of a finalizer that did nothing at all?
There must be an action taken by the finalizer to effectively indicate to
the GC that the object must not be sweeped and marked for later
finalization.
The finalizer may also resurrect that object by linking it to another
"live" object (i.e. a reachable object that has already been marked) and
also will not call any setmetatable, but it can also stil lset or reset the
__gc entry of its existing metatable.
All we know is that an object has a "state" which is active but still not
marked (possible only at start or during the marking phase, impossible
during the swep phase), active but marked, dead to finalize, finalized to
sweep, or resurrected (to be made active but still not marked again at end
of the sweep phase). This state is not enough to determine if what a
finalizer does (or does not do) will cause the object to be swept or to be
finalized again later.
The only reliable info is that, just before calling the finalizer, the GC
will clear the link of the object to its metatable: it is then up to the
finalizer to reattach the metatable by calling setmetatable with a suitable
__gc entry attached to a finalizer function (not necessarily the same
function as the current finalizer itself). If there's no such call to set
metatable, or if the finalizer clears the __gc entry or sets it to a
non-function, and if the object has not been resurrected by the finalizer
by linking it to a object with a "marked" status or an object with a "dead
to finalize" status (processed later in the same sweep cycle, then the
object will be swept by the GC just after the finalizer has returned.
That's what is not clearly documented: what is the effective status of the
object which differentiates an object being finalized to indicate to the GC
that it must not be swept after calling the finalizer? There must be an
action taken by the finalizer itself, but by default if this action is not
taken by the finalizer, then the finalization will be immediately followed
by sweeping.
And I only see the fact for a finalizer of calling setmetatable() to set or
restore the metatable which was detached from the object by the GC just
before calling the finalizer, simply by clearing the internal pointer to
the object's metatable, so when the finalizer will call setmetatable() to
set it to a non-nil value, this will have the desired effect of indicating
to the GC that the object must not be finalized
E.g.:
- a TCP network session socket that has been closed but is still kept for
about one minute in FIN_WAIT state, during which that socket may still be
resurrected, in order to reuse its allocated port number and allow fast
restart with its existing reception/transmission windows and MTU: this can
be useful for security against DOS attacks to avoid a server to eat all its
port number resources, but also for privacy reason to secure all sessions
- another usage is to allow closed files to have some delays before they
get flushed physically, or because the flush itself may be long and may
need to be tested and retried several times, before abandoning and logging
some severe errors to inform the user or the program itself that something
bad happened aynchronously without forcing the close() to be blocking until
flushing is fully completed.
- another usage may be to delay the power down of a previously used device
(e.g. turning off a screen display after several minutes when there was no
longer any new message to display), because turning on the device may be
very lengthy if it was turned off immediately after a close).
- another usage may be to unallocate other OS or external resources (e.g.
returning local memory used by Lua to the OS, by forcing all "weak" objects
to be deallocated, including for example caches, or deleting caches stored
in the filesystem that have expired a "grace delay" where they can still be
reused)
- another usage would be to start a
reorganization/optimization/defragmentation of the storage, or physicallly
storage entries that are no longer in use: this could be I/O intensive on
large volumes, and such clearing will be done after a grace period, where
it will be more easily performed with lower impact by performing it
sequentially instead of in random order on disk)
Basically finalizers are there to delay operations that can be postoned
without blocking the program that no longer needs immediately an object. It
still allows a program to reconstruct the object (notably weak" objects for
caches much faster if the underlying structures were not cleared and their
finalization was delayed for a grace period.
What you quoite explains is just that there are lists of objects from which
candidates are extracted, but it still does not indicate clearly which
action a finalizer takes to effectively change the state of the object so
that the GC will not sweep it when the finalizer will return. The GC must
then have already modified the state of the object (to indicate that it
MUST be swept) just before calling the finalizer and the finalizer takes an
optional decision to change again that state and indicate that now it MUST
NOT be swept by the GC: te finalizer itself cannot change the various lists
of objects maintained only by the GC itself, it cannot change its
"generation" models if generations are used in Lua 5.4 to subdivide the
lists of objects in smaller subsets, where GC and finalization will be
faster on live objects than objects in older generations that have survived
more than 1 cycle and are less likely of not needing to be swept rapidly).
Post by Philippe VerdyIt's not very well documented, but when a finalizer gets called on an
object, just before calling it, the GC first clears the associated
metatable if the object being finalized is a table: in the finalizer
for an object whose type is 'table' or 'userdata', if you use
getmetatable(self), it's not documented clearly if either you'll get
nil, or you'll get the same metatable whose "__gc" entry is now nill,
something that should be better, allowing you to store the "cnt"
variable inside the metatable itself along with the "__gc" variable,
instead of the object being finalized).
That's complete nonsense. Any modification of the metatable would be
unsafe as these are commonly used on several objects (though not in this
example), so the collection / finalization of the first such object
would break the finalization of all other objects with the same shared
metatable.
Post by Philippe VerdyFor an object (table or userdata) to be finalized when collected, you
must mark it for finalization. You mark an object for finalization
when you set its metatable and the metatable has a field indexed by
the string `"__gc"`. Note that if you set a metatable without a
`__gc` field and later create that field in the metatable, the object
will not be marked for finalization.
And §3
Post by Philippe VerdyWhen a marked object becomes garbage, it is not collected immediately
by the garbage collector. Instead, Lua puts it in a list. After the
collection, Lua goes through that list. For each object in the list,
it checks the object's __gc metamethod: If it is a function, Lua
calls it with the object as its single argument; if the metamethod is
not a function, Lua simply ignores it.
And further §5
Post by Philippe VerdyBecause the object being collected must still be used by the
finalizer, that object (and other objects accessible only through
it) must be resurrected by Lua. Usually, this resurrection is
transient, and the object memory is freed in the next
garbage-collection cycle. However, if the finalizer stores the object
in some global place (e.g., a global variable), then the resurrection
is permanent. Moreover, if the finalizer marks a finalizing object
for finalization again, its finalizer will be called again in the
next cycle where the object is unreachable. In any case, the object
memory is freed only in a GC cycle where the object is unreachable
and not marked for finalization.
(I wouldn't call that "not very well documented"âŠ)
If, when you setmetatable(), there's _anything_ non-nil at `__gc` in the
metatable, the thing gets flagged for finalization. (This is a property
of the table/userdata, not the metatable.)
When the thing is later collected and it has the "to be finalized" bit
set, this bit is cleared and, if _at this point_ the value at `__gc` in
the metatable is a function, that function gets run.
(And no matter what it'll do, the object survives until the next
collection. Now _usually_, the "to be finalized" bit isn't re-enabled
by the `__gc` method and so the thing will be collected normally by the
next cycle⊠but you can re-flag it (by again calling setmetatable()
using a metatable with a `__gc` field), and even keep it around
indefinitely in an "undead" state â it's "dead" / fully unreachable from
the rest of the Lua state (hooks don't run during `__gc`), but it can
still do arbitrary stuff with the state.)
A fun / silly use of that is to make the computer beep on every
setmetatable( {}, { __gc = function(t)
io.stderr:write("\7") ; setmetatable(t,getmetatable(t)) end }
)
(This is easy to pre-load via the `-e` / `-l` options, and might be
useful for debugging⊠in fact, the Lua tests do something similar, just
writing a '.' for every collection instead of making it beep.)
You might also (ab)use this to trigger bookkeeping tasks (once per GC
cycle), if you have no better way to do that. (A fixed "every $n
invocations of a function" scheme might not work (it could fire _both_
too rarely and too often, at different times), and in certain restricted
situations (games etc.), this might be as good as it gets⊠but note that
this is slightly racy â _any_ allocation can trigger a GC cycle, so
protect your data structures / make sure you're not reading inconsistent
state when triggered in the middle of some change.)
And of course there's LOTS of other stuff that you can doâŠ
-- nobody