seqlua: Extension for handling sequential data in Lua ===================================================== This package is an experimental extension for the Lua 5.2 programming language which: * allows ``ipairs(seq)`` to accept either tables or functions (i.e function iterators) as an argument, * adds a new function ``string.concat(separator, seq)`` that concats either table entries or function return values, * provides auxiliary C functions and macros to simplify iterating over both tables and iterator functions with a generic statement. Existing ``__ipairs`` or ``__index`` (but not ``__len``) metamethods are respected by both the Lua functions and the C functions and macros. The ``__ipairs`` metamethod takes precedence over ``__index``, while the ``__len`` metamethod is never used. Metamethod handling in detail is explained in the last section ("Respected metamethods") at the bottom of this README. In Lua, this extension is loaded by ``require "seqlua"``. In order to use the auxiliary C functions and macros, add ``#include `` to your C file and ensure that the functions implemented in ``seqlualib.c`` are statically or dynamically linked with your C Lua library. Motivation ---------- Sequential data (such as arrays or streams) is often represented in two different ways: * as an ordered set of values (usually implemented as an array in other programming languages, or as a sequence in Lua: a table with numeric keys {1..n} associated with a value each), * as some sort of data stream (sometimes implemented as a class of objects providing certain methods, or as an iterator function in Lua: a function that returns the next value with every call, where nil indicates the end of the stream). Quite often, when functions work on sequential data, it shouldn't matter in which form the sequential data is being provided to the function. As an example, consider a function that is writing a sequence of strings to a file. Such function could either be fed with an array of strings (a table with numeric keys in Lua) or with a (possibly infinite) stream of data (an iterator function in Lua). A function in Lua that accepts a table, might look like as follows: function write_lines(lines) for i, line in ipairs(lines) do io.stdout:write(line) io.stdout:write("\n") end end In contrast, a function in Lua that accepts an iterator function would have to be implemented differently: function write_lines(get_next_line) for line in get_next_line do io.stdout:write(line) io.stdout:write("\n") end end If one wanted to create a function that accepts either a sequence in form of a table or an iterator function, then one might need to write: do local function write_line(line) io.stdout:write(line) io.stdout:write("\n") end function write_lines(lines) if type(lines) == "function" then for line in lines do write_line(line) end else for i, line in ipairs(lines) do write_line(line) end end end end Obviously, this isn't something we want to do in every function that accepts sequential data. Therefore, we usually decide for one of the two first forms and thus disallow the other possible representation of sequential data to be passed to the function. This extension, however, modifies Lua's ``ipairs`` statement in such way that it automatically accepts either a table or an iterator function as argument. Thus, the first of the three ``write_lines`` functions above will accept both (table) sequences and (function) iterators. In addition to the modification of ``ipairs``, it also provides C functions and macros to iterate over values in the same manner as a generic loop statement with ``ipairs`` would do. This extension doesn't aim to supersede Lua's concept of iterator functions. While metamethods (see section "Respected metamethods" below) may be used to customize iteration behavior on values, this extension isn't thought to replace the common practice to use function closures as iterators. Consider the following example: function write_lines(lines) for i, line in ipairs(lines) do io.stdout:write(line) io.stdout:write("\n") end end local result = sql_query("SELECT * FROM actor ORDER BY birthdate") -- assert(type(result:get_column_entries("name")) == "function") write_lines(result:get_column_entries("name")) Note, however, that in case of repeated or nested loops, using function iterators may not be feasible: function print_list_twice(seq) for i = 1, 2 do for i, v in ipairs(seq) do print(v) end end end print_list_twice(io.stdin:lines()) -- won't work as expected Where desired, it is possible to use metamethods to customize iteration behavior: function print_rows(rows) for i, row in ipairs(rows) do print_row(row) end end local result = sql_query("SELECT * FROM actor ORDER BY birthday") assert(type(result) == "userdata") -- we may rely on the ``__index`` or ``__ipairs`` metamethod to -- iterate through all result rows here: print_rows(result) -- no need to use ":rows()" or a similar syntax -- but we can also still pass an individual set of result rows to the -- print_rows function: print_rows{result[1], result[#result]} This extension, however, doesn't respect the ``__len`` metamethod due to the following considerations: * An efficient implementation where ``for i, v in ipairs(tbl) do ... end`` does neither create a closure nor repeatedly evaluate ``#tbl`` seems to be impossible. * Respecting ``__len`` could be used to implement sparse arrays, but this would require iterating functions to expect ``nil`` as a potential value. This may lead to problems because ``nil`` is usually also used to indicate the absence of a value. Though, if such behavior is desired, it can still be implemented through the ``__ipairs`` metamethod. Unless manually done by the user in the ``__ipairs`` metamethod, the ``ipairs`` function as well as the corresponding C functions and macros provided by this extension never create any closures or other values that need to be garbage collected. Lua part of the library ----------------------- The modified ``ipairs(seq)`` and the new ``string.concat(sep, seq)`` functions accept either a table or a function as ``seq``. This is demonstrated in the following examples: require "seqlua" t = {"a", "b", "c"} for i, v in ipairs(t) do print(i, v) end -- prints: -- 1 a -- 2 b -- 3 c print(string.concat(",", t)) -- prints: a,b,c function alphabet() local letter = nil return function() if letter == nil then letter = "a" elseif letter == "z" then return nil else letter = string.char(string.byte(letter) + 1) end return letter end end for i, v in ipairs(alphabet()) do print(i, v) end -- prints: -- 1 a -- 2 b -- 3 c -- ... -- 25 y -- 26 z print(string.concat(",", alphabet())) -- prints: a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z function filter(f) return function(seq) return coroutine.wrap(function() for i, v in ipairs(seq) do f(v) end end) end end alpha_beta_x = filter(function(v) if v == "a" then coroutine.yield("alpha") elseif v == "b" then coroutine.yield("beta") elseif type(v) == "number" then for i = 1, v do coroutine.yield("X") end end end) print((","):concat(alpha_beta_x{"a", 3, "b", "c", "d"})) -- prints: alpha,X,X,X,beta print((","):concat(alpha_beta_x(alphabet()))) -- prints: alpha,beta C part of the library --------------------- In ``seqlualib.h``, the following macro is defined: #define seqlua_iterloop(L, iter, idx) \ for ( \ seqlua_iterinit((L), (iter), (idx)); \ seqlua_iternext(iter); \ ) and #define seqlua_iterloopauto(L, iter, idx) \ for ( \ seqlua_iterinit((L), (iter), (idx)); \ seqlua_iternext(iter); \ lua_pop((L), 1) \ ) This macro allows iteration over either tables or iterator functions as the following example function demonstrates: int printcsv(lua_State *L) { seqlua_Iterator iter; seqlua_iterloop(L, &iter, 1) { if (seqlua_itercount(&iter) > 1) fputs(",", stdout); fputs(luaL_tolstring(L, -1, NULL), stdout); // two values need to be popped (the value pushed by // seqlua_iternext and the value pushed by luaL_tolstring) lua_pop(L, 2); } fputs("\n", stdout); return 0; } printcsv{"a", "b", "c"} -- prints: a,b,c printcsv(assert(io.open("testfile")):lines()) -- prints: line1,line2,... of "testfile" NOTE: During iteration using ``seqlua_iterloop``, ``seqlua_iterloopauto``, or ``seqlua_iterinit``, three extra elements are stored on the stack (additionally to the value). These extra elements are removed automatically when the loop ends (i.e. when ``seqlua_iternext`` returns zero). The value pushed onto the stack for every iteration step has to be removed manually from the stack, unless ``seqlua_iterloopauto`` is used. Respected metamethods --------------------- Regarding the behavior of the Lua functions and the C functions and macros provided by this extension, an existing ``__index`` metamethod will be respected automatically. An existing ``__ipairs`` metamethod, however, takes precedence. If the ``__ipairs`` field of a value's metatable is set, then it must always refer to a function. When starting iteration over a value with such a metamethod being set, then this function is called with ``self`` (i.e. the value itself) passed as first argument. The return values of the ``__ipairs`` metamethod may take one of the following 4 forms: * ``return function_or_callable, static_argument, startindex`` causes the three arguments to be returned by ``ipairs`` without further modification. Using the C macros and functions for iteration, the behavior is according to the generic loop statement in Lua: ``for i, v in function_or_callable, static_argument, startindex do ... end`` * ``return "raw", table`` will result in iteration over the table ``table`` using ``lua_rawgeti`` * ``return "index", table_or_userdata`` will result in iteration over the table or userdata while respecting any ``__index`` metamethod of the table or userdata value * ``return "call", function_or_callable`` will use the callable value as (function) iterator where the function is expected to return a single value without any index (the index is inserted automatically when using the ``ipairs`` function for iteration) These possiblities are demonstrated by the following example code: require "seqlua" do local function ipairsaux(t, i) i = i + 1 if i <= 3 then return i, t[i] end end custom = setmetatable( {"one", "two", "three", "four", "five"}, { __ipairs = function(self) return ipairsaux, self, 0 end } ) end print(string.concat(",", custom)) -- prints: one,two,three -- (note that "four" and "five" are not printed) tbl = {"alpha", "beta"} proxy1 = setmetatable({}, {__index = tbl}) for i, v in ipairs(proxy1) do print(i, v) end -- prints: -- 1 alpha -- 2 beta proxy2 = setmetatable({}, { __ipairs = function(self) return "index", proxy1 end }) for i, v in ipairs(proxy2) do print(i, v) end -- prints: -- 1 alpha -- 2 beta print(proxy2[1]) -- prints: nil cursor = setmetatable({ "alice", "bob", "charlie", pos=1 }, { __call = function(self) local value = self[self.pos] if value == nil then self.pos = 1 else self.pos = self.pos + 1 end return value end, __ipairs = function(self) return "call", self end }) for i, v in ipairs(cursor) do print(i, v) end -- prints: -- 1 alice -- 2 bob -- 3 charlie print(cursor()) -- prints: alice for i, v in ipairs(cursor) do print(i, v) end -- prints: -- 1 bob -- 2 charlie -- (note that "alice" has been returned earlier) coefficients = setmetatable({1.25, 3.14, 17.5}, { __index = function(self) return 1 end, __ipairs = function(self) return "raw", self end }) for i, v in ipairs(coefficients) do print(i, v) end -- prints: -- 1 1.25 -- 2 3.14 -- 3 17.5 -- (note that iteration terminates even if coefficients[4] == 1) print(coefficients[4]) -- prints: 1