lines: view files as list of lines

The lines library (and it's submodules) enable line-based processing of text. The #lines module itself exports an API that work on both a list of strings (aka lines) or deal with files as indexed lines (i.e. #lines.File)

lines is the fundamental building block of civstack's ele editor and pegl> parsers: it is much easier as a developer to think about human-readable text as a list of lines then as a list of bytes and this is especially true for an editor. #lines.File and #lines.EdFile use a separate index file to make reading and writing to files as lines actually performant for more real-world use-cases.

sub / insert / remove semantics

lines.sub(t, l,c, l2,c2) returns the text from span l.c -> l2.c2.

For instance: suppose you have the following text:

1234 6789

abcd fghi

You would get the following values:

1.1 1.2	{'12'}
1.6 1.10	{'6789', ''} - last char goes to next line
1.6 2.0	{'6789', ''} - next line zero char the same
1.6 2.2	{'6789', 'ab'}
1.10 2.2	{'', 'ab'}
1.10 3.0	{'', 'abcd fghi'} - EoF does not have new line

The methods span remove offset offsetOf insert are all designed to use these conventions to enable reversibility. When you remove a span, it will modify the lines object in-place, returning the span you removed. If you re-insert that span in the same place it will return the table to it's previous state. Along with being easy to understand, this architecture if fundamental to how undo/redo works in the Ele editor.

Mod lines

The lines module, providing a uniform API for lines-like objects.

You can also call this module directly to get a table of lines

from a string

Functions

fn join(t) -> string
Join a table of strings with newlines.
fn span(l, c, l2, c2) -> (l, c?, l2, c2?)
Enables addressing lines via either (l,l2) or (l,c, l2,c2) span.
fn bound(t, l,c, tlen, ln) -> l, c
Bound the line/col for the lines table.
- l will be from 1 to #t+1.
- c will be from 0 to #t[l]+1.
tlen is precomputed #t and line is pre-fetched t[l]
This can handle negative integers.
fn boundSpan(t, l,c, l2,c2, tlen)
Bound a span from l,c -> l2,c2.
fn insert(t, ins, l,c) -> nil
insert string at l, c

Note: this is NOT performant (O(N)) for large tables.
See: #Gap (or similar) for handling real-world workloads.
fn sort(...) -> l1, c1, l2, c2
Sort the span
fn sub(l, ...) -> {str}, l,c
Get the sub-span of the lines.
fn usub(l, ...) -> {str}, l,c
Get the UTF8 aware sub-span of the lines.
fn map(lines) -> table
create a table of lineText -> {lineNums}
fn offset(t, off, l,c) -> l2,c2
Get the l, c with the +/- offset applied
fn offsetOf(t, l,c, l2,c2) -> int
get the byte offset
fn find(t, pat, l,c) -> (l, c, c2)
find the pattern starting at l/c Note: matches are only within a single line.
fn findBack(t, pat, l,c)
find the pattern (backwards) starting at l/c
fn remove(t, l,c, l2,c2) -> string|table
remove span (l, c) -> (l2, c2), return what was removed
fn box(t, l1, c1, l2, c2, fill) -> lines
return the box of the lines.
Outside the box is not returned.
***1------------------------+**
***|l1,c1 = top left |**
***| bot right = l2,c2|
***+------------------------2**
*So no '*' chars are returned.*
fn getIndent(t, l) -> str?
Get the indentation of line.
fn autoIndent(t, l) -> string?
Get the autoIndent to use for line.
fn load(f, close) -> (table?, errstr?)
load lines from file or path. On error return (nil, errstr)
fn dump(t, f, close, chunk)
write lines t to file f in chunks (default = 16KiB) if f is a string then it is opened as a file and closed when done
fn write(t, ...) -> true
Logic to make a table behave like a file:write(...) method.
This is NOT performant, especially for large lines.

Command lines.diff

Diffing module and command
Cmd Usage: ldiff 'file/path1.txt' 'file/path2.txt'
Lib Usage: io.fmt(ldiff.Diff(linesA, linesB))

This library/cmd creates readable diffs using the "patience diff" alorithm. The code was written from scratch referencing only the algorithm outline below, but I want to give special thanks to James Coglan for his excellent blog post.

Fundamentals of patience diff:

Skip unchanged lines on both top and bottom.
Find unique lines in both sets and "align" them using "longest increasing sequence".
Repeat for each aligned section.

Types: Diff

Record Diff

Datastructure which holds the result of computing the difference between two lists of lines.

Fields b and c are just the original base/change lines.

noc, rem and add are lists of integers which represent the length of a block. For instance, if for a given index rem=3 and add=2 it means that three lines were removed from b and two were added to c. If noc=10 that means that there is a block of 10 identical lines.

Fields:

b base, aka raw original lines
c change, aka raw new lines
len len of diff blocks (aka len of below fields). It's not possible to use # for below, since some values are nil.
noc nochange range (in both)
rem removed from b
add added from c

Methods

fn:map(nocFn, chgFn)
Iterate through nochange and change blocks, calling the functions for each
- nocFn(baseStart, numUnchanged, changeStart, numUnchanged)
- chgFn(baseStart, numRemoved, changeStart, numAdded)
Note that the num removed/added will be nil if none were added/removed.

Record lines.Writer

Deprecated: use ds.bytearray instead. This will be removed.

A lines table with a write method and a few other file-like methods. This is NOT performant, especially for small writes or large lines. It is useful for tests and cases where simplicity is more important than performance.

Methods

fn set(name)
Create a parser spec record. These have the fields kind and name and must define the parse method.
fn get(name)
Create a parser spec record. These have the fields kind and name and must define the parse method.
fn write(t, ...) -> true
Logic to make a table behave like a file:write(...) method.
This is NOT performant, especially for large lines.
fn flush()
function that does and returns nothing.
fn extend(r, l) -> r
This is used by types implementing :extend. It uses their get and set methods to implement extend in a for loop.
types do this if they may yield in their get/set, which is not allowed through a C boundary like table.move
fn icopy(r)
For types implementing :copy() method.

Record Gap

Line-based gap buffer. The buffer is composed of two lists (stacks) of lines

The "bot" (aka bottom) contains line 1 -> curLine. curLine is at #bot. Data gets added to bot.
The "top" buffer is used to store data in lines after "bot" (aka after curLine). If the cursor is moved to a previous line then data is moved from top to bot

Gap gives a file-like write API which may not be the most performant for some workloads (writing single characters)

Fields:

top array of lines on the top (near start).
bot array of lines on the bottom (near end).
path the path this was read from or nil.
readonly whether to throw errors on write.

Methods

fn:icopy() -> list
Make a copy of the gap to a lua list.
fn:reader() -> Gap
fn load(T, f, close) -> Gap?, err?
Load gap from file, which can be a path. returns nil, err on error
fn:get(l) -> string
Get a specific line index.
fn:set(l, v)
Set a specific line index with the value.
fn:inset(i, values, rmlen) -> rm?
See ds.inset for documentation.
fn:extend(lns) -> self
Extend gap with the lines.
fn:setGap(l)
set the gap to the line number, making l == #g.bot.
fn:write(...)
fn dumpf(t, f, close, chunk)
write lines t to file f in chunks (default = 16KiB) if f is a string then it is opened as a file and closed when done

Record lines.U3File

A file of 3 byte (24 bit) integers. These are commonly used for indexing lines.

This object supports get/set index operations including appending. Every operation (except consecutive reads/writes) requires a file seek.

Fields:

f
path
mode
len
sz the size of each value

Methods

fn create(T, ...) -> icreate(T, 3, ...)
fn:reload() -> IFile?, errmsg?
Reload IFile from path.
fn load(T, ...) -> iload(T, 3, ...)
fn:flush()
fn:close()
fn:closed() -> bool
fn:getbytes(i)
get bytes. If index out of bounds return nil. Panic if there are read errors.
fn:get(i)
get value at index
fn:setbytes(i, v)
fn:set(i, v)
set value at index
fn:move(to, mvFn) -> self
Move the IFile's path to to.
mv must be of type fn(from, to). If not provided, civix.mv will be used.
This can be done on both closed and opened files.
The IFile will re-open on the new file regardless of the previous state.
fn:reader() -> IFile?, err?
Get a new read-only instance with an independent file-descriptor.
Warning: currently the reader's len will be static, so this should be mostly used for temporary cases. This might be changed in the future.

Record File

Usage: File{'path/to/file.txt', mode='r'}
Indexed file of lines supporting modes 'r' and 'a+'.

use EdFile instead if you need to do non-append edits

Fields:

path path of this file.
mode 'r', 'a' or 'a+'
f open (normal) file object
idx line index of f
cache cache of lines
loadIdxFn default=lines.futils.loadIdx

Methods

fn:close()
fn:flush() -> ok, errmsg?
fn:write(...) -> ok, errmsg?
append to file
fn:get(i) -> line
Get line at index
fn:set(i, v)
Set line at index
fn:reader() -> lines.File?, err?
Get a new read-only instance with an independent file-descriptor.
This allows reading the file while another coroutine writes it (via lap.html).

Record EdFile

EdFile: an editable line-based file object, optimized for indexed and consequitive reads and writes

Usage:

local ed = EdFile(path, mode);

ed:set(1, 'first line')

ed:set(2, 'second line')

ed:set(1, 'changed first line')

ed:close()

Fields:

lf indexed append-only file.
dats list of Slc | Gap objects.
lens rolling sum of dat lengths.

Methods

fn:get(i) -> line
Get line at index
fn:write(...) -> self?, errmsg?
fn:set(i, v)
Set line at index.
fn:reader()
Return a read-only view of the EdFile which shares the associated data structures.
fn:flush()
Flush the .lf member (which can only be extended). To write all data to disk you must call :dumpf().
fn:close()
Note: to write all data to disk you must call :dumpf().
fn:dumpf(f)
Dump contents to file or path.
fn:extend(values)
Appends to lf for extend when possible.
fn:inset(i, values, rmlen) -> rm?
insert into EdFile's dats.

Mod lines.futils

utilities for file loading of lines. Generally users shouldn't

need to use this file.

Functions

fn forceLoadIdx(f, idxpath)
Can be usd instead of loadIdx to force a reload of the index, ignoring modification times/etc.
This is useful in some situtations where stat is not available.
fn loadIdx(f, idxpath, fmode, reindex) -> idxFile
load or reindex the file at path to/from idxpath.

Mod lines.motion

Helper methods for moving a cursor around a lines-like 2D grid.

The notation l.c is used to refer to line, column where

both are indexed by 1.

Functions

fn decDistance(s, e) -> int
Move s closer to e by 1.
If they are equal do nothing.
fn lcLe(l, c, l2, c2) -> bool
Return whether l.c is equal to or before l2.c2.
fn lcGe(l, c, l2, c2) -> bool
Return whether l.c is equal to or after l2.c2
fn topLeft(l, c, l2, c2) -> (l, c)
Return the top-left (aka the minimum) of two points.
fn lcWithin(l, c, l1, c1, l2, c2) -> bool
fn wordKind(ch) -> ws|sym|let
Given a character, return it's word-kind: ws (whitespace), sym (symbol), let (letter).
fn pathKind(ch) -> ws|sym|path
Given a character, return it's path-kind: ws (whitespace), sym (symbol), path (path)
fn forword(s, si, getKind) -> int
Get the start of the next word from si (start-index).
fn backword(s, ei, getKind) -> int
Get the start of the previous word from ei (end-index).
fn getRange(s, i, getKind) -> si,ei
get the range[si,ei] of whatever is at s[i].
fn findBack(s, pat, ei, plain) -> int
find backwards from ei (end index).
This searches for the pattern and returns the LAST one found. This is HORRIBLY non-performant, only use for small amounts of data (like a line).

table: raw table

Mod lines.kev

kev: "Key Equal Value" serialization format.

This is an extremely common format in many unix utilities, "good enough"

for a large number of configuration use cases. The format is simple: a file

containing lines of key=value. The input and output are a table of

key,val strings (though tostring is called for to()). Lines which start

with # or don't have = in them are ignored.

Nested data is absolutely not supported. Spaces are treated as literal both

before and after =. If you want a key containing = or key/value

containing newline then use a different format (or write your own).

Functions

fn to(t)
convert to a table of key=value lines.
fn from(lines, to)
convert key=value lines to a table.
fn load(f)
fn dump(t)