lines: view files as list of lines

The lines library (and it's submodules) enable line-based processing of text. The #lines module itself exports an API that work on both a list of strings (aka lines) or deal with files as indexed lines (i.e. #lines.File)

lines is the fundamental building block of civstack's ele editor and pegl> parsers: it is much easier as a developer to think about human-readable text as a list of lines then as a list of bytes and this is especially true for an editor. #lines.File and #lines.EdFile use a separate index file to make reading and writing to files as lines actually performant for more real-world use-cases.

sub / insert / remove semantics

lines.sub(t, l,c, l2,c2) returns the text from span l.c -> l2.c2.

For instance: suppose you have the following text:

1234 6789
abcd fghi

You would get the following values:

1.1 1.2 {'12'}
1.6 1.10 {'6789', ''} - last char goes to next line
1.6 2.0 {'6789', ''} - next line zero char the same
1.6 2.2 {'6789', 'ab'}
1.10 2.2 {'', 'ab'}
1.10 3.0 {'', 'abcd fghi'} - EoF does not have new line

The methods span remove offset offsetOf insert are all designed to use these conventions to enable reversibility. When you remove a span, it will modify the lines object in-place, returning the span you removed. If you re-insert that span in the same place it will return the table to it's previous state. Along with being easy to understand, this architecture if fundamental to how undo/redo works in the Ele editor.

Mod lines

The lines module, providing a uniform API for lines-like objects.

You can also call this module directly to get a table of lines

from a string

Functions

Command lines.diff

Diffing module and command
Cmd Usage: ldiff 'file/path1.txt' 'file/path2.txt'
Lib Usage: io.fmt(ldiff.Diff(linesA, linesB))

This library/cmd creates readable diffs using the "patience diff" alorithm. The code was written from scratch referencing only the algorithm outline below, but I want to give special thanks to James Coglan for his excellent blog post.

Fundamentals of patience diff:

Types: Diff

Record Diff

Datastructure which holds the result of computing the difference between two lists of lines.

Fields b and c are just the original base/change lines.

noc, rem and add are lists of integers which represent the length of a block. For instance, if for a given index rem=3 and add=2 it means that three lines were removed from b and two were added to c. If noc=10 that means that there is a block of 10 identical lines.

Fields:

Methods

Record lines.Writer

Deprecated: use ds.bytearray instead. This will be removed.

A lines table with a write method and a few other file-like methods. This is NOT performant, especially for small writes or large lines. It is useful for tests and cases where simplicity is more important than performance.

Methods

Record Gap

Line-based gap buffer. The buffer is composed of two lists (stacks) of lines

Gap gives a file-like write API which may not be the most performant for some workloads (writing single characters)

Fields:

Methods

Record lines.U3File

A file of 3 byte (24 bit) integers. These are commonly used for indexing lines.

This object supports get/set index operations including appending. Every operation (except consecutive reads/writes) requires a file seek.

Fields:

Methods

Record File

Usage: File{'path/to/file.txt', mode='r'}
Indexed file of lines supporting modes 'r' and 'a+'.

use EdFile instead if you need to do non-append edits

Fields:

Methods

Record EdFile

EdFile: an editable line-based file object, optimized for indexed and consequitive reads and writes

Usage:

local ed = EdFile(path, mode);
ed:set(1, 'first line')
ed:set(2, 'second line')
ed:set(1, 'changed first line')
ed:close()

Fields:

Methods

Mod lines.futils

utilities for file loading of lines. Generally users shouldn't

need to use this file.

Functions

Mod lines.motion

Helper methods for moving a cursor around a lines-like 2D grid.

The notation l.c is used to refer to line, column where

both are indexed by 1.

Functions

table: raw table

Mod lines.kev

kev: "Key Equal Value" serialization format.

This is an extremely common format in many unix utilities, "good enough"

for a large number of configuration use cases. The format is simple: a file

containing lines of key=value. The input and output are a table of

key,val strings (though tostring is called for to()). Lines which start

with # or don't have = in them are ignored.

Nested data is absolutely not supported. Spaces are treated as literal both

before and after =. If you want a key containing = or key/value

containing newline then use a different format (or write your own).

Functions