Name

rtree-bsrt — BSRT serialisation of R-trees

DESCRIPTION

The rtree_bsrt_write function writes a simple serialised version of an R-tree to a file in the BSRT (binary serialised R-tree) format. This file can used to rebuild the tree in other programs. These files can be created from CSV files using the rtree-build (1) utility.

BSRT files are functionally equivalent to the JSON files described in rtree-json (5) but are typically a tenth of the size.

Outline

The file begins with the "magic number" BSRt, then the version of the format as a little-endian 2-byte unsigned integer (a uint16_t). At present the version is two. The remainder of the file is version-dependant, but always ends with the footer "fin", since I like French cinema.

In version two, the body of the file consists of a header, six little-endian uint16_t: the dimension, page-size, the size (in bytes) of the rtree_coord_t type, the size of a bounding rectangle (or cuboid), the size of a branch and the size of a node (including its branches). Then the serialised root node.

A node has a header of two uint16_t: the level and the branch count. It then has as many branches as that count.

Branches are of two types depending on the value of the node level. If it is zero then it is a leaf branch, it contains the rectangle (twice as many rtree_coord_t as the dimension), and the "id" of that rectangle, a rtree_id_t. If the node level is positive then we have an internal branch. This also has a rectangle, but then a child node.

The library writes the current version of the format, but will read any version.

Endianness

All integers and floats in BSRT files are in little-endian format, reading and writing of these is handled by OS-specific functions; for glibc these are described in endian (3) and these are well tested; for OSs other than Linux and C standard libraries other than glibc, not tested at all.

The author would welcome collaboration with users of big-endian hardware and non-glibc C standard libraries (Windows, BSD variants, Solaris, ...) to handle endianness on these platforms.

Portability

A design feature of the library is that a node (including its set of branches) should fit in a page of memory, though the user may specify that multiple nodes fit in a branch. To ensure this we determine the number of branches in a node from the branch-size, the page-size and the nodes-per-page. The branch size depends on the floating point types used for rtree_coord_t (which describes the rectangles) and the integer type used for rtree_id_t; these are set when the library is compiled.

So that the "nodes in a page" constraint is not violated, the library will fail to load a file with different values of the page-size, the rtree_coord_t size and the rtree_id_t size. Typically errno (3) will be set to EINVAL in this case.

In practice, this means that BSRT file are not portable between 32- and 64-bit systems (these typically have page sizes of 1,024 and 4,096 bytes respectively), nor between instances of the library compiled with different rtree_id_t and rtree_coord_t types. They should be portable between machines of different endianness, notwithstanding the notes in the previous section.

AUTHOR

J.J. Green, .