2.3. table.hh


2.3.1

This file contains functions for manipulating tables in a reasonably simple binary format. The records in the table are indexed by a 4-byte key, which is treated as an integer for the purposes of sorting. The keys are arranged in a B+-tree.

Note that this code is not portable yet. It relies on sizeof(int) being 4 and the processor being little-endian.

The field types supported and their type codes are: 32-bit integer (i), string (s) and array of 32-bit integers (I). Support for arrays of strings (S) is planned. The strings are zero terminated, but since their size is also stored when they are not the last field they are restricted to being 64Kb in size.

An integer is stored as 4 bytes in little endian order. The value might be considered to be signed or unsigned.

A string is stored as a sequence of arbitrary characters terminated by the first `\0'. If the field is not the last one in the record then the string starts with a 2 bytes which are a little endian unsigned integer indicating the offset to the next field.

An array of integers is represented by a 4 byte integer for the size followed by that many integers stored in the same format as above.


2.3.2
#include "doc.hh"
#include <utility>
#include <cstdio>
#include <unistd.h>
#include <sys/types.h>
#include <gdbm.h>

namespace libdoc {

2.3.3

A `Key' is a value of a primary key. This is wrongly assumed to always be 32 bits in size.

typedef unsigned int Key;

2.3.4

Each field's type is represented by a character. The character used here is the same as the one used in the column specs given to `create' when the table is created.

struct TableField
{
   char type;
   union
   {
      unsigned int v_int;              // i
      string *v_string;                // s
      vector <unsigned int> *v_int_a;  // I
   };
};

2.3.5

A structure to represent a record, including its implicit integer primary key (id), although that will be ignored in some circumstances, such as when the table uses a string primary key.

struct TableRecord
{
   Key id;
   vector <TableField> fields;
};

2.3.6

A function to free the memory used by the fields in rec.

void record_delete (TableRecord &rec);

2.3.7

Normally an integer (32 bit) primary key is used, and is either specifed by the user (i.e., each key number is decided on the application level) or is chosen automatically by this library. If a string primary key is used then all of the actual records are stored in the associated GDBM file.

enum PrimKeyType
{
   PRIMKEY_INTEGER = 0,
   PRIMKEY_STRING = 1,
   PRIMKEY_AUTO = 2
};

2.3.8

An object of this class represents a currently open table.

class Table
{
 private:
   string filename;
   PrimKeyType primkey;
   string fields;             // Field types, as single character codes.
   bool readonly;
   int table_fd;              // File descriptor of main table file.
   GDBM_FILE table_gdbm;      // Only used if the primary key is a string.
   size_t length;
   unsigned int num_recs;     // Number of records.
   off_t header_offsets;      // Offset of `num_recs' entry in header.
   off_t root;                // Offset within file of root node.

2.3.9

For now only one secondary index is supported, and this is it.

   size_t index_field;
   string index_filename;
   GDBM_FILE index_file;
   static string make_index_filename (const string &filename);

2.3.10

Information and the function which uses it to generate unique ID numbers.

   Key next_id, key_skip, next_key_skip;
   Key autogen_id (Key id);

2.3.11

Calculate the size of a record in the file or in a string.

   size_t find_record_size (off_t pos);
   size_t calc_record_size (const TableRecord &rec);

2.3.12

Functions for writing characters cleanly.

   static void write_char (int fd, off_t &pos, unsigned char c);
   void write_char (off_t &pos, unsigned char c)
      { write_char (table_fd, pos, c); }

2.3.13

Functions for reading, writing and skipping over strings.

   void ignore_string (off_t &pos) const;
   void read_string (off_t &pos, string &s, bool read_sz = false) const;
   static void write_string (int fd, off_t &offset, const char *s, size_t sz);
   void write_string (off_t &offset, const string &s)
      { write_string (table_fd, offset, s.c_str(), s.size() + 1); }

2.3.14

Functions for reading and writing integers.

   void read_int (off_t &pos, unsigned int &i) const;
   static void write_int (string &s, unsigned int i);
   static void write_int (int fd, off_t &offset, unsigned int i);
   void write_int (off_t &pos, unsigned int i)
      { write_int (table_fd, pos, i); }

2.3.15

Functions for reading whole records.

   void read_record (off_t &pos, vector <TableField> &vals) const;
   void read_record (const datum dat, vector <TableField> &vals) const;

2.3.16

Skip a few bytes at the end of the file to bring us to offset.

   static void pad_file (int fd, off_t &offset);

2.3.17

Read the header of the table file and store the values needed from it.

   void read_header ();

2.3.18

Write a single record, either into the file or into a string, or append a record to the end of the file.

   void write_record (off_t &p, const TableRecord &rec,
                      off_t left_child = 0, off_t right_child = 0);
   void write_record (string &dat, const TableRecord &rec);
   void append_record (const TableRecord &rec,
                       off_t left_child = 0, off_t right_child = 0);

2.3.19

Open the GDBM files used for primary and secondary indices.

   static void open_stringprim_db (GDBM_FILE &db, const string &filename,
                                   bool readonly);
   static string open_secondary_db (GDBM_FILE &db, size_t &field_num,
                                    const string &base, const string &indices,
                                    const string &fields, bool readonly);

2.3.20

This type is used for iterating over all the records in a table (for listing them, for example). For tables stored in GDBM files (those with a string primary key) dat is used, otherwise stack.

 public:
   union Iterator
   {
      vector <pair <off_t, bool> > *stack;
      datum dat;
   };

2.3.21

Constants which are part of the definition of the file format.

   const static char *MAGIC;
   const static size_t MAGIC_LEN = 8;
   const static Key KEY_MAX = 0xFFFFFFFF;
   const static Key KEY_MIDDLE = 0x80000000;

2.3.22

Create a new table file and write it to filename.

   static void create (const string &filename, const string &fieldtypes = "s",
                       PrimKeyType pk = PRIMKEY_INTEGER,
                       const string &indices = "");

2.3.23

Open and close a table file.

   Table (const string &fname, bool ro);
   ~Table ();

2.3.24

Insert a record, returning `true' if it could be added or `false' if there was already a record with the same key.

   bool insert (TableRecord &rec);
   bool insert (const string &key, TableRecord &rec);

2.3.25

Update an existing record, setting its value to rec, or return `false' if the specified record does not exist.

   bool update (const TableRecord &rec);
   bool update (const string &key, const TableRecord &rec);

2.3.26

Return `true' if a record with integer primary key id is already present.

   bool exists (Key id) const;

2.3.27

Find the record with integer primary key id or string primary key key, returning `true' if it was found.

   bool find (Key id, vector <TableField> &vals) const;
   bool find (const string &key, vector <TableField> &vals) const;

2.3.28

Find a record with the secondary key seckey, returning `true' if it exists.

XXX: Should the seckeys not be const?

   bool find (TableField seckey, TableRecord &rec) const;
   bool find_id (TableField seckey, Key &id) const;

2.3.29

Build an iterator by traversing the binary tree starting from the record at offset p in the table file, and always descending by the leftmost child.

 protected:
   void leftmost_stack (off_t p, Iterator &iter) const;

2.3.30

These functions allow the records in a table to be iterated over.

 public:
   bool iter_begin (Iterator &iter) const;
   bool iter_next (Iterator &iter) const;
   void iter_deref (const Iterator &iter, TableRecord &rec) const;
   void iter_delete (Iterator &iter) const;

2.3.31

Make the table file agree with the variables stored in this object.

   void sync () const;

2.3.32

Print some information about all of the records in the table in tree-shaped form, for checking the integrity of the file.

 protected:
   void debugging_recurse (FILE *ous, unsigned int indentation,
                           off_t node) const;
 public:
   void debugging_tree (FILE *ous) const;

2.3.33

Trivial accessors.

   const string &get_filename () const
      { return filename; }
   const string &get_field_types () const
      { return fields; }
   const size_t get_length () const
      { return length; }
   const unsigned int get_num_recs () const
      { return num_recs; }
   const off_t get_root () const
      { return root; }
   const PrimKeyType get_primary_key_type () const
      { return primkey; }
   const string &get_secondary_filename () const
      { return index_filename; }
   const size_t get_secondary_field () const
      { return index_field; }

2.3.34
};

}