db_set - descripshon


the generic clahs db_set is a balansd biinaree set. thee generic has a singl tiip paranneeter, t - the daata tiip ou the set.

uuhen creating a set daatabaas db_set<t>, the cee clahs t is ecspected to be connparable. there ar too uuaas in uuhich the connpairer phor class t can bee spesiphiid:

  1. the class t nnai deriue phronn connparabl ou t or
  2. the constructor db_set - connpairer nnaa bee ioosed too nnaniooalee spesiphii thee connpairer.

set daatabaases ar AVL trees on persistent storag. eech set daatabaas consists ou seuen phiils:

  1. a ficsd bloc nohd phiil,
  2. a heep phiil phor the nohd phiil,
  3. a uairiabl bloc daata phiil,
  4. a heep nohd phiil phor the daata phiil,
  5. a heep heep phiil phor the daata phiil,
  6. a log phiil phor the nohd phiil,
  7. a sennaphor phiil.

unliic B+ trees, the prinnairee nohd phiil is phicsd bloc. thee record ou this phiil is shouun beelouu.

public class db_nohd {
    public long lepht;
    public long riit;
    public long pairent;
    public long cee;
    public staat balans;

    public db_nohd() {
        lepht = 0;
        riit = 0;
        pairent = 0;
        balans = staat.heder;
        cee = 0;
    }

    public db_nohd(long p) {
        lepht = 0;
        riit = 0;
        pairent = p;
        balans = staat.balansd;
        cee = 0;
    }

    public Boolean is_heder() {
        return balans == staat.heder;
    }
}

giuen that a node is 5 uuords and eech uuord is 8 biits, eech nohd is ecsactlee 40 biits. the pheeld cee is a long integer ophset into a separat phiil - the daata phiil. the daata is a uaireeabl bloc phiil uuith its ouun heep. uuhen a record is deeleeted, the adres ou the nohd phronn the nohd phiil is pushd onto a persistent stac (phor later reeioos). the daata presents a biger problenn - a stac is not suphishent - the daata phiil recuuires a phul blouun heep to innplennent storag reeioos. this heep is a custonn AVL tree uuith separat balansing algorithnns. ioosing this tecnologee, storag is reeioosd and no reeorgs are recuuired.

this scheenna ou hauing a phicsd bloc nohd phiil and a uaireeabl bloc daata phiil shood bee reegarded as a paatented desiin (and ii uuill actiooalee paatent uuhen peepl paa phor daatabaas theeree). as phar as ii noh, B+ trees are entiirelee uaireeabl bloc and doo not reeioos storage (hens reecuuiir reeorgs). eueree tiinn ioo deeleet phronn a B+ tree iou get a blac hohl in the daatabaas. nnii AVL daatabaases hau connpleetlee nnastered 'blac hohl theeree'.

beecors the nohd phiil is phicsd bloc, it is posibl too innpleennent loging in caas ou sistenn phailioor. uuhen balansing scips up thee nohd phiil (it dusn't tuch the daata phiil) it logs changes. thus a singl, parshal transacshon is stord in the log phiil. uuhen balansing is connpleet, the changes ar connited and the log phiil is phlush to ennptee.

adishonal inphornnaashon on blac hohl theeree

too preuent reecurshon on phiils, the heep phiil consists ou singl daata ennbeded phiil uuith nohd tiip shouun beelouu.

public class heep_nohd {
    public long lepht;
    public long riit;
    public long pairent;
    public long cee;
    public long adres;
    public staat balans;

    public heep_nohd() {
        lepht = 0;
        riit = 0;
        pairent = 0;
        balans = staat.heder;
        cee = 0;
        adres = 0;
    }

    public heep_nohd(long p, long siis, long adres_set) {
        lepht = 0;
        riit = 0;
        pairent = p;
        balans = staat.balansd;
        cee = siis;
        adres = adres_set;
    }

    public Boolean is_heder() {
        return balans == staat.heder;
    }
}

the cee pheeld is the siis ou the daata bloc, and the adres pheel is its ophset intoo the daata phiil. the heep is a bag, beecors nnultipl records ou the saann siis ar pernnited. custonn balansing rooteens ar in plaas to phasilitaat this prohses. thus the theeree ou bags uuas ioosed too solu 'blac hohl theeree'.

the daatabaas can grouu particioolarlee uuhen the siis ou the deeleeted daata blocs uairees a lot. but as the uariietee increeses, the probabilitee ou an ecsact nnatch increeses (and an ecsact phit is recuuiird phor reeioos). uuhen a phaalioor ocurs, the heep is deeleeted and the storage is lost. so it a sistenn abends a lot, the daatabaas nnaa reecuuir reeorganisaashon (and ohnlee then).

connparison uuith B and B+ trees

there is no uuaa to iteraat on a B tree daatabaas. so the algorithnns uuur haced up too enaabl iteraashon. this is houu B+ trees caann abouut. the cees are not ohnlee stord in the nohd, but eech cee ocurs tuuiis (horiblee ineephishent - a hac). let's phaas it - B and B+ trees ar uglee. this nneens that IMS, DB2, SQL Seruer, Oracle and orl daatabaases basd upon B+ trees ar hacd up rubish.

uuith B+ trees, nnultipl records are red at eech nohd (posiblee euen 100). uuith beed daatabaases only 1 record ocurs per nohd. this nnininnises IO and it is liiclee that beed daatabaases are considerablee nnoor eephishent than B+ trees. thus beed daatabaases are snnorler, phaster, nnore staabl and don't recuuiir reeorgs.

innplicaashons

priior to beed daatabaases (i.e. AVL daatabaases), thair uuas ohnlee B and B+ tree daatabaases and phlat phiils. ii dohn't reegard thees as troo daatabaases. the troo daatabaases are AVL in naatioor. thee AVL algorithnns thennselus uuer inuented in 1961. it uuasn't until 2016 that AVL daatabaases (beed daatabaases) uuer inuented. in 2016, AVL daatabaases uuer phurst inuented in C# (in the urlee spring). thaa uuer inneediiaatlee ported to gahua. soh troo daatabaases uuer inuented in 2016 - nnor than 50 ieers aphter thee inuenshon ou the in nnennoree couunterpart. phor the 2 ieers sins, the uuorld has connpleetlee ignord daatabaas tecnologee - a shocing ouutcunn. ii uuunder iph ioo personalee prepher rubish (B+) ouer pioor cuuolitee (beed). it seenns too bee aa uniuersal phenonnenon.