K-D-B trees

A K-D-B-tree (k-dimensional B-tree) is a tree data structure used in computer science to subdivide a k-dimensional search area. The K-D-B-tree’s goal is to give the search efficiency of a balanced k-d tree while also giving the block-oriented storage of a B-tree to optimize external memory accesses.

A K-D-B-tree, like the k-d tree, arranges points in k-dimensional space, which is beneficial for tasks like range-searching and multidimensional database queries. K-D-B-trees split space into two subspaces by comparing items in the same domain. By using a point in only one of the domains, or axes in this case, space is divided in the same way that a k-d tree is divided: all other values are either less than or greater than the current value and fall to the left and right of the splitting plane, respectively.

In contrast to a k-d tree, each half-space does not have its own node. Instead, nodes in the K-D-B-tree are saved as pages, as in a B-tree, and the tree keeps a pointer to the root page.

Structure:

The K-D-B-tree contains two types of pages:

Region pages: A set of (region, child) pairs, including a description of the enclosing region and a link to the child page for that region.
Point pages: A group of (point, place) pairings. In the case of databases, the location may refer to the database record’s index, but for points in k-dimensional space, it can be interpreted as the point’s coordinates in that space.

When introducing an element into a K-D-B-tree, the size of a node exceeds its ideal size, and page overflows occur. Because the K-D-B-tree aims to maximize external memory accesses, such as those from a hard drive, a page is regarded to have overflowed or be overfilled if the node size exceeds the external memory page size.

The K-D-B-tree preserves the following features during insertion/deletion operations:

The graph is a tree with several branches. Region pages can never be empty and must always refer to child pages. Point pages represent the leaf nodes of the tree.
The route length to the tree’s leaves is the same for all queries, much like a B-tree.
The regions that comprise a region page are not connected.
If the root is a region page, the total search space is the union of its regions.
If the child of a (region, child) pair in a region page is likewise a region page, the region is the union of all the regions in the child.
In the previous example, if a child is a point page, all points must be contained in the region.

Operations on a K-D-B-tree

Queries

A query on a K-D-B-tree is a range search over all domains or axes in the tree. The query region is a collection of intervals. The query area in k-space can be seen as a bounding volume around some subspace in the full k-dimensional search space. A query can be classified into one of three types:

Because certain intervals cover a whole domain or axis, the query is a partial range query.
Some intervals are points, and others are whole domains. The query is a partial match query.
Because the intervals are all points, the bounding volume is likewise a point. This is a query for precise matches.

Algorithm

If the tree’s root is null, terminate; otherwise, let the page be root.
If the page is a point page, return every point in a (point, location) pair within the query region.

If a page is a region page, set the page as the child and recurse from step 2 for all (region, child) pairings where region and query region meet.

Considering the above definitions, a potential “K-D-B tree” might be a data structure that combines the concepts of K-D trees and B-trees. It could be a multidimensional self-balancing search tree that organizes points in a K-dimensional space while maintaining the balancing properties of B-trees. Such a data structure could offer efficient multidimensional range searches and other operations on large datasets.