K-D-B trees

K-D-B trees

A K-D-B-tree (k-dimensional B-tree) is a tree data structure used in computer science to subdivide a k-dimensional search area. The K-D-B-tree’s goal is to give the search efficiency of a balanced k-d tree while also giving the block-oriented storage of a B-tree to optimize external memory accesses.

A K-D-B-tree, like the k-d tree, arranges points in k-dimensional space, which is beneficial for tasks like range-searching and multidimensional database queries. K-D-B-trees split space into two subspaces by comparing items in the same domain. By using a point in only one of the domains, or axes in this case, space is divided in the same way that a k-d tree is divided: all other values are either less than or greater than the current value and fall to the left and right of the splitting plane, respectively.

In contrast to a k-d tree, each half-space does not have its own node. Instead, nodes in the K-D-B-tree are saved as pages, as in a B-tree, and the tree keeps a pointer to the root page.

Structure:

The K-D-B-tree contains two types of pages:

  • Region pages: A set of (region, child) pairs, including a description of the enclosing region and a link to the child page for that region.
  • Point pages: A group of (point, place) pairings. In the case of databases, the location may refer to the database record’s index, but for points in k-dimensional space, it can be interpreted as the point’s coordinates in that space.

When introducing an element into a K-D-B-tree, the size of a node exceeds its ideal size, and page overflows occur. Because the K-D-B-tree aims to maximize external memory accesses, such as those from a hard drive, a page is regarded to have overflowed or be overfilled if the node size exceeds the external memory page size.

The K-D-B-tree preserves the following features during insertion/deletion operations:

  • The graph is a tree with several branches. Region pages can never be empty and must always refer to child pages. Point pages represent the leaf nodes of the tree.
  • The route length to the tree’s leaves is the same for all queries, much like a B-tree.
  • The regions that comprise a region page are not connected.
  • If the root is a region page, the total search space is the union of its regions.
  • If the child of a (region, child) pair in a region page is likewise a region page, the region is the union of all the regions in the child.
  • In the previous example, if a child is a point page, all points must be contained in the region.

Operations on a K-D-B-tree

Queries

A query on a K-D-B-tree is a range search over all domains or axes in the tree. The query region is a collection of intervals. The query area in k-space can be seen as a bounding volume around some subspace in the full k-dimensional search space. A query can be classified into one of three types:

  • Because certain intervals cover a whole domain or axis, the query is a partial range query.
  • Some intervals are points, and others are whole domains. The query is a partial match query.
  • Because the intervals are all points, the bounding volume is likewise a point. This is a query for precise matches.

Algorithm

  1. If the tree’s root is null, terminate; otherwise, let the page be root.
  2. If the page is a point page, return every point in a (point, location) pair within the query region.
  1. If a page is a region page, set the page as the child and recurse from step 2 for all (region, child) pairings where region and query region meet.

Considering the above definitions, a potential “K-D-B tree” might be a data structure that combines the concepts of K-D trees and B-trees. It could be a multidimensional self-balancing search tree that organizes points in a K-dimensional space while maintaining the balancing properties of B-trees. Such a data structure could offer efficient multidimensional range searches and other operations on large datasets.