Array DBMS

Array databases aim at offering flexible, scalable storage and retrieval on this information category.

Management of arrays requires novel techniques, particularly due to the fact that traditional database tuples and objects tend to fit well into a single database page – a unit of disk access on server, typically 4 KB – while array objects easily can span several media.

To this end, arrays get partitioned, during insertion, into so-called tiles or chunks of convenient size which then act as units of access during query evaluation.

Like with, e.g., SQL, expressions of arbitrary complexity can be built on top of a set of core array operations.

Due to the extensions made in the data and query model, Array DBMSs sometimes are subsumed under the NoSQL category, in the sense of "not only SQL".

The relational data model, which is prevailing today, does not directly support the array paradigm to the same extent as sets and tuples.

ISO SQL lists an array-valued attribute type, but this is only one-dimensional, with almost no operational support, and not usable for the application domains of Array DBMSs.

Another option is to resort to BLOBs ("binary large objects") which are the equivalent to files: byte strings of (conceptually) unlimited length, but again without any query language functionality, such as multi-dimensional subsetting.

[1] This system offers the precursor of a 2-D array query language, albeit still procedural and without suitable storage support.

A first declarative query language suitable for multiple dimensions and with an algebra-based semantics has been published by Baumann, together with a scalable architecture.

A map algebra, suitable for 2-D and 3-D spatial raster data, has been published by Mennis et al.[6] In terms of Array DBMS implementations, the rasdaman system has the longest implementation track record of n-D arrays with full query support.

Oracle GeoRaster offers chunked storage of 2-D raster maps, albeit without SQL integration.

When adding arrays to databases, all facets of database design need to be reconsidered – ranging from conceptual modeling (such as suitable operators) over storage management (such as management of arrays spanning multiple media) to query processing (such as efficient processing strategies).

Also for transmission of results compression is useful, as for the large amounts of data under consideration networks bandwidth often constitutes a limiting factor.

A tile-based storage structure suggests a tile-by-tile processing strategy (in rasdaman called tile streaming).

In many – if not most – cases where some phenomenon is sampled or simulated the result is a rasterized data set which can conveniently be stored, retrieved, and forwarded as an array.

A de facto standard in the Earth Science communities is OPeNDAP, a data transport architecture and protocol.

A declarative geo raster query language, Web Coverage Processing Service (WCPS), has been standardized by the Open Geospatial Consortium (OGC).

The new standard, adopted in Fall 2018, is named ISO 9075 SQL Part 15: MDA (Multi-Dimensional Arrays).

Euclidean neighborhood of elements in arrays
Euclidean neighborhood of elements in arrays
Transformation of a query to a more efficient, but equivalent version during array query optimization