Online aggregation

Online aggregation is a technique for improving the interactive behavior of database systems processing expensive analytical queries.

For example, if the final answer is 1000, after k seconds, the user gets the estimates in form of a confidence interval like [990, 1020] with 95% probability.

[2] In 2007, Jermaine et al. designed and implemented a prototype database system called Database-Online (or DBO) that computes group-by aggregate query over multiple tables in an online and more importantly in a scalable fashion.

[3] All the approaches for online aggregation use random sampling, which is non-trivial in a distributed environment due to inspection paradox of renewal reward theory.

In 2011, Pansare et al. proposed a Bayesian model to deal with the inspection paradox and implemented online aggregation for a MapReduce-like environment.