Drift plus penalty

In the mathematical theory of probability, the drift-plus-penalty method is used for optimization of queueing networks and other stochastic systems.

The Lyapunov drift is defined: Every slot t, the current queue state is observed and control actions are taken to greedily minimize a bound on the following drift-plus-penalty expression: where p(t) is the penalty function and V is a non-negative weight.

Like backpressure routing, this method typically does not require knowledge of the probability distributions for job arrivals and network mobility.

This drift-plus-penalty technique was later used to minimize average power[1] and optimize other penalty and reward metrics.

Suppose that minimization of the time average of p(t) must be done subject to time-average constraints on a collection of K other functions: Every slot t, the network controller observes a new random event.

could be a finite list of abstract elements, an uncountably infinite (and possibly non-convex) collection of real-valued vectors, and so on.

The above problem poses each constraint in the standard form of a time average expectation of an abstract process y_i(t) being non-positive.

This update equation is identical to that of a virtual discrete time queue with backlog Q_i(t) and with y_i(t) being the difference between new arrivals and new service opportunities on slot t. Intuitively, stabilizing these virtual queues ensures the time averages of the constraint functions are less than or equal to zero, so the desired constraints are satisfied.

The weight V can be tuned to place more or less emphasis on penalty minimization, which results in a performance tradeoff.

A key feature of this algorithm is that it does not require knowledge of the probability distribution of the random event process.

[5] The case C = 0 corresponds to exact minimization of the desired expression on every slot t. This section shows the algorithm results in a time average penalty that is within O(1/V) of optimality, with a corresponding O(V) tradeoff in average queue size.

Thus, the time average expected penalty can be made arbitrarily close to the optimal value

affects the size of the queues, which determines the speed at which the time average constraint functions converge to a non-positive number.

Related probability 1 performance bounds for infinite horizon time average queue size and penalty can be derived using the drift-plus-penalty method together with martingale theory.

[15] The above analysis considers constrained optimization of time averages in a stochastic system that did not have any explicit queues.

A related problem is the minimization of a convex function of time averages subject to constraints, such as: where

However, the Lagrange multiplier analysis was later strengthened by Huang and Neely to recover the original O(1/V), O(V) tradeoffs, while showing that queue sizes are tightly clustered around the Lagrange multiplier of a corresponding deterministic optimization problem.

[20] [21] When implemented for non-stochastic functions, the drift-plus-penalty method is similar to the dual subgradient method of convex optimization theory, with the exception that its output is a time average of primal variables, rather than the primal variables themselves.

[4][6] A related primal-dual technique for maximizing utility in a stochastic queueing network was developed by Stolyar using a fluid model analysis.

[16] [17] The Stolyar analysis does not provide analytical results for a performance tradeoff between utility and queue size.

A later analysis of the primal-dual method for stochastic networks proves a similar O(1/V), O(V) utility and queue size tradeoff, and also shows local optimality results for minimizing non-convex functions of time averages, under an additional convergence assumption.

Related primal-dual algorithms for utility maximization without queues were developed by Agrawal and Subramanian [22] and Kushner and Whiting.

[23] The drift-plus-penalty algorithm is known to ensure similar performance guarantees for more general ergodic processes

[5] The drift-plus-penalty method can be extended to treat systems that operate over variable size frames.

Define penalty and constraint functions as: Define the following time averages: Now consider the following time average optimization problem: By Jensen's inequality the following holds for all slots t>0: From this, it can be shown that an optimal solution to the time-average problem (Eq.

9) can be achieved by solutions of the type x(t) = x* for all slots t, where x* is a vector that solves the convex program (Eq.

7) can be solved (to within any desired accuracy) by taking the time average of the decisions made when the drift-plus-penalty algorithm is applied to the corresponding time-averaged problem (Eq.

[26] However, a key difference is that the dual subgradient algorithm is typically analyzed under restrictive strict convexity assumptions that are needed for the primal variables x(t) to converge.

Then the above algorithm reduces to the following: Every slot t and for each variable n in {1, …, N}, choose xn(t) in [xmin,n, xmax,n] to minimize the expression: Then update queues Qi(t) as before.

This amounts to choosing each variable xi(t) according to the simple bang-bang control policy: Since the primal variables xi(t) are always either xmin,i or xmax,i, they can never converge to the optimal solution if the optimal solution is not a vertex point of the hyper-rectangle A.