Reward Variance in Markov Chains

Reward Variance in Markov Chains: A Calculational Approach
Tom Verhoeff
Eindhoven University of Technology
2004

Abstract

*Reward Variance in Markov Chains*
Verhoeff, 2004
---> **DOWNLOAD** <---

This paper introduces a new method for calculating the variance of rewards accumulated until absorption in a Markov chain. Traditional approaches derive variance from the second moment (expectation of the squared reward), risking catastrophic cancellation when first and second moments nearly equal. The proposed technique establishes a direct system of linear equations for the variance at each state, requiring only the first moment (reward expectation). By conditioning on the first transition step, recursive relationships express a state's variance in terms of neighboring states' variances and expectations, avoiding explicit second moment calculation.

The derivation leverages properties of expectations over walks until absorption in a subset of states. The resulting variance equations can be solved efficiently, e.g., by backward substitution for absorbing chains without cycles. Extensions handle covariances between different reward structures. Applying this to a large Yahtzee model demonstrates practical value by computing variances and covariances of final scores for different category scoring choices. Overall, this offers a numerically robust, computationally efficient approach for analyzing stochastic processes via absorbing Markov chains.

More Resources

How to Beat
Yahtzee High Scores

Reinforcement Learning
for Solving Yahtzee

Optimal Yahtzee