When describing Markov chains, one speaks of. Homogeneous Markov chains

  • The date: 22.09.2019
June 1, 2016 at 04:31 pm

Developing a class for working with Markov chains

  • C++ ,
  • Algorithms

Today I would like to tell you about writing a class to simplify the work with Markov chains.

Please under cat.

Basic knowledge:

Representation of graphs in the form of an adjacency matrix, knowledge of the basic concepts of graphs. Knowledge of C++ for the practical part.

Theory

A Markov chain is a sequence of random events with a finite or countable number of outcomes, characterized by the property that, loosely speaking, with a fixed present, the future is independent of the past. Named after A. A. Markov (senior).

If to speak in simple words, then the Markov chain is a weighted graph. Events are located at its vertices, and the weight of the edge connecting vertices A and B is the probability that event A will be followed by event B.

Quite a few articles have been written about the use of Markov chains, but we will continue to develop the class.

Here is an example of a Markov chain:

In the following, we will consider this scheme as an example.

Obviously, if there is only one outgoing edge from vertex A, then its weight will be equal to 1.

Notation
At the vertices we have events (from A, B, C, D, E...). On the edges, the probability that after the i-th event there will be an event j > i. For convention and convenience, I numbered the peaks (No. 1, No. 2, etc.).

A matrix is ​​the adjacency matrix of a directed weighted graph, which represents the Markov chain. (more on this later). In this particular case, this matrix is ​​also called the matrix transition probabilities or just a transition matrix.

Matrix representation of a Markov chain
We will represent Markov chains using a matrix, exactly the adjacency matrix with which we represent graphs.

Let me remind you:

The adjacency matrix of a graph G with a finite number of vertices n (numbered from 1 to n) is a square matrix A of size n, in which the value of the element aij is equal to the number of edges from i-th vertex graph to the j-th vertex.

More about adjacency matrices - in the course of discrete mathematics.

In our case, the matrix will have a size of 10x10, let's write it:

0 50 0 0 0 0 50 0 0 0
0 0 80 20 0 0 0 0 0 0
0 0 0 0 100 0 0 0 0 0
0 0 0 0 100 0 0 0 0 0
0 0 0 0 0 100 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 70 30 0
0 0 0 0 0 0 0 0 0 100
0 0 0 0 0 0 0 0 0 100
0 0 0 0 0 100 0 0 0 0

Idea
Take a closer look at our matrix. In each row, we have non-zero values ​​in those columns whose number matches the next event, and the non-zero value itself is the probability that this event will occur.

Thus, we have the values ​​of the probability of the occurrence of an event with a number equal to the number column after an event with a number equal to lines.

Those who know the theory of probability could guess that each line is a probability distribution function.

Markov chain traversal algorithm

1) initialize the initial position k with a zero vertex.
2) If the vertex is not final, then we generate a number m from 0...n-1 based on the probability distribution in row k of the matrix, where n is the number of vertices, and m is the number of the next event (!). Otherwise we leave
3) The number of the current position k is equal to the number of the generated vertex
4) Go to step 2

Note: the vertex is final if the probability distribution is zero (see the 6th row in the matrix).

Pretty neat algorithm, right?

Implementation

In this article, I want to separately take out the implementation code of the described bypass. The initialization and filling of the Markov chain is of no particular interest (see the complete code at the end).

Implementation of the traversal algorithm

template Element *Markov ::Next(int StartElement = -1) ( if (Markov ::Initiated) // if the adjacency matrix is ​​created ( if (StartElement == -1) // if the default start element is StartElement = Markov ::Current; // then continue (in the constructor Current = 0) std::random_device rd; std::mt19937gen(rd()); std::discrete_distribution<>dicr_distr(Markov ::AdjacencyMatrix.at(Current).begin(), Markov ::AdjacencyMatrix.at(Current).end()); // initialize the container to generate a number based on the probability distribution int next = dicr_distr(gen); // generate the next vertex if (next == Markov ::size()) // subtleties of the generator, if the probability distribution is zero, then it returns the number of elements return NULL; Markov ::Current = next; // change the current vertex return &(Markov ::elems.at(next)); // return the value at the top ) return NULL; )

This algorithm looks especially simple due to the features of the container discrete_distribution. It is rather difficult to describe in words how this container works, so let's take the 0th row of our matrix as an example:

0 50 0 0 0 0 50 0 0 0

As a result of the generator, it will return either 1 or 6 with a probability of 0.5 for each. That is, it returns the column number (which is equivalent to the number of the vertex in the chain) where to continue moving further.

An example program that uses the class:

Implementation of a program that traverses the Markov chain from the example

#include #include "Markov.h" #include #include using namespace std; int main() ( Markov chain; outstream outs; outs.open("out.txt"); ifstream ins; ins.open("matrix.txt"); int num; doubleProb = 0; (ins >> num).get(); // number of vertices string str; for (int i = 0; i< num; i++) { getline(ins, str); chain.AddElement(str); // добавляем вершину } if (chain.InitAdjacency()) // инициализируем матрицу нулями { for (int i = 0; i < chain.size(); i++) { for (int j = 0; j < chain.size(); j++) { (ins >>Prob).get(); if (!chain.PushAdjacency(i, j, Prob)) // push matrix ( cerr<< "Adjacency matrix write error" << endl; } } } outs << chain.At(0) << " "; // выводим 0-ю вершину for (int i = 0; i < 20 * chain.size() - 1; i++) // генерируем 20 цепочек { string *str = chain.Next(); if (str != NULL) // если предыдущая не конечная outs << (*str).c_str() << " "; // выводим значение вершины else { outs << std::endl; // если конечная, то начинаем с начала chain.Current = 0; outs << chain.At(0) << " "; } } chain.UninitAdjacency(); // понятно } else cerr << "Can not initialize Adjacency matrix" << endl;; ins.close(); outs.close(); cin.get(); return 0; }


An example of the output that the program generates:

Markov chain- such a chain of events in which the probability of each event depends only on the previous state.

This article is abstract in nature, written on the basis of the sources cited at the end, which are cited in places.

Introduction to the theory of Markov chains

A Markov chain is such a sequence of random events in which the probability of each event depends only on the state in which the process is currently located and does not depend on earlier states. The finite discrete circuit is defined by:

∑ j=1…n p ij = 1

An example of a matrix of transition probabilities with a set of states S = (S 1 , ..., S 5 ), a vector of initial probabilities p (0) = (1, 0, 0, 0, 0):

FROM Using the vector of initial probabilities and the transition matrix, you can calculate the stochastic vector p (n) - a vector composed of the probabilities p (n) (i) that the process will be in state i at time n. You can get p(n) using the formula:

p(n) = p(0)×Pn

The vectors p (n) with the growth of n in some cases stabilize - they converge to some probability vector ρ, which can be called the stationary distribution of the chain. Stationarity manifests itself in the fact that taking p (0) = ρ, we get p (n) = ρ for any n.

The simplest criterion that guarantees convergence to a stationary distribution is as follows: if all elements of the transition probability matrix P are positive, then as n tends to infinity, the vector p (n) tends to the vector ρ, which is the only solution to the system of the form

It can also be shown that if, for some positive value of n, all elements of the matrix P n are positive, then the vector p (n) will stabilize anyway.

The proof of these assertions is given in detail.

The Markov chain is depicted as a transition graph, the vertices of which correspond to the states of the chain, and the arcs correspond to the transitions between them. The weight of the arc (i, j) connecting the vertices si and sj will be equal to the probability pi(j) of the transition from the first state to the second. The graph corresponding to the matrix shown above:

To classification of states of Markov chains

When considering Markov chains, we may be interested in the behavior of the system in a short period of time. In this case, the absolute probabilities are calculated using the formulas from the previous section. However, it is more important to study the behavior of the system over a large time interval, when the number of transitions tends to infinity. Next, definitions of the states of Markov chains are introduced, which are necessary to study the behavior of the system in the long term.

Markov chains are classified depending on the possibility of transition from one state to another.

The groups of states of a Markov chain (a subset of vertices of the transition graph) to which dead-end vertices of the order diagram of the transition graph correspond are called ergodic classes of the chain. If we consider the graph shown above, we see that it has 1 ergodic class M1 = (S5) reachable from the strongly connected component corresponding to the subset of vertices M2 = (S1, S2, S3, S4). States that are in ergodic classes are called essential, and the rest are called inessential (although such names do not agree well with common sense). The absorbing state si is a special case of the ergodic class. Then, once in such a state, the process will stop. For Si, pii = 1 will be true, i.e. in the transition graph, only one edge will come out of it - a loop.

Absorbing Markov chains are used as temporary models of programs and computational processes. When modeling a program, the states of the chains are identified with the blocks of the program, and the matrix of transition probabilities determines the order of transitions between the blocks, depending on the structure of the program and the distribution of the initial data, the values ​​of which affect the development of the computational process. As a result of the representation of the program by the absorbing chain, it is possible to calculate the number of calls to program blocks and the program execution time, estimated by means, variances, and, if necessary, distributions. Using these statistics in the future, you can optimize the program code - apply low-level methods to speed up critical parts of the program. This technique is called code profiling.

For example, in Dijkstra's algorithm, there are the following circuit states:

    vertex (v), extracting a new vertex from the priority queue, transition to state b only;

    begin (b), the beginning of the cycle of enumeration of outgoing arcs for the weakening procedure;

    analysis (a), analysis of the next arc, possible transition to a, d, or e;

    decrease (d), decrease in the estimate for some graph vertex, transition to a;

    end (e), end of the loop, move to the next vertex.

It remains to set the transition probabilities between vertices, and we can study the duration of transitions between vertices, the probabilities of getting into different states, and other average characteristics of the process.

Similarly, the computational process, which is reduced to requests for system resources in the order determined by the program, can be represented by an absorbing Markov chain, the states of which correspond to the use of system resources - processor, memory and peripheral devices, transition probabilities reflect the order of access to various resources. Due to this, the computational process is presented in a form convenient for the analysis of its characteristics.

A Markov chain is said to be irreducible if any state Sj can be reached from any other state Si in a finite number of transitions. In this case, all states of the chain are said to be communicating, and the transition graph is a strongly connected component. The process generated by an ergodic chain, starting in a certain state, never ends, but successively passes from one state to another, falling into different states with different frequencies, depending on the transition probabilities. Therefore, the main characteristic of an ergodic chain is

the probabilities of the process being in the states Sj, j = 1,…, n, the fraction of time that the process spends in each of the states. Irreducible chains are used as systems reliability models. Indeed, if a resource that the process uses very often fails, the health of the entire system will be in jeopardy. In such a case, duplication of such a critical resource can help avoid failures. At the same time, the states of the system, which differ in the composition of the serviceable and failed equipment, are interpreted as the states of the circuit, the transitions between which are associated with failures and restoration of devices and changes in the connections between them, carried out to maintain the system's operability. Estimates of the characteristics of an irreducible circuit give an idea of ​​the reliability of the behavior of the system as a whole. Also, such chains can be models of the interaction of devices with tasks coming for processing.

Examples of using

Failure Service System

The server consists of several units, such as modems or network cards, which receive requests from users for service. If all blocks are busy, then the request is lost. If one of the blocks receives a request, then it becomes busy until the end of its processing. As states, we take the number of idle blocks. Time will be discrete. Denote by α the probability of receiving a request. We also assume that the service time is also random and consists of independent continuations, i.e. a request with probability β is served in one step, and with probability (1 - β) is served after this step as a new request. This gives the probability (1 - β) β for a two-step service, (1 - β)2 β for a three-step service, and so on. Consider an example with 4 devices operating in parallel. Let's make a matrix of transition probabilities for the chosen states:

M It can be seen that it has a unique ergodic class, and hence the p × P = p system has a unique solution in the class of probability vectors. We write down the equations of the system that allows us to find this solution:


Now we know the set of probabilities πi that i blocks will be occupied in the system in the stationary mode. Then the fraction of time p 4 = С γ 4 /4 in the system is occupied by all blocks, the system does not respond to requests. The results obtained apply to any number of blocks. Now you can use them: you can compare the cost of additional devices and the reduction in the time the system is completely busy.

You can read more about this example in .

Decision processes with a finite and infinite number of stages

Consider a process in which there are several matrices of transition probabilities. For each moment of time, the choice of one or another matrix depends on the decision we made. The above can be understood with the following example. As a result of the analysis of the soil, the gardener evaluates its condition with one of three numbers: (1) - good, (2) - satisfactory, or (3) - poor. At the same time, the gardener noticed that the productivity of the soil in the current year depends only on its condition in the previous year. Therefore, the probabilities of soil transition without external influences from one state to another can be represented by the following Markov chain with matrix P1:

L Naturally, soil productivity deteriorates over time. For example, if last year the state of the soil was satisfactory, then this year it can only remain the same or become bad, but it will never become good. However, the gardener can influence the state of the soil and change the transition probabilities in the P1 matrix to those corresponding to them from the P2 matrix:

T Now you can assign to each transition from one state to another some function of income, which is defined as profit or loss over a one-year period. The gardener can choose to use or not to use fertilizer, it will depend on his final income or loss. Let us introduce the matrices R1 and R2, which determine the income functions depending on the costs of fertilizers and soil quality:

H Finally, the gardener is faced with the task of what strategy to choose to maximize the average expected income. Two types of problems can be considered: with a finite and infinite number of stages. In this case, someday the activity of the gardener will definitely end. In addition, visualizers solve the decision problem for a finite number of stages. Let the gardener intend to stop his occupation in N years. Our task now is to determine the optimal strategy for the gardener's behavior, that is, the strategy that will maximize his income. The finiteness of the number of stages in our problem is manifested in the fact that the gardener does not care what will happen to his agricultural land for N + 1 years (all years up to N inclusive are important to him). Now it is clear that in this case the problem of finding a strategy turns into a problem of dynamic programming. If we denote by fn(i) the maximum average expected income that can be obtained in stages from n to N inclusive, starting from the state with number i, then it is easy to derive the recursive

W where k is the number of the strategy used. This equation is based on the fact that the total income rijk + fn+1(j) is obtained as a result of the transition from state i at stage n to state j at stage n+1 with probability pijk.

Now the optimal solution can be found by sequentially computing fn(i) in the downward direction (n = N…1). At the same time, the introduction of the vector of initial probabilities into the condition of the problem will not complicate its solution.

This example is also discussed in .

Modeling word combinations in text

Consider a text consisting of the words w. Imagine a process in which the states are words, so that when it is in the state (Si) the system goes to the state (sj) according to the matrix of transition probabilities. First of all, it is necessary to “train” the system: submit a sufficiently large text to the input to estimate the transition probabilities. And then you can build the trajectories of the Markov chain. An increase in the semantic load of a text constructed using the Markov chain algorithm is possible only with an increase in order, where the state is not one word, but sets with greater power - pairs (u, v), triples (u, v, w), etc. And that in the chains of the first, that of the fifth order, there will be little more sense. Meaning will begin to appear when the dimension of the order increases to at least the average number of words in a typical phrase of the source text. But it is impossible to move in this way, because the growth of the semantic load of the text in Markov chains of high orders is much slower than the decline in the uniqueness of the text. And a text built on Markov chains, for example, of the thirtieth order, will still not be so meaningful as to be of interest to a person, but already quite similar to the original text, moreover, the number of states in such a chain will be amazing.

This technology is now very widely used (unfortunately) on the Internet to create content for web pages. People who want to increase traffic to their site and improve its ranking in search engines tend to put as many search keywords on their pages as possible. But search engines use algorithms that can distinguish real text from a rambling jumble of keywords. Then, in order to deceive the search engines, they use texts created by the generator based on the Markov chain. There are, of course, positive examples of using Markov chains to work with text, they are used to determine authorship and analyze the authenticity of texts.

Markov chains and lotteries

In some cases, a probabilistic model is used in predicting numbers in various lotteries. Apparently, there is no point in using Markov chains to model the sequence of different circulations. What happened to the balls in the draw will not affect the results of the next draw, because after the draw the balls are collected, and in the next draw they are placed in the lottery drum in a fixed order. The connection with the previous edition is lost. Another thing is the sequence of balls falling within the same draw. In this case, the fall of the next ball is determined by the state of the lottery drum at the time of the fall of the previous ball. Thus, the sequence of balls falling in one draw is a Markov chain, and such a model can be used. When analyzing numerical lotteries, there is a great difficulty here. The state of the lottery drum after the next ball has fallen determines further events, but the problem is that this state is unknown to us. All we know is that some ball fell out. But when this ball falls out, the other balls can be arranged in different ways, so that there is a group of a very large number of states corresponding to the same observed event. Therefore, we can only construct a matrix of transition probabilities between such groups of states. These probabilities are an average of the transition probabilities between different individual states, which of course reduces the effectiveness of applying the Markov chain model to number lotteries.

Similar to this case, such a neural network model can be used for weather forecasting, currency quotes, and in connection with other systems where there is historical data, and new information can be used in the future. A good application in this case, when only the manifestations of the system are known, but not the internal (hidden) states, can be applied hidden Markov models, which are discussed in detail in Wikibooks (hidden Markov models).

Methods of mathematical descriptions of Markov random processes in a system with discrete states (DS) depend on at what points in time (known in advance or random) the transitions of the system from state to state can occur.
If the transition of the system from state to state is possible at pre-fixed times, we are dealing with random Markov process with discrete time. If the transition is possible at any random time, then we are dealing with random Markov process with continuous time.
Let there be a physical system S, which may be in n states S 1 , S 2 , …, S n. Transitions from state to state are possible only at times t 1 , t 2 , …, t k Let's call these moments of time steps. We will consider the SP in the system S as a function of the integer argument 1, 2, ..., k, where the argument is the step number.
Example: S 1 → S 2 → S 3 → S 2 .
Let us denote Si (k) is an event consisting in the fact that after k steps the system is in state S i.
For any k events S 1 ( k), S 2 ( k),…, S n (k) form full group of events and are incompatible.

The process in the system can be represented as a chain of events.
Example: S 1 (0) , S 2 (1) , S 3 (2) , S 5 (3) ,….
Such a sequence is called Markov chain , if for each step the probability of transition from any state Si in any state S j does not depend on when and how the system came to the state Si.
Let at any time after any k-go step system S can be in one of the states S 1 , S 2 , …, S n, i.e., one event from a complete group of events can occur: S 1 (k), S 2 ( k) , …, S n (k) . Let us denote the probabilities of these events:
P 1 (1) = P(S 1 (1)); P 2 (1) = P(S 2 (1)); …; P n(1) = P(S n (k));
P 1 (2) = P(S 1 (2)); P 2 (2) = P(S2(2)); …; P n(2) = P(S n (2));
P 1 (k) = P(S 1 (k)); P 2 (k) = P(S 2 (k)); …; P n(k) = P(S n (k)).
It is easy to see that for each step number the condition
P 1 (k) + P 2 (k) +…+ P n(k) = 1.
Let's call these probabilities state probabilities.consequently, the task will sound as follows: find the probabilities of the system states for any k.
Example. Let there be some system that can be in any of the six states. then the processes occurring in it can be depicted either in the form of a graph of changes in the state of the system (Fig. 7.9, a), or in the form of a graph of system states (Fig. 7.9, b).

a)

Rice. 7.9
Also, the processes in the system can be depicted as a sequence of states: S 1 , S 3 , S 2 , S 2 , S 3 , S 5 , S 6 , S 2 .
State probability on ( k+ 1)-th step depends only on the state at k- m step.
For any step k there are some probabilities of the system transition from any state to any other state, let's call these probabilities transition probabilities of a Markov chain.
Some of these probabilities will be 0 if the transition from one state to another is not possible in one step.
The Markov chain is called homogeneous if the transition states do not depend on the step number, otherwise it is called heterogeneous.
Let there be a homogeneous Markov chain and let the system S It has n possible states: S 1 , …, S n. Let the probability of transition to another state in one step be known for each state, i.e. P ij(from Si in S j in one step), then we can write the transition probabilities as a matrix.

. (7.1)
On the diagonal of this matrix are the probabilities that the system passes from the state Si in the same state Si.
Using the previously introduced events , the transition probabilities can be written as conditional probabilities:
.
Obviously, the sum of the terms in each row of the matrix (1) is equal to one, since the events form a complete group of incompatible events.

When considering Markov chains, as well as when analyzing a Markov random process, various state graphs are used (Fig. 7.10).

Rice. 7.10

This system can be in any of six states, while P ij is the probability of the system transition from the state Si into a state S j. For this system, we write the equations that the system was in some state and from it during the time t didn't come out:

In the general case, the Markov chain is inhomogeneous, i.e., the probability P ij changes from step to step. Suppose that a matrix of transition probabilities at each step is given, then the probability that the system S on the k-th step will be in the state Si, can be found using the formula

Knowing the matrix of transition probabilities and the initial state of the system, one can find the probabilities of states after any k-th step. Let at the initial moment of time the system be in the state S m. Then for t = 0
.
Find the probabilities after the first step. Out of state S m the system will go into states S 1 , S 2, etc. with probabilities Pm 1 , Pm 2 , …, Pmm, …, Pmn. Then after the first step the probabilities will be equal

. (7.2)
Let's find the probabilities of the state after the second step: . We will calculate these probabilities using the total probability formula with hypotheses:
.
The hypotheses will be the following statements:

  • after the first step the system was in the state S 1 -H 1 ;
  • after the second step the system was in the state S 2 -H 2 ;
  • after n-th step the system was in the state S n -H n .
The probabilities of hypotheses are known from expression (7.2). Conditional state transition probabilities BUT for each hypothesis are also known and recorded in the transition state matrices. Then, according to the total probability formula, we get:

Probability of any state after the second step:

(7.3)
Formula (7.3) summarizes all transition probabilities P ij, but only those other than zero are taken into account. The probability of any state after k-th step:

(7.4)
Thus, the probability of the state after k-th step is determined by the recursive formula (7.4) through the probabilities ( k- 1)th step.

Task 6. The matrix of transition probabilities for a Markov chain in one step is given. Find the transition matrix of a given circuit in three steps .
Solution. The transition matrix of a system is a matrix that contains all the transition probabilities of this system:

Each row of the matrix contains the probabilities of events (transition from the state i into a state j), which form a complete group, so the sum of the probabilities of these events is equal to one:

Denote by p ij (n) the probability that as a result of n steps (tests) the system will move from state i to state j . For example p 25 (10) - the probability of transition from the second state to the fifth in ten steps. Note that for n=1 we obtain transition probabilities p ij (1)=p ij .
We are faced with the task: knowing the transition probabilities p ij , find the probabilities p ij (n) of the system transition from the state i into a state j per n steps. To do this, we introduce an intermediate (between i and j) condition r. In other words, we will assume that from the initial state i per m steps, the system will go to an intermediate state r with probability p ij (n-m), after which, in the remaining n-m steps from the intermediate state r, it will go to the final state j with probability p ij (n-m) . According to the total probability formula, we get:
.
This formula is called Markov's equality. Using this formula, you can find all the probabilities p ij (n) and, consequently, the matrix P n itself. Since the matrix calculus leads to the goal faster, let us write down the matrix relation following from the obtained formula in a general form.
Calculate the transition matrix of the Markov chain in three steps using the resulting formula:

Answer:.

Task #1. The transition probability matrix for the Markov chain is:
.
The distribution over states at time t=0 is determined by the vector:
π 0 \u003d (0.5; 0.2; 0.3)
Find: a) distribution over states at the moments t=1,2,3,4 .
c) stationary distribution.

All possible states of the system in a homogeneous Markov chain, and is the stochastic matrix defining this chain, composed of transition probabilities (See page 381).

Denote by the probability that the system is in a state at a time if it is known that at the time the system was in a state (,). Obviously, . Using the theorems on addition and multiplication of probabilities, we can easily find:

or in matrix notation

Hence, giving successively the values ​​of , we obtain the important formula

If there are limits

or in matrix notation

then the quantities are called limiting or final transition probabilities.

To find out in which cases there are limiting transition probabilities and to derive the corresponding formulas, we introduce the following terminology.

We will call a stochastic matrix and the corresponding homogeneous Markov chain correct if the matrix does not have characteristic numbers that are different from unity and equal in absolute value to unity, and regular if unity is additionally a simple root of the characteristic equation of the matrix .

A regular matrix is ​​characterized by the fact that in its normal form (69) (p. 373) the matrices are primitive. For a regular matrix, additionally .

In addition, a homogeneous Markov chain is called indecomposable, decomposable, acyclic, cyclic, if for this chain the stochastic matrix is, respectively, indecomposable, decomposable, primitive, imprimitive.

Since a primitive stochastic matrix is ​​a special kind of a regular matrix, an acyclic Markov chain is a special kind of a regular chain.

We will show that limiting transition probabilities exist only for regular homogeneous Markov chains.

Indeed, let be the minimal polynomial of a regular matrix . Then

According to Theorem 10, we can assume that

Based on formula (24) Ch. V (page 113)

(96)

where is the reduced adjoint matrix and

If is a regular matrix, then

and therefore on the right side of formula (96) all the terms, except for the first one, tend to zero as . Therefore, for a regular matrix, there is a matrix composed of limiting transition probabilities, and

The opposite is obvious. If there is a gap

then the matrix cannot have a characteristic number for which , but , since then there would be no limit [The same limit must exist due to the existence of the limit (97").]

We have proved that for a correct (and only correct) homogeneous Markov chain there exists a matrix . This matrix is ​​determined by formula (97).

Let us show how the matrix can be expressed in terms of the characteristic polynomial

and the associated matrix .

From identity

by virtue of (95), (95") and (98) it follows:

Therefore, formula (97) can be replaced by the formula

(97)

For a regular Markov chain, since it is a particular type of a regular chain, the matrix exists and is determined by any of formulas (97), (97"). In this case, formula (97") also has the form

2. Consider a regular chain of general type (irregular). We write the corresponding matrix in normal form

(100)

where are primitive stochastic matrices, and indecomposable matrices have maximum characteristic numbers . Assuming

,

write in the form

(101)

But , since all the characteristic numbers of the matrix are less than unity in absolute value. That's why

(102)

Since are primitive stochastic matrices, then the matrices according to formulas (99) and (35) (p. 362) are positive

and in each column of any of these matrices, all elements are equal to each other:

.

Note that the normal form (100) of the stochastic matrix corresponds to the division of the system states into groups:

Each group in (104) corresponds to its own group of series in (101). According to the terminology of L. N. Kolmogorov, the states of the system included in are called essential, and the states included in the remaining groups are called inessential.

It follows from the form (101) of the matrix that, for any finite number of steps (from moment to moment ), only the transition of the system is possible a) from an essential state to an essential state of the same group, b) from an insignificant state to an essential state, and c) from an insignificant state to the non-essential state of the same or an earlier group.

From the form (102) of the matrix, it follows that in the process when the transition is possible only from any state to an essential state, i.e., the probability of transition to any insignificant state tends to zero with the number of steps. Therefore, essential states are sometimes also called limit states.

3. From formula (97) it follows:

.

This shows that each column of the matrix is ​​an eigenvector of the stochastic matrix for the characteristic number .

For a regular matrix, the number 1 is a simple root of the characteristic equation, and this number corresponds to only one (up to a scalar factor) matrix eigenvector . Therefore, in any -th column of the matrix, all elements are equal to the same non-negative number:

Thus, in a regular chain, the limiting transition probabilities do depend on the initial state.

Conversely, if in some regular homogeneous Markov chain the individual transition probabilities do not depend on the initial state, i.e., formulas (104) hold, then in scheme (102) for the matrix, . But then the chain is also regular.

For an acyclic chain, which is a special case of a regular chain, is a primitive matrix. Therefore, for some (see Theorem 8 on p. 377). But then and .

Conversely, from it follows that for some , and this, according to Theorem 8, means that the matrix is ​​primitive and, therefore, the given homogeneous Markov chain is acyclic.

We formulate the results obtained in the form of the following theorem:

Theorem 11. 1. In order for all limiting transition probabilities to exist in a homogeneous Markov chain, it is necessary and sufficient that the chain be regular. In this case, the matrix , composed of the limiting transition probabilities, is determined by formula (95) or (98).

2. In order for the limiting transition probabilities to be independent of the initial state in a regular homogeneous Markov chain, it is necessary and sufficient that the chain be regular. In this case, the matrix is ​​determined by formula (99).

3. In order for all limiting transition probabilities to be different from zero in a regular homogeneous Markov chain, it is necessary and sufficient that the chain be acyclic.

4. Let us introduce columns of absolute probabilities

(105)

where is the probability of the system being in the state (,) at the moment. Using the theorems of addition and multiplication of probabilities, we find:

(,),

or in matrix notation

where is the transposed matrix for the matrix .

All absolute probabilities (105) are determined from formula (106) if the initial probabilities and the matrix of transition probabilities are known

Let us introduce into consideration the limiting absolute probabilities

Passing in both parts of equality (106) to the limit at , we obtain:

Note that the existence of a matrix of limiting transition probabilities implies the existence of limiting absolute probabilities for any initial probabilities and vice versa.

From formula (107) and from the form (102) of the matrix, it follows that the limiting absolute probabilities corresponding to insignificant states are equal to zero.

Multiplying both sides of the matrix equality

to the right, we, by virtue of (107), obtain:

i.e., the column of marginal absolute probabilities is the eigenvector of the matrix for the characteristic number .

If the given Markov chain is regular, then it is a simple root of the characteristic equation of the matrix . In this case, the column of limiting absolute probabilities is uniquely determined from (108) (since and ).

Let a regular Markov chain be given. Then from (104) and from (107) it follows:

(109)

In this case, the limiting absolute probabilities do not depend on the initial probabilities .

Conversely, may not depend on in the presence of formula (107) if and only if all rows of the matrix are the same, i.e.

,

and therefore (by Theorem 11) is a regular matrix.

If is a primitive matrix, then , and hence due to (109)

Conversely, if all and do not depend on the initial probabilities, then in each column of the matrix all elements are the same and according to (109) , and this, according to Theorem 11, means that is a primitive matrix, i.e., this chain is acyclic.

It follows from the above that Theorem 11 can be formulated as follows:

Theorem 11. 1. In order for all limiting absolute probabilities to exist in a homogeneous Markov chain for any initial probabilities, it is necessary and sufficient that the chain be regular.

2. In order for a homogeneous Markov chain to have limiting absolute probabilities for any initial probabilities and not depend on these initial probabilities, it is necessary and sufficient that the chain be regular.

3. In order for a homogeneous Markov chain to have positive limiting absolute probabilities for any initial probabilities and these limiting probabilities not to depend on the initial ones, it is necessary and sufficient that the chain be acyclic.

5. Consider now a homogeneous Markov chain of general type with a matrix of transition probabilities .

Let us take the normal form (69) of the matrix and denote by the imprimitivity indices of the matrices in (69). Let be the least common multiple of integers . Then the matrix does not have characteristic numbers equal in absolute value to one, but different from one, i.e., is a regular matrix; at the same time - the smallest indicator, at which - the correct matrix. We call a number the period of a given homogeneous Markov chain and. Conversely, if and defined by formulas (110) and (110").

The average limiting absolute probabilities corresponding to non-essential states are always equal to zero.

If there is a number in the normal form of the matrix (and only in this case), the average limiting absolute probabilities do not depend on the initial probabilities and are uniquely determined from equation (111).

Today I would like to tell you about writing a class to simplify the work with Markov chains.

Please under cat.

Basic knowledge:

Representation of graphs in the form of an adjacency matrix, knowledge of the basic concepts of graphs. Knowledge of C++ for the practical part.

Theory

A Markov chain is a sequence of random events with a finite or countable number of outcomes, characterized by the property that, loosely speaking, with a fixed present, the future is independent of the past. Named after A. A. Markov (senior).

In simple terms, a Markov chain is a weighted graph. Events are located at its vertices, and the weight of the edge connecting vertices A and B is the probability that event A will be followed by event B.

Quite a few articles have been written about the use of Markov chains, but we will continue to develop the class.

Here is an example of a Markov chain:

In the following, we will consider this scheme as an example.

Obviously, if there is only one outgoing edge from vertex A, then its weight will be equal to 1.

Notation
At the vertices we have events (from A, B, C, D, E...). On the edges, the probability that after the i-th event there will be an event j > i. For convention and convenience, I numbered the peaks (No. 1, No. 2, etc.).

A matrix is ​​the adjacency matrix of a directed weighted graph, which represents the Markov chain. (more on this later). In this particular case, this matrix is ​​also called the transition probability matrix or simply the transition matrix.

Matrix representation of a Markov chain
We will represent Markov chains using a matrix, exactly the adjacency matrix with which we represent graphs.

Let me remind you:

The adjacency matrix of a graph G with a finite number of vertices n (numbered from 1 to n) is a square matrix A of size n, in which the value of the element aij is equal to the number of edges from the i-th vertex of the graph to the j-th vertex.

More about adjacency matrices - in the course of discrete mathematics.

In our case, the matrix will have a size of 10x10, let's write it:

0 50 0 0 0 0 50 0 0 0
0 0 80 20 0 0 0 0 0 0
0 0 0 0 100 0 0 0 0 0
0 0 0 0 100 0 0 0 0 0
0 0 0 0 0 100 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 70 30 0
0 0 0 0 0 0 0 0 0 100
0 0 0 0 0 0 0 0 0 100
0 0 0 0 0 100 0 0 0 0

Idea
Take a closer look at our matrix. In each row, we have non-zero values ​​in those columns whose number matches the next event, and the non-zero value itself is the probability that this event will occur.

Thus, we have the values ​​of the probability of the occurrence of an event with a number equal to the number column after an event with a number equal to lines.

Those who know the theory of probability could guess that each line is a probability distribution function.

Markov chain traversal algorithm

1) initialize the initial position k with a zero vertex.
2) If the vertex is not final, then we generate a number m from 0...n-1 based on the probability distribution in row k of the matrix, where n is the number of vertices, and m is the number of the next event (!). Otherwise we leave
3) The number of the current position k is equal to the number of the generated vertex
4) Go to step 2

Note: the vertex is final if the probability distribution is zero (see the 6th row in the matrix).

Pretty neat algorithm, right?

Implementation

In this article, I want to separately take out the implementation code of the described bypass. The initialization and filling of the Markov chain is of no particular interest (see the complete code at the end).

Implementation of the traversal algorithm

template Element *Markov ::Next(int StartElement = -1) ( if (Markov ::Initiated) // if the adjacency matrix is ​​created ( if (StartElement == -1) // if the default start element is StartElement = Markov ::Current; // then continue (in the constructor Current = 0) std::random_device rd; std::mt19937gen(rd()); std::discrete_distribution<>dicr_distr(Markov ::AdjacencyMatrix.at(Current).begin(), Markov ::AdjacencyMatrix.at(Current).end()); // initialize the container to generate a number based on the probability distribution int next = dicr_distr(gen); // generate the next vertex if (next == Markov ::size()) // subtleties of the generator, if the probability distribution is zero, then it returns the number of elements return NULL; Markov ::Current = next; // change the current vertex return &(Markov ::elems.at(next)); // return the value at the top ) return NULL; )

This algorithm looks especially simple due to the features of the container discrete_distribution. It is rather difficult to describe in words how this container works, so let's take the 0th row of our matrix as an example:

0 50 0 0 0 0 50 0 0 0

As a result of the generator, it will return either 1 or 6 with a probability of 0.5 for each. That is, it returns the column number (which is equivalent to the number of the vertex in the chain) where to continue moving further.

An example program that uses the class:

Implementation of a program that traverses the Markov chain from the example

#include #include "Markov.h" #include #include using namespace std; int main() ( Markov chain; outstream outs; outs.open("out.txt"); ifstream ins; ins.open("matrix.txt"); int num; doubleProb = 0; (ins >> num).get(); // number of vertices string str; for (int i = 0; i< num; i++) { getline(ins, str); chain.AddElement(str); // добавляем вершину } if (chain.InitAdjacency()) // инициализируем матрицу нулями { for (int i = 0; i < chain.size(); i++) { for (int j = 0; j < chain.size(); j++) { (ins >>Prob).get(); if (!chain.PushAdjacency(i, j, Prob)) // push matrix ( cerr<< "Adjacency matrix write error" << endl; } } } outs << chain.At(0) << " "; // выводим 0-ю вершину for (int i = 0; i < 20 * chain.size() - 1; i++) // генерируем 20 цепочек { string *str = chain.Next(); if (str != NULL) // если предыдущая не конечная outs << (*str).c_str() << " "; // выводим значение вершины else { outs << std::endl; // если конечная, то начинаем с начала chain.Current = 0; outs << chain.At(0) << " "; } } chain.UninitAdjacency(); // понятно } else cerr << "Can not initialize Adjacency matrix" << endl;; ins.close(); outs.close(); cin.get(); return 0; }


An example of the output that the program generates: