[PYTHON] Basics of Quantum Information Theory: Data Compression (2)

\def\bra#1{\mathinner{\left\langle{#1}\right|}} \def\ket#1{\mathinner{\left|{#1}\right\rangle}} \def\braket#1#2{\mathinner{\left\langle{#1}\middle|#2\right\rangle}}

Introduction

Last time, I understood "Shannon's coding theorem in noise-free channels" in classical information theory, so this time it is the quantum version of "Noisy". We will study "Schumach's coding theorem in quantum channels without noise". Shannon's theorem could be proved using the concept of "typical series", but Schumacher's theorem can also be proved using a similar concept with almost the same logical expansion. After that explanation, let's simulate the coding using the quantum calculation simulator qlazy.

The following documents were used as references.

  1. Nielsen, Chan "Quantum Computer and Quantum Communication (3)" Ohmsha (2005)
  2. Ishizaka, Ogawa, Kawachi, Kimura, Hayashi "Introduction to Quantum Information Science" Kyoritsu Shuppan (2012)
  3. Tomita "Quantum Information Engineering" Morikita Publishing (2017)

Entanglement fidelity

Before that, let's take a look at the "entanglement fidelity" that we will use in a later discussion. I haven't touched on it in previous articles, so let's take a quick look at its definition and nature.

Entertainment fidelity is a measure of how well a CPTP map preserves the state of a system. Suppose that the density operator $ \ rho ^ {A} $ in a quantum system A becomes $ \ Gamma (\ rho ^ {A}) $ by the CPTP map. At this time, if the purification of $ \ rho ^ {A} $ is $ \ rho ^ {AR} = \ ket {\ Psi} ^ {AR} \ bra {\ Psi} ^ {AR} $, entanglement faithfulness Degree $ F_ {e} ^ {2} $ uses fidelity $ F $ between two density operators to

F_{e}^{2}(\rho^{A}, \Gamma) = F(\rho^{AR}, (\Gamma \otimes I^{R})(\rho^{AR}))^{2} = \bra{\Psi}^{AR} (\Gamma \otimes I^{R})(\rho^{AR}) \ket{\Psi}^{AR}  \tag{1}

It is defined as [^ 1]. This is nothing but the inner product of the purified states, so its upper limit is 1.

[^ 1]: In Introduction to Quantum Information Science, which has a detailed explanation of entanglement fidelity, $ F_ {e} ^ {2} The $ symbol was used. It seems to be a habit of adding 2 to the shoulder of $ F_ {e} $ to mean that it is the square of fidelity.

Also, the CPTP map uses the Claus operator $ \ {M_k \} $ to $ \ Gamma (\ rho) = \ sum_ {k} M_ {k} \ rho M_ {k} ^ {\ dagger} $ I was able to express it. Then, the entanglement fidelity is

F_{e}^{2}(\rho^{A}, \Gamma) = \sum_{k} |Tr(\rho^{A} M_k)|^{2}  \tag{2}

You can also write like this. this is,

\begin{align}
F_{e}^{2}(\rho^{A}, \Gamma) &= \bra{\Psi}^{AR} \sum_{k} (M_k \otimes I_k) \rho^{AR} (M_{k}^{\dagger} \otimes I_k) \ket{\Psi}^{AR} \\
&= \sum_{k} \bra{\Psi}^{AR} (M_k \otimes I_k) \ket{\Psi}^{AR} \bra{\Psi}^{AR} (M_{k}^{\dagger} \otimes I_k) \ket{\Psi}^{AR}  \\
&= \sum_{k} Tr(\rho^{AR} (M_k \otimes I_k)) Tr(\rho^{AR} (M_{k}^{\dagger} \otimes I_k)) \\
&= \sum_{k} Tr(\rho^{A} M_k) Tr(\rho^{A} M_{k}^{\dagger}) \\
&= \sum_{k} |Tr(\rho^{A} M_k)|^{2}  \tag{3}
\end{align}

You can see from that.

Problem setting

Well then, the main subject.

This time, the target is not "classical information" but "quantum information". In other words, "classical information" is transmitted on the "quantum state" (= encoded), and the quantum state itself is transmitted instead of being measured (= decoded) on the receiving side to obtain "classical information". .. At that time, we will consider how the "quantum state" can be compressed. It may sound easy to say that, but first of all, let's check what you are actually trying to do physically.

In quantum information theory, the quantum state is the density operator $ \ rho $. What is described by the density operator is generally a mixed state. A mixed state manifested itself as a stochastic ensemble of pure states, or as a large pure state subset. For example, the former is the state that Alice has a photon launcher that can set the pure state arbitrarily, and it appears when it is stochastically switched and fired, and the latter is a part of the qubit of the quantum computer. It is a state that appears by taking out (for example,).

Consider transmitting one after another with the state described by this density operator as one unit. In the case of classical information theory, the transmission of the English alphabet in chronological order was modeled by the sequence of random variables $ X_1, X_2, \ cdots $, but similarly, the quantum states to be transmitted are dense. Model as a series of operators $ \ rho_1, \ rho_2, \ cdots $. In Alice's example, Alice first performs a photon firing action equivalent to $ \ rho_1 $, then takes a breather, then performs a photon firing action equivalent to $ \ rho_2 $, and so on. Or, in fact, there may be multiple Alice (!), Alice 1 is in charge of $ \ rho_1 $, Alice 2 is in charge of $ \ rho_2 $, and all Alice shoots photons at once. Maybe. In the example of a quantum computer, we take out a specific partial qubit of the quantum state obtained by a certain quantum circuit, consider it as $ \ rho_1 $, emit it to the quantum communication path, take a break, and do the same thing. It is an image that generates a density operator series by repeatedly releasing $ \ rho_2 $ and so on. Alternatively, you may use a large quantum circuit to create a series of $ \ rho_1, \ rho_2, \ cdots $ in parallel, as in the case of multiple Alices earlier [^ 2]. ].

[^ 2]: Now, I have illustrated two images, one is when the states described by the density operator are emitted one after another in chronological order, and the other is when multiple emissions are made at once. Is a unitary transformation that spans multiple states, that is, it is entangled with each other (described later), so it is better to have multiple images emitted at the same time in your head. It may be easy. Technically, I think it doesn't matter which one you can entangle.

Here, the generated quantum state is transmitted via some transmission line. Alice's photons may be delivered to Bob through an optical fiber, and quantum states emitted from a quantum computer may be delivered to another quantum computer via the quantum internet [^ 3]. In any case, since it is a story involving a physical system, it is better not to think that an infinite dimensional quantum state can be transmitted instantly. Each $ \ rho_i $ is a density operator defined on the Hilbert space of the $ d $ dimension (hence the qubit number is $ \ log d $). If you want to send $ n $ in bulk, the number of qubits required to generate information is naturally $ n \ log d $. Since there must be physical restrictions on the transmission path, some state conversion is applied to transmit in a form in which the number of qubits is reduced as much as possible, and the receiving side performs some conversion to the received quantum state to obtain the original quantum state. I want to restore as little error as possible.

[^ 3]: I think the realization of the quantum Internet will be several decades later, but it's fun to imagine (delusion?) The future in which quantum computers around the world are connected like this!

At this time, is there a limit to the number of qubits that can be compressed? If so, how is it quantified? The "Schumacher's coding theorem" that we will study this time is the answer.

Typical subspace

First, as in the case of Shannon's theorem, simply model the source. It is assumed that each $ \ rho_i $ has the same probability distribution (= has the same element when expressed in a matrix), and $ \ rho_i $ belonging to different $ i $ are independent of each other (= uncorrelated). Such a quantum source is called an "i.i.d quantum source". Subsequent discussions assume this.

Now, consider the density operator series $ \ {\ rho_i \} $ as one composite system $ \ sigma $. That is,

\sigma = \rho_1 \otimes \rho_2 \cdots  \tag{4}

is. This $ \ rho_1, \ rho_2, \ cdots $ is subscripted for distinction, but the entity has the same shape. If you represent it with the symbol $ \ rho $ and decompose the spectrum,

\rho = \sum_{x} p(x) \ket{x} \bra{x}  \tag{5}

Can be written. If $ \ rho $ is in one quantum state, its spectral decomposition is linear for two orthogonal states, for example $ \ ket {up} \ bra {up} $ and $ \ ket {down} \ bra {down} $. It becomes a sum, and its coefficient p (x) can be interpreted as the probability that the state $ \ ket {x} $ will occur. That is, the probability that the state is $ \ ket {up} $ is p (up), the probability that the state is $ \ ket {down} $ is p (down), and the sum of the two is 1. Also, the quantum entropy $ S (\ rho) $ at this time is equal to the classical entropy $ H (p (x)) $ for p (x) = {p (up), p (down)}. Then, the probability that $ n $ pieces are taken out from the infinite number of $ \ rho $ series and that the state is $ \ ket {x_1} \ ket {x_2} \ cdots \ ket {x_n} $ is You can think of it exactly as in the case of classical theory [^ 4]. That is, there is a typical pattern that is very likely to occur, and its probability is $ p (x_1, x_2, \ cdots, x_n) = p (x_1) p (x_2) \ cdots p (x_n) \ approximation 2 It can be approximated by ^ {-nS (p)} $.

[^ 4]: For example, if you think of $ \ ket {up} $ as the "front" and $ \ ket {down} $ as the "back", you may throw coins many times. You can think in the same way.

And the probability that the typical pattern will occur can be quantitatively characterized as having a width of $ \ epsilon $ as follows.

2^{-n(S(\rho)+\epsilon)} \leq p(x_1)p(x_2) \cdots p(x_n) \leq 2^{-n(S(\rho)-\epsilon)}  \tag{6}

This is also

|\frac{1}{n} \log \frac{1}{p(x_1) \cdots p(x_n)} - S(\rho) | \leq \epsilon  \tag{7}

Can be rewritten as [^ 5].

[^ 5]: Please refer to equations (3) (8) (9) in Previous article.

Here, the set of states $ \ ket {x_i} \ ket {x_2} \ cdots \ ket {x_n} $ corresponding to the typical series pattern that occurs with the probability characterized by this $ \ epsilon $ is "$ ". We call it "epsilon $ typical subspace ($ \ epsilon $ -typical subspace)" and write it as $ T (n, \ epsilon) $.

Then, we can also define a projection operator $ P (n, \ epsilon) $ on a typical subspace of $ \ epsilon $.

P(n,\epsilon) = \sum_{x \in T(n,\epsilon)} \ket{x_1}\bra{x_1} \otimes \ket{x_2}\bra{x_2} \otimes \cdots \otimes \ket{x_n}\bra{x_n}  \tag{8}

It can be written as (this is a concept not found in classical theory).

Now that we're ready, let's look at three theorems that hold for typical subspaces.

Typical subspace theorem (1)

The description of Nielsen, Chan is quoted as it is.

"Fix $ \ epsilon> 0 $. For any $ \ delta> 0 $ and a sufficiently large $ n $

Tr(P(n,\epsilon)\rho^{\otimes n}) \geq 1-\delta  \tag{9}

It is almost the same as "Typical Series Theorem (1)" in classical information theory. In the previous article (https://qiita.com/SamN/items/12cc4f3057b36381e474), I explained what the unique wording of $ \ epsilon-\ delta $ means, so I won't repeat it here. The difference is that the probability of being a $ \ epsilon $ typical subspace is represented by $ Tr $. Is this okay? Since we have mapped the original density operator to the $ \ epsilon $ typical subspace by a projection operation and traced it, this is the probability of being a $ \ epsilon $ typical subspace.

Let's prove it.

[Proof]

\begin{align}
Tr(P(n,\epsilon) \rho^{\otimes n}) &= Tr(\sum_{x \in T(n,\epsilon)} p(x_1) \ket{x_1}\bra{x_1} \otimes p(x_2) \ket{x_2}\bra{x_2} \otimes \cdots \otimes p(x_n) \ket{x_n}\bra{x_n}) \\
&= \sum_{x \in T(n,\epsilon)} p(x_1) \cdots p(x_n) Tr(\ket{x_1}\bra{x_1} \otimes \cdots \otimes \ket{x_n}\bra{x_n}) \\
&= \sum_{x \in T(n,\epsilon)} p(x_1) \cdots p(x_n)  \tag{10}
\end{align}

Therefore, the left side of equation (9) is equal to the "probability of being a typical series" in classical information theory. Therefore, it can be proved immediately from the typical series theorem (1) proved in Previous article. (End of proof)

Typical subspace theorem (2)

The description of Nielsen, Chan is quoted.

"Any fixed\epsilon>0When\delta>0AgainstT(n,\epsilon)Dimension|T(n,\epsilon)|=Tr(P(n,\epsilon))Satisfies the following equation.

(1-\delta) 2^{n(S(\rho)-\epsilon)} \leq |T(n,\epsilon)| \leq 2^{n(S(\rho)+\epsilon)}  \tag{11}

here,T(n,\epsilon)Please note that we are referring to "dimensions" rather than "numbers". In classical information|T(n,\epsilon)|Is\epsilon典型系列の「数」を表していましたが、量子情報でIs、T(n,\epsilon)Is、\epsilon典型部分空間、すなわち全体のヒルベルト空間に対する部分空間なので、数でIsなく次元と呼ぶのが適切です。この部分空間へIs射影演算子P(n,\epsilon)It can be reached by applying. And the projection operatorP(n,\epsilon)Is、式(8)Was defined in. That\sumの中身Is各々が直交する軸への射影であり、それを典型的になっているxの分だけ足し合わせる格好になっています。従って、古典情報の典型系列の数に相当するものIs、量子情報においてIs、状態xのバリエーションの数、つまり、典型部分空間の次元|T(n,\epsilon)|ということになります。また、典型部分空間の次元Is、式(8)As you can see by taking the traces on both sides ofTr(P(n,\epsilon))Is equal to This theorem(2)の記述に関する注意ポイントIs以上です。それでIs、証明します。

[Proof]

By replacing $ H (X) $ in the typical series theorem (2) proved in previous article with $ S (\ rho) $, You can prove it immediately. (End of proof)

Typical subspace theorem (3)

The description of Nielsen, Chan is quoted.

"$ S (n) $ is an operator that projects $ H ^ {\ otimes n} $, which is at most $ 2 ^ {nR} $ dimension, into any subspace. Where $ R \ <S (\ rho) $ Is fixed. At this time, for any $ \ delta > 0 $ and $ n $ that is large enough,

Tr(S(n)\rho^{\otimes n}) \leq \delta  \tag{12}

Similar to the typical series theorem (3) in previous article, the claim of this theorem is sufficient no matter how small $ \ delta $ is taken. If we take a large $ n $, we can make equation (12) hold. In short, it says that the limit of $ n \ to \ infinity $ is $ Tr (S (n) \ rho ^ {\ otimes n}) \ to 0 $. Let's prove it.

[Proof]

Divide the projection trace using $ S (n) $ (that is, the sum of probabilities) into a trace for a typical subspace and a trace for an atypical subspace.

Tr(S(n)\rho^{\otimes n}) = Tr(S(n) \rho^{\otimes n} P(n,\epsilon)) + Tr(S(n) \rho^{\otimes n} (I-P(n,\epsilon)))  \tag{13}

Where $ P (n, \ epsilon) $ and $ \ rho ^ {\ otimes n} $ are commutative, that is,

\begin{align}
\rho^{\otimes n} P(n,\epsilon) &= \sum_{x \in T(n,\epsilon)} p(x_1) \cdots p(x_n) \ket{x_1}\bra{x_1} \cdots \ket{x_n}\bra{x_n} \\
&= P(n,\epsilon) \rho^{\otimes n}  \tag{14}
\end{align}

And since $ (P (n, \ epsilon)) ^ 2 = P (n, \ epsilon) $ (because it is a projection), the first term on the right side of equation (13) is

Tr(S(n) P(n,\epsilon) \rho^{\otimes n} P(n,\epsilon))  \tag{15}

It will be. Here, the eigenvalue of $ P (n, \ epsilon) \ rho ^ {\ otimes n} P (n, \ epsilon) $ is $ p (x_1) p (x_2) \ cdots p (x_n) $, and the expression From (6), the upper limit is $ 2 ^ {-n (S (\ rho)-\ epsilon)} $. $ S (n) $ in the trace of equation (15) is equivalent to the operation of extracting $ 2 ^ {nR} $ eigenvalues [^ 6], and they are added together in the trace, so in the end,

[^ 6]: $ P (n, \ epsilon) \ rho ^ {\ otimes n} P (n, \ epsilon) $ is the basis of a typical subspace $ \ ket {x_1} \ cdots \ ket {x_n} $ When diagonalized with, the eigenvalues $ p (x_1) \ cdots p (x_n) $ are lined up in a row on the diagonal components. You can think of $ S (n) $ as a projection that takes $ 2 ^ {nR} $ from it and zeros the rest.

Tr(S(n) P(n,\epsilon) \rho^{\otimes n} P(n,\epsilon)) \leq 2^{nR} 2^{-n(S(\rho)-\epsilon)} \approx 2^{-n(S(\rho)-R)}  \tag{16}

Since the inequality was established and it was assumed that $ R <S (\ rho) $, the first term on the right side of equation (13) converges to 0 at the limit of $ n \ to \ infinity $.

Next, consider the second term on the right-hand side of equation (13). Since $ S (n) \ leq I $ and $ S (n) $ and $ \ rho ^ {\ otimes n} (I-P (n, \ epsilon)) $ are both positive values,

0 \leq Tr(S(n)\rho^{\otimes n} (I-P(n,\epsilon))) \leq Tr(\rho^{\otimes n} (I-P(n,\epsilon)))  \tag{17}

Holds [^ 7]. $ IP (n, \ epsilon) \ to 0 $ at the limit of $ n \ to \ infinty $, so $ Tr (S (n) \ rho ^ {\ otimes n} (IP (n, \ epsilon))) \ to \ infinty 0 $, that is, the second term on the right-hand side of equation (13) also converges to 0.

Therefore, equation (13) converges to 0 at the limit of $ n \ to \ infinity $. (End of proof)

[^ 7]: Hermitian operator $ A $ is positive for any $ \ ket {\ psi} $, $ \ bra {\ psi} A \ ket {\ psi} \ geq 0 It says that $ holds. Also, when $ \ bra {\ psi} (AB) \ ket {\ psi} \ geq 0 $ holds for the Hermitian operators $ A, B $ and any $ \ ket {\ psi} $, $ Write A \ geq B $. Under the premise that $ AB $ is also a positive value for the positive Hermitian operators $ A and B $, $ Tr (AB) \ geq 0 $ holds, and if you look at $ B = I $ , $ Tr (A) \ geq 0 $. The equal sign holds only when $ A = 0 $, that is, $ A = 0 \ Leftrightarrow Tr (A) = 0 $. Then, $ A \ geq B \ Rightarrow Tr (A) \ geq Tr (B) $ holds, and $ A \ geq B, C \ geq 0 \ Rightarrow Tr (AC) \ geq Tr (BC) $ can be proved. I will. And so on are used behind the scenes.

Schumacher's coding theorem in a noise-free channel

Now we are ready to prove Schumacher's theorem, which is today's main event. Again, I will quote the description of Nielsen, Chan.

"$ \ {H, \ rho \} $ is the iid quantum source. If $ R> S (\ rho) $, then the source $ \ {H, \ rho \} $ On the other hand, there is a reliable compression method with a rate of $ R $. If $ R <S (\ rho) $, any compression method with a rate of $ R $ is unreliable. "

$ \ {H, \ rho \} $ is a series of states (density operator) $ \ rho $ defined on a certain Hilbert space $ H $, which is used as an iid quantum information source. The premise is that you should first imagine it. At that time, if the transmission rate $ R $ is set to a value slightly larger than the quantum entropy $ S (\ rho) $, it is claimed that a reliable compression method exists. Let's prove it.

[Proof]

First, when $ R> S (\ rho) $.

Consider $ \ epsilon> 0 $ that satisfies $ S (\ rho) + \ epsilon \ leq R $. From the typical subspace theorem (2)

|T(n,\epsilon)| \leq 2^{n(S(\rho)+\epsilon)} \leq 2^{nR}  \tag{18}

Is established. Here, $ H_ {c} ^ {n} $ is a Hilbert space of arbitrary $ 2 ^ {nR} $ dimension and contains $ T (n, \ epsilon) $ [^ 8].

[^ 8]: The Hilbert space that describes the original quantum source $ \ rho_1 \ otimes \ cdots \ rho_n $ is $ H ^ {\ otimes n} $, and the Hilbert space that describes the encoded quantum source. The subspace of is written as $ H_ {c} ^ {n} (\ subset H ^ {\ otimes n}) $. Then, $ H_ {c} ^ {n} $ is determined to contain $ \ epsilon $ typical subspace $ T (n, \ epsilon) $. Since equation (18) holds, it is possible to decide so.

The encoding is done as follows. Measure the quantum information $ \ sigma \ equiv \ rho_1 \ otimes \ cdots \ rho_n $ generated from the quantum source with the projective operators $ P (n, \ epsilon), IP (n, \ epsilon) $ that are orthogonal to each other. I will. If the obtained result is expressed as "0", "1", if "0" is obtained, it means that it is in a typical subspace state, and if it is "1", it means that it is in a state on a typical subspace. Instead, it is interpreted as a state on an atypical subspace. If it is "0", the quantum state is sent as it is, and if it is "1", it is replaced with some appropriate state and sent (for example, $ \ ket {00 \ cdots 0} $).

As a result, the original quantum state is transformed as follows.

C^{n}(\sigma) \equiv P(n,\epsilon) \sigma P(n,\epsilon) + \sum_{i} A_i \sigma A_i^{\dagger} \tag{19}

Where $ C ^ {n} $ is

C^{n}: H^{\otimes n} \rightarrow H_{c}^{n}  \tag{20}

It is an operation that maps like. In addition, the second term on the right side describes the effect of replacing the atypical subspace with an appropriate state, for example.

A_i \equiv \ket{0}\bra{i}  \tag{21}

You can think of it as something like ($ \ ket {i} $ is the basis for the orthogonal complement of a typical subspace).

Decryption operation $ D ^ {n} $ is an identity operation [^ 9].

D^{n}(\sigma) = \sigma  \tag{22}

[^ 9]: When encoding, I sent it as it is without doing anything for the state on the typical subspace, so it is correct to do nothing when decoding. If it is in a state on an atypical subspace, we have given up because of an error, so we will leave it as it is without doing anything in particular when decoding.

The entanglement fidelity $ F_ {e} ^ {2} $ evaluates how much the original state changed during the encoding and decoding process. From the fact that the maximum value of entanglement fidelity is 1, using equation (2) by the Klaus operator, and from the theorem (1) of the typical subspace,

\begin{align}
1 &\geq F_{e}^{2}(\rho^{\otimes n}, D^{n} \circ C^{n}) \\
&\geq |Tr(\rho^{\otimes n} P(n,\epsilon))|^{2} \\
&\geq |1-\delta|^{2} \geq 1-2\delta  \tag{23}
\end{align}

Is established. Therefore, no matter how small $ \ delta> 0 $ is, if you bring a sufficiently large $ n $, the above equation can be established. That is, at the limit of $ n \ to \ infinity $, the entanglement fidelity converges to 1. In other words, there is a reliable compression method with a rate of $ R $.

Next is the case of $ R <S (\ rho) $.

The coding operation can be thought of as a mapping from $ H ^ {\ otimes n} $ to a subspace of the $ 2 ^ {nR} $ dimension by the projection operator $ S (n) $.

The Claus operator for the coding operation $ C ^ {n} $ is $ \ {C_j \} $, and the Claus operator for the decoding operation $ D ^ {n} $ is $ \ {D_k \} $. Put. That is, the coding is

\begin{align}
&\rho_1 \otimes \cdots \otimes \rho_n \\
&\rightarrow \rho_{1}^{\prime} \otimes \cdots \otimes \rho_{n}^{\prime} \\
&= C_{1} \rho_1 C_{1}^{\dagger} \otimes \cdots \otimes C_{n} \rho_n C_{n}^{\dagger} = \sum_{j} C_{j} \rho^{\otimes n} C_{j}^{\dagger}  \tag{24}
\end{align}

So, the decryption is

\begin{align}
&\rho_{1}^{\prime} \otimes \cdots \otimes \rho_{n}^{\prime} \\
&\rightarrow \rho_{1}^{\prime\prime} \otimes \cdots \otimes \rho_{n}^{\prime\prime} \\
&= D_{1} \rho_{1}^{\prime} D_{1}^{\dagger} \otimes \cdots \otimes D_{n} \rho_{n}^{\prime} D_{n}^{\dagger} = \sum_{k} D_{k} \rho^{\prime \otimes n} D_{k}^{\dagger}  \tag{25}
\end{align}

is. Then, the entanglement fidelity by encoding / decoding is

F_{e}^{2}(\rho^{\otimes n}, D^{n} \circ C^{n}) = \sum_{j,k} |Tr(D_{k} C_{j} \rho^{\otimes n})|^{2}  \tag{26}

It will be. Since $ C_j $ is a projection operation on the same subspace as the $ 2 ^ {nR} $ dimensional subspace projected by $ S (n) $,

C_j = S(n) C_j  \tag{27}

is. Also, the $ 2 ^ {nR} $ dimension Hilbert space mapped by $ S (n) $ sets the projection operator to the subspace mapped by $ D_k $ as $ S ^ {k} (n) $. Then,

S^{k}(n) D_{k} S(n) = D_{k} S(n)  \tag{28}

Is established. From equations (27) and (28)

D_{k} C_{j} = D_{k} S(n) C_{j} = S^{k}(n) D_{k} S(n) C_{j} = S^{k}(n) D_{k} C_{j} \tag{29}

Substituting this into equation (26)

F_{e}^{2}(\rho^{\otimes n}, D^{n} \circ C^{n}) = \sum_{j,k} |Tr(D_{k} C_{j} \rho^{\otimes n} S^{k}(n))|^{2}  \tag{30}

It will be. Furthermore, from the Cauchy-Schwartz inequality,

F_{e}^{2}(\rho^{\otimes n}, D^{n} \circ C^{n}) \leq \sum_{j,k} Tr(D_{k} C_{j} \rho^{\otimes n} C_{j}^{\dagger} D_{k}^{\dagger}) Tr(S^{k}(n) \rho^{\otimes n}) \tag{31}

Holds [^ 10].

[^ 10]: However, it is a mystery how this formula transformation is done. I would like to supplement it when I understand it.

From the typical subspace theorem (3), the second trace on the right side is for any $ \ delta> 0 $.

Tr(S^{k}(n) \rho^{\otimes n}) \leq \delta  \tag{32}

Can be done. Therefore, equation (31)

F_{e}^{2}(\rho^{\otimes n}, D^{n} \circ C^{n}) \leq \delta \sum_{j,k} Tr(D_{k} C_{j} \rho^{\otimes n} C_{j}^{\dagger} D_{k}^{\dagger}) = \delta \tag{33}

It will be. Here, the last equal sign consists of saving the trace of the density operator by $ C ^ {n}, D ^ {n} $. From equation (33), the entanglement fidelity converges to 0 at the limit of $ n \ to \ infinity $, so after all, if $ R <S (\ rho) $, any compression method is unreliable. It was proved that. (End of proof)

Coding simulation

There is an example of Schumacher's encoding in "Column 12.4" of Nielsen, Chan, so I would like to implement it. As one qubit state

\rho = \frac{1}{4}
\begin{pmatrix}
3 & 1 \\
1 & 1
\end{pmatrix}  \tag{34}

Consider the density operator. Four of these are transmitted together. In other words, $ \ rho ^ {\ otimes 4} $ is transmitted as one series. According to Schumacher's coding theorem, we should have considered that the transmission rate is slightly larger than the quantum entropy $ S (\ rho) $. In the case of equation (34), $ S (\ rho) $ is about 0.6 when calculated, so it is assumed that the transmission rate is $ R = 0.75 $ and 4 qubits are compressed to 3 qubits. So, I would like to try to restore the 4-qubit state again on the decoding side.

When Schumacher is encoded, it is projected onto a typical subspace, but before that, we need to think about the basis for diagonalizing $ \ rho $. That is, it decomposes $ \ rho $ into spectra (as in equation (5)).

\rho = p \ket{\bar{0}} \bra{\bar{0}} +(1-p) \ket{\bar{1}} \bra{\bar{1}}  \tag{35}

It will be. here,

\begin{align}
\ket{\bar{0}} &= \cos \frac{\pi}{8} \ket{0} + \sin \frac{\pi}{8} \ket{1}  \\
\ket{\bar{1}} &= -\sin \frac{\pi}{8} \ket{0} + \cos \frac{\pi}{8} \ket{1}  \tag{36}
\end{align}
p = \frac{1}{4} (3 + \tan \frac{\pi}{8}) = 0.85..  \tag{37}

Is [^ 11].

[^ 11]: Equation (34) can be easily derived by substituting equations (36) and (37) into equation (35). Also, by diagonalizing equation (34) (solving the eigenvalue problem), equations (35) (36) (37) can be derived. It's a simple exercise in linear algebra.

Based on this basis, $ \ rho $ has a probability of being in the state $ \ ket {\ bar {0}} $ $ p \ approx 0.85 $, $ \ ket {\ bar {1}} $. It is a mixture of pure states with a probability of $ 1-p \ approx 0.15 $. The original expression (34) of $ \ rho $ is expressed by the basis $ \ ket {0}, \ ket {1} $, so the first thing to do when encoding Schumacher is this. It is a unitary transformation equivalent to such a basis transformation.

As you can easily see from equation (36), the unitary transformation is a rotating gate that rotates about $-\ pi / 4 $ around the Y axis. You can apply it separately for each of the 4 qubits.

Then, the probability that $ \ ket {\ bar {0}} $ will occur is $ p \ approx 0.85 $, so the state of the result of applying this rotating gate

\sum_{x_1=\bar{0}}^{\bar{1}} \sum_{x_2=\bar{0}}^{\bar{1}} \sum_{x_3=\bar{0}}^{\bar{1}} \sum_{x_4=\bar{0}}^{\bar{1}} C(x_1,x_2,x_3,x_4) \ket{x_1 x_2 x_3 x_4}  \tag{38}

In, the larger the number of $ \ bar {0} $ appearing in $ \ ket {x_1 x_2 x_3 x_4} $, the higher the probability [^ 12].

[^ 12]: To be exact, $ 4 \ times 0.85 = 3.4 $, so the probability of combining the states where the number of $ \ bar {0} $ is around $ 3.4 $ is very high compared to the others. It should get bigger. I'm saying that.

This state with a large number of $ \ bar {0} $ constitutes a typical subspace, so it is sufficient to determine this and assign an appropriate quantum state for transmission, but this operation is unitary. The difficulty of quantum information theory is that it must be executed as a transformation. The idea is to arrange them in ascending order of the number of $ \ bar {1} $ and convert them in a dictionary-like order. The specific list is as follows (I omitted it here because it is troublesome to put a bar above each number).

\begin{align}
& \ket{0000} \rightarrow \ket{0000} \\
& \ket{0001} \rightarrow \ket{0001} \\
& \ket{0010} \rightarrow \ket{0010} \\
& \ket{0100} \rightarrow \ket{0011} \\
& \ket{1000} \rightarrow \ket{0100} \\
& \ket{0011} \rightarrow \ket{0101} \\
& \ket{0101} \rightarrow \ket{0110} \\
& \ket{0110} \rightarrow \ket{0111} \\ 
& \ket{1001} \rightarrow \ket{1000} \\
& \ket{1010} \rightarrow \ket{1001} \\
& \ket{1100} \rightarrow \ket{1010} \\
& \ket{0111} \rightarrow \ket{1011} \\
& \ket{1011} \rightarrow \ket{1100} \\
& \ket{1101} \rightarrow \ket{1101} \\
& \ket{1110} \rightarrow \ket{1110} \\
& \ket{1111} \rightarrow \ket{1111} \tag{39}
\end{align}

You just have to perform the conversion. How about that. The smaller the number of $ \ bar {1} $, the higher the order, that is, the probabilistically probable states are clustered at the top, and a specific qubit (the right side of the four). The transformation is done so that many states can be expressed with only 3 qubits). This is a unitary because it is a replacement operation as a whole. In other words, after applying the basis transformation to each qubit, the unitary transformation corresponding to this permutation should be performed.

Since the transmission rate is set to 3 qubits, the leftmost qubit is discarded and only the 3 qubits on the right are left for transmission.

When decoding, first, add the discarded qubits with $ \ ket {0} $ to make them into a 4-qubit state, and execute the inverse transformation of the substitution in equation (39). Finally, the inverse transformation of the basis transformation is performed for each of the 4 qubits. This should almost restore the original state.

Implementation

I actually tried the above flow. Here is the entire Python code.

import numpy as np
from qlazypy import QState,DensOp

PERM = {'0000':'0000', '0001':'0001', '0010':'0010', '0100':'0011',
        '1000':'0100', '0011':'0101', '0101':'0110', '0110':'0111', 
        '1001':'1000', '1010':'1001', '1100':'1010', '0111':'1011',
        '1011':'1100', '1101':'1101', '1110':'1110', '1111':'1111'}

def perm_matrix(PERM):

    dim = len(PERM)
    perm_mat = np.zeros((dim,dim))
    iperm_mat = np.zeros((dim,dim))

    for k,v in PERM.items():
        col = int(k,2)
        row = int(v,2)
        perm_mat[row][col] = 1
        iperm_mat[col][row] = 1

    return (perm_mat, iperm_mat)

def create_densop():

    mat_1 = np.array([[0.75,0.25],[0.25,0.25]])
    de_1 = DensOp(matrix=mat_1)
    de_ini = de_1.composite(num=4)
    de_1.free()

    return de_ini

def coder(de_ini, id_all, id_comp, theta, perm_mat):

    de_tmp = de_ini.clone()
    [de_tmp.ry(i, phase=-theta) for i in id_all]
    de_tmp.apply(perm_mat)
    de_comp = de_tmp.partial(id_comp)  # 4-qubit -> 3-qubit state
    de_tmp.free()

    return de_comp

def decoder(de_comp, id_all, theta, iperm_mat):

    qs_0 = QState(1)
    de_0 = DensOp(qstate=[qs_0])
    de_fin = de_0.tenspro(de_comp)  # add 1-qubit (|0>)
    de_fin.apply(iperm_mat)
    [de_fin.ry(i, phase=theta) for i in id_all]

    qs_0.free()
    de_0.free()

    return de_fin
    
if __name__ == '__main__':

    # settings
    id_all = [0,1,2,3]
    id_comp = [1,2,3]
    theta = 0.25
    (perm_mat,iperm_mat) = perm_matrix(PERM)

    # initial state (4-qubit state)
    de_ini = create_densop()

    # coding (4-qubit -> 3-qubit state)
    de_comp = coder(de_ini, id_all, id_comp, theta, perm_mat)

    # decoding (3-qubit -> 4-qubit state)
    de_fin = decoder(de_comp, id_all, theta, iperm_mat)

    # fidelity
    print("fidelity = {:.6f}".format(de_fin.fidelity(de_ini)))

    de_ini.free()
    de_comp.free()
    de_fin.free()

I'll give you a rough idea of what you're doing. Look at the main processing section.

# settings
id_all = [0,1,2,3]
id_comp = [1,2,3]
theta = 0.25
(perm_mat,iperm_mat) = perm_matrix(PERM)

Then, set the list of the entire qubit number (id_all) and the list of qubit numbers to be compressed (id_comp), and set the angle (theta) for basis conversion to $ 0.25 \ pi $ [^ 13]. .. In addition, the function perm_matrix creates a transformation matrix and its inverse transformation matrix for permutation operations. At that time, the dictionary data PERM defined at the top is referenced (the dictionary corresponds to equation (39)). See the function definition for the processing content. You will use this permutation matrix later to create unitary transformations that you apply to the state.

[^ 13]: qlazy uses $ \ pi $ radians for all rotation angles, so if you specify $ 0.25 $, it will be $ 0.25 \ pi. Represents $. I thought it would be a hassle to write $ \ pi $ each time on the program, so I made this specification, but it was the correct answer. It's quite comfortable.

# initial state (4-qubit state)
de_ini = create_densop()

Create the initial state (density operator) with. In the function create_densop, the process of creating the product state by collecting four 1-quantum density operators corresponding to equation (34) is performed.

de_ini = de_1.composite(num=4)

That one line is the process. By giving argument 4 with the composite method (added in v0.0.31), it returns the product state of 4 quanta.

# coding (4-qubit -> 3-qubit state)
de_comp = coder(de_ini, id_all, id_comp, theta, perm_mat)

And encode it. Compress the state of 4 qubits to 3 qubits. Look at the contents of the function coder.

def coder(de_ini, id_all, id_comp, theta, perm_mat):

    de_tmp = de_ini.clone()
    [de_tmp.ry(i, phase=-theta) for i in id_all]
    de_tmp.apply(perm_mat)
    de_comp = de_tmp.partial(id_comp)  # 4-qubit -> 3-qubit state
    de_tmp.free()

    return de_comp

In the first line, copy the initial state (de_ini) to temporary (de_tmp) with the clone method, and execute the process for this. In the second line, the base is converted by the rotation gate (in v0.0.30, the gate operation processing for the density operator was added. For the quantum gate $ U $, $ U \ rho U ^ {\ dagger} $ Execute). If you use the comprehension notation, you only need one line. The third line performs the unitary transformation equivalent to the replacement. If you specify a permutation matrix with the apply method, this also requires only one line. I'd really like to do it with a combination of quantum gates, but I broke it, sweat [^ 14]. Finally, on the 4th line, the leftmost qubit (0th qubit in the program) is discarded (partial trace is taken) to make it a 3 qubit state (de_comp). I will return this.

[^ 14]: Actually, this is "Practice Exercise 12.6" of Nielsen, Chan. It is currently under consideration, so I will supplement it if I understand it.

Return to the main processing section.

# decoding (3-qubit -> 4-qubit state)
de_fin = decoder(de_comp, id_all, theta, iperm_mat)

Enter the 3 qubit state and output the decoded 4 qubit state. Look at the contents of the function decoder.

def decoder(de_comp, id_all, theta, iperm_mat):

    qs_0 = QState(1)
    de_0 = DensOp(qstate=[qs_0])
    de_fin = de_0.tenspro(de_comp)  # add 1-qubit (|0>)
    de_fin.apply(iperm_mat)
    [de_fin.ry(i, phase=theta) for i in id_all]

    qs_0.free()
    de_0.free()

    return de_fin

Prepare the density operator for the quantum state $ \ ket {0} $ to be added in the first and second lines (de_0). The third line creates the product state (de_fin) of the quantum state (de_0) and the compressed quantum state (de_comp). On line 4, the reverse of the replacement operation is performed. In the 5th line, the reverse of the basis conversion is applied to each qubit. Finally, it returns the result (de_fin).

Return to the main processing section.

# fidelity
print("fidelity = {:.6f}".format(de_fin.fidelity(de_ini)))

Calculates and displays the fidelity of the original state (de_ini) and the decrypted state (de_fin). that's all.

Operation check

The execution result is shown. It's very simple because it only shows the fidelity. If the fidelity is 1.0, it means that compression is achieved without error.

fidelity = 0.970213

And there was a slight error.

As you can see from equation (39), the lower you go, the less likely it is to occur. However, it should have a finite probability value. Since the compression is done so that the lower half is deleted, it is natural that an error will occur. Schumacher's coding theorem argues that this error is small enough at the limit of $ n \ to \ infinity $. So, if you change the value of $ n $ to $ 40,400,4000, \ cdots $, you can see how the error becomes infinitely small. However, since I cannot try such a huge number of qubits in the simulator, I will say that Schumacher's coding simulation was possible with the result when $ n = 4 $.

in conclusion

Since I studied "when there is no noise" this time, I would like to go to "when there is noise" next time, but before that, it is probably better to understand "quantum error correction", so next time , I would like to attack that direction. There seems to be a lot of topics to cover, so I think I'll spend some time. After that, I'm planning to say "when there is noise" or "quantum cryptography" on top of that.

that's all

Recommended Posts

Basics of Quantum Information Theory: Data Compression (1)
Basics of Quantum Information Theory: Data Compression (2)
Basics of Quantum Information Theory: Entropy (2)
Basics of Quantum Information Theory: Horebaud Limits
Basics of Quantum Information Theory: Quantum State Tomography
Basics of Quantum Information Theory: Fault Tolerant Quantum Computation
Basics of Quantum Information Theory: Quantum Error Correction (Shor's Code)
Basics of Quantum Information Theory: Quantum Error Correction (CSS Code)
Basics of Quantum Information Theory: Quantum Error Correction (Stabilizer Code: 4)
Basics of Quantum Information Theory: Universal Quantum Calculation by Toric Code (1)
Basics of Quantum Information Theory: Logical Operation by Toric Code (Brading)
Read "Basics of Quantum Annealing" Day 5
Read "Basics of Quantum Annealing" Day 6
Basics of Tableau Basics (Visualization Using Geographic Information)
[Introduction to Data Scientists] Basics of Python ♬
[Pandas] Basics of processing date data using dt
Basics of Python ①
Basics of python ①
Basics of pandas for beginners ② Understanding data overview
[Basics of data science] Collecting data from RSS with python
Extract the band information of raster data with python
Numerical summary of data
Basics of Python scraping basics
# 4 [python] Basics of functions
Basics of network programs?
Basics of Perceptron Foundation
Preprocessing of prefecture data
Basics of regression analysis
Selection of measurement data
Basics of python: Output
Compression / decompression of zipfile
Let's utilize the railway data of national land numerical information
Data processing that eliminates the effects of confounding factors (theory)
[Introduction to Data Scientists] Basics of Python ♬ Functions and classes