09 August 2020

Proof of a tensor trace inequality using Frobenius norm

The text is a formal proof of an inequality involving Frobenius norm, which is a matrix norm defined as the square root of the sum of the absolute squares of its elements. The inequality states that the product of two tensors is greater than or equal to one-third times the square of their trace. The proof uses matrix notation and properties, Cauchy-Schwarz inequality for Frobenius inner product, and operator norms induced by vector norms. The proof is detailed and rigorous, and provides references and explanations for each step. The text also gives some examples and applications of the inequality in physics and geometry.

The inequality in question is

$$\varkappa_{\beta}^{\alpha}\varkappa_{\alpha}^{\beta} \geq \frac{1}{3}\left(\varkappa_{\alpha}^{\alpha}\right)^2$$ where $\varkappa_{\beta}^{\alpha} = \frac{\partial \gamma_{\beta}^{\alpha}}{\partial t}$ is the time derivative of the three-dimensional metric tensor $\gamma_{\beta}^{\alpha}$.
Frobenius norm of a matrix $A$

The inequality (inequality 1) is found in Landau and Lifshitz. Classical Field Theory, Section 97, which is the second volume of their Theoretical Physics where it helps to prove that the metric determinant goes to zero in finite time, i.e., there is a necessary singularity in the metric in synchronous reference frame. Landau and Lifshitz write in a footnote that inequality (1) can be "easily" found to be true by transforming the tensor $\varkappa_{\alpha}^{\beta}$ into a diagonal form. If $\varkappa_{\alpha}^{\beta}$ is any tensor, that is, non-symmetric, complex, non-square, etc., the following formal proof is in place

Proof

To prove the inequality

$$ \varkappa^{\beta}_{\alpha} \varkappa^{\alpha}_{\beta} \geq \frac{1}{3}(\varkappa^{\alpha}_{\alpha})^2 $$

for any 2-rank 3D tensors $\varkappa^{\beta}_{\alpha}$, we can use the following steps:

  1. Write $\varkappa^{\beta}_{\alpha}$ as a $3 \times 3$ matrix $K$ with elements $K_{ik} = \varkappa^{\beta}_{\alpha}$.
  2. Then $\varkappa^{\beta}_{\alpha} \varkappa^{\alpha}_{\beta} = \sum_{i=1}^3 \sum_{k=1}^3 K_{ik} K_{ki}$ is the sum of squares of all elements of $K$, and $\varkappa^{\alpha}_{\alpha} = \sum_{i=1}^3 K_{ii}$ is the trace of $K$.
  3. Use Cauchy-Schwarz inequality for Frobenius inner product of two matrices, which states that
  4. $$\left(\langle K, L \rangle_F \right)^2 \leq \left( \|K\|_F \right)^2 \left( \|L\|_F \right)^2 $$ or

    $$ \left[Tr(KL) \right]^2 \leq \left( \|K\|_F \right)^2 \left( \|L\|_F \right)^2 $$

    where $\langle K, L \rangle_F$ is the Frobenius inner product of two matrices, $Tr(KL)$ is the trace of the product of two matrices, $\|K\|_F = \sqrt{Tr(K^T K)}$ is the Frobenius norm of a matrix $K$, and $L$ is any matrix with compatible dimensions.

  5. Choose $L = I$, where $I$ is the identity matrix. Then we have

    $$ \left[Tr(KI) \right]^2 \leq \left( \|K\|_F \right)^2 \left( \|I\|_F \right)^2 $$

    which simplifies to

    $$ \left[Tr(K) \right]^2 \leq \left( \|K\|_F \right)^2 \left( \|I\|_F \right)^2 $$

  6. Use the fact that $\|I\|_F = \sqrt{n}$ for an $n\times n$ identity matrix. We get $$ \left[Tr(K) \right]^2 \leq n Tr(K^T K) $$
  7. Substitute back $\varkappa^{\beta}_{\alpha}$ for $K_{ik}$ and simplify. We get

    $$ (\varkappa^{\alpha}_{\alpha})^2 \leq 3 (\varkappa^{\beta}_{\alpha}\varkappa^{\alpha}_{\beta}) $$

  8. Divide both sides by 3 and rearrange to get the desired inequality.

Therefore, we have proved that $ \varkappa^{\beta}_{\alpha} \varkappa^{\alpha}_{\beta} \geq \frac{1}{3}(\varkappa^{\alpha}_{\alpha})^2. $                    

It is easy to see that this proof can be extended for tensors of any rank and any dimension. In addition, the Cauchy-Schwarz inequality for Frobenius inner product is true for any matrix, including complex and non-commutative matrices.

Explanation for the proof

This proof is based on the idea of using matrix notation and properties to simplify the tensor expression. It also uses a clever trick of applying a well-known inequality for matrices to get a lower bound for the tensor product. The following is a step-by-step explanation of the terse proof given in the previous section.

  1. In this step, we write the tensor $\varkappa^{\beta}_{\alpha}$ as a matrix $K$ with elements $K_{ik} = \varkappa^{\beta}_{\alpha}$. So, instead of a tensor, which is a geometric and physics object we get a matrix which is an algebraic object (linear algebra). A matrix is easier to work with because we can use matrix operations and rules. For example, we can use the fact that the trace of a matrix is equal to the sum of its diagonal elements. We don't have to worry about covariance and contravariance of indices because there are no implicit basis and basis vectors. Matrix elements are viewed as scalars and not as vector components.

    Especially important in our case is the fact that we can replace the tensor product $\varkappa^{\beta}_{\alpha} \otimes \varkappa^{\alpha}_{\beta}$ with the matrix product $K^T K$. A tensor product is a way of combining two vector spaces into a new vector space that captures the properties of bilinear maps. The tensor product symbol is $\otimes$, and the tensor product of two vectors $u$ and $v$ is written as $u \otimes v$.

    For example, if $$ u = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \end{bmatrix} \text{and} \, v = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} \text{then} \, u \otimes v = \begin{bmatrix} u_1 v_1 \\ u_1 v_2 \\ u_2 v_1 \\ u_2 v_2 \\ u_3 v_1 \\ u_3 v_2 \end{bmatrix} $$ The tensor product of two vector spaces $V$ and $W$, denoted by $V \otimes W$, is the vector space that consists of all linear combinations of elementary tensors of the form $u \otimes v$, where $u \in V$ and $v \in W$. For example, if $V$ and $W$ are both two-dimensional vector spaces with bases ${e_1, e_2}$ and ${f_1, f_2}$, respectively, then $V \otimes W$ is a four-dimensional vector space with basis ${e_1 \otimes f_1, e_1 \otimes f_2, e_2 \otimes f_1, e_2 \otimes f_2}$.

    The tensor product of two vector spaces has the property that any bilinear map from $V \times W$ to another vector space $Z$ can be uniquely factorized as a linear map from $V \otimes W$ to $Z$. This is called the universal property of the tensor product and can be expressed as $$ \begin{array}{ccc} V \times W & \xrightarrow{\phi} & Z \\ \downarrow & & \uparrow \\ V \otimes W & \xrightarrow{\tilde{\phi}} & Z \end{array} $$ where $\phi$ is a bilinear map, $\tilde{\phi}$ is a linear map, and the vertical arrows are natural maps that send $(u,v)$ to $u \otimes v$ and $z$ to $z$.

  2. In this step, we rewrite the tensor products in terms of matrix elements. We use the Einstein summation convention which means that repeated indices are summed over. For example, $\varkappa^{\beta}_{\alpha} \varkappa^{\alpha}_{\beta} = \sum_{i=1}^3 \sum_{k=1}^3 K_{ik} K_{ki}$ means that we multiply each element of $K$ by its corresponding element in $K$ transposed and then add them all up. This is equivalent to taking the sum of squares of all elements of $K$. Similarly, $\varkappa^{\alpha}_{\alpha} = \sum_{i=1}^3 K_{ii}$ means that we add up all the diagonal elements of $K$. This is equivalent to taking the trace of $K$. Both $\varkappa^{\beta}_{\alpha} \varkappa^{\alpha}_{\beta}$ and $\varkappa^{\alpha}_{\alpha}$ are scalars in their tensor and their matrix forms. Having scalars on both sides makes it possible to compare their magnitudes with equalities or inequalities.
  3. In this step, we use a powerful inequality for matrices called Cauchy-Schwarz inequality. It says that if we have two matrices $K$ and $L$, then their inner product (which is like a dot product but for matrices) cannot be larger than their norms (which are like lengths but for matrices) multiplied together. The inner product and norm are defined using Frobenius notation which means that we take square roots and traces of products and squares of matrices. The inequality can be written in two equivalent ways: either using inner products or using traces.
  4. In this step, we choose a special matrix $L$ to apply the inequality. We pick $L = I$, where $I$ is the identity matrix which has 1s on the diagonal and 0s everywhere else. This makes things simpler because when we multiply any matrix by $I$, we get back the same matrix. So when we take the trace or inner product of $K$ and $I$, we just get back the trace or norm of $K$. Then we have

    $$ |Tr(K)| \leq \|K\|_F \|I\|_F $$

  5. In this step, we square both sides of the inequality to get rid of the absolute value sign. We also use another fact about identity matrices: their norm is equal to the square root of their size (the number of rows or columns). So if we have a 3x3 identity matrix, its norm is equal to $\sqrt{3}$. Then we have $$ (Tr(K))^2 \leq 3 Tr(K^T K) $$
  6. In this step, we go back to tensor notation by replacing $K_{ik}$ with $\varkappa^{\beta}_{\alpha}$ and simplifying. We use another fact about traces: they are invariant under transposition (flipping rows and columns). So when we take the trace of $(K^T K)$, it's equal to taking the trace of $(KK)$. Then we have

    $$ (\varkappa^{\alpha}_{\alpha})^2 \leq 3 (\varkappa^{\beta}_{\alpha}\varkappa^{\alpha}_{\beta}) $$

  7. In this final step, we divide both sides by 3 and rearrange them to get our desired result.

So what does this proof tell us? It tells us that no matter what tensor $\varkappa^{\beta}_{\alpha}$ we choose, its product with itself will always be greater than or equal to one-third times its trace squared. This is an interesting property that might have some applications in physics or geometry.

Below, it is written in another form, which is more convenient for the numerical computation: $$\varkappa^1_1 \varkappa^1_1 + \varkappa^1_2 \varkappa^2_1 + \varkappa^1_3 \varkappa^3_1 + \varkappa^2_1 \varkappa^1_2 + \varkappa^2_2 \varkappa^2_2 + \varkappa^2_3 \varkappa^3_2 + \varkappa^3_1 \varkappa^1_3 + \varkappa^3_2 \varkappa^2_3 + \varkappa^3_3 \varkappa^3_3 \geq \frac{1}{3} \left(\varkappa _{11}^2+\varkappa _{22}^2+\varkappa _{33}^2\right)$$ with the indices $\alpha$ and $\beta$ running through 1, 2, 3 (i.e., pertaining to tensors in 3-dimensional space).

In a 3 x 3 matrix form, this tensor looks like: $$\varkappa_{\alpha}^{\beta} \equiv \left( \begin{array}{ccc} \varkappa _{11} & \varkappa _{12} & \varkappa _{13} \\ \varkappa _{21} & \varkappa _{22} & \varkappa _{23} \\ \varkappa _{31} & \varkappa _{32} & \varkappa _{33} \\ \end{array} \right)$$ As for any matrix, diagonalizing the matrix $\varkappa_{\alpha}^{\beta}$ requires solving the characteristic equation $|\mathbf{A} - \lambda\mathbf{I}| = 0$, where the determinant $|\mathbf{A} - \lambda \mathbf{I}| = (\lambda_1 - \lambda )(\lambda_2 - \lambda) \cdots (\lambda_n - \lambda)$ is the characteristic polynomial. The roots of the characteristic equation, $\lambda_i$, which are called eigenvalues, may be real or complex. Further computations are done with algorithms that can be found in the free online tool Wolfram Alpha or the commercial program Mathematica. For the matrix $\varkappa_{\alpha}^{\beta}$, using the function CharacteristicPolynomial, we find the characteristic polynomial to be: $$\left(\lambda ^2-\lambda \varkappa _{11}-\lambda \varkappa _{22}-\varkappa _{12} \varkappa _{21}+\varkappa _{11} \varkappa _{22}\right) \left(\varkappa _{33}-\lambda \right)+\varkappa _{31} \left(\lambda \varkappa _{13}-\varkappa _{22} \varkappa _{13}+\varkappa _{12} \varkappa _{23}\right)-\varkappa _{32} \left(-\lambda \varkappa _{23}-\varkappa _{13} \varkappa _{21}+\varkappa _{11} \varkappa _{23}\right)$$

This is a cubic polynomial ($\lambda$ is to the power of 3) and, respectively, it has 3 roots. Equalizing the characteristic polynomial to zero and using the function Roots, we find the exact roots expressed as radicals

$$\lambda_1 = \frac{1}{3} C + \frac{\sqrt[3]{A + \sqrt{4 \left(-C^2 -B \right)^3 + A^2}}}{6 \sqrt[3]{2}} + \frac{1}{3 \sqrt[3]{2}} \frac{-C^2 - B}{A + \sqrt{4 \left(-C^2 -B \right)^3 + A^2}}$$

$$\lambda_2 = \frac{1}{3} C + \frac{\sqrt[3]{A + \sqrt{4 \left(-C^2 -B \right)^3 + A^2}}}{6 \sqrt[3]{2}} + \frac{1}{3 \sqrt[3]{2}} \frac{\left(1-i \sqrt{3}\right) \left(-C^2 - B\right)}{A + \sqrt{4 \left(-C^2 -B \right)^3 + A^2}}$$

$$\lambda_3 = \frac{1}{3} C + \frac{\sqrt[3]{A + \sqrt{4 \left(-C^2 -B \right)^3 + A^2}}}{6 \sqrt[3]{2}} + \frac{1}{3 \sqrt[3]{2}} \frac{\left(1+i \sqrt{3}\right) \left(-C^2 - B\right)}{A + \sqrt{4 \left(-C^2 -B \right)^3 + A^2}}$$

$$\begin{multline} A = -2 \varkappa _{11}^3+3 \varkappa _{22} \varkappa _{11}^2+3 \varkappa _{33} \varkappa _{11}^2+3 \varkappa _{22}^2 \varkappa _{11}+3 \varkappa _{33}^2 \varkappa _{11}-9 \varkappa _{12} \varkappa _{21} \varkappa _{11}-9 \varkappa _{13} \varkappa _{31} \varkappa _{11}+ \\ 18 \varkappa _{23} \varkappa _{32} \varkappa _{11}-12 \varkappa _{22} \varkappa _{33} \varkappa _{11}-2 \varkappa _{22}^3-2 \varkappa _{33}^3+3 \varkappa _{22} \varkappa _{33}^2-9 \varkappa _{12} \varkappa _{21} \varkappa _{22}+18 \varkappa _{13} \varkappa _{22} \varkappa _{31}- \\ 27 \varkappa _{12} \varkappa _{23} \varkappa _{31}-27 \varkappa _{13} \varkappa _{21} \varkappa _{32}-9 \varkappa _{22} \varkappa _{23} \varkappa _{32}+3 \varkappa _{22}^2 \varkappa _{33}+18 \varkappa _{12} \varkappa _{21} \varkappa _{33}-9 \varkappa _{13} \varkappa _{31} \varkappa _{33}-9 \varkappa _{23} \varkappa _{32} \varkappa _{33} \end{multline}$$ $$B = 3 \left(\varkappa _{12} \varkappa _{21}-\varkappa _{11} \varkappa _{22}+\varkappa _{13} \varkappa _{31}+\varkappa _{23} \varkappa _{32}-\varkappa _{11} \varkappa _{33}-\varkappa _{22} \varkappa _{33}\right)$$ $$C = \varkappa _{11}+\varkappa _{22}+\varkappa _{33}$$

The first root is real, and the other two roots are complex conjugates. These roots are, in fact, eigenvalues so we can find them immediately, without resorting to the characteristic polynomial, with the function Eigenvalues.

These exact symbolic eigenvalues, however, do not allow to estimate the elements of $\varkappa_{\alpha}^{\beta}$ in order to substitute them into inequality (1). We have to solve numerically each eigenvalue with respect to all variables concerned. This makes 9 elements of the matrix plus the eigenvalue itself for a total of 10 variables. One equation with 10 unknowns has infinitely many solutions. In such case, using the function NSolve, we can find only those real solutions that are situated at the intersection of the infinite solution set with a randomly chosen hyperplane. For systems of algebraic equations, NSolve computes a numerical Gröbner basis using an efficient monomial ordering, then uses eigensystem methods to extract numerical roots. For implementation of this method, Mathematica uses Buchberger's algorithm. In our case, NSolve finds that the infinite solution set is 3-dimensional, and uses 3 intersecting hyperplanes, respectively.

Using NSolve for each eigenvalue with these random hyperplanes, we find that the first and third eigenvalue each have 2 real solutions, and the second eigenvalue returns an empty set (no real solutions). For the first eigenvalue:

$ \lambda_1 \to 1.88139,$

$ \left( \begin{array}{ccc} 0.476244 & -0.642266 & 0.612353 \\ 3.20349 & -1.52838 & 0.169494 \\ -0.0777897 & 1.31714 & -1.81492 \\ \end{array} \right)$

$ \lambda_2 \to 1.46087,$

$ \left( \begin{array}{ccc} -1.02157 & -0.64777 & 0.0954454 \\ 2.14034 & -1.22996 & 0.861299 \\ -1.03221 & -0.248212 & -0.0702314 \\ \end{array} \right)$

and for the third eigenvalue:

$ \lambda_3 \to 1.70399,$

$ \left( \begin{array}{ccc} -0.850372 & 0.28731 & -1.43228 \\ 2.00158 & -0.0211683 & 0.14678 \\ -0.115587 & 0.111843 & -0.552966 \\ \end{array} \right)$

$ \lambda_4 \to 1.17485,$

$ \left( \begin{array}{ccc} 0.350488 & 1.42482 & -0.466813 \\ 1.96659 & -1.45947 & 1.31125 \\ -0.70384 & -0.199493 & -0.169194 \\ \end{array} \right)$

Substituting in turn each of these 4 solutions in inequality (1), we find $$\varkappa^1_1 \varkappa^1_1 + \varkappa^1_2 \varkappa^2_1 + \varkappa^1_3 \varkappa^3_1 + \varkappa^2_1 \varkappa^1_2 + \varkappa^2_2 \varkappa^2_2 + \varkappa^2_3 \varkappa^3_2 + \varkappa^3_1 \varkappa^1_3 + \varkappa^3_2 \varkappa^2_3 + \varkappa^3_3 \varkappa^3_3 = 3.97483$$ $$\frac{1}{3} \left(\varkappa _{11}^2+\varkappa _{22}^2+\varkappa _{33}^2\right) = 1.95224$$ It is no coincidence that although the 4 solutions are different, all of them give the same value for the left- and right-hand sides. However, we must not forget that the intersecting hyperplane was chosen randomly. If we choose another hyperplane, the values will be different. The important thing is that the left-hand side is always bigger or equal to the right-hand side.

In the application in the Landau and Lifshitz book, the tensor $\varkappa_{\alpha}^{\beta}$ is symmetric. It can be seen from its definition $$\varkappa_{\alpha}^{\beta} \equiv \frac{\partial \gamma_{\alpha}^{\beta}}{\partial t}$$ Because the 3-dimensional metric tensor $\gamma_{\alpha}^{\beta}$ is symmetric, then $\varkappa_{\alpha}^{\beta}$ is symmetric in any given moment, i.e. $\varkappa_{\alpha}^{\beta} = \varkappa_{\beta}^{\alpha}$ or in matrix form $$\varkappa_{\alpha}^{\beta} \equiv \left( \begin{array}{ccc} \varkappa _{11} & \varkappa _{12} & \varkappa _{13} \\ \varkappa _{12} & \varkappa _{22} & \varkappa _{23} \\ \varkappa _{13} & \varkappa _{23} & \varkappa _{33} \\ \end{array} \right)$$ Then inequality (1) takes the form $$\varkappa^1_1 \varkappa^1_1 + \varkappa^2_2 \varkappa^2_2 + \varkappa^3_3 \varkappa^3_3 + 2 (\varkappa^1_2 \varkappa^2_1 + \varkappa^1_3 \varkappa^3_1 + \varkappa^2_3 \varkappa^3_2) \geq \frac{1}{3} \left(\varkappa _{11}^2+\varkappa _{22}^2+\varkappa _{33}^2\right)$$ There is no need for the above diagonalization. Just note that the first 3 terms on the left-hand side are 3 times bigger than the right-hand side and the rest of the terms on the left-hand side are all positive because they are the squares of the off-diagonal elements. This is enough to prove without further ado the inequality (1). In fact, in this case of a symmetric matrix, inequality (1) becomes strict (equality is not possible).

A PDF file with the Mathematica calculations can be found at my Google drive.

No comments:

Post a Comment