Sunday 12 January 2020

Free photons are massless... because they are free!



The correct thing to say is that photons are massless, even if they have momentum and energy, which is what explains that they interact with matter: that is what happens in the photoelectric effect, for example, where photons knock electrons out of a metal plate.

This poses, however, two difficulties for us students:

First, how to reconcile this with the definition of momentum as "mass times velocity"? Doesn't this mean that one of the two things is wrong: either light is massless, in which case it should not have momentum, or it does have momentum, in which case it should have "some sort of" mass...?

Second, the famous formula:

E = m{c^2}

which people wear in T-shirts... doesn't it mean that light, which is energy, is mass?

Given this, some physicists answer: well, light does have mass, it has the so-called "relativistic mass",  that is why it interacts with things and that is why it has momentum. To justify this, they follow this route:

- The fully-fledged formula is actually not the one that you see in T-shirts, but this longer one:

{E^2} = {({m_0}{c^2})^2} + {(pc)^2}

- In the theory of relativity, momentum is not simply m*v, but is preceded by the so-called Lorentz or gamma factor:

{\bf{p}} = \frac{1}{{\sqrt {1 - {{(v/c)}^2}} }}{m_0}{\bf{v}}

- If you replace p in the first formula with its relativistic value, after a few algebraic operations, you get:

E = \frac{{{m_0}{c^2}}}{{\sqrt {1 - {v^2}/{c^2}} }}

- In this new formula, m with a subscript (mo) is "rest mass", that is to say, the mass that a particle at rest with you might have. A free photon cannot be at rest with you because you cannot travel at the speed of light, but it can have relativistic mass, which is rest mass times the gamma factor:

m = \frac{1}{{\sqrt {1 - {{(v/c)}^2}} }}{m_0}

- But then... if v in this formula is the speed of the particle and c is the speed of light, when you speak of a photon, it turns out that v = c, so the gamma factor is 1/0... If you add to that that the rest mass of the photon is also 0, the expression looks as follows: (1/0)0. Anyhow, it looks like an undefined expression. How do you make sense of this?

Well, you can ask those physicists who defend this approach. There are several explanations around, but I simply don't grasp them. I will only mention that the "excuse" is often that v tends to c and you get the undefined only in the limit, when v = c....

In turn, the majority of physicists reject the above reasoning and the very concept of "relativistic mass".

For them the famous formula,

E = m{c^2}

is only applicable for particles at rest with the relevant reference frame, not for free photons. In other words, for massive particles or for a photon that is trapped within a massive object. In effect,  a photon can contribute to the mass of an object if, for example, it goes into a cavity with perfectly reflecting mirrors: in this case, if you could measure with an incredibly precise instrument, you would be able to detect a tiny increase in the inertia of the object. However, as long as the photon is free, it has no mass, full stop.

The photon does have momentum, although its value does not come from the usual formula (mass times velocity), but from quantum mechanics. The reasoning is in particular:

- In the above full formula, replace m with 0 and simplify as follows:

\begin{array}{l}
{E^2} = {m^2}{c^4} + {p^2}{c^2} = {p^2}{c^2} \to \\
E = pc
\end{array}

- As per quantum mechanics, the Energy of a photon is its frequency f times the Planck constant h:

E = hf

- So the photon's momentum is:

p = \frac{{hf}}{c}

Said this, what is my point? Well, I want to highlight that the mistake of those "some physicists" lies at the step of replacing p with the formula for momentum (relativistic momentum, in this case) and what this error means, because this is a very common issue in many fields of knowledge.

You cannot do this move (if we were playing, we would say that it is an "illegal move") when you are talking about a free photon, because, if you do, you are using the concepts at hand ("free photon", "mass" and "momentum") beyond their respective domains of applicability, which in turn have been defined based on the observation of how nature works.

It is not strange, therefore, that you arrive at an undefined, i.e. a dead-end street, which is proof that you have previously taken a wrong turn. An undefined is an expression that breaches the rules to make meaningful mathematical expressions, so whenever you end up with one is because you took the wrong path.

And it is not an excuse to say that, "in the context of calculus, you often encounter undefined expressions, but this difficulty is overcome through the use of limits". It is true that in calculus, limits enable you to get rid of the undefined, but that is because the undefined should never be there from the start and what you do by resorting to limits is solving that initial trouble!

I will explain myself briefly because the matter deserves a calmer discussion and now I just want to outline it. When you seek, for instance, an instantaneous speed (a derivative), you are frustrated because we define speed as the ratio between space traversed and time elapsed. This is a logical thing to do,  because that is the way to physically measure speed. Hence if something is instantaneous, you should divide by 0, which is mathematically forbidden... But you don't despair, because you understand that one thing is the operative definition of concepts (how in practice you obtain their value) and another is the reality that the concepts aim at grasping. In this case, such reality is a "state of motion" that can be perfectly growing continuously and thus has a different value at each instant. This reality surpasses measurement and surpasses the algebraic expression of a ratio. So the trick cannot be easier: get rid of the ratio, i.e. make a number of algebraic operations, so that the time interval is not placed anymore at the denominator of a fraction. Then it is said that you make such an interval tend to 0 and eliminate it. I would simply say that you make it equal to 0, because the concept itself, in its purest form, does not need of any time interval at all.

In conclusion, in calculus we get an undefined because we ourselves created a language problem (we used a ratio to refer to what is not a ratio in nature). So it is legitimate that we sort out the problem by trying to extract the underlying reality that our language was hiding.

Instead here, if we get an undefined, it is because we are being inconsistent with nature and with the concepts that we have adopted to reflect nature.

The key assumption is that there are things that can travel at the speed of light (we call them light, among other things) and others that can't travel at the speed of light (we call them massive). [I would add that sometimes the former (light) becomes the latter (mass) when it gets trapped among the walls of something massive. And I would like to generalize the idea to other forms of energy, but this would be a speculation beyond the scope of this post.]

Given this, the momentum of massive things is defined as mass times velocity (preceded by gamma factor, if you want to account for relativistic effects). Hence this concept is designed from scratch as a one being applicable only to objects having mass, that is to say, being slower than light. Therefore, if you plug this definition (gamma *mass * v) into the energy formula, it must be because you are assuming that you are in face of something massive, that is to say, whose v is < than c. Otherwise, if you were thinking of a free photon, you would have simply made the wrong move. So forget about playing with limits for justifying your mistake.

Last but not least, under this light, if free photons' momentum is just hf/c, couldn't we dispense with this concept and talk only about light's energy, especially bearing in mind that, if we use natural units (whee c = 1), the numerical value of momentum of light equals that of its energy? I initially thought that, but then I realized that energy is a scalar (it has just magnitude), whereas momentum is a vector (it has direction)... Thus the concept of photon's momentum comes useful when analyzing its collision with another particle, like an electron ("Compton shift"): after the collision, each particle takes a direction (it is said that the photon is "scattered" and the electron "recoils"), but both directions are correlated, because momentum must be conserved. Thus by observing the direction of the photon's scattering, one could deduce the direction of the electron's recoil... were it not for the fact that... how can we guess the scattering angle of the photon? This remark paves the way for an interesting reflection about the uncertainty principle and randomness, but that will be another day.

Wednesday 1 January 2020

Baptizing perpendicularity and understanding dot product

In my studies, I like to spot mental models that appear repeatedly, in either the same or different areas. This is useful, because often when analyzing a problem, you hit on the idea that it can be solved through a model that you are acquainted with.

One of these models is the "generalization of a concept", which is often used in Mathematics and about which there is extensive literature.

One way, among others, to look at this process is seeing it as the realization that one was assuming an arbitrary restriction in the conditions of the phenomenon. Since reality proves that the world does not always have such restriction, it is lifted. Thus you elaborate a more general concept, which is valid for cases built with and without the relevant restriction.

But the point of this post is that, once that you have given birth to the general concept, you should be able to baptize it. By this I mean, not so much giving it a more or less appropriate name, as giving it a faithful abstract description, which communicates its essence. This is what is going to enlighten you as to the meaning of the concept in question, both at the elementary and the general level. Thus it may happen that you do not understand well what you were doing with an apparently basic idea until you super-generalize it; or you don't grasp the general concept unless you see its evolution from the basic level.

I will illustrate this idea with an example: the generalized meaning of perpendicularity and how this helps understanding why and to what extent dot product works, when done analytically, i.e. component-wise. In this work, I will start with a more mathematical/abstract approach, but soon give way to the paradigm that this Blog promotes, which is: see everything from a practical point of view as a problem-solving technique.

Perpendicularity or orthogonality

This notion appears with a geometric meaning: two vectors (understood as "little arrows") are perpendicular if they form an angle of 90 degrees.

However, geometric definitions work in 2D or 3D spaces.  But what if you lift this restriction and start playing with 4D or 10D or even infinitely dimensional vectors? (That is by the way another nice generalization: functions are infinitely dimensional vectors, where the input plays the role of dimensions and the output acts as coefficients or coordinates or values of the vector in each dimension.)

Well, in that case, you use the algebraic version of the dot product.

The dot product initially receives also a geometric definition: it is an operation whereby the moduli of the vectors are multiplied, but then you apply a percentage, a trigonometric ratio (the cosine of the angle formed by the vectors), which in turn measures to what extent the vectors point in the same direction. Thus when one vector is lying over the other (they are parallel = angle is zero), the cosine is 1, so the ratio is 100%, because both vectors point in the same direction. Instead, when the vectors are perpendicular (angle is 90 degrees) cosine is 0, which means that the ratio is 0% as well, because the vectors point in totally different directions.

However, the cosine is of no use anymore after you leave 3D behind. Fortunately, the dot product technique also evolves to adapt to the new higher-than-3D environment and takes an algebraic form guaranteeing the same effect: the dot product is also the result of (i) multiplying the respective coordinates and (ii) adding up those products.

This works for the simplest cases. For example, the dot product of the unit vectors of a 2D orthonormal basis [(0,1) and (1,0)] is 0x1 +1x0 = 0 + 0 = 0, thus proving that such vectors are perpendicular. But it also works in the advanced cases, like in Fourier analysis, where the sum is an integral (because the number of dimensions is infinite), but the structure is analogous.

Time now for the baptism. We have kept the term "orthogonality", but this is just a vestige of the geometric context where the concept was born, in which dimensions were directions in the plane. The abstract meaning is "total dimensional discrepancy", that is to say, if we are comparing vectors a and b, a has no component at all (zero amount) in the dimensions where b has components and viceversa. As to the dot product, in the new context, it is often given another name, "inner product". But more important than that is its new abstract meaning: if it initially revealed the extent to which vectors share a direction in the plane, in a generalized sense it means "dimensional similarity". Thus for example in Fourier analysis dimensions are time points in one reference frame or frequencies in another. 

Problem: orthogonality is the aim but it is also the condition

Now to the problem. This algebraic dot product is a tool for detecting orthogonality, but at the same time it only works when there is orthogonality, in that the basis is orthogonal. How to prove this?

At StackExchange forum, I have found an answer that presents the solution very "mathematically". Let us follow it and later check how you could have also followed this path "intuitively", just by understanding the deep meaning of perpendicularity:

We first multiply the components of the two vectors in a "total war" manner (each component against the other two):



Then we we apply the definition of the dot product, in two senses:

  • the product of the two unit basis vectors i and j, if we are requiring that the basis be orthogonal, will be again 1x1 but multiplied by a 0% ratio of dimensional coincidence, so it is 0; because of this, the two middle terms vanish out;
  • the product of the unit basis vector i with itself is the product of the moduli (1x1) with a ratio of 100% dimensional coincidence, so it is 1; the same applies to the product of j with itself; because of this, the first and the last products become simple scalar products of the homogeneous components.
Thus the expression reduces to the following:



We thus conclude that this algebraic dot product technique is valid to the extent that we are relying on an orthogonal basis, because otherwise the two middle terms would not have vanished out and the answer would depend on which angle, other than 90 degrees, separates the two basis vectors, i and j.

Now, the intuitive and practical approach. The coordinates of a vector are like the information provided by Cinderella's slipper: quantities that you measure to serve as "clues" for catching criminals (solving problems). You can also call them "whistle-blowers". Obviously, a spy who repeats exactly the same as another one, is superfluous: you shouldn't pay him! Hence the minimum requirement for hiring a set of whistle-blowers is that each of them contributes with something new, even if they somehow repeat themselves. In mathematical jargon, it is said that those informers are "linearly independent". I would say that they are "helpful". But one that provides totally fresh and new information might be preferable, because it is "original" (technically, "perpendicular"), so that this way you can optimize your network: each specialist will investigate a different fact.

Let us check if that is the case. The dot product is like combining two sets of reports about two suspects (two vectors), so as to check to what extent both are "pointing to the same solution of the crime". For this purpose, the informers lay their reports on the table. All possible combinations among reports are like the "total war" product mentioned before. But soon you realize that you can mix apples with apples and pears with pears, but not apples with pears.

Combining apples with apples is what you do when you multiply homogeneous (100% dimensionally  coincident) quantities, like in the above mentioned first and last terms: ax* bx and ay * by.  For example, you combine the reports for direction X (ax* bxand it may happen that you get a higher or  lower positive product (because you combine + with + or - with -) or you may get a higher or lower negative product (because you multiply  + with -); that will mean that direction X contributes, respectively, with a vote for coincidence (if product is +) or for discrepancy (if it is -). Finally, you do the ballot or vote counting: you add up the mutually scaled reports for X and Y and thus get the modulus of the overall coincidence.

Should you also add up the the middle terms, i.e. the products between heterogeneous quantities, between apples and pears (ax* by or ay * bx)? No, because by definition you know that these clues are not overlapping at all, they refer to completely different facts; hence if the purpose is to learn to what extent they point the same direction (generically, they are dimensionally coincident), the answer is zero, so they make your life simpler by not casting any vote.

That is how orthogonality at the level of information sources (definition of basis vectors) helps you detect orthogonality (or any other degree of dimensional coincidence) at the level of problem solving.