Friday, 8 September 2023

A reflection about the null spacetime interval

A frequent question in the context of special relativity (“SR”) is this: in the ordinary physical space, which is Euclidean, the null distance between two points is 0 and geometrically appears as an ideal point, which does not have any extension; however, in Minkowskian spacetime, the null interval between two spacetime points or events, which is also numerically 0, can be any lightlike interval, i.e. the interval between any two events connected by a light signal, which may have any extension when painted on a diagram (sometimes people call this extension the ruler length or Euclidean length; we will call it here "visual length"). How can that be?

A mystical answer is what you often find in pop science books and on the internet: things like "photons do not experience time", which in turn prompt all types of wild (but attractive to some people) speculations, about whether time is an illusion and stuff like that.

A more reasonable but incomplete answer is: you are blinded by your Euclidean intuition; there is certainly an analogy between Euclidean and Minkowskian spaces, but analogy is not identity and so things function differently in the second context: just accept that the Minkowskian null interval has 0 numerical value even if it has some visual length when painted in a diagram.

But the ambitious answer consists of delving into how the analogy plays and so see clearly how this has to be: why the numerical value of the lightlike interval must be 0, while the visual length is not. 

For this purpose, the rule of thumb is as follows: detach yourself from the specific situation that is problematic in the novel area, elevate to a more abstract level and look for a general formulation embracing both the old and the new area; then study how this generic spirit is dealt with in each domain, given its specificities. 

If you do this here and you adopt for this purpose (as one should always do) a positivistic or operational approach (that is to say, you define the intervals of a space based on what they measure and how it is measured), you will realize that non-null intervals are associated to the idea of "no measurement". At this stage, I have hesitated between two formulations. Initially, I focused on the object of measurement and thus stated that non-null intervals are what you can directly measure with the available physical instruments, whereas a null interval is what you cannot directly measure with the said instruments (only indirectly). Lately, I have thought of another version focusing on how you measure: any instrument contains an element or agent that participates in the measured property; thus you can obtain a non-null interval when you use an instrument having that feature and a null interval when your instrument does not have the measured property or you use it in a manner so that, in the end, the element in question has done nothing. In the end, I have thought that the two versions are complementary, as we will immediately see.

This definition fits ordinary space. Here what you measure is pure lengths of objects or spatial distances and you do it with instruments having such property, i.e. having physical extension, like rulers (X, Y and Z rulers). Hence non-null intervals are the values obtained with rulers, while null intervals are ideal points having no extension, which is what: in the first version, you don't directly get with rulers (only indirectly as an intersection between two lines); in the second version, you would get if you tried to measure with ideal points or you measure with rulers but you undo what you have done and return to the ideal point of departure.   

But note that no visual length is the specific form or vest that a null interval takes in this particular space. If we now apply the above-mentioned general formulation to spacetime, then we must look at what you measure and how you do it in practice in this new space. What you measure is the distance between events and you do it by making use of an instrument that also mimics this reality, i.e. an agent traveling from one event to another, in particular an agent whose standard is light and which is active in both axes, i.e. what we usually call the X and the cT axis.

At this stage, we would probably need a lengthier explanation, which I will do elsewhere, but let me just make a telegraphic summary for our present purposes. I do not mean that all clocks and rulers must be constructed with light. Instead of light, we could talk about another agent traveling at the same speed, which is the universal speed limit (for example, gluons are also said to displace at the same speed). Also note that I do not mean that one of those best-in-class agents must forcefully appear in each instrument. Certainly, a clock can have inside its walls any other oscillating agent. But then you must convert its values into meters, which you do using the speed of light as a conversion factor. And how do you measure this speed? You do it by organizing a round-trip competition between any agent inside a clock and light oscillating inside another where it turns out that the latter traverses about 300 million meters while the former ticks 1 second. Then you fix once for all such ratio by redefining the meter as the length that light traverses in such time also in a one-way trip. But what if you need to measure distances between distant events? In that case, you need to synch the clocks at the two endpoints of the interval. And you do this through the so-called Einstein-Poincaré convention or radar convention, by virtue of which a clock is set to read the time for a light signal to cover a round-trip to its location, divided by 2. So there are massive walls at the start and end of the spacetime measurement operation in all cases. 

Given this, the general formulation is adapted as follows: spacetime non-null intervals are the values obtained with spacetime clocks and rulers, having light oscillating between massive walls, while null intervals are the lightlike trajectories because they are: in the first version, what you don't directly get with spacetime instruments (only indirectly as a combination of distance traversed and time employed); in the second version, what you would get if you tried to measure with light alone because indeed light is the standard agent of the measurement instruments but it is not apt for measurement unless it is forced to make round trips within massive walls.

The geometry of spacetime confirms this. If you look at how spacetime is drawn in a Minkowski diagram, you will notice that the null interval must be placed where it cannot be overlapped by either the cT or the X axis of any frame. In such diagram, a frame takes a privileged position and has perpendicular axes, while the axes of the other frames get closer to each other as one increases the relative velocity with regard to the first frame, but they never overlap with what we could call the light axis, i.e. the null interval. Of course, all light intervals or so-called lightlike intervals can be described using a combination of a cT and an X interval (of the same size, so that their subtraction is 0). No problem, also in ordinary space a null interval (i.e. an ideal point) can be described using both an X and a Y axis (when they intersect, by the way). However, neither in ordinary space nor in spacetime do you get a single axis on top of a null interval, precisely for the indicated reasons: because the null interval is what cannot be directly measured (with the values that populate your axes) and because it is also what cannot measure (what does not furnish the values that populate those axes). 

As a bonus, I would make a link with the concept of eigenvector. 

In ordinary space, the null vector cannot be the eigenvector of the transformation matrix, i.e. it is not the vector that remains unaltered after a change of basis or transformation of values into the language of another frame. Here the eigenvector is the line crossing the origin of the system, which acts as a rotation axis, but it is not an ideal point without any extension. This makes sense because, in a generalized sense, the common thing is the element or agent with which you make measurements, which is the same no matter the orientation of the axis and the reference frame. Particularly, in this space, this is the extension of things, which as noted does not vary regardless the axis and regardless the perspective or frame, which corresponds to a rotation of the coordinate system.

But in spacetime things are different. Here the agent with which you make measurements is light or at least light's standard, so it is the eigenvector. But at the same time in this context the said eigenvector can be the null vector because, despite being a necessary condition for measuring, it is not sufficient: with it alone (i..e without walls where to bounce), you cannot measure.


Sunday, 2 May 2021

Fluids explained in terms of "involved" numbers



As discussed in the post "Forces and numbers", you can guess what happens in many interactions by taking into account how much mass should be involved on each side. As a way of speaking (a sloppy way, I admit), I call this a question of numbers, but the key realization is that what matters is not the number that there is, but the number that is ready to participate (i.e. be involved) in the interaction. 

I want now to go into more detail discussing fluids, both...

... static fluids

Here the question is if the body will float or sink into the fluid and to what extent. Books say that everything depends on the relative density of the body. That looks like pointing at numbers: it seems like the winner should be the one that encapsulates higher numbers within a given volume, but that is not enough, we still need to know how many of those guys will get involved.

a) Let us start with the case of a body (like a log of balsa wood) that is less dense than water. 

This object will float, yes, but the truth is it starts sinking... The reason is that the wood is solid and so there are rigid connections between its molecules. Thus when the first row of wood molecules clashes with the first more massive layer of water, wood soldiers will call the rest of their solid army through their connections and persuade them to take part in the interaction, to the extent needed. Instead in the water band the intermolecular cohesive forces barely serve to keep the particles side by side, but there is no solidarity among them: if one is pushed aside, it will slide over its colleagues, which will not participate in the interaction, until they are pushed themselves. This explains that the balsa wood commences by penetrating the water: it does because it outnumbers its "involved" opponents.

That does not mean, however, that the fight is over. The reason is that, although water molecules do not get "involved" in the fight until they are displaced, from that moment onward they do keep all of them involved.  Hence the wood log will only be winning the battle until it has displaced and thus involved in the interaction water molecules amounting to its own mass, i.e. (in a sloppy way) until it is equal-numbered by involved water molecules. This will happen when it is only partially submerged, precisely because it is less dense than water, so the latter manages to pack up the equivalent of the body's mass in only the submerged volume. 

b) If the test body were of the same density as water, it would only be equal-numbered when it became totally submerged, with its top side at the level of the surface of the water, because precisely at that moment the involved (displaced) water mass matches its own mass.

c) And if the body were denser than water, it would sink to the bottom because its mass will always outnumber the involved water mass opponents, which are as usual those displaced by its volume and hence of less mass than its own mass.

That is Archimedes principle, explained in terms of numbers or, if you want, in terms of "the winner is at each moment whoever involves more mass", taking into account that by definition the solid body involves 100% but the fluid involves the displaced mass.

As natural as this rule looks  (involved water molecules are those having been displaced), there is a story behind it. 

The reason you read in books is that (i) pressure depends on depth, (ii) pressure acts in all directions, (iii) pressure on each side of the body neutralizes, (iv) we are left with pressure from below acting upward minus pressure from above acting downward and (v) force is pressure per area, in this case, area of the sinking body, so the net force is (density of water * g * height of body * area of body) = m of displaced water/V displaced of  * g * h*A = m/V * g * V = m * g, i.e. you get the weight of the displaced water mass.

Isn't this as clear as cold and arid? 

In trying a more colorful alternative, I would start with noting the experimental fact that "water floats on water". In general, a fluid floats on itself. This means that it is at equilibrium: the weight of a column of water (gravity force exerted on it by the Earth) is matched and counteracted by support force exerted by what is below. This accounts for the fact there is a force upward equal to the column of water until the bottom surface of the body.

But why does upper water float and not penetrate the slightest into water? It does because it is of the same density, but with a difference with respect to what we said about solids: wood of the same density as water partially sinks because, being a solid, it does win the battle against the first layers of water by calling into service all its molecules; instead, water cannot do that, so it does not penetrate into neighboring water, but what it does do is pushing such adjacent molecules and thus transferring pressure downward, until it meets the ground. And the ground is another thing. The ground is a solid that exerts a constraint force: it is very massive and rigid and thus able to call into service as many soldiers as needed to avoid penetration and match whatever rests on it. That is why the support force sustaining the patch of water at each level of depth is equal to its weight: because it is held by the ground, with the water in between acting as a neutral messenger between above and below.

.... and dynamic fluids

Here the question is... I would say that in practical terms it boils down to whether a body may suddenly be pushed because of a pressure difference that causes the fluid to rush from some area to another.

Again this will happen because one side outnumbers the other, due to some physical configuration, of which there are two basic types.

One type is those situations where there is some sort of wall that produces this difference: bigger numbers on one side than the other. For example:

  • Narrowing: when a pipe narrows, the molecules at the wider section outnumber those at the narrow one, pressure is behind greater than in front, so the more advanced molecules are accelerated and hence acquire more velocity. Thus the water comes out of a hose with a nozzle at greater velocity. It is important here to disambiguate "pressure": we say that the fluid comes out with more external pressure, meaning that the ordered motion of the fluid as a bulk in one direction is faster, but that is because the internal pressure, meaning the random motion of its individual molecules in all directions, is lower at the narrowing than at the wider precedent area.
  • Shielding: a smaller boat passes by a bigger one, which shields it from the pressure of that side of the ocean;  so the mass on the other side outweighs the mass on the side of the passing big boat and pushes towards it the water and together with it the small boat.

The  other type is situations where directly the fluid is pushed and accelerated by some factor, thus reducing the numbers in that part:

  • You blow between two pieces of paper, so as the air is passing between them more quickly, the pressure is lower there than outside, hence the pieces of paper are drawn to each other.
In both cases, the cause is pressure descent and the effect is higher velocity. Don't be confused because in the second type we first accelerate the fluid. We do that, say, in the horizontal direction, but this creates a lower pressure area, which then again causes the fluid to rush in the vertical direction to fill such an area.





Monday, 12 October 2020

Why is v^2/r centripetal acceleration in uniform circular motion?


Abstract: To get motion that is uniform (velocity is not changing in modulus) but circular (velocity is always changing in direction), you need a force that is applied ever-perpendicularly (always giving a new direction, never giving more of the same direction) and you can only do that if the acceleration comes with a certain modulus, which in turn is proportional to (i) directly, the velocity that the object comes with and (ii) inversely, the distance, i.e. the radius. That is logical: if the object comes faster, you need more force to give it a new direction; if it is farther away, attaining circularity is less of a change of direction. In practice, you can achieve this by adjusting the force to the velocity (tension force) or by adjusting the velocity to the force (gravity force). To get to the specific formula (v^2/r), we take the approach of a sometimes given but less popular route: consider first velocity change. If you paint the evolution of the velocity vector after one round, with always same modulus but ever-changing direction, you get that velocity has changed 2πv. But to get acceleration, we should divide by the time, which is the path (2πr) divided by v. Thus we get 2πv/2πr/v = v^2/r.


We start with an object that is moving by inertia and hence with constant velocity v, i.e. in a straight line (no change in direction) and with always the same speed v (no change in modulus).

After applying acceleration generated by whatever cause, we stipulate that the object: 

  • takes a "circular" trajectory, which means that it will keep the same distance away (the radius) from whatever reference it is orbiting around 
  • and keeps "uniform" circular motion, which means that the speed v of the object remains the same. 

But, if you think of it, our acceleration source must take care of the two things (circularity and uniformity) together, it cannot produce one without the other: if something is causing circular motion, it cannot be at the same time changing the speed of the object. It may happen that the source causes a curve where both the direction and the speed change, but that will not be a circle: it may be an open curve (a parabola for example) or even a closed orbit if you wish (an ellipsis), but not a perfect circle. Why so? Think that any acceleration vector can be of two types: being perpendicular or not. If it is perpendicular, it means that it has no component at all in the direction of the body. (That is what perpendicularity means, total "unrelation":  if you project one perpendicular vector onto another, the shadow is none, that is to say, one vector has no component at all in the direction of the other). Therefore, it will have no effect at all in such former direction, it will not add or subtract anything to the modulus thereof. Keep doing it at each ideal instant and you will get that the object is always acquiring a new direction, but never more of the previous one. That is a circular trajectory, which we can thus define as something caused by an "ever-perpendicular" source having, therefore, an effect only in terms of direction and not modulus.

(Before you start disbelieving, note that I am talking about a single source of acceleration. If you factor in several sources, it may be, like in some example below, that you get non-uniform circular motion.)

Now, have we thus finished? Is it then a sufficient condition that the acceleration vector is constantly perpendicular to the inertial trajectory of the object? Well, yes... but you will not get that "constancy in perpendicularity" of the acceleration vector unless the latter also comes with a certain modulus. To understand why, it is convenient to jump, provisionally, from kinematics to dynamics and consider two idealized situations where you can attain such a result.

One possibility is with tension force. Think of this example: an asteroid that is flying by and you attach it to a planet with a very long string, so long that the planet, despite being very massive, is not exerting any significant gravity acceleration, all the job being done by contact force as transmitted by the string. Tension force is similar to normal force in that they are constraint forces: they will gather as much force as necessary to keep the body from penetrating the floor (in the case of normal force) or escaping away (in the case of tension). So the possibility that the asteroid flies away, as long as the string does not break, is not an issue. We can also trust that tension will always act perpendicularly because the string is attached to a pivot and hence rotating with the asteroid, so it will always act by giving it precisely the direction that it does not have. Thus we have attained our objective (circular trajectory), but only because we have somehow cheated: we have chosen a display where the acceleration is automatically adjusted to whatever is needed, not only in terms of direction (thanks to the pivot) but also in terms of modulus (thanks to the string acting as a constraint force).

(Once inside the circle, the asteroid, if endowed with an engine, could accelerate and achieve non-uniform circular motion, because the tension force will automatically adjust to do what is required at each new instant so as to avoid that the satellite flies away, but that non-uniformity is the work of the engine, not of the string.)

A second possibility is that now there is no string. The planet must do its job only by means of gravity. But acceleration due to gravity is what it is, it does not adjust to the target. That is a problem. Imagine that we launch the satellite from the height of a mountain in parallel to the ground. At this initial stage, gravity is acting perpendicularly to the body's path. But such a body will start falling... Thankfully the horizon is also going down due to the curvature of the earth... Now, if the speed of the object is too high, it will fall at a lesser rate than the fall of the horizon, perpendicularity will be lost and the object will not follow a circular path; it may still be caught in an elliptical trajectory or it may even fly away. If the speed is too low, again perpendicularity will be lost and the body will be speed up towards the ground. But if it is the right speed, if we strike the adequate balance between the inertia of the object (as given by its speed) and the gravitational acceleration, then the body will fall at the same rate as the ground and we will get a circular trajectory. (That is why in nature gravitational orbits are usually not circular, but parabolic, elliptical and so on.)

We could look for other examples, but by now it should be clear that the applied force will have to match somehow the velocity of the object, either (i) because the force self-adjusts or (ii) because the velocity is the right one by chance or, more probably, because it is adjusted by very careful design. (Notice that we are not saying that the driving force that is responsible for the acceleration "has" a velocity v. The planet around which the satellite orbits does not have such velocity. This velocity is just the challenge that it faces and which determines what the force must be, to be up to the task.)

On another note, we can also guess that the distance to the object, which should become the radius "r" of the circle, also matters, but in this case in inverse proportion: the farther away from the center, the flatter that the circular trajectory will look like and so a change in direction at this softer rate will be less demanding.

Thus we have guessed that centripetal acceleration for circular motion must look like v/r... 

To find out the exact formula, we can start looking, not for the time rate of the change in velocity (which is what acceleration is), but for the velocity change after a while, say for instance after a full revolution. 

It helps us to know that the modulus of the initial velocity vector is always equal to the final velocity vector. Imagine that we paint those vectors as radii of a circle and we draw red lines between each initial and each final vector, which red lines correspond to the velocity increase vector for each lapse because indeed they correspond to the vectorial subtraction of final velocity from initial velocity. That is the picture drawn at the start of the post (sorry I cannot cite the source of the picture, I forgot where I took it from).

Because we have measured at intervals, the circle looks like a jagged wheel. But if we now imagine infinitesimal changes, we will compose a perfect circumference. The length of a circumference is 2π times its radius, which radius we have stipulated to be the original and never-changing modulus of the velocity. All this leads to a modulus (for the velocity increase) of 2πv.

Here there is a little logical jump to be made. Which is the "while", i.e. the temporal lapse that the said velocity increase corresponds to? The writings that use this method quickly assume that it is the time lapse required for the object to describe a full round. This is true but not so obvious. Let us not forget that the above picture does not emulate the spatial trajectory of the satellite: the v radii actually paint the constant speed of the orbiting object, which is always spatially tangential (perpendicular to the spatial radius), whereas the red vectors paint how this tangential velocity spatially changes direction (in parallel to the spatial radius). In other words, the v circumference is symmetrical to the spatial circumference. I am also assuming that this works, although I would like to see a more categorical confirmation.  

Anyhow, this 2πv that we have landed upon is the accumulated velocity increase after a round. But we were looking for the acceleration, that is to say, a ratio of the change of velocity by time unit. Therefore we need to divide our velocity increase for the round by the time required by such round. This time is the space traversed by the object along such round (i.e. 2πr) divided by the constant velocity v. Thus we get 2πv/2πr/v =  v/r/v = v^2/r.

Finally, just note that in real problems this weird "circular acceleration" may not be (in fact very often it is not) something happening autonomously: it is usually a component of another force, but it does help to know its value. And it may also happen that the speed of the body is changing (it is accelerating tangentially, like a bob in a pendulum), but it does help to know the acceleration that is needed at each instant (in view of the relevant v) to keep it in a circular track.


Wednesday, 20 May 2020

Forces and numbers

With coronavirus forcing people's confinement around the world, many of us are reminded of the time when Isaac Newton flew away from the bubonic plague to seek refuge in a farm belonging to his family and during the two years of isolation that followed, he produced genial work. I don't have his genius nor the time he disposed of, since unfortunately I keep teleworking and I am also dispersed with other studies, but I will at least share some thoughts on Newton's laws of motion that have been haunting me these days. No discovery at all, of course, but at least a way of presenting things that I deem pedagogical and is not usually mentioned.

The gist of the message is that the strength of a force is not something that is given prior to the interaction between two objects, but depends on how much of the respective masses gets involved. One could simplify this as a question of the "number of soldiers being called to service". A question that requires two steps: 
  • only if the acting force is gravity, then the number is 100%, because gravitational force gives particles no option to take part or not in the war; things are transparent to gravity, so the force affects all its targets equally, at least in local phenomena where the field is uniform; 
  • in other cases, like in contact forces, the force triggers on the constituents of a body a chain reaction, so how many particles are involved depends on their readiness to cooperate in the interaction, which in turn has to do with the cohesion of the material (whether it is solid or fluid);
  • in particular, in an interaction between a medium and an object falling due to gravity, the mass that will cooperate in the interaction will be 100% of the mass of the less massive object (when the support material is a solid) or the mass of the displaced fluid (when the support medium is a fluid).

Things are transparent to gravity and hence do not resist it

Gravity is a force that acts at distance. In that, it is like electrostatic force. In fact, the formulas for both are very similar: G (Mm)/r^2 and k(Qq)/r^2. And in both cases the way to act at a distance is not mysterious: both a charge Q and a mass M create attraction "fields" over their surrounding spaces (well, in the case of Q that is for opposite charges; for like charges, it is a "repulsion" field). A massive or an electrically charged object send around messages like this:  <<I am here with my mass "M" or charge "Q", so if you are a mass "m" or a charge "q", respectively, please be accelerated..." The first time that the field is created or if it changes, the message will take some time to reach its targets (if the Sun disappeared, the effect on the orbit of the Earth would only be perceived after some minutes), but once the field is there, it accelerates whatever falls under its net. The question for us is: at what rate? Let us look at the factors that compose the above-mentioned force formulas:

- Intentionally we leave aside, for the time being, the constants "G" and "k".

- We choose to reason on the basis of a uniform field: we consider a given distance "r" and then we assume that any additional distances till one target or the other are negligible, so "r^2" is a fixed value. 

- The next step is dividing by the mass of the object which is under the influence of the field. This results from Newton's second law of motion: Force = mass times acceleration (F = ma), so to guess acceleration you must divide F by m.  In the case of electrostatic force, this means that don't you know in advance how much acceleration you will get, unless you are also given the mass of the object in question. Instead, things are easier in the case of a gravitational field, because "by chance" you also had "m" at the numerator of the force formula, so when you divide by "m" as well, both terms cancel out. This means in practice that, in order to obtain the acceleration caused by, say, the Earth on any object located in its gravitational field, you don't need to know the mass of the concerned object: you just need M (the mass of the source), G (which is a constant) and r (which we deem fixed in a uniform field). The "passive" mass is hence irrelevant.

This is the equivalence principle, which was first enunciated by Galileo. All objects are accelerated towards a given gravitational source, say the Earth, at the same rate, no matter their respective masses. Of course the crash between the Earth and a neutron star (an incredibly massive object) will happen in much less time than the crash between the Earth and a feather, but that is only because the Earth itself will be much more accelerated by the star than by the feather. But both star and feather will pass by a "neutral" reference point (an inertial one, i.e. one that is not accelerated itself) at the same time, because as I said the two things are accelerated equally. Certainly, in a real-life case, at the surface of the Earth, we have air resistance, but that is a complication we are not considering now, as we are analyzing a "pure" gravitational attraction. So, without an atmosphere, if you drop one feather one day and repeat the experiment a week later with a steel ball and you record the meeting times, would they be the same? Well, strictly speaking, no, because as I said in terms of meeting times, it does count how much the object attracts the Earth; so, in theory, there should be a tiny difference in favor of the steel ball (less meeting time), although this advantage would be ridiculously small and utterly negligible. However, if we released feather and ball together, they would be jointly attracting the Earth at the same time, as if they were the same object, so even such an unperceivable difference would not arise.

So, by looking at the sheer mathematical equation, we see a crucial difference between gravitational and electrostatic force: how much m1 or m2 will get accelerated, depends (in a uniform field) only on one variable factor, i.e. M; instead, in order to guess how much q1 or q2 will get accelerated by Q, you need to know much more: not only Q, but also if we talk about q1 or q2 and also if q1, for instance, has mass m or m' or whatever. 

What is the physical difference that this math is reflecting?  

Sometimes (even in text-books) you hear that this is so because there is a funny coincidence: the Earth attracts more the steel ball than the feather, but the steel ball would "resist" more, so there is a trade-off between the two things and thus ultimately the accelerations experienced by ball and feather would be identical. I dislike this explanation, because a variable "resistance" implies a variable "action": it suggests that the Earth is adjusting its force to the opposition that it may encounter; it sounds as if the Earth were acting like a radar, i.e. sending scouts around, getting feedback that the challenge is harder or easier, processing it and then sending back an appropriate action. But this is not true: as I said, the Earth creates a field in the surrounding space, which means that it sends out messages, but it does not care about "who" may be on the other side of the line and it does not wait for an answer.

To provide an alternative explanation, it comes useful to look at the factors that we had temporarily left aside: the constants "G" or "k". "G" is constant everywhere, irrespective of the medium through which gravity propagates. Instead the "k" for the electrostatic force depends on the characteristics of the medium, which may be more or less apt to propagate the force. This means that certain materials can act as an attenuating filter for electrostatic force and even as a barrier (that is the case of insulators). Instead, gravity cannot be screened or attenuated (other than by distance). 

This becomes important when the force reaches the target because its atoms can also be regarded as a medium. If an electrostatic influence reaches two equally charged objects, the charges in them will react in a similar manner, but such charges are incrusted in their respective bodies, which act as loads that must be dragged along. Thus the participants in an electrostatic interaction do "resist" in the proportion of their respective masses or, which amounts to the same, they get accelerated in inverse proportion to those masses. Instead gravitational force travels unimpeded through the body in question and reaches all its constituents without the need for mediation: transmission of the message happens "mouth to mouth". Thus the same message reaches all particles of the body, which are thus accelerated equally, without the need of any of them pulling from the others... (Well, for completeness, I should recall that we are always assuming a uniform field across the accelerated body: that the distance to all parts of the object in question is irrelevant and so there are no "tidal effects", which is what happens when a participant in the interaction is so big that the field ceases to be uniform. For example, the side of the Earth facing the Sun experiences a stronger field than the side facing away, which is farther away by a non-negligible distance... But that is not the case we are considering, so we leave it aside.) 

Hence the equivalence principle holds, not because heavier (i.e. more massive) objects resist more, but precisely because they do not resist: no matter if the object has many or few constituents, they are all reached by the gravitational field simultaneously and react in unison, without mutually interacting, so that is why they "free-fall" (in vacuum) side by side. Thus we have accounted for the acceleration version of the formula: acceleration of anything in the vicinity of a ball is m; acceleration of anything in the vicinity of the Earth is M. What about the force formula, how do you compose it...? Well, it is curious, because we have just ruled out "resistance", but it immediately returns through the back door: it may be that a body does not physically do anything to resist gravity, but anyhow it must suffer its effects in inverse proportion to the resistance (or inertia) that it does show up in other types of interaction. Explaining why requires a little excursus. 

If things don't resist gravity, then they must attract by their resistance capacity

Note that the "active gravitational parameter" (what causes gravity and goes into the gravitation formula) could theoretically be a factor having nothing to do with inertial mass.  This is what in fact happens with electrostatic attraction, where the strength of the interaction is given by the involved electric charges (Q and q), the distance r and the constant k, nothing else. It would be no problem if the same happened with gravity, i.e. if this force were created by a sort of gravitational charge being unrelated to mass. But what is indispensable is that the ensuing effect (acceleration) be distributed in inverse proportion to inertial masses: if we called such gravitational charge C and c, we would need acceleration by Cc/m and Cc/M, respectively. In other words, we need the objects to meet at the "center of mass" (COM) of the system that they jointly form. 

That is a must in all interactions. I will give you some other example that a guy told me about time ago in a discussion at a forum (I will even quote him):
  • Two magnets, M1 and M2, are sitting on a frictionless surface with their opposite poles facing each other. Both magnets have the same inertial mass, but M1 produces a stronger magnetic field than M2. Now, will M2 have a higher acceleration than M1? Where will they meet? Of course, they will have the same acceleration and they will meet at their COM (which is determined by their inertial mass), which in this case is the mid-point.
  • You and I are sitting in chairs on a frictionless surface. I reach out and pull you toward me. The result is that we both meet at our COM. Now we repeat the experiment, except this time you reach out and pull me toward you. The result is the same. We both meet at our COM. We repeat the experiment again, except this time we both pull simultaneously, me pulling with a slightly greater force than you. The result is exactly the same. We will always meet at our COM.
The same happens when the force in question is repulsive. If two charges are mutually repelled, the total effect will be dependent upon the charges (Qq) but it will be distributed in inverse proportion to the masses. Likewise when two balls collide: in the simple example where they have the same masses and crash against each other at the same speed, if the collision is perfectly elastic (no loss of energy), the ensuing acceleration will be evenly distributed: each ball will keep the same speed but in opposite direction, because the masses are equal.

This example serves to justify why it must be the same with gravity. Imagine that the two equal balls A and B are mutually attracted due to gravity. Then, if their "resistance" (what we could call their passive gravitational mass) were not proportional to inertial mass, but to something else, the two balls would not meet at the center of mass, but let us say closer to A. However, then they bounce against each other and now they must be accelerated in the opposite direction in proportion to inertial masses, that is to said equally. The outcome of this would be that the system as a whole would be accelerated in the direction of A. It would have self-accelerated (accelerated without any external force acting on it), which would violate Newton's Laws of motion. And now realize that the same would happen when the system is a compact object: if the constituents of a body were thrown against each other by gravity and not "resisting" in proportion to masses, the body would self-accelerate.    

So what is the solution? Well, if we want it both ways, if we want the equivalence principle to hold, i.e. all masses being attracted equally (out of a physical reason, i.e. "transparency") and we also want that the gravitational effect is distributed between the two sides in inverse proportion to inertial masses (out of respect for Newton's Laws of motion), we need that the active gravitational mass ("gravitational charge") be equal to inertial mass.

Translation into a mathematical derivation

I have the disadvantage that I need to express in many words what others see with a formula. But I have also the advantage of always trying to harmonize the two languages. So I will repeat now the whole story with formulae. 

I find it more convenient to reason on the basis of the so-called "reduced mass" reference frame because this handles the total acceleration between the two masses. This frame represents the perspective of one of the masses participating in the interaction (let us say it is M, because it is more massive). M is being accelerated itself, but we disregard that and we attribute all the motion to the other one (m). Based on a derivation that I will spare you, this means that the mass of the object that is deemed to move is a combination of the two masses, which takes this form: Mm/(M+m). This is called the "reduced mass" because it is a little less than the effective mass of the system (M+m) and it is usually referred to with the Greek letter μ. Likewise, the acceleration of m is in fact the "total acceleration" or sometimes called the "relative acceleration".

So let us start with an interaction where we do not pre-judge if the "charge" (the cause for the acceleration, here Q and q) is related or not to mass. In this case, in accordance with Newton's laws of motion, the relative acceleration will be the force F divided by the reduced mass, as follows:




Now let us do what is also usual: we have started with a reference frame where it is easier to reason, but let us move to the habitual one, which is an inertial reference, for example, the center of mass of the system, towards which the two bodies accelerate at their respective acceleration rates. For this purpose, what we have to do -always in accordance with Newton's laws- is distributing the total acceleration in inverse proportion to the masses, i.e. M takes a share of acceleration equal to m's share in the sum of the masses and vice versa, like this:


Now, if we asked no more, gravity would be something akin to electrostatic force, i.e. it would be caused by a sort of gravitational charge that is unrelated to inertial mass. But it cannot be so, we need that the red term vanishes out, so that the acceleration of any object (no matter its mass) is conditioned exclusively by the mass of the other object (equivalence principle). We need it because that is what matches experiments and what logic requires, in view of the also empirical fact that things are transparent to gravity. And the only way to obtain it is that Qq is equal to Mm, i.e. that the factors generating the gravitational effect are the same that determines its distribution between the objects in question.

Support forces: ground and fluids

Clarified this, we can move to the analysis of the other type of force, where inertial mass does play a passive level, i.e. it triggers actual resistance. Time to talk about contact forces. We will look in particular at support forces: those that are present when you are holding an object, say a steel ball, which is attracted to the Earth by gravity, but -without allowing it to fall- you gently lay it on the ground or on a fluid (water or air, for example). Let us consider each of these two cases.

The steel ball seeks support on the ground

Here we label the support force as "normal force", which is equal to the weight of the object, i.e. mg, but of the opposite direction. Because of this equality, students tend to make this mistake: they think that weight and normal force are the pair of forces referred to in Newton's third law: the object would exert weight (= - mg) on the ground (which would be the action) and the ground would exert a reaction force (= +mg). The next step is wondering: here the object is at rest, but if there is always, by definition, action and reaction of equal magnitude, how come that on other occasions objects move at all, how is it that, for example, a steel ball sinks through water?

The mistake is what I wrote in blue: we are not in the face of a Newton third law pair, because weight is not the force that the object exerts on the ground. Weight is the gravitational force that the Earth exerts on the object, whose 3rd law pair is the gravitational force that the object exerts on the Earth. This is so before Earth and object come into contact. If we now place the object (say the steel ball) on the ground, then it exerts contact force (which is basically electrostatic in nature, i.e. electron shells repel each other) on the ground, which in turn exerts a contact force on the object. Those two are also a Newton 3rd law pair. Thus we have four forces, ordered by pairs: each pair is of a given nature (either gravitational or electrostatic), of a given magnitude (same amount on both sides), but reciprocal (each element of the pair acts on its opponent). If you want to guess whether the ball moves or not, you need to look at forces acting on the same thing, by picking them out from each pair: here, for example, forces acting on the ball are gravity exerted by the Earth, pointing downward, and contact force exerted by the floor, pointing upward. In this special case, the support contact force always matches gravity force, so it does make the object motionless. But there may be other cases where the support exerts a stronger force and pushes it upward (as it happens with hot air being pushed up by cold air or cork being buoyed up to the surface of the water). Or it may happen the other way round (gravity is stronger than support force), as it happens when a steel ball sinks in water.

What is the key to telling one case from the other? Here terminology may be a little confusing. We call "normal force" the one exerted by the ground and "buoyant force" the one exerted by the fluid. The reason for the word "normal" is that the plane pushes the ball perpendicularly. But actually "buoyant force" also acts perpendicularly, so we could have called them both "normal"... Better denominations would be those revealing the real difference, the factor accounting for why contact support force is sometimes equal, other times stronger or weaker than gravity, thus causing the ball to remain still or rise or fall... Obviously, the reason has to do with whether the objects involved are solids or fluids.

When we talk about solids (say our steel ball is resting on the rocky ground of the Earth), the support force automatically matches gravity because there is discipline in both armies, the army of the ball and the army of the ground. The defining characteristic of a solid is that cohesion among its particles is high (strong bonds), so when they are pushed, they hold tight, backing up each other, to avoid penetration. 

When applied to the ball, this means that it will transmit the force of gravity. Certainly, the idea that weight is the force exerted by the ball on the ground is not accurate, but it is not without any truth. Weight is the force suffered by the ball, but the ball "transmits" it to the ground. When I step on a ball, the ball conveys my push to the floor. Likewise, when the Earth pulls from the ball, the ball also conveys this pull, in the form of a push, to the ground. But it is important to note that this transmission is fully effective in this case only because all the constituents of the ball, when facing the opposition of the ground, find the collaboration of their colleagues and cooperate to avoid being disbanded.

Likewise, with the ground. The Earth is being accelerated by the ball, which means a tiny acceleration but of many constituents, matching exactly mg. But even if I helped the ball by jumping on it, the ground would resist, as it counts for this purpose with huge resources, which are fully committed in terms of cooperation, so the call will be answered by as many particles as are necessary to avoid penetration. (For this reason, "normal force" is called a constraint force, because its effect is banning a certain direction of progress for the object: something clashing against a firm ground can take any direction it pleases, except penetration. Tension, by the way, is another constraint force, whose role is avoiding that the object escapes away.) And how many are in particular needed? Well, I am not going to say that we can ascertain such a number. What we can guess is that it will be particles sufficiently committed to stop the progress of the ball. 

Thus it turns out that this time the support force exerted by the ground on the ball matches the gravity force on the same ball. But it does not need to be like that. It is not like that when...

The steel ball seeks support on water

Here the start point is identical. The ball is being pulled by the Earth, the ground (including the water) is being pulled against the ball. However, the interaction will be weaker because of the lack of firmness of the support. The water's army, certainly, may still count with many, many soldiers (think of the sea for example), but they show less solidarity: cohesion among troops is weaker in a liquid and even weaker in a gas. The sequence is as follows. The surface of our steel box touches the water. Steel is denser than water, which means that the first layer of steel soldiers brings into the interaction more mass than the first layer of water molecules. Unfortunately, the latter don't have strong bonds with their liquid medium, so when they are outnumbered, they slide over one another, giving way. This does not mean that the water soldiers abandon the battle: the whole army keeps attempting to recover the lost volume, from all directions, as they also "fall"  attracted by gravity: the water molecules placed between the surface and the bottom of the object push, like a piston, those below, which in turn bounce against the ground and push the ball upward;  those above push downward the top side; while the lateral ones neutralize each other. The net difference is that the object is always pushed upward by those water molecules that have been displaced by its volume, which is the Archimedes principle. Thus, unlike in the previous case, because the ground is a liquid, it does not involve unlimited resources: only those that I have just mentioned. That gives us the measure of the contact force applied by water on the ball (upward): mass of displaced water times g (for completeness, "g" makes sense because the particles pushing from the top are falling by gravity whereas the lower ones have been pushed by others also falling by gravity and have rebounded due to normal force exerted by the ground). 

This will also be the contact force to be exerted by the ball on the water (downward), not only because this follows from Newton's third law, but also because we know the reason: if the support water medium does not involve more soldiers in the battle, neither will the invading army of the ball. By this I do not mean that only a limited number of particles will participate on the side of the ball: I am not sure if all will participate or not, but in any case, only this level of "mass cooperation" will be requested. If you think of it, this is the same that happens with the Earth in the case of normal force: it only does as much as needed to avoid penetration.

In conclusion, we can guess the value of the support force exerted by a medium against (and also by) an object falling due to gravity, as per this rule of thumb: it is g times the mass that will cooperate/be involved in the interaction, which is 100% of the mass of the less massive object (when the support material is a solid) or the mass of the displaced fluid (when the support medium is a fluid).


Sunday, 12 January 2020

Free photons are massless... because they are free!



The correct thing to say is that photons are massless, even if they have momentum and energy, which is what explains that they interact with matter: that is what happens in the photoelectric effect, for example, where photons knock electrons out of a metal plate.

This poses, however, two difficulties for us students:

First, how to reconcile this with the definition of momentum as "mass times velocity"? Doesn't this mean that one of the two things is wrong: either light is massless, in which case it should not have momentum, or it does have momentum, in which case it should have "some sort of" mass...?

Second, the famous formula:

E = m{c^2}

which people wear in T-shirts... doesn't it mean that light, which is energy, is mass?

Given this, some physicists answer: well, light does have mass, it has the so-called "relativistic mass",  that is why it interacts with things and that is why it has momentum. To justify this, they follow this route:

- The fully-fledged formula is actually not the one that you see in T-shirts, but this longer one:

{E^2} = {({m_0}{c^2})^2} + {(pc)^2}

- In the theory of relativity, momentum is not simply m*v, but is preceded by the so-called Lorentz or gamma factor:

{\bf{p}} = \frac{1}{{\sqrt {1 - {{(v/c)}^2}} }}{m_0}{\bf{v}}

- If you replace p in the first formula with its relativistic value, after a few algebraic operations, you get:

E = \frac{{{m_0}{c^2}}}{{\sqrt {1 - {v^2}/{c^2}} }}

- In this new formula, m with a subscript (mo) is "rest mass", that is to say, the mass that a particle at rest with you might have. A free photon cannot be at rest with you because you cannot travel at the speed of light, but it can have relativistic mass, which is rest mass times the gamma factor:

m = \frac{1}{{\sqrt {1 - {{(v/c)}^2}} }}{m_0}

- But then... if v in this formula is the speed of the particle and c is the speed of light, when you speak of a photon, it turns out that v = c, so the gamma factor is 1/0... If you add to that that the rest mass of the photon is also 0, the expression looks as follows: (1/0)0. Anyhow, it looks like an undefined expression. How do you make sense of this?

Well, you can ask those physicists who defend this approach. There are several explanations around, but I simply don't grasp them. I will only mention that the "excuse" is often that v tends to c and you get the undefined only in the limit, when v = c....

In turn, the majority of physicists reject the above reasoning and the very concept of "relativistic mass".

For them the famous formula,

E = m{c^2}

is only applicable for particles at rest with the relevant reference frame, not for free photons. In other words, for massive particles or for a photon that is trapped within a massive object. In effect,  a photon can contribute to the mass of an object if, for example, it goes into a cavity with perfectly reflecting mirrors: in this case, if you could measure with an incredibly precise instrument, you would be able to detect a tiny increase in the inertia of the object. However, as long as the photon is free, it has no mass, full stop.

The photon does have momentum, although its value does not come from the usual formula (mass times velocity), but from quantum mechanics. The reasoning is in particular:

- In the above full formula, replace m with 0 and simplify as follows:

\begin{array}{l}
{E^2} = {m^2}{c^4} + {p^2}{c^2} = {p^2}{c^2} \to \\
E = pc
\end{array}

- As per quantum mechanics, the Energy of a photon is its frequency f times the Planck constant h:

E = hf

- So the photon's momentum is:

p = \frac{{hf}}{c}

Said this, what is my point? Well, I want to highlight that the mistake of those "some physicists" lies at the step of replacing p with the formula for momentum (relativistic momentum, in this case) and what this error means, because this is a very common issue in many fields of knowledge.

You cannot do this move (if we were playing, we would say that it is an "illegal move") when you are talking about a free photon, because, if you do, you are using the concepts at hand ("free photon", "mass" and "momentum") beyond their respective domains of applicability, which in turn have been defined based on the observation of how nature works.

It is not strange, therefore, that you arrive at an undefined, i.e. a dead-end street, which is proof that you have previously taken a wrong turn. An undefined is an expression that breaches the rules to make meaningful mathematical expressions, so whenever you end up with one is because you took the wrong path.

And it is not an excuse to say that, "in the context of calculus, you often encounter undefined expressions, but this difficulty is overcome through the use of limits". It is true that in calculus, limits enable you to get rid of the undefined, but that is because the undefined should never be there from the start and what you do by resorting to limits is solving that initial trouble!

I will explain myself briefly because the matter deserves a calmer discussion and now I just want to outline it. When you seek, for instance, an instantaneous speed (a derivative), you are frustrated because we define speed as the ratio between space traversed and time elapsed. This is a logical thing to do,  because that is the way to physically measure speed. Hence if something is instantaneous, you should divide by 0, which is mathematically forbidden... But you don't despair, because you understand that one thing is the operative definition of concepts (how in practice you obtain their value) and another is the reality that the concepts aim at grasping. In this case, such reality is a "state of motion" that can be perfectly growing continuously and thus has a different value at each instant. This reality surpasses measurement and surpasses the algebraic expression of a ratio. So the trick cannot be easier: get rid of the ratio, i.e. make a number of algebraic operations, so that the time interval is not placed anymore at the denominator of a fraction. Then it is said that you make such an interval tend to 0 and eliminate it. I would simply say that you make it equal to 0, because the concept itself, in its purest form, does not need of any time interval at all.

In conclusion, in calculus we get an undefined because we ourselves created a language problem (we used a ratio to refer to what is not a ratio in nature). So it is legitimate that we sort out the problem by trying to extract the underlying reality that our language was hiding.

Instead here, if we get an undefined, it is because we are being inconsistent with nature and with the concepts that we have adopted to reflect nature.

The key assumption is that there are things that can travel at the speed of light (we call them light, among other things) and others that can't travel at the speed of light (we call them massive). [I would add that sometimes the former (light) becomes the latter (mass) when it gets trapped among the walls of something massive. And I would like to generalize the idea to other forms of energy, but this would be a speculation beyond the scope of this post.]

Given this, the momentum of massive things is defined as mass times velocity (preceded by gamma factor, if you want to account for relativistic effects). Hence this concept is designed from scratch as a one being applicable only to objects having mass, that is to say, being slower than light. Therefore, if you plug this definition (gamma *mass * v) into the energy formula, it must be because you are assuming that you are in face of something massive, that is to say, whose v is < than c. Otherwise, if you were thinking of a free photon, you would have simply made the wrong move. So forget about playing with limits for justifying your mistake.

Last but not least, under this light, if free photons' momentum is just hf/c, couldn't we dispense with this concept and talk only about light's energy, especially bearing in mind that, if we use natural units (whee c = 1), the numerical value of momentum of light equals that of its energy? I initially thought that, but then I realized that energy is a scalar (it has just magnitude), whereas momentum is a vector (it has direction)... Thus the concept of photon's momentum comes useful when analyzing its collision with another particle, like an electron ("Compton shift"): after the collision, each particle takes a direction (it is said that the photon is "scattered" and the electron "recoils"), but both directions are correlated, because momentum must be conserved. Thus by observing the direction of the photon's scattering, one could deduce the direction of the electron's recoil... were it not for the fact that... how can we guess the scattering angle of the photon? This remark paves the way for an interesting reflection about the uncertainty principle and randomness, but that will be another day.

Wednesday, 1 January 2020

Baptizing perpendicularity and understanding dot product

In my studies, I like to spot mental models that appear repeatedly, in either the same or different areas. This is useful, because often when analyzing a problem, you hit on the idea that it can be solved through a model that you are acquainted with.

One of these models is the "generalization of a concept", which is often used in Mathematics and about which there is extensive literature.

One way, among others, to look at this process is seeing it as the realization that one was assuming an arbitrary restriction in the conditions of the phenomenon. Since reality proves that the world does not always have such restriction, it is lifted. Thus you elaborate a more general concept, which is valid for cases built with and without the relevant restriction.

But the point of this post is that, once that you have given birth to the general concept, you should be able to baptize it. By this I mean, not so much giving it a more or less appropriate name, as giving it a faithful abstract description, which communicates its essence. This is what is going to enlighten you as to the meaning of the concept in question, both at the elementary and the general level. Thus it may happen that you do not understand well what you were doing with an apparently basic idea until you super-generalize it; or you don't grasp the general concept unless you see its evolution from the basic level.

I will illustrate this idea with an example: the generalized meaning of perpendicularity and how this helps understanding why and to what extent dot product works, when done analytically, i.e. component-wise. In this work, I will start with a more mathematical/abstract approach, but soon give way to the paradigm that this Blog promotes, which is: see everything from a practical point of view as a problem-solving technique.

Perpendicularity or orthogonality

This notion appears with a geometric meaning: two vectors (understood as "little arrows") are perpendicular if they form an angle of 90 degrees.

However, geometric definitions work in 2D or 3D spaces.  But what if you lift this restriction and start playing with 4D or 10D or even infinitely dimensional vectors? (That is by the way another nice generalization: functions are infinitely dimensional vectors, where the input plays the role of dimensions and the output acts as coefficients or coordinates or values of the vector in each dimension.)

Well, in that case, you use the algebraic version of the dot product.

The dot product initially receives also a geometric definition: it is an operation whereby the moduli of the vectors are multiplied, but then you apply a percentage, a trigonometric ratio (the cosine of the angle formed by the vectors), which in turn measures to what extent the vectors point in the same direction. Thus when one vector is lying over the other (they are parallel = angle is zero), the cosine is 1, so the ratio is 100%, because both vectors point in the same direction. Instead, when the vectors are perpendicular (angle is 90 degrees) cosine is 0, which means that the ratio is 0% as well, because the vectors point in totally different directions.

However, the cosine is of no use anymore after you leave 3D behind. Fortunately, the dot product technique also evolves to adapt to the new higher-than-3D environment and takes an algebraic form guaranteeing the same effect: the dot product is also the result of (i) multiplying the respective coordinates and (ii) adding up those products.

This works for the simplest cases. For example, the dot product of the unit vectors of a 2D orthonormal basis [(0,1) and (1,0)] is 0x1 +1x0 = 0 + 0 = 0, thus proving that such vectors are perpendicular. But it also works in the advanced cases, like in Fourier analysis, where the sum is an integral (because the number of dimensions is infinite), but the structure is analogous.

Time now for the baptism. We have kept the term "orthogonality", but this is just a vestige of the geometric context where the concept was born, in which dimensions were directions in the plane. The abstract meaning is "total dimensional discrepancy", that is to say, if we are comparing vectors a and b, a has no component at all (zero amount) in the dimensions where b has components and viceversa. As to the dot product, in the new context, it is often given another name, "inner product". But more important than that is its new abstract meaning: if it initially revealed the extent to which vectors share a direction in the plane, in a generalized sense it means "dimensional similarity". Thus for example in Fourier analysis dimensions are time points in one reference frame or frequencies in another. 

Problem: orthogonality is the aim but it is also the condition

Now to the problem. This algebraic dot product is a tool for detecting orthogonality, but at the same time it only works when there is orthogonality, in that the basis is orthogonal. How to prove this?

At StackExchange forum, I have found an answer that presents the solution very "mathematically". Let us follow it and later check how you could have also followed this path "intuitively", just by understanding the deep meaning of perpendicularity:

We first multiply the components of the two vectors in a "total war" manner (each component against the other two):



Then we we apply the definition of the dot product, in two senses:

  • the product of the two unit basis vectors i and j, if we are requiring that the basis be orthogonal, will be again 1x1 but multiplied by a 0% ratio of dimensional coincidence, so it is 0; because of this, the two middle terms vanish out;
  • the product of the unit basis vector i with itself is the product of the moduli (1x1) with a ratio of 100% dimensional coincidence, so it is 1; the same applies to the product of j with itself; because of this, the first and the last products become simple scalar products of the homogeneous components.
Thus the expression reduces to the following:



We thus conclude that this algebraic dot product technique is valid to the extent that we are relying on an orthogonal basis, because otherwise the two middle terms would not have vanished out and the answer would depend on which angle, other than 90 degrees, separates the two basis vectors, i and j.

Now, the intuitive and practical approach. The coordinates of a vector are like the information provided by Cinderella's slipper: quantities that you measure to serve as "clues" for catching criminals (solving problems). You can also call them "whistle-blowers". Obviously, a spy who repeats exactly the same as another one, is superfluous: you shouldn't pay him! Hence the minimum requirement for hiring a set of whistle-blowers is that each of them contributes with something new, even if they somehow repeat themselves. In mathematical jargon, it is said that those informers are "linearly independent". I would say that they are "helpful". But one that provides totally fresh and new information might be preferable, because it is "original" (technically, "perpendicular"), so that this way you can optimize your network: each specialist will investigate a different fact.

Let us check if that is the case. The dot product is like combining two sets of reports about two suspects (two vectors), so as to check to what extent both are "pointing to the same solution of the crime". For this purpose, the informers lay their reports on the table. All possible combinations among reports are like the "total war" product mentioned before. But soon you realize that you can mix apples with apples and pears with pears, but not apples with pears.

Combining apples with apples is what you do when you multiply homogeneous (100% dimensionally  coincident) quantities, like in the above mentioned first and last terms: ax* bx and ay * by.  For example, you combine the reports for direction X (ax* bxand it may happen that you get a higher or  lower positive product (because you combine + with + or - with -) or you may get a higher or lower negative product (because you multiply  + with -); that will mean that direction X contributes, respectively, with a vote for coincidence (if product is +) or for discrepancy (if it is -). Finally, you do the ballot or vote counting: you add up the mutually scaled reports for X and Y and thus get the modulus of the overall coincidence.

Should you also add up the the middle terms, i.e. the products between heterogeneous quantities, between apples and pears (ax* by or ay * bx)? No, because by definition you know that these clues are not overlapping at all, they refer to completely different facts; hence if the purpose is to learn to what extent they point the same direction (generically, they are dimensionally coincident), the answer is zero, so they make your life simpler by not casting any vote.

That is how orthogonality at the level of information sources (definition of basis vectors) helps you detect orthogonality (or any other degree of dimensional coincidence) at the level of problem solving.








Thursday, 5 December 2019

Derivative as a "rail" for growth

As I have another Blog in Spanish where I mix Mathematics with law, literature and no matter what, I am going to be more restrained here and speak only about Math and Physics. 

Since I wrote latest posts, I have learnt much more, so I think that I can aim at posting only things that should be orthodox, even if I cannot help being somehow “creative”.

Today’s post is about a visual way for understanding a differentiation rule, the “power rule”.

Let us choose a well-known function, like L = f(t), which takes as input the time elapsed and generates as output the length traversed by a body within that time lapse.

This function, like any function, can be viewed as the product between the input and the average change rate. This is self-evident, since L is obviously v * t = (L/t) * t = L.

In the simplest manifestation of the function (say for example L = 2t), v is a constant, so it will be the same no matter the interval. If the body has traversed 2 meters, it will have done it in 1 second, so the ratio 2/1 is 2.  If displacement is 1 meter, time required will be 0.5 seconds, the ratio being again 2.  And what if we keep reducing the interval considered until it is 0, what we call an “instant” without duration? Well, division by 0 is not permitted: it leads to an “undefined”, i.e. an expression that breaches the rules about how to build mathematical expressions… This shouldn’t come as a surprise: if you define velocity as a rate of change over time, how come that you now want to calculate it without allowing for the lapse of any time at all? However, here we can perfectly stipulate that this playful instantaneous rate is the same as in all other cases, since it is in fact constant: it is 2 as well.

In a more complicated form, the function is L = t2. In our example, this will happen when the body is subject to acceleration.  Let me remind you that the kinematic formula for obtaining the displacement of a body that starts with velocity “vo” and suffers acceleration “a” is L = vo + at2/2. If we stipulate that the body starts at rest (vo =0) and is subject to a = 2 m/s2, then we get the above mentioned function, L = t2. This can still be viewed as L = tt, where t is again the average velocity, but not the instantaneous one anymore, as velocity is continuously changing (there is acceleration). (It may surprise you that I refer to t as a velocity, but it is fine, because I mean its numerical value; if 3 seconds have elapsed and the body has displaced 3*3=9 meters, what I mean is that it has moved at an average of v m/s during t seconds, it just happens that the numerical value of v is the same as the number of seconds t having elapsed).

Because this instantaneous velocity is variable, it will be a function (the derivative or L’), where the argument can only be the one that we are in possession of (time), although arranged differently. For this sort of functions, the so-called “power functions” [where the argument is raised to progressively higher powers (t, t2, t3…)], the rule for finding the derivative is the so-called “power rule”, which is L’ = n * tn-1, i.e. (i) reduce the power by 1 and (ii) multiply by the power.

We can learn this rule by heart or try to understand the logic behind it. 

I asked about such logic here, but I am not very convinced with the answer, which extracts the proof from another differentiation rule, the product rule. My intention is however finding a logic that could precisely later illuminate such other rule.

Another approach is doing the normal operation that you carry out to find derivatives and then repeat it with a few power functions, until a pattern emerges.


In particular, with the function L = t2, you would reason as follows:


If you repeat this operation with L = t3, the result is L’ = 3t2 and thus the pattern represented by the power rule (L’ = n * tn-1) clearly shines up.

I am not very convinced, however, about this idea that the interval “tends to zero in the limit”, because this suggests that the value obtained for the derivative is an “approximation”. But it is not such thing, it is an exact value. It is true that in the real world you can only measure a velocity as a ratio between distance and time, but the abstract idea of a body having a state of motion at a given “instant” (null interval) is perfectly valid. Precisely the visual interpretation that I propose is a description of things where you can dispense with the concept of a ratio. In particular, the idea is viewing the function (L = length traversed in our original example) as the area (A) (or volume or hyper-volume) of a geometric figure, which is stretched as you pull in the direction of the input from one or several sides, which side/s is/are thus providing a sort of path or rail constraining the growth of the figure.

The simplest manifestation is what we can call rectangular growth, as shown in these pictures:




In both cases, the function is the same: it is A = 3 x (which is the equivalent of L = 3 t in the original example).  The characteristic of this function is that there is a neat separation of roles: one dimension (either the horizontal or the vertical one, the choice is arbitrary) plays the role of input or growing dimension that pulls from the other side and is logically variable; you take this out and what is left, the other dimension, is fixed and one can view it as the side that you pull from to stretch the figure, which is acting as a path or rail constraining the growth thereof.

Next step is what we can call triangular growth, like in these pictures:






Again the function is the same for both pictures: it is A = x2/2 (which would be L = t2/2 in the original). Here both dimensions are growing harmoniously at the same time. After taking out one dimension, what is left is average growth rate, which is x/2, a half-side.  In turn, the derivative will be the side that you pull from and is therefore determining how much the area grows, which is one side (x). The choice of which side to pull from is arbitrary: in the first picture I chose to pull from the vertical one; in the second one we pull from the horizontal side; but in any case we pull from only one side. The novelty now is that, yes, certainly, such path is not fixed, it is variable, but that is not a problem. We said that the derivative would be the side from which we pull and that constitutes the path for growth of the other dimension and there is no reason for changing our mind just because such path is variable: it is in fact continuously widening following the hypotenuse of the triangle and that is why we say that the derivative, instead of the coefficient 3 or whatever number, is a variable, xv as in the left picture or xh as in the right one (any of them is fine, since both grow harmoniously).

In turn, a square-like growth, represented by the function A = x2, looks as follows:


The average growth rate is one side (x), which is what is left after taking out one dimension. What is the derivative? It is two sides of the square, that is to say, 2x. The visual reason is that in order to obtain a square-like growth, as suggested by the expression x2, you must pull from two sides, because (unlike what happened in the triangle case) only this way can you achieve to cover the area that the expression x2 demands. We could expect this by realizing that the square area x2 is the sum of the areas of two triangles (x2/2 + x2/2), so the growth path should also be double (xh + xv = 2x). And, yes, certainly, these paths are both variable, but that is not a problem: it just happens that each side is acting as a rail for the growth of the other one and growing itself within the other’s channel. So, in this case, we have two weird aspects: unlike in the rectangular case, the path is variable and, unlike in the triangle case, there is a mixture of roles, as both sides are reciprocally and simultaneously pulling from and constraining each other. But that is not dramatic, we just address the situation mutatis mutandi: because the path is variable, the derivative is x; because there are two paths, it is 2x.

If we now face a cube-like growth, where the function is V = x3, we can infer that the average growth rate will be one side, a face of the cube (x2) and the derivative, based on the visual criterion, should be 3 sides (3 faces of x2 each = 3x2). Why? Because that is what you pull from to make the cube grow in its three dimensions and also grow along the full volume embraced by those sides, as demanded by the expression of the function.  The sides will be double than the dimensions, but you just need to pull from the dimensions (half of the sides), what we could call a “half-skin”.

From here onwards, with 4D or higher-dimensional figures (so called “hyper-cubes”), you apply the same algebraical rules:

  •          since you must pull from the sides, to find yourself at the level thereof, you go down one dimension (you reduce the power by one) and
  •          since you must pull from as many sides as dimensions has the figure, you multiply by the number of dimensions (which is the power itself).
Of course, as the case of the triangle exemplifies, all this is assuming that there is growth to the full extent of the figure (and not more); if you want a partial (or an extended) growth, the function will include the corresponding coefficient and so will the derivative.

Conclusion: the objective was finding a way to describe the derivative (instantaneous rate of change), at least for power functions, that is not relying on a ratio and I think that we have found it in this idea that the derivative is the side/s that you pull from in order to stretch a geometric figure.

Some other aspects to be discussed to complete the study:

  •          What if the input of the function is not a side of the figure but its radius?
  •          Can you see the function as a sum of individual growths in each dimension and the derivative as the sum of the velocities in each dimension?
  •          Link with the calculation of initial / final velocity in kinematic formulas.
  •          How this approach can serve also to explain product rule and other differentiation rules.
  •          The link of all this with Zeno paradoxes.