A few days ago while I was preparing dinner, I was peeling carrots and noticed how much of the original thickness of the carrot I was peeling away.
The peeled carrots were significantly more slender than the carrots I started out with.
Suddenly I had a flash of intuition from the “sick of mathematics” side of my brain: the more spherical a vegetable is, the less volume is proportionally removed when it’s peeled.
Let me explain what I mean.
We can imagine that peeling is equivalent to removing a thin layer of thickness dx from a vegetable with a surface S and volume V.
The volume peeled off can be approximated by the value dx * S so that the proportion of removed volume relative to the total volume is given by:
If you always use the same peeler, the dx value is fixed. Let’s say that normally dx may be equal to a couple of millimeters.
So, given the peeler, the removed volume is proportional to the S/V ratio.
It’s well known that the geometric figure with the lowest S/V ratio is the sphere.
This is connected to the circle’s analogous properties of being the planar geometric shape that maximizes the area given the perimeter.
Demonstrating this property is not as trivial as it may seem. The first to achieve results was the mathematician Jacob Steiner in 1838, and later mathematicians completed the demonstration.
The two main ideas behind the demonstration are the following:
1) If a planar figure is concave, then there is another figure with the same perimeter but with greater area
2) A planar figure that is not fully symmetrical can be deformed to create another flat shape with the same perimeter but with larger area
As a result of 1) and 2) the planar figure that maximizes the area given the perimeter must be convex and must have the greatest possible symmetry, and then it is the circumference (obviously this is just a sketch of the demonstration).
Formally the result is known as the isoperimetric inequality: each closed curve of length L and area A satisfies:
and the equality is true only for the circumference.
Then I would like to suggest the following result.
Peeler corollary to the isoperimetric inequality: given two vegetables with equal volume, the one whose shape is closer to a sphere is the one that minimizes the volume wasted peeling it.
Clearly this statement is somewhat vague because I haven’t defined what “closer to a sphere” means.
Things get complicated, however, if you have to buy a whole bag of potatoes of different sizes. In this case, in fact, to minimize the waste you should evaluate which bag has, given the same weight, the smallest total surface area (obtained from the sum of all the potatoes’ surfaces).
A bag with two large potatoes not at all spherical could have a smaller total area than a bag with many tiny perfectly spherical potatoes.
Reasoning in this way, in addition to reducing waste, will also minimize the time required for peeling all those potatoes (which is always proportional to the surface).
Take into account these considerations the next time you choose a sack of potatoes!
In May 1997, for the first time, a reigning world chess champion was beaten by a computer in a match under tournament conditions.
In 2016, another significant milestone was reached: a program defeated one of the best Go players in the world. This event had less media coverage than Kasparov’s defeat, but it has aroused a great deal of wonder among artificial intelligence experts.
Why did it take almost 20 years from the chess victory to victory in the game of Go? Will the machines overtake men in all activities, even the most complex? Can we now say that machines do think?
The leading characters in the 1997 match were Garry Kasparov, then the undisputed World Chess Championship, and Deep Blue, a supercomputer designed in its hardware and software components by IBM. The match consisted of six games and, as usually happens in chess tournaments, after each game the winner gains one point or, in case of a tie, half a point is given to each player.
The previous year a similar match was held in which, although Kasparov had lost a game, the man won with a score of 4-2.
The second time the computer prevailed, winning two games, drawing three and losing just one (the first) so the final score was 3.5-2.5 for Deep Blue.
How did Deep Blue choose his moves?
The computer based its analysis on an algorithm that took a board position as an input and returned as an output a value that quantified the advantage (or disadvantage) with respect to the opponent player.
This algorithm was created by IBM engineers with the help of professional chess players and took into account material advantages (as a result of piece captures) or positional advantages (as a consequence of placing a piece in a key square).
Given this algorithm, Deep Blue implemented a brute-force approach. The supercomputer calculated this function on all the possible following positions up to a certain depth and chose the move that guaranteed the best result.
Deep Blue evaluated 200 million positions per second. This enabled it to analyze a position up to a depth of 6/8 moves if the chessboard presented many pieces, or up to 20 or more moves in the presence of only a few pieces.
Since then, chess programs have become a bit smarter and instead of searching through all possible moves, they only analyze the most promising variants. That’s more similar to how humans decide what the next move is: they evaluate fewer positions, but based on experience they know how to choose the most significant sequences.
For example, in 2006 the program Deep Fritz won against the world champion Vladimir Kramnik running on a standard personal computer that allowed it to evaluate “only” eight million positions per second, a lot fewer than Deep Blue.
Now, no human is able to win a chess game against the smartest computer program.
Go is a board game with very simple rules. The players take turns placing pieces called “stones” on a square board with 19×19 intersections. If a player occupies all the intersections adjacent to an opponent’s group with his stones, the opponent’s group is captured and removed from the board.
The player who gains more points adding up the stones he captured and the number of intersections surrounded with his own stones wins the game.
Despite its simple rules (much simpler than chess), the game is extremely complex for these reasons:
As a consequence, the creation of a program capable of competing with the best Go players has been considered an ambitious challenge in the field of artificial intelligence.
AlphaGo was developed by Google Deep Mind and, unlike Deep Blue, has been programmed according to the machine learning approach. This approach consists in submitting examples to a program in such a way that it learns how to make decisions based on those examples without being given explicit instructions.
More specifically, AlphaGo consists of two neural networks and has been trained with the submission of many professional player games for a total of 60 million moves. One of the two neural networks is used to figure out what the most promising future moves are (policy network), while the other one assigns a value to a position to represent the probability of victory for either player (value network).
After the first phase of learning based on human player games, AlphaGo has been trained by playing against different versions of itself to further improve its way of playing.
The result of the match against Lee Sedol, one of the strongest go players in the world, has been quite clear-cut: AphaGo won 4 out of 5 games.
Recently a new match as been held, this time between AlphaGo and Ke Jie, the latter considered the word’s strongest Go player. AlphaGo also won this new match with a score of 3-0.
Given these stunning results, we should now begin to ask ourselves one question:
From the point of view of the results they have achieved there is no doubt that in a certain way, machines are thinking.
Compared to humans, they think differently, that’s for sure. But planes also fly differently from birds, and we still describe what they do as flying. Why shouldn’t we say that computers think?
Machines are not yet able to perform more artistic tasks like composing music or writing coherent texts, but that’s just a matter of reaching higher levels of complexity that sooner or later will be achieved. I’m pretty sure in some years we’ll be listening to the first symphony completely composed by a computer.
By now let’s make do with the first pop song written by a computer imitating the style of the Beatles, and then performed and mixed by humans.
Some argue that machines cannot create anything genuinely new, because whatever they do is an imitation of some human activity or something they have been taught to do.
I’d like to offer a couple of considerations to counterpose to this reasoning.
First, consider AlphaGo. The program began trying to imitate the moves of professional players, but then also improved playing against itself. Just as if it had “studied” to find stronger than human strategies.
In a sense, AlphaGo’s playing style is new and in fact, according to professional players, AlphaGo occasionally chooses moves considered rather original.
Second, even human creativity doesn’t come out of nowhere. In an artist’s style you can find influences of other artists, elements linked to the place where the artist grew up, events they took part in, things they heard or read during their life. Artists basically transform personal experiences into pictures, sounds, or words.
Nothing prevents us from imagining a process whereby a very complex neural network could start imitating the works of one or more artists and then form a personal style, maybe with random elements inserted into its evolution or with the influence of something similar to “personal experiences” that could be linked to images, text, or music it takes its cue from.
The situation is quite clear to me. Machines are going to execute every human task in the same way we do or better than we do. In the coming years we’ll become more and more aware of this trend.
And yes, I find no reason to say that machines don’t think.
What’s your opinion?
At this time I am particularly interested in human points of view, but if some machine wants to post a comment or join the newsletter, it is welcome!
In the previous post we discussed a Monte Carlo method to calculate an approximate value for the area of a circular sector. The trick was in counting how many randomly drawn points within the unit square fell within the circumference.
You can easily guess how this approach can be generalized to calculate areas with different shapes.
From calculus we know that an area can also be calculated using integrals:
But then when we use the Monte Carlo method to find areas, we are just calculating an approximation of an integral!
This approach is not very useful when calculating integrals in a single dimension. In this case, there are more efficient deterministic methods.
However, if you want to calculate integrals over many dimensions, deterministic methods become less efficient, while Monte Carlo methods become more useful.
What causes this strange phenomenon?
Given the degree of precision you want to achieve in the calculation, deterministic methods require a number of elementary steps that grow exponentially with the number of dimension. On the other hand, the error you expect from an estimation done with a Monte Carlo method depends only on the number of total extractions, not on the number of dimensions.
For example, to calculate a 100-dimensional integral, deterministic approaches would require such a huge number of calculations as to be practically impossible, while Monte Carlo methods are still applicable.
But wait! In what cases is it necessary to calculate integrals with so many dimensions?
It typically happens in statistical mechanics, a branch of theoretical physics that studies systems with many degrees of freedom, for example, a system composed of many particles enclosed in a box. To calculate the value taken by macroscopic quantities such as temperature or pressure, it is necessary to perform integrals on a space with a number of dimensions proportional to the number of particles.
You can understand that the number of dimensions of these statistical mechanics integrals can be very large. In cases like these, Monte Carlo methods are used more than deterministic methods.
Various algorithms are used to find the local minima of a function. Typically, these algorithms:
Applying these steps many times these algorithms finally reach a minimum (in this case, at step number 2 you can’t determine which direction to move in).
However, what if we have a function with many local minima and want to find the point that globally minimizes the function?
A local search process may stop at any of the many local minima of the function. How can an algorithm understand if it has found one of the many local minima or the global minimum?
Actually, there is no way to exactly establish this. The only option is to explore different areas of the search domain to increase the likelihood of finding the global minimum among the various local minima.
Many methods have been developed to realize this idea of “exploring” the domain. Very popular are those known as genetic algorithms and are inspired by species evolution.
These are Monte Carlo methods in which a starting population of points is created and subsequently evolved through an algorithm that pairs a couple of points to generate new ones (the coordinates of the two points are combined to obtain the coordinates of a new point). In this pairing process, random genetic mutations happen so that the pairing function gives a randomized result. During the simulation of different generations of points a process of natural selection intervenes that keeps only the best points (those that give lower values of the function to be minimized).
In the meantime, the algorithm also keeps track of which point represents the best “individual” ever.
Continuing with this process the points tend to move toward the local minima but exploring many areas of the optimization domain. The process at some point is stopped (usually by setting a limit to the number of generations) and the best individual is taken as the estimate for the global minimum (usually this is used as a starting point for a subsequent local optimization algorithm that refines the result).
The last kind of application concerns the generation of probability distributions that can’t be derived through analytical methods.
Example: estimate the probability distribution of damage caused by tornadoes in the United States over a period of one year.
In this type of analysis there are two sources of uncertainty: how many tornadoes will happen over one year and how much damage each tornado causes. Even if you are able to assign a probability distribution to these two logical levels, it’s not always possible to put them together to get an annual loss distribution with analytical methods.
It’s much simpler to do a Monte Carlo simulation like this:
Repeating these three steps many times you can generate a sample of annual losses you can use to estimate the probability distribution that you couldn’t derive analytically.
If you liked this post, consider sharing it or signing up for the newsletter.
Subscribe me to the newsletter!
At first glance, it seems the answer is no. We usually think that a mathematical algorithm is deterministic, so that if you run the calculation again with the same input you should get the same result.
Yet there is a large class of algorithms that use random numbers to calculate their results. For that kind of algorithm, every time you repeat the calculation you get a different result. But what’s the usefulness of these algorithms?
The point is that in the field of applied mathematics the goal is to solve concrete problems. In these cases, if you can’t find the theoretical solution, finding a good enough solution may be acceptable. For this reason, if the next time you run the algorithm you obtain a different but still good enough result, that’s OK (what good enough means exactly depends on the particular application).
This kind of algorithms are called Monte Carlo methods or Monte Carlo simulations and they may be loosely defined as all those algorithms that make use of random number generators.
Let’s see a classic example just to understand the usefulness and the main features of these methods: the calculation of an approximation of through a Monte Carlo simulation.
Suppose you know how to generate random numbers uniformly distributed in the interval (all high-level programming languages can do that).
If we draw a couple of values both uniformly distributed in we can interpret as a random point drown inside the unit square with vertices , , , .
Generating such points we can check which of them are inside the circle of unit radius centered in the origin: it is sufficient to check whether the distance from the origin is less than 1.
Note: points are uniformly distributed inside the square so the probability of a random point falling inside the circle is equal to the ratio between the area of the circular sector and the area of the square.
The area of the square is 1 while the circular sector area is (one-fourth of the area of a unitary circle) so that the probability of a random point falling inside the circle is given by:
As a consequence, the ratio between the number of points falling inside the circle over the total points will tend toward this value:
But then we could:
obtaining in this way an approximation of .
Here you have an animated gif that clarifies what’s going on.
You can easily make this simulation also in Excel, here you have an example file: MonteCarloPi.xlsx.
I have to underline that the purpose of this method is only educational, because much more efficient deterministic algorithms exist that can be used to calculate the value of with a given precision.
Nevertheless, this basic example is very useful to understand the logic behind Monte Carlo methods, and the final result is random but gradually converges to the theoretical result increasing the number of simulations.
It’s possible to demonstrate that the estimation error in a Monte Carlo simulation goes as where is the number of simulations. To halve the error you have to quadruple the number of simulations.
The Buffon’s needle experiment designed in the 1700s could be considered the oldest Monte Carlo simulation.
Suppose you have a sheet on which parallel equidistant lines are drawn. You also have many sticks and you randomly throw them on this sheet. Each stick can fall between two straight lines or intersect one or more (if long enough) parallel lines.
If the stick is smaller than the distance between the lines you can demonstrate that the probability of a stick intersecting a line is given by:
where is the stick length and is the distance between the parallel lines. Thanks to the fact that the formula contains , this experiment can be used to calculate approximations of in a way that is similar to the preceding one.
It’s just a matter of throwing the sticks many times to estimate the probability . The values for and are known so that you can invert the formula and obtain an estimate for .
Experiments similar to this one of Buffon’s needle had no real application, because to obtain results with an acceptable degree of approximation, they required an unreasonably large number of repetitions.
However, around the middle of the last century, the advent of computers drastically changed the scenario. The new technology made it possible to run simulations with sufficient speed to solve practical problems that could not be solved in other ways.
The pioneers of the Monte Carlo methods were Stanislaw Ulam and John Von Neumann. In 1946 they both worked on the Manhattan Project aimed at building the atomic bomb, and they used Monte Carlo methods to carry out calculations related to neutron absorption that they couldn’t solve with more conventional approaches.
Ulam had the idea while he was recovering from an illness. While he was playing solitaire, he asked himself what the chance was of successfully finishing it. The rules of solitaire rules made it difficult to calculate this probability.
He realized it was possible to simulate many different games with a computer using random deck arrangements and check how many times the game of solitaire could be finished. This way you could empirically calculate the probability of winning that game of solitaire.
Because the results were part of the secret plans for building the atomic bomb, it was necessary to assign a code name to the project. Because chance played a fundamental role in the estimation method, the codename Monte Carlo was chosen, and this is why even today we use this terminology.
Since then, these methods have been used in many fields: weather forecasting, elementary particle physics, astrophysics, molecular chemistry, electronics, fluid dynamics, biology, computer graphics, artificial intelligence, finance, project evaluation… and many others!
Despite their many applications, all Monte Carlo methods fall roughly into three families: numerical integration, optimization, generation of probability distributions.
In the next post we will see an example for each of these families of Monte Carlo methods.
Stay tuned!
Subsequent investigations revealed that it would be too simplistic to attribute the failure of the mission only to this problem. The biggest problem was not the unit mismatch itself, but the failure to detect and correct this mistake. This failure was caused by some imprudent choices in the way the mission was managed.
The Mars Climate Orbiter was launched on 11 December 1998 from Cape Canaveral, Florida. Along with the Mars Polar Lander, the probe was part of a project to study Martian meteorology and climate.
In particular, the Mars Climate Orbiter was designed to monitor the evolution of daily weather conditions, to study the distribution of water both on the ground, and in the atmosphere, and to measure the temperature of the atmosphere.
On 23 September 1999 the probe began its final maneuvers to enter Mars orbit.
The probe had to pass behind the planet, meaning that a temporary loss of radio signal was expected. However, the radio contact was lost 49 seconds earlier than expected and was never restored.
The subsequent investigation clarified that the spacecraft was much closer to the planet than planned, so close as to be destroyed by friction with the Martian atmosphere.
Why was the probe so close to Mars?
One part of the trajectory control software that had been developed by Lockheed Martin produced a numerical output in English units. These results were then sent to another part of the software developed by NASA that interpreted them as if they were expressed in International System Units.
More precisely, the first software passed an impulse expressed in pound * second while the second part expected it to be given in newton * second.
The result was that instead of being 226 km from the planet, the probe was only 57 km from the Martian surface.
In addition to the mismatch on the unit measure there were many other secondary factors that led to the disaster.
Just a few months earlier, in April, a bug was fixed in the trajectory management software. At that time the need to use the new code in the mission was urgent. This meant there wasn’t enough time to thoroughly test the changes.
Some members of the navigation team noticed signals indicating that the trajectory could be wrong. Although they discussed the discrepancy in meetings, they failed to report it following the available formal process.
The navigation team was following three different missions at the same time and due to budget cuts, the team was not adequately trained.
On the other hand, project managers required engineers to prove something was going wrong while, given the uncertainties in the trajectory, it was not even possible to prove that all was going right.
Due to uncertainties concerning the probe’s position, the team even considered the possibility of a trajectory correction. It seems that project managers decided to forgo the correction trusting in the more optimistic estimates. Nevertheless, it was not really clear who should decide to perform the correction.
This case should be studied by project managers, who need to understand how things can go terribly wrong when they’re dealing with big, challenging projects.
In particular there are 6 important lessons to learn:
For a deeper analysis of the Mars Climate Orbiter mission failure, take a look at this really well-written account by James Oberg for the IEEE Spectrum magazine.
If you liked this post, consider sharing it or signing up for the newsletter!
Subscribe me to the newsletter!
Let’s see what they are and how they’re used in applications.
A wavelet is a real function which represents a wavelike oscillation localized in a limited range of its domain.
Here are some examples:
Given a mother wavelet we can define a set of child wavelets through the parameters
Parameter scales the function while shifts it. In applications it is common to take into account a discrete set of pairs so you can index child functions with discrete parameters with .
The general idea behind wavelets is that a function can be represented as a linear combination of child wavelets:
The function could be, for example, the sound of a musical instrument or the signal of a seismograph or electrocardiogram.
At first, the signal is registered sampling at a certain frequency. In practice for each sampling interval the value of the function is recorded. If the sampling frequency is high, a signal stored in such a way can occupy a lot of memory.
Through wavelets it is possible to store the signal using only the values of the main coefficients of the wavelet expansion.
The truncation of the wavelet series results in a little loss of precision in representing the function, but it also results in a huge saving in the amount of information to be stored, also called compression.
In the JPEG-2000 and MPEG-4 standards, images and videos are represented through a wavelet expansion. In addition to data compression, the main advantage of using wavelets in this field is to manage different resolutions of the image with a single file.
Once an image is saved as a wavelet expansion, if you want to create a low-resolution preview of the same image, it is sufficient to use fewer elements of the summation.
Different image resolutions are obtained by simply truncating the wavelet series at different depths.
Experienced readers will have noticed the similarity between the wavelet decomposition and discrete Fourier transforms.
The Fourier transforms have many properties that make them interesting from a theoretical point of view. However, wavelets have some significant advantages in applied mathematics.
1) Personalization: Fourier transforms always make use of sine and cosine functions. On the other hand, depending on the particular application, you can choose the wavelets that better adapt to deal with that problem.
2) Localization: signals that are analyzed in applications often consist of several blocks of information separated by intervals of near-zero signal (for example, in the case of the electrocardiogram). As a consequence, it’s more natural to decompose this kind of signal through wavelets that represent localized waves.
3) More control over Gibbs phenomenon: Fourier transforms present some problems in describing discontinuous signals. I’m referring to the so-called Gibbs phenomenon.
The classic example is that of a square wave that alternately takes the values 0 and 1. The discrete Fourier expansion of this signal presents a peak near the discontinuity with a value of about 1.09.
The left image shows the approximation using 25 harmonics while the right image using 125 harmonics. The height of the peak remains stable, even increasing the terms of the Fourier series!
This is somewhat counterintuitive, because you would expect the series to converge to the function and so to the value 1.
Also, the wavelet expansion exhibits this kind of phenomenon, but to a lesser extent compared to the discrete Fourier transform.
Another difference between wavelets and Fourier transform is geometric. The sine and cosine functions used in Fourier series form a basis of the space of functions .
This means they are linearly independent vectors that span the whole space of functions.
Often, wavelets used in applications are frames rather than bases. A frame is a set of vectors that span the vector space but that are not linearly independent.
As a consequence we have that the decomposition of a vector in terms of wavelets is not unique. This feature, which might seem to be a problem, represents instead a further computational advantage, contributing to an improved numerical stability of wavelets with respect to Fourier transform.
In 1915 Einstein published the field equations of general relativity. These formulas create a link between the presence of matter and energy and the curvature of space-time. Gravitation was then explained as a consequence of the space-time curvature caused by matter and energy.
In 1917, Einstein applied these equations in a physical model for the entire universe and realized that it was not possible to have a static universe within that model. The universe should expand or contract, but it couldn’t stay still.
At that time the idea the universe could evolve was considered so bizarre that Einstein introduced a new term into the field equations called a cosmological constant, just to make the existence of a static universe a feasible solution.
In 1929 Edwin Hubble made one of the more sensational discoveries of the century. He found that the galaxies beyond those in our local group were moving away from us, and that they were receding at a speed that was proportional to their distance. This meant that our universe is expanding.
The cosmological constant was introduced just to fit a static universe in the theory. With the discovery of the universe’s expansion the constant no longer seemed to be a necessary hypothesis.
As a consequence, from the early 30’s almost all research in the field of cosmology hypothesized that the cosmological constant was equal to zero.
Einstein realized that starting from the equations, he could have hypothesized about the universe’s expansion before it was experimentally discovered and called the introduction of the cosmological constant his biggest blunder.
During the 90’s, many cosmological observations began suggesting that the expansion of the universe is accelerating. In particular, in 1998 two groups of cosmologists, the Supernova Cosmology Project and the High-Z Supernova Search Team, independently came to this conclusion observing the redshift of supernovae.
The discovery was a huge breakthrough because most cosmologists expected to find that the expansion was decelerating.
That was one of those fascinating moments in physics history when everybody expects A, and B just happens, making it clear that something deep in our theory needs to be better understood.
This acceleration was not compatible with zero cosmological constant models. After more than 60 years, scientists began again to consider the presence of this term in the equations of general relativity.
The reasons for the constant’s comeback were completely different from the ones that led Einstein to introduce it, but finally the constant regained its position in the equations.
Is everything clear now? Not at all. The interpretation of the cosmological constant is still one of the biggest mysteries in physics.
In the field equations of general relativity, you can identify two parts, the physical terms which describe the distribution of matter and energy and the geometrical terms related to the curvature of space-time.
It’s not clear whether the cosmological constant should be considered an element of the geometrical part or as a term of the energy/matter part generated by some physical process not yet identified (or even whether it should be the result of the sum of both of these components).
By now there are several hypotheses but no certainty. So young physicists, come forward! This is a problem still waiting for someone to explain it!
For more mathematical details on the cosmological constant, take a look at this nice post by Peter Coles: One Hundred Years of the Cosmological Constant.
What happened gives us an interesting view of renaissance mathematics and shows us what it meant to be a mathematician in the 15th and 16th centuries.
In those days, mathematicians were used to publicly challenging each other in mathematical duels. Each one would submit to the other a fixed number of problems. The one who could solve more problems won the challenge, gaining exposure, and with it, the chance of being hired by a patron.
For that reason it was common for mathematicians to keep their discoveries secret with the purpose of using them to create problems their rival wouldn’t be able to solve.
The first to shed light on the problem of finding the solution of the cubic equation
was the Italian mathematician Scipione del Ferro, born in 1465 in Bologna where his father worked in the paper industry. Perhaps his father’s work gave del Ferro access to books that would otherwise not have been easy to find.
Around 1505, he found the solution to the so-called depressed cubic, the equation of the form
that is a cubic in which the quadratic term doesn’t appear. He probably used his discovery to win mathematical challenges and gain money and reputation, but he never published his result. Just before dying in 1526 he revealed the secret to his student, Antonio Del Fiore, who began to publicly congratulate himself for being able to solve that kind of problem.
In 1530 another mathematician, Niccolò Fontana, found the solution to another kind of cubic equation, the one without the linear term.
When Niccolò was 12, a French soldier struck him with a sword during the invasion of the city of Brescia by King Luis XII’s troops. Niccolò appeared dead and the soldier didn’t insist on making sure. In fact, he was still alive, but the wounds he received caused him speaking difficulties for the rest of his life. For that reason he was nicknamed tartaglia, which means “stammerer.”
Del Fiore thought Fontana was bluffing about his discovery and challenged him. Fontana instead guessed his rival could really solve depressed cubic equations and with a huge effort succeeded in independently discovering the same solution just before the challenge.
Fontana solved two different kinds of cubic equations and won the match hands down. He was able to solve all 30 problems Del Fiore submitted to him, while Del Fiore could not solve any.
It was 1535 and, as you can guess, Fontana didn’t publish his discoveries.
In 1539 the mathematician Gerolamo Cardano invited Fontana to Milan and convinced him to reveal the solution to the depressed cubic. Fontana, maybe hoping to gain a position as a professor in Milan, unveiled his discovery to Cardano making him take an oath not to divulge it.
After some time Cardano found a way to reduce every cubic equation into a depressed cubic, solving the problem of finding the general solution. Nevertheless his discovery was connected to Fontana’s result and he couldn’t publish anything due to the promise he’d made.
Rumors that Del Ferro had previously found the solution to the depressed cubic reached Cardano who succeed in finding a manuscript in which Del Ferro had written his results. Due to this discovery, Cardano felt that he was no longer obliged to keep the secret. In 1545 he published the results giving credit to Fontana and Del Ferro for the discovery of the solution to the depressed cubic.
Fontana was upset about this and began a dispute with Cardano that resulted in a challenge between Fontana and Cardano’s student, Ludovico Ferrari.
The challenge was held in Milan and rules were fixed as to favor the home competitor, Ferrari. Part of the challenge was verbal so that Fontana was at a disadvantage due to his speech difficulties. Ferrari prevailed, and as a consequence, Fontana lost his academic position in Brescia and had financial problems for the rest of his life.
The work by Cardano and Ferrari on cubic and quartic equations was a huge breakthrough and laid the foundations for subsequent discoveries, in particular concerning the nature of imaginary numbers. As a matter of fact, imaginary numbers sometimes come up in the solution formula for cubic equations, even when the solution is real.
But that, as we usually say, is another story.
The set-up is quite simple:
We are acquainted with the way objects move inside a car when we’re driving. When we accelerate, objects tend to go toward the back of the car. When we turn right, objects tend to go left, and so on.
Clearly this is due to the inertia of the things inside the car. For example when the car turns they go straight until some force (usually the constraint force imposed by a seat or a window) cause them to turn the same way the car is turning.
If you try the experiment you’ll see that the balloon behaves in a way that’s unexpected: it moves in the opposite direction with respect to the other things. If you accelerate it goes towards the front of the car. If you turn left, it goes left. What the hell is going on?
If you’re too lazy to try this yourself (I haven’t tried either!) watch the following video from the Smarter Every Day youtube channel that shows the outcome of the experiment and explains what is going on through an effective analogy.
Then I will give a deeper explanation of the experiment, bringing (hang on to your seat) general relativity into it!
The analogy of the bubble in the bottle is correct, but in the case of the balloon it’s not so easy to visualize what’s happening.
Let’s first review some concepts about the buoyant force that acts on a balloon. The force always points in the opposite direction with respect to the gravitational field. Why is that?
Due to the presence of the gravitational field, the air pressure is not constant and grows following the direction of the gravitational field. Then the balloon experiences different pressure values around its surface. The higher pressure on one side wins against the lower pressure on the other side so that the balloon is pushed toward the lower pressure, that is, opposite to the gravitational field.
Now let’s take into account one of the key concepts of general relativity: the equivalence between the effects of a gravitational field and fictitious forces in an accelerating reference frame (known as the equivalence principle).
If a reference frame has a constant acceleration the fictitious forces experienced in it are identical to those observed in the presence of a constant gravitational field.
So now things are getting clearer… when the car is accelerating, everything inside the cabin reacts, as in addition to the earth’s gravitational field there were another gravitational field directed toward the back of the car. The two fields sum up to give an effective gravitational field that points down and to the back of the car.
As a consequence, the air inside the cabin arranges with the lower pressure side pointing up and to the front. And that’s why the balloon floats in that direction.
Again the balloon is floating in the opposite direction of the effective gravitational field!
Thanks to Einstein for giving us such a nice framework for analyzing so many phenomena, from black holes to what happens in our car.
This is the case of the “sum to product” trigonometric formula
This formula has a direct application in explaining an acoustic phenomenon called beats. Let’s see what this is all about.
When you play two notes with slightly different pitch, the resulting sound seems to appear and disappear as someone raises and lowers the volume with a certain frequency. This phenomenon is known as a beat. Trigonometry will help us explain it.
The two notes played can be represented by the trigonometric functions
that oscillate in time with the two frequencies , (to be precise those are the angular frequencies but in this post I will call them simply frequencies).
Acoustic phenomena are (to a good approximation) linear, and the sound of two notes played together is equal to the sum of the two single notes
With the aid of the sum to product formula, we can write the “sound” function as
where in the last term we have defined
If the two frequencies are very close to each other we have that is also similar (the mean between them) and is a very small value compared to (the difference between two similar values).
With being much smaller than , we can interpret it as a periodic change in the amplitude applied to the note
so we can see that acts as a periodic volume change of the sound . The smaller the difference is between the two original frequencies, the slower is the beat frequency .
In the following image you can see an example of the two functions , (top graph) and the resulting function (bottom graph).
And the following is the corresponding sounds. At the beginning you can hear the two notes separately and then the effect created by the two notes played together.
Beats are used to tune instruments. Let’s see how this works.
Assume you have to tune a guitar string using a diapason as a reference. If the guitar string is tuned somewhere near the note of the diapason, playing them together will create a beat.
Usually it’s difficult to understand if you have to raise or lower the string tension to perfectly tune the string. It’s easier to proceed by trial and error.
Let’s say you try raising the string tension (and so the note frequency), you play the string again and the diapason, and hear that the beat frequency has increased. That means raising the tension is going in the wrong direction. You slowly lower the string tension until the beat frequency is so small that it’s no longer noticeable. Congratulations! Now your guitar string reproduces (for every practical purpose) the same note as the diapason.
And the other 5 strings? The process is the same but instead of the diapason you can take other notes played on the string you already tuned as reference.
Of course, there are apps you can use to tune instruments, but if you are playing your guitar on a beach surrounded by a group of appreciative listeners, you’d better know how to tune your guitar the old-fashioned way or you’ll instantly lose their trust of your playing skills!
Ciao,
Enrico