In September 1999, the Mars Climate Orbiter probe crashed into Mars due to a mismatch between the measurement units used in different parts of the trajectory control software.
Subsequent investigations revealed that it would be too simplistic to attribute the failure of the mission only to this problem. The biggest problem was not the unit mismatch itself, but the failure to detect and correct this mistake. This failure was caused by some imprudent choices in the way the mission was managed.
The Mars Climate Orbiter was launched on 11 December 1998 from Cape Canaveral, Florida. Along with the Mars Polar Lander, the probe was part of a project to study Martian meteorology and climate.
In particular, the Mars Climate Orbiter was designed to monitor the evolution of daily weather conditions, to study the distribution of water both on the ground, and in the atmosphere, and to measure the temperature of the atmosphere.
On 23 September 1999 the probe began its final maneuvers to enter Mars orbit.
The probe had to pass behind the planet, meaning that a temporary loss of radio signal was expected. However, the radio contact was lost 49 seconds earlier than expected and was never restored.
Causes of Failure
The subsequent investigation clarified that the spacecraft was much closer to the planet than planned, so close as to be destroyed by friction with the Martian atmosphere.
Why was the probe so close to Mars?
One part of the trajectory control software that had been developed by Lockheed Martin produced a numerical output in English units. These results were then sent to another part of the software developed by NASA that interpreted them as if they were expressed in International System Units.
More precisely, the first software passed an impulse expressed in pound * second while the second part expected it to be given in newton * second.
The result was that instead of being 226 km from the planet, the probe was only 57 km from the Martian surface.
In addition to the mismatch on the unit measure there were many other secondary factors that led to the disaster.
Just a few months earlier, in April, a bug was fixed in the trajectory management software. At that time the need to use the new code in the mission was urgent. This meant there wasn’t enough time to thoroughly test the changes.
Some members of the navigation team noticed signals indicating that the trajectory could be wrong. Although they discussed the discrepancy in meetings, they failed to report it following the available formal process.
The navigation team was following three different missions at the same time and due to budget cuts, the team was not adequately trained.
On the other hand, project managers required engineers to prove something was going wrong while, given the uncertainties in the trajectory, it was not even possible to prove that all was going right.
Due to uncertainties concerning the probe’s position, the team even considered the possibility of a trajectory correction. It seems that project managers decided to forgo the correction trusting in the more optimistic estimates. Nevertheless, it was not really clear who should decide to perform the correction.
The 6 Lessons
This case should be studied by project managers, who need to understand how things can go terribly wrong when they’re dealing with big, challenging projects.
In particular there are 6 important lessons to learn:
- Tests are fundamental. In a big project it’s not sufficient to thoroughly test each individual part of a machine or instrument. It’s crucial to also test the way each component interacts with the others. You can’t skimp on tests. A failure to detect an issue could compromise the whole project.
- The more complex the project is, the more you should worry about communication between the teams. In big projects, it is difficult to have a global vision of what’s going on. To avoid inconsistencies, it’s really essential to share information between the involved teams.
- Upstream opinions should be evaluated carefully. It’s not nice to hear that something could go very wrong, but project managers should have enough technical knowledge to recognize when the dissent is just a matter of hairsplitting or if it is grounded on rational reasons.
- Evaluate both positive and negative signals with the same level of critical thinking. Try to objectively collect all the information about an issue, independently of its positive or negative impact, and expect the same accuracy from people who uphold optimistic or pessimistic points of view.
- In uncertain circumstances take all the measures available to reduce risks. The uncertainties regarding the probe’s position called for the prudent move of making a final trajectory correction. That choice wouldn’t have threatened the mission in any case, so it was really due.
- Decision-making responsibility should be clear at every step of the project. The risk in dealing with a big project is that responsibility is somehow spread out among different teams so that everyone trusts that someone else is going to make a particular decision.
For a deeper analysis of the Mars Climate Orbiter mission failure, take a look at this really well-written account by James Oberg for the IEEE Spectrum magazine.
If you liked this post, consider sharing it or signing up for the newsletter!