(The above image is from an article by Felix Salomon - 23/2/2009).
When a continuous domain is transferred onto another continuous domain, the process is called transformation.
When a discrete domain is transferred onto another discrete domain, the process is called mapping.
But when a discrete domain is transferred onto a
continuous domain, what is the process called? Not clear, but in such a
process information is destroyed. Regression is an example. Discrete
(often expensive to get) data is used to build a function that fits the
data, after which the data is gently removed and life continues on the
smooth and differentiable function (or surface) to the delight of
mathematicians. Typically, democratic-flavoured approaches such as
Least Squares are adopted to perpetrate the crime.
The reason we call Least Squares (and other related
methods) "democratic" (in democracy everyone gets one vote, even
assassins who get re-inserted into society, just as respectful
hard-working and law-observing citizens) is that every point contributes
to the construction of the mentioned best-fit function in equal
measure. In other words, data points sitting in a cluster are treated
equally with dispersed points. All that matters is the vertical distance
from the sought best-fit function.
Finally, we have the icing on the cake: correlation.
Look at the figure below, depicting two sets of points lying along a
straight line.
The regression model is the same in each case. The
correlations too! But how can that be? These two cases correspond to two
totally different situations. The physics needed to distribute points
evenly is not the same which makes them cluster into two groups. And yet
in both cases stats yields a 100% correlation coefficient without
distinguishing between two evidently different situations. What's more,
in the void between the two clusters one cannot use the regression model
just like that. Assuming continuity a-priori can come at a heavy
price.
Clearly this is a very simple example. The point,
however, is that not many individuals out there are curious enough to
look a bit deeper into data (yes, even visually!) and ask basic
questions when using statistics or other methods.
By the way, "regression" is defined (Merriam Webster
Dictionary) as "trend or shift to a lower or less perfect state".
Indeed, when you kill information - replacing the original data with a
best-fit line - this is all you can expect.
No comments:
Post a Comment