A Complexity Profile is 
probably the most important result of a QCM analysis. Its interpretation
 therefore is of paramount importance.
Before this is done, it is important to consolidate a few basic concepts.
There are two types of variables in a system:
- Inputs
 - Outputs
 
These can be classified in two other categories:
- Controllable
 - Uncontrollable
 
There are different situations that one can be confronted with:
- Variables are only inputs (e.g. accelerator pedal angle)
 - Variables are only outputs (e.g. stock values, survey results)
 - Both inputs and outputs are present
 
What is Complexity? 
Complexity is a measure of how much information a system “contains” and 
how much this information is structured. One could simply sum up the 
Shannon entropies of each variable and conclude that this is the total 
amount of information in a system. However, because variables can be 
correlated, they give rise to structure. Structure means the system can 
“do more” and, potentially, perform new functions. Structure is present 
everywhere in Nature.  More structured information means more 
correlations within the system.  Critical complexity measures how much 
information can a system contain before it starts to lose structure 
(i.e. before this information becomes meaningless). Complexity is 
measured in bits, since information is measured in bits.
The importance of 
structure is paramount. An analogy: the mass of an atom’s nucleus is 
less than the sum of the masses of its components. This is because the 
energy going into the various bindings has an equivalent in terms of 
mass (m=E/c^2). It is this amount that is “lost” when measuring the mass
 of the nucleus as a whole. The same is with complexity. It measures the
 information within a system not only based on the sum of the Shannon 
entropies of each variable, it also takes into account the “bindings” 
between the variables. This means that structure also carries 
information, not just each variable.
Complexity is like 
energy. More energy one has, more can be turned into work in order to 
accomplish something. More complexity means more information and more 
information also means that more can be accomplished.
What does the Complexity
 Map show? It shows which groups of variables vary together. It does NOT
 indicate if A is causing a variation in B or vice-versa, it simply 
shows how variables are grouped when they change. In other words, “when 
variable A varies, B also varies” – this is all that can be said, unless
 one knows specifically that a certain variable is independent and is 
controllable and its variations are intended.
A Complexity Profile (or Complexity Spectrum) shows how much information is “removed” from a system (a multi-dimensional data array) if a particular variable is removed. The measurement is provided in percentage terms. The contributions to a Complexity Profile are ranked in descending order. When a variable is at the top of the CP it does not necessarily mean that it is the most important one or that it dominates/controls the system in question. This is ONLY true if the variable is an input.
When the first variable 
in a CP profile is removed, all one can say for sure is that the data 
set without that variable will experience the largest possible loss of 
information. The fact that a variable lies at the top of the CP does not
 automatically mean that it drives the business. Why is that the case? 
The first important step in a complexity analysis of any system is the 
synthesis of a meaningful data set. If you put in garbage, the results 
will be in proportion to the amount of garbage with respect to 
meaningful data. It is up to the user to collect meaningful data that 
embraces correctly a given problem and not indiscriminately. Therefore, 
if you are completely sure that your data is correct and meaningful 
(i.e. is of high quality), then indeed 
the CP provides a ranking of the variables in terms of how much information each variable contributes to the whole picture.
But what that does physically mean? It means that the variable in question varies a lot AND it does so in unison (i.e. with structure) with numerous other variables.
the CP provides a ranking of the variables in terms of how much information each variable contributes to the whole picture.
But what that does physically mean? It means that the variable in question varies a lot AND it does so in unison (i.e. with structure) with numerous other variables.
The CP, therefore, is an
 objective way of ranking (weighing) variables because it ranks them 
based on how much information they carry not based on a subjective 
perception of importance.
And what is meant by high quality data?
- A sufficient number of samples (generally less than 10 is not a good way to start)
 - No outliers (a few very remote points can skew the results very much)
 - The data array is well populated (i.e. has high density, or a relatively small fraction of null entries.
 
Therefore, if a variable
 lies in the upper part of the CP and it is a controllable input to your
 system then indeed it is an important business driver.
What about outputs? What
 if you have, say N stocks, and therefore N observable outputs from a 
system (stock exchange). How is the CP to be interpreted then? The above
 comment in red still holds. But can anything else be said in such a 
case? Probably yes.
A common question people
 formulate (even though we think this is not a good question to ask) is 
that of causality. If A and B vary together, is it A that causes the 
variation in B or vice-versa? This question is very difficult to answer 
(unless one has “insider” information). It is one of those questions 
that have no answer and that are useless to ask (is pizza better than 
spaghetti?). However, the Complexity Profile can help.
Let us see an example, the DJIA Index. The Complexity Map is illustrated below.
The corresponding Complexity profile is:
  
The corresponding Complexity profile is:
This is a case in which it is impossible, for example, to say if it is 
the price of Home Depot stocks that drives the price of Citigroup stocks
 or vice-versa. What does it mean “to drive”?  The relationship in 
question is shown below:
 
What really drives both stocks is the market but 
that cannot be measured easily. So, what we can do is to assume that if 
two variable co-vary (vary together) the one with a higher CP 
contribution “drives” the other. In this case we could say that 
Citigroup “dominates” Home Depot. It is very difficult to disprove such a
 statement (unless one has privileged information or if the data has 
been manipulated).
In the case in question we could say that Citigroup
 dominates the DJIA Index even though market capitalization or stock 
value could hint something different. In summary, we could conclude that
 a Complexity Profile may help solve the eternal issue of causality 
(which seems to trouble humanity so much).
No comments:
Post a Comment