A Complexity Profile is
probably the most important result of a complexity analysis and it may
be helpful when it comes to shedding some light on the issue of
causality. Its interpretation, therefore, is of paramount importance.
Before this is done, it is important to consolidate a few basic
concepts. There are two types of variables in a system:
These can be classified in two other categories:
- Controllable
- Uncontrollable
There are different situations that one can be confronted with:
- Variables are only inputs (e.g. accelerator pedal angle)
- Variables are only outputs (e.g. stock values, survey results)
- Both inputs and outputs are present
But first of all, what
is complexity? Complexity is a measure of how much information a system
“contains” and how much this information is structured. One could simply
sum up the Shannon entropies of each variable and conclude that this is
the total amount of information in a system. However, because variables
can be correlated, they give rise to structure. Structure means the
system can “do more” and, potentially, perform new functions. Structure
is present everywhere in Nature. More structured information means more
correlations within the system. Critical complexity measures how much
information can a system contain before it starts to lose this structure
(i.e. before this information becomes meaningless). Since information
is measured in bits complexity is measured in bits.
The importance of
structure is paramount. An analogy: the mass of an atom’s nucleus is
less than the sum of the masses of its components. This is because the
energy going into the various bindings has an equivalent in terms of
mass (m=E/c^2). It is this amount that is “lost” when measuring the mass
of the nucleus as a whole. The same is with complexity. It measures the
information within a system not only based on the sum of the Shannon
entropies of each variable, it also takes into account the “bindings”
between the variables. This means that structure also carries
information, not just each variable. This structure is reflected in the
so-called Complexity Map.
Complexity is like
energy. More energy one has, more can be turned into work in order to
accomplish something. More complexity means more information and more
information also means that more can be done.
What does the Complexity
Map show? It shows which groups of variables vary together. It does NOT
indicate if A is causing a variation in B or vice-versa, it simply
shows how variables are grouped when they change. In other words, “when
variable A varies, B also varies” – this is all that can be said, unless
one knows specifically that a certain variable is independent and is
controllable and its variations are intended.
A Complexity Profile (or
Complexity Spectrum) shows how much information is “lost” from a system
(a multi-dimensional data array) if a particular variable is removed.
The measurement is provided in percentage terms. The contributions to a
Complexity Profile are ranked in descending order. When a variable is at
the top of the CP it does not necessarily mean that it is the most
important one or that it dominates/controls the system in question. This
is ONLY true if the variable is an input.
When the first variable
in a CP profile is removed, all one can say for sure is that the data
set without that variable will experience the largest possible loss of
information. The fact that a variable lies at the top of the CP does not
automatically mean that it drives the business. Why is that the case?
The first important step in a complexity analysis of any system is the
synthesis of a meaningful data set. If you put in garbage, the results
will be in proportion to the amount of garbage with respect to
meaningful data. It is up to the user to collect meaningful data that
embraces correctly a given problem and not indiscriminately. Therefore,
if you are completely sure that your data is correct and meaningful
(i.e. is of high quality), then indeed the CP provides a correct ranking
of the variables in terms of how much information each variable contributes to the whole picture.
But what does that physically mean? It means that the variable in
question varies a lot AND it does so in unison (i.e. with structure)
with numerous other variables.This means it is important, it is a
driver.
The CP, therefore, is an
objective way of ranking (weighing) variables as it ranks them based on
how much information they carry. Therefore, if a variable lies in the
upper part of the CP and it is a controllable input to your system then
indeed it is an important business driver. And what about outputs? What
if you have, say N stocks, and therefore N observable outputs from a
system (stock exchange). How is the CP to be interpreted then? The above
comment in red still holds. But can anything else be said in such a
case? Probably yes.
A common question people
formulate (even though we think this is not a good question to ask) is
that of causality. If A and B vary together, is it A that causes the
variation in B or vice-versa? This question is very difficult to answer
(unless one has “insider” information). It is one of those questions
that have no answer and that are useless to ask (is pizza better than
spaghetti?). However, the Complexity Profile can help.
Let us see an example, the DJIA Index. The Complexity Map is illustrated below (click image to navigate map).
The corresponding CP is this:
This is a case in which it is impossible, for example, to say if it is
the price of Home Depot stocks that drives the price of Citigroup stocks
or vice-versa. What does it mean “to drive”? The relationship in
question is shown below:
What really drives both stocks is
the market but that cannot be measured easily. So, what we can do is to
assume that if two variables co-vary (vary together) the one with a
higher CP contribution drives the other. In this case we could say that
Citigroup “dominates” Home Depot. It is very difficult to disprove such a
statement (unless one has privileged information or if the data has
been manipulated).
In the case in question we could
say that Citigroup dominates the DJIA Index even though market
capitalization or stock value could hint something different. In
summary, we could conclude that a Complexity Profile may help solve the
eternal issue of causality (which seems to trouble humanity so much).
www.ontonix.com