|
-Home
-Contact
- Contents
- Reviews and
comments
- Questions
- Papers
- New results
- K:)nigsberg
- Jan Hajek's
IT & TI
|
|
The Mathematical Theory of Information:
(Re)Search Hints by Jan Hajek, The Netherlands
Introduction
Jan Hajek, The Netherlands, has contributed with this appendix
about different functions used to measure information. Questions
can be directed to him at
hajek@matheory.info.
(Re)Search Hints
When young, I was interested in cybernetics and in infotheory, but
got an opportunity to do some IT only after 1978 when finished my
R&D work on the automated verification/validation of communication
protocols including TCP. Let me complement Jan Kåhre's MTI with some
(re)search hints & pointers for the readers living & working
in the age of Internet still driven by TCP. Just search for the names
and keywords (JASA = Journal of American Statistical Assoc., LDI is
the Law of Diminishing Information) and get your hits & bits.
Hint 0: Verify that the LDI follows from the
fact that no (ir)reversible remapping (like
e.g. A-B-C) of a set of discrete symbols can
increase info, and usually it will decrease it.
Hint 1: The hallmark of the LDI
inf(C@A) <= inf(B@A) is its
generality and consequently its asymmetry. The classical symmetrical
inequality for cascaded channels is
I(B;A) >= I(C;A) <= I(C;B). A
metric distance must be symmetrical,
e.g. d(A;B) = H(A|B) + H(B|A),
normalized dn = d(A;B)/H(A,B), see
C. Rajski, Entropy and metric spaces, pp.41-45 (vector diagram of a
channel!), in the book Information Theory - 4th London
Symposium, 1956, editor Cherry; Shannon 1950 on The lattice theory of
information; Yasuichi Horibe 1973, Linfoot 1957 on An information
measure of correlation, Hamdan & Tsokos 1971 on An information
measure of association, 1971, all 3 in
Info & Control. Check and compare these via LDI which in
fact is much stronger than the triangle inequality, since
inf(C@A) <= inf(B@A) + any nonnegative number
like e.g. I(C@B) or 0, hence the LDI is a scale law.
Hint 2: The cont() measure
and its complement 1 - cont() have
several meanings [e.g. 1-cont(A) is the
expected probability of A] and applications ranging from diagnosing to
codebreaking. Cont() and
1 - cont() were rediscovered many
times, and were called many names, e.g.: Gini/Simpson index of
diversity and concentration (1912, 1938; Nature 1949, vol.163, p.688);
repeat rate by A. Turing & I.J. Good was used for code-breaking
since WWII (also in the cryptography book by A. Sinkov, 1968; Simpson,
Kullback and Leibler were cryptanalysts indexed in J. Bamford's Puzzle
Palace); relative decrease in the proportion of incorrect predictions
tau = cont(B@A)/cont(A) = (cont(A) - cont(A|B))/cont(A) = 1 - cont(A|B)/cont(A)
is the eq. (31) on p.759-760 in the survey paper on the Measures of
association for cross classifications, part I, JASA 1954, part III,
JASA 1963 eqs. (4.4.1..3) on pp.353-354, later published as a book;
tau is the eq. (11.3-22) in the book by Y. Bishop & S. Fienberg
& P. Holland, 1975, p.390, where the chap. 12 derives pseudo-Bayes
estimators of probabilities employing
cont(A) in K as a posterior risk;
energie informationelle by Octav Onicescu in 1966; C.L. Sheng &
S.G.S. Shiva On Measures of information, Proc. Nat. Electronics
Conf. 1966, Ottawa, pp.798-803, watch for the bugs in the eq. (40) and
in the following inequalities on p.802 - still a nice paper; Havrda
& Charvat 1967; quadratic entropy by I. Vajda 1968; JASA 1971
p.534, 1974 p.755; quadratic mutual information by G. Toussaint 1972;
J. Zvarova 1974; Toomas Vilmansen 1972; Bhargava & Uppuluri
1975-1977; JASA 1982, p.548-580; S. Watanabe in his books Knowing and
Guessing 1969 p.14, Pattern Recognition 1985 p.150; information energy
surveyed by I.J. Taneja 1989 & 1995, with L. Pardo 1991; parabolic
entropy; Bayesian distance thoroughly analyzed by P. Devijver
1972-1979; Jan van der Lubbe has papers and Ph.D. thesis on certainty
1981; Pielou on Mathematical Ecology 1977; Colette Padet 1985; Tsallis
entropy since 1985 plays a role in physics; see the Encyclopedia of
Statistical Sciences and the WWWeb for diversity indices, generalized
information measures, generalized divergences, measures of
association.
Hint 3:
Imre Csiszar: | f-divergence, f-information,
f-informativity; |
Moshe Ben-Bassat: | f-entropies, Bayes, probability of
error; |
Cornelius Gutenbrunner: | f-divergences as averaged minimal Bayesian
risk |
Hint 4: On desiderata for (fuzzy) entropies: De Luca
& Termini in Info & Control 1972 & 1974; Bruce Ebanks in J. of
Mathematical Analysis and Apps 1983, do not miss his theorem 3.2 on p.32
which says that font() is the only measure of fuzziness which satisfies all
6 desiderata.
Hint 5: On ent(B|a) as H(B|a): Nelson
Blachman, The amount of information that y gives about X; Tebbe & Dwyer,
Uncertainty and the probability of error, both in IEEE-IT 1968; Robert
Fano's 1963 book on Transmission of Information. Try to formulate new
ent(a|B), e.g. H(a|B), cont(a|B), etc.
Hint 6: Check that the probability of misclassification
is bound by
1 -rel(B@A) <= | cont(A|B)
<= | Min[ | H(A|B)/2, | 1 -
2^-H(A|B)] |
rel(B@A) >= | 1 - cont(A|B)
>= | Max[1 - | H(A|B)/2, | 2^-H(A|B)] |
see p.23/2.7a-e in the book by M. Mansuripur on Information Theory, 1987.
Which other interpretations can you see in the above inequalities?
Hint 7: The asymmetry of LDI implying
inf(B@A) =/ inf(A@B) is naturally desirable for
prediction, forecasting, classification, identification and diagnostic
tasks, and of course for measuring the cause-to-effect strength. The book by
A. Renyi, A Diary on Information Theory, 1987, the 3rd lecture, discusses
(a)symmetry and causality on pp.24-25+33, without offering a solution. Lets
reopen the discussion with a simple proposal plus its criticism. The obvious
need for asymmetrization of I(A;B) = I(B;A) may seem to
be solved simply by the normalization, since
H(X|Y) <= H(X), hence
asy(B@A) = 1 - H(A|B)/H(A) = (H(A) - H(A|B))/H(A) = I(A;B)/H(A)
asy(A@B) = 1 - H(B|A)/H(B) = (H(B) - H(B|A))/H(B) = I(A;B)/H(B)
from which it is clear that asy(B@A) =/ asy(A@B) solely due to
H(A) =/ H(B).
Since H(X) increases with the number of the distinct discrete values a.k.a.
the cardinality of X, the asymmetry is trivial if
card(A) =/ card(B). The
question is whether non-Shannonian informations such as e.g. the
tau above
[asymmetrical due to cont(B@A) =/ cont(A@B) ], can be at all substantially
less simplistic in this respect? Can the LDI help?
Hint 8: M. Bongard's book on Pattern Recognition, 1970,
chap.7 on Useful information, on disinformation pp.100-101; see p.121. A.
Hobson & Bin-Kang Cheng, A comparison of the Shannon and Kullback
information measures, J. of Statistical Physics, 1973, pp.301-310, p.305...
on undesirable additivity. The book by A.M. Mathai & P.N. Rathie, Basic
Concepts in Information Theory and Statistics, 1975, pp.27,68,79 on Kerridge
inaccuracy, p.84, p.90, p.100-102, p.110.
Further reading:
To Top of Page
Contact: jankahre (at) hotmail.com
|