SYMPOSIUM ON
INFORMATION THEORY IN BIOLOGY
y r, o
SYMPOSIUM ON
INFORMATION THEORY
IN BIOLOGY
Gatlinbiirg, Tennessee, October 29-31 , 1956
Edited by
HUBERT P. YOCKEY
Oak Ridge National Laboratory
With the assistance of
ROBERT L. PLATZMAN HENRY QUASTLER
Purdue University Brookhaven National
Laboratory
SYMPOSIUM PUBLICATIONS DIVISION
PERGAMON PRESS
NEW YORK LONDON • PARIS • LOS ANGELES
PERGAMON PRESS INC.
722 East 55th Street, New York 22, N. Y.
P.O. Box 47715, Los Angeles, California
PERGAMON PRESS LTD.
4 & 5 Fitzroy Square, London W.L
PERGAMON PRESS, S.A.R.L.
24 Rue des Ecoles, Paris, V^
First published J 958
Library of Congress Card No. 58-9687
Printed in Northern Ireland at The Universities Press, Belfast
CONTENTS
PAGE
Foreword ix
A. M. Weinberg
Preface xi
PART I. INTRODUCTION
A Primer on Information Theory 3
Henry Quastler
Some Introductory Ideas Concerning the Application of Information Theory in
Biology . 50
Hubert P. Yockey
PART II. STORAGE AND TRANSFER OF INFORMATION
Editorial Introduction 61
The Cryptographic Approach to the Problem of Protein Synthesis 63
George Gamow and Martynas Ycas
The Protein Text ^ 70
Martynas Ycas
Discussion 101
Protein Structure and Information Content 103
L. G. Augenstine
Discussion 1 23
Specific Mechanisms of Protein Synthesis and Information Transfer in the
Developing Chick Embryo 124
H. R. Mahler, H. Walter, A. Bulbenko and D. W. Allmann
Discussion 135
The Mechanism of Action of Methyl Xanthines in Mutagenesis 136
Arthur L. Koch
Evidence for a Negative Feedback System Controlling Liver Regeneration 148
Andre D. Glinos
Fluctuations in Neural Thresholds 153
Lawrence S, Frishkopf and Walter A. Rosenblith
PART III. DETERMINATION OF INFORMATION MEASURES
Editorial Introduction 169
Chemistry and Biochemistry at Low Temperatures and Discrimination of States
and Reactivities 171
Simon Freed
Discussion 1 80
75193
vi Contents
Information Content of Tracer Data With Respect to Steady-state Systems 181
MoNES Berman and Robert L. Schoenfeld
The Domain of Information Theory in Biology 187
Henry Quastler
Discussion 196
Some Membrane Phenomena from the Point of View of Information Theory 197
Herman Branson
Efficiency of Information Transmission by Biochemical Co-factors 204
Peter D. Klein
Discussion 209
Antigenic Specificity 211
Bernard N. Jaroslovv and Henry Quastler
Information Content and Biotopology of the Cell in Terms of Cell Organelles 218
Charles F. Ehret
Quantification of Performance in a Logical Task With Uncertainty 230
A. Rapoport
PART IV. DESTRUCTION OF INFORMATION
BY IONIZING RADIATION
Editorial Introduction 239
Electron Spin Resonance in the Study of Radiation Damage 241
Walter Gordy
A Physical Mechanism for the Inactivation of Proteins by Ionizing Radiation 262
Robert Platzman and James Franck
Information and Inactivation of Biological Material 276
Harold J. Morowitz
Discussion 28 1
The Absence of Radiation-Induced Disulfide Interchanges 283
Arthur L, Koch
A Proposed Mechanism of Protein Inactivation 287
L. G. Augenstine
Discussion 29 1
PART V. AGING AND RADIATION DAMAGE
Editorial Introduction 293
A Study of Aging, Thermal Killing, and Radiation Damage by Information
Theory 297
Hubert P. Yockey
Entropic Contributions to Mortality and Aging 317
George A. Sacher
Contents vii
A Quantitative Description of Latent Injury from Ionizing Radiation 331
H. A. Blair
Some Notes on Aging 341
Hardin B. Jones
Cancer as a Special Case of a General Degenerative Process 347
Harry Auerbach
Discussion 351
Free Radicals as a Possible Cause of Mutations and Cancer 353
Walter Gordy
PART VI. INFORMATION NETWORKS
A Probabilistic Model for Morphogenesis 359
Murray Eden
Functional Geometry and the Determination of Pattern in Mosaic Receptors 371
John R. Platt
PART VII. THE STATUS OF INFORMATION THEORY IN BIOLOGY
The Status of Information Theory in Biology: A Round Table Discussion 399
Edited by Henry Quastler
Author Index 403
Subject Index 41 1
FOREWORD
Alvin M. Weinberg
Director, Oak Ridge National Laboratory
The reader of this book may wonder why it is that an institution such as the
Oak Ridge National Laboratory, which is primarily interested in the control
and release of nuclear energy, should also be interested in sponsoring a meeting
on Information Theory in Health Physics and Radiobiology.
The answer rests in the fact that among the activities that are pursued at
this Laboratory there are two which bear very directly on general problems
of growth and of the impairment of growth by radiation and allied agents.
Broad programs in fundamental research in the basic physical mechanisms
and in the basic biological manifestations of radiation damage have been
established in the Health Physics Division and in the Biology Division. In
the Biology Division there is a great deal of experimental work being done on
protein synthesis, on the mechanism of action of the nucleic acids, and on
problems of the characterization of the nucleic acids. In the Health Physics
Division there is a lively interest in the problems of dosimetry and the basic
mechanisms of the interaction of radiation and matter. It is in establishing
a tie-up between the physical and biological aspects of radiation damage that
information theory may play an important role. We hope that this conference
will help to assess the value of information theory to phenomena involved
in the interaction of radiation and living matter.
(/) ; in our example:
2p{i) • z, = 1/2 + 3/8 + 3/8 + 3/8 + 4/16 + 5/32 + 5/32 = 70/32 = 2.19
i
From p{i) = (1/2)^
we get : logg p(/) = z, • logg ( 1 /2)
and, because: logg (1/2) = —1
we have: z^ = —loga /?(/)•
We get (for/?(/)'s which are integral powers of 1/2!) the following result:
Average number of binary symbols per event = —^p(i) logg /)(/).
i
We will check this result for the case of equiprobable categories. For
r categories, the probabihty of every one will be 1/r; so:
-lp(i) log2/'(0 = -r--- log2 - = log2 r
i r r
This is the expression previously obtained for equiprobable categories.
Any Probabilities — What if probabihties are not limited to the values 1/2,
1/4, 1/8, etc. ? In this case, it will — in general — not be possible to make divisions
into exactly equiprobable groups. We would suspect that in this case the
coding will be less than optimally efficient; accordingly, the average length
of a code word will be somewhat higher than —^p(i) logg /?(/). The approxi-
mation is usually not bad. This is illustrated in the following example which
shows the construction of a binary code for the letters of the English alphabet,
taking into account their relative frequencies. As expected, it turns out that
A Primer on Information Theory
15
each category, /, is represented by a code word of approximately —\og2pO)
digits; accordingly, its contribution to the weighted average is not far from
the ideal value of —p{i) log, /?(/), and the mean code length is only very slightly
greater than the limiting value of —]£/?(/) logg /?(/)•
Table I.
Fano Code for English Letters
1
2
3
4
5 6
7
No. of
digits in
code word
Contribution
/
pU)
Code
-10g2/'(/)
to weighted
-/>(/) X logipO)
average
2x4
2x5
E
.132
HI
3
2.92
.393
.384139
T
.105
110
3
3.25
.315
.341411
A
.086
101
3
3.54
.258
.304398
.080
1001
4
3.64 1 .320
.291508
N
.071
1000
4
3.82
.284
.270938
R
.068
0111
4
3.88
.272
.263725
I
.063
0110
4
3.99
.252
.251275
S
.061
0101
4
4.04
.244
.246137
H
.053
0100
4
4.24
.212
.224606
D
.038
00111
5
4.72 .190
.179278
L
.034
00110
5
4.88
.170
.165862
F
.029
00101
5
5.11
.145
.148126
C
.028
00100
5
5.16 .140
.144436
M
.025
0001 1 1
6
5.32 1 .150
.133048
U
.020
000110
6
5.64
.120
.112877
G
.020
000101
6
5.64
.120
.112877
Y
.020
000100
6
5.64
.120
.112877
P
.020
000011
6
5.64
.120
.112877
W
.015
000010
6
6.06
.090
.090883
B
.014
000001
6
6.16
.084
.086218
V
.009
0000001
7
6.80
.063
.061162
K
.004
00000001
8
7.97
.032
.031863
X
.002
0000000011
10
8.97
.020
.017931
J
.001
0000000010
10
9.97
.010
.009965
Q
.001
0000000001
10
9.97
.010
.009965
z
.001
1.000
0000000000
10
9.97
.010
.009965
4.144
4.118347
We have already met a situation where a binary code was less than optimally
efficient (in the sense of minimum length of code words); that was the case
of r equiprobable categories, when r was not an integral power of 2. In this
16 Henry Quastler
instance, it was possible to approximate optimal efficiency by symbolizing
groups of events instead of single events. The same principle works in the case
of probabilities which are not integral powers of (1/2). We will illustrate
the method in the case of a situation with two alternatives.
Example: Let there be two categories of events, 'A' and 'B', with associated
probabilities, p{K) and /?(B) :
/'(A) = .7
/KB) = .3
The limiting value of symbols per event is:
-IKO log2 AO = -(0.7 logo 0.7 + 0.3 log2 0.3) = 0.881291 . . .
i
If this situation is to be represented on the basis of single events, then one
needs one binary digit per event.
Event Probability Representation
A 0.7 1
B 0.3
Average number: 1.0 symbol per event; excess 12 per cent.
The following two-event clusters are possible: AA, AB, BA, BB. If the two
events are independent, then the probability that both occur is the product
of their individual probabilities :
p(AA) =7;(A) -piA), p(BA) =/7(B) - p{A\ etc.
Setting up a Fano code, we get:
Event Probability Representation
AA .49
AB .21
BA .21
BB .09
1
1
1
Average 1.81, or 0.905 symbols per event; excess 3 per cent.
If we can encode groups of three real events, then we get still closer to optimum
economy :
Event Probability Representation
AAA .343
AAB .147
ABA .147
BAA .147
ABB .063
BAB .063
BBA .063
BBB .027
1 1
1
1 1
1
10
11
1
Average: 2.686, or 0.895 digit per event; excess 1^ per cent.
A Primer on Information Theory 17
Even with more pronounced unbalance of frequencies, tiie minimum value
of binary digits per word is soon approximated. For/)(A) —■ .89 and /;(B) = .11,
the limiting value is .50. In single-event-code, one needs one digit per event;
for two-event-sequences, .66 digits; for three-event-sequences, .55; and for
four-event-sequences, .52.
We have begun our discussion of binary representation with the case of
2, 4, 8, 16, ... , equiprobable categories. We then generalized to cases with
any number of categories, and proceeded from the representation of single events
to clusters of events. Next, we introduced unequal probabilities, of value
1/2, 1/4, 1/8, ... . Finally, we dropped all restrictions. We can now state,
with full generality:
If a real situation is categorized into r categories, with associated proba-
bilities p(i), (where / = 1, 2, . . . , r), then it is possible to represent each
r
event with an average of no more than — 2 p(i) log2 pii) binary symbols.
i = l
Representation Theorem — In general, the closer we v/ant to approximate
the minimum bulk of representation, the larger the groups of sequences which
must be encoded. This entails the following penalties :
1. There will be a delay in waiting for a whole group of events to occur
or to be registered, and
2. The encoding and decoding procedures, and the code book itself, will
become the more elaborate the larger the groups coded.
It is obvious that the code which is most economical in terms of bulk of
representation is not necessarily optimum in over-all performance. There
will be cases where it might be worthwhile to sacrifice economy in word length
for ease in decoding. If the reader will work through exercise 4, then he surely
will appreciate this possibility. Whether or not minimum bulk of coding is
favorable, in a given case, cannot be derived from informational analysis.
What information theory does is to establish a limiting value of the number
of symbols, of a given kind, which are needed to represent the information in
a given factual situation; in some cases, like those here discussed, information
theory will also show how such coding economy can be achieved; but it can
never prescribe that this is what should be done.
It would be quite legitimate to inquire, at this point, why we have gone to
so much trouble to find out how to achieve binary representation with minimum
bulk ? Is not the result of doubtful value, in view of the fact that a tolerable
approximation to minimum bulk can usually be achieved with the simplest
means, and that a close approximation often entails prohibitive costs in encoding
and decoding? The answer is this: by establishing the minimum length of
code words in standard binary representation, we have implicitly established
a general condition of representability :
If an event can be represented by (on the average) n binary digits, then it
can symbolically represent, or be represented by, any other event that can
also be coded into n binary digits.
This can be immediately generalized to groups of events: Let 5"^. and Sy be
the number of real and symbolic events in a group, and n^. and «„ the average
18 Henry Quastler
binary representation per event and per symbol. Then, the general condition
of representability can be stated as follows :
^y ' f^u ^ ^X ' ^X
EXERCISES
1. A weakness of the Paul Revere code is that there is no positive signal for "peace and
quiet". Hence, the colonists could not be sure whether the absence of a warning signal meant
"peace and quiet" or a disturbance in the communication system. Show how two lights
could be used to indicate the four situations by positive signals.
2. Any integer can be written as a sum of powers of 2(1,2,4,8,16, • • •)• For instance:
27 = 16 + 8 + 2 + 1
= 2* + 2=' + 21 + 2»
In binary notation, one indicates the power by position, and writes a '1' in appropriate position
if this power does enter the sum, a '0' if it does not. Thus, '27' becomes 11011.
(a) Write the following numbers in binary notation: 0,1,2,3,4,5,6,7,8,9,10,12,16,1955.
(b) Write the following binary numbers in decimal notation: 1001, 1011, 10010011, 100000.
Any proper fraction can be written as a sum of powers of 1/2, (1/2, (1/2)^ = 1/4, (1/2)^ = 1/8,
etc.). For instance: .75 = 1/2 + 1/4, or, in binary notation, .11.
(c) Translate into decimal notation: .001, .1001001
3. (a) encode the message 'ABCDE' in code (a) of the five-word codes described earlier,
(b) decode the message: '000001011011' in code (b).
4. This assignment is coded in the Fano code for English letters given earlier.
001001001 10000101 1 1001 1 10001 10001001 10101001001001 1000001010001 1001010
1 101001 10000010101 1111111 1001001001001 1 1 1 1 10001 10010101 101000000101001
0101 100000001 1 1 100000101 10100010101 1 1000100001 1 101 1000010101 101 1001010
0101 100101 1 1 1 1 1 101001000100001 101111 101 101 110111 101 1000001 1 10010010010
001 1 100001 110101111111 1001001 1 100001 1111011 100101 100101 1 10001 1 1 101 1000
001001 1 1 100100101 1 10010001 100101001001001001 1 1 1 1 100001001 101 1001001 100
100101 1 10100100101 1 1001001 1 1 10 100000 1 10010000001 1 1 10000010001001 1 1 1000
001001001001 1 101 101000000101 101 1000001 1 1001 1 1 1 1 1001001001001 1 101 101000
000 101 1010001 1 1 1 1 101010101 101000101 1 1 1001 1001 1000000001 1 1 1 1 10010001 10
010110011000 111
(This assignment is very tedious but it is good practice.)
5. Given a real situation with three categories and probabilities p(A) = .8, p{B) = .15,
p(C) = .05. Construct a binary code which comes within 10 per cent of the minimum bulk.
6. A protein is thought to be a linear arrangement of amino acids of which there are
(about) twenty kinds in each cell. The specificity of a protein depends mostly on the sequence
of amino acids, i.e. a protein can be considered as a 'message' written in a twenty-letter
alphabet. It is known that, in the living cell, protein specificity is determined by nucleic
acids. These are linear arrangements of nucleotides, of which there are four different kinds.
Question: what is the minimum number of nucleotides needed, on average, to specify
each amino acid? Assume all amino acids to be equiprobable.
III. THE MEASURE OF INFORMATION OR UNCERTAINTY
It seems reasonable to equate the amount of information acquired, as a
result of an event, to the amount of uncertainty which its occurrence has
A Primer on Information Theory 19
abolished*. The prior uncertainty does not depend on the event that has
actually happened, but, rather, on the whole set of events which could have
happened at this particular occasion. For instance, if one wishes to compute
how much information is acquired, on the average, by a glance at the speedo-
meter, one proceeds to estimate how uncertain a motorist is before he glances.
The amount of this uncertainty must depend on the number of needle positions
which the motorist thinks he can distinguish. Suppose his speedometer scale
reaches from zero to one hundred and he can read the position to the nearest
mile per hour; then, he will be able to distinguish 101 positions, and the amount
of his uncertainty will be somehow related to this number. However, it wouldn't
be realistic to relate his uncertainty only to this number, 101. Because, suppose
his speedometer scale ranges up to 150 instead of 100 miles per hour; yet,
when he is driving along the highway at a moderate speed, this extra portion
of scale does not contribute in any way to his uncertainty; he will be quite
sure that his needle will not be in this interval. In fact, he will expect to find
his needle somewhere within a range of about 10 m.p.h., and he will be almost
certain to find it within a somewhat larger range of, say, 20 m.p.h. Thus, to
describe his uncertainty realistically, we must not only state every possible
result of his reading, but will have to qualify each by a statement of expectation
or probability.
The Amount oj Uncertainty
As before, we turn to a binary situation to obtain a simple perspective of
the problem. Suppose somebody has made a record of 100 tosses of a coin;
he has registered only whether the coin fell 'head up' or 'tail up', but neglected
all other features such as on what spot the coin came down, which direction
the head faced, etc. What is the average amount of information in the record
of any one toss? In other words, what is the amount of uncertainty before
the record is seen ?
The uncertainty must be a function of 'two', the number of alternatives;
it must be modified by their relative frequencies. If it is known that the record
is that of a coin so thoroughly biassed that 'head' always turns up, then there
will be no uncertainty at all; if the coin is moderately biassed, then the outcome
of a toss will be uncertain but not qui.te as much as with an unbiassed coin.
If we don't know the bias of a particular coin, then we do not know exactly
how uncertain we should feel about the outcome of a toss. If we know that
the record contains 60 'heads' and 40 'tails', then a record of 'head' will show
up with a probability of .60, a record of 'tail' with a probability .40. The
uncertainty can be described by a statement of these probabilities:
Probability of head up 0.60
Probability of tail up 0.40
In the same way we can describe any number of binary uncertainties with
a 60-40 choice between any class 'A' and its complement 'non-A' — where
'A' and 'non-A' may be males and females, hits and misses, friends and foes.
* At some time there was some discussion whether uncertainty and information should be
given opposite signs. Present usage prescribes the same sign for both.
20 Henry Quastler
These uncertainties differ in any number of respects from each other. They
win be of interest in very different situations; the kind of infomiation needed
to produce certainty is not the same; neither is the usefulness of this information,
and so on. However, there is something in common between all uncertainties
which can be characterized by the probabihties:
Probabihty of 'A' 60
Probabihty of 'non-A' ... 1 — .60
One aspect of this 'something-in-common' is that an arrangement of any 60
A's and 40 non-A's can be coded to represent any other 60 A's and 40 non-A's
—heads or tails, males or females, hits or misses, friends or foes. Once such
representation has been established, then the uncertainty concerning one
event will be abolished by information concerning the other. We have previously
equated the amount of information with the amount of uncertainty it removes.
Accordingly, it can be said that the amounts of uncertainty and information
must be equal in all situations characterized by a binary alternative with
probabilities .60 and .40.
The foregoing consideration exposes the fundamental features of the
measure of information :
(1) Information is a measurable abstract quantity; its value does not
depend on what the information is about, just as length, or weight, or tempera-
ture have values which do not depend on the nature of the thing which is long,
heavy, or hot ;
(2) Information is related to the ensemble of possible outcomes of an
event; its value depends on the probabihties associated with these outcomes,
but not on their causes, and not on their consequences.
What remains is the development of a measure which comphes with this
concept of 'amount of information'; this is merely a technical problem. An
obvious generalization states that whenever two events have the same number
of possible outcomes, and identical sets of probabihties are associated with
the two ensembles of possible outcomes, then these two events have identical
information contents. However, we wish to be able to compare events with
quite different probability sets; for instance, we wish to be able to say which
uncertainty is greater, that associated with a situation with three equiprobable
alternatives, or that where there are four possibilities with probabilities .8,
.1, .05 and .05. To answer such questions, we have to derive a measure which
is a single number, whatever the number of possible categories and their
associated probabihties.
Such a measure is readily derived from the equivalence of uncertainty
with the information which removes it. We may represent the information
content of an uncertainty-removing piece of intelligence in any manner we
wish. We stipulate that this information should be represented in a standard
fashion, namely, by using a binary alphabet. In addition we stipulate that
the binary representation be coded in such a manner that the expected number
of symbols is minimized. We thus obtain a unique number; namely, the
minimum average number of binary symbols needed to abolish the uncertainty
associated with a given situation. This number will be called the amount of
uncertainty or information of this situation.
A Primer on Information Theory 21
The function here needed has already been derived as the condition of
representabiHty. If two situations can be made to represent each other, then
information on one can aboHsh uncertainty concerning the other. Thus,
mutual representabiHty implies equal information content, and representation
in the standard binary system yields a general measure of information content.
This measure is the 'amount of selective information' as defined by Shannon
and Wiener (4, 5). It is expressed as follows:
Let X be a classification with categories i and associated probabilities
p{i); then the information content oj x is designated H(x) and given by*:
H(x)^ -2 p(i) logo p(i)
i
The units of this function are the binary digits needed for representation
of a given event, and are called bits. It must be remembered that the 'bit' is
a technical unit of amount of information and not a small piece of information.
A single chunk of information may contain many bits or a fraction of a bit.
Some Properties of the Shannon- Wiener Information Function
The Shannon-Wiener information function has been derived (admittedly, in
a loose fashion) from a consideration of standard representation of information.
We will now consider a number of its properties and see that they correspond
losely to the behavior which one would intuitively expect from a good
measure of information.
(1) Independence — Let / be one of the possible categories of an event x,
p{i) the associated probability, and F{i) the contribution of the /th category
to the uncertainty. It is desirable that F{i) be a function of and only of p{i).
The function / ^
F{i)^ -pii)\og^p(i)
fulfills this requirement. /
(2) Continuity — A small change of /;(/) should result in a small change in
F(i); in other words, F(i) should be a continuous function of p(i). The function
p{i) log2 p(i) is continuous.
/(3) Additivity — It is desirable that the total information derived from two
dependent sources should be the sum of the individual information; in other
* The information function looks (except for a scale factor) like Boltzmann's entropy-
function; this is not a mere coincidence. The physical entropy is the amount of uncertainty
associated with a state of a system, provided all states which are physically distinguishable are
considered as different, that is, if the categorization is taken with the finest grain possible.
In most situations dealt with in information theory, large numbers of states which are physically
distinguishable are lumped into equivalent classes. The category "one light on the steeple" is
a good example; an enormous number of physically distinct states are compatible with this
definition, but they are all lumped into one class. The distinctions upon which categorizations
are based are usually a very small percentage of the distinctions one could make. Thus,
physical entropy is an upper bound of the information functions which can be associated with a
given situation, but it is a very high upper bound, usually very far from the actual value. For
this reason, I prefer not to use the word 'entropy' as synonymous with 'information'.
A very thorough discussion of the relation between information and entropy has been given
by Brillouin (9).
22 Henry Quastler
words, the uncertainty concerning independent events should be the sum of
the individual uncertainties.
Let y be an event with categories j and associated probabilities p{j). Let
p{i,j) be the probability of the event pair that x falls into category / and v
into category y. Then, the function
Hix,}') = -lp{i,j)\og2p(i,j)
will measure the uncertainty associated with the event pair.
If X and y are independent events, then
p{Uj)^p{i)-p{])
As a matter of fact, this relation is often used to define independence. In this
case, we have
H{x, j) = - 2 p{i.j) logo p{i) ■ pij)
i.j
= -lp(hj)^og^pii) - lp('J)\oz.2p(j)
It is known that
J.piUj)=p{i)
j
IpiUj)=p(j)
Substituting these expressions, we obtain
Hix, >0 = - 2 Pii) log2 /XO - 2 p(j) loga p(j)
i .
= H(x) + H(y). ^ H^^^) ^ H^'^^^'i^f^^
Thus, the Shannon-Wiener function fulfills the postulate of additivity.
(4) Natural Scale— X\yQ prototype of uncertainty is that associated with a
50-50 choice. So, the unit of uncertainty should be the uncertainty associated
with this situation. In this case, both/s have the value 1/2, and
Hix) = -(1/2 log2 1/2 + 1/2 log2 1/2) - 1
Thus, the Shannon-Wiener function is seen to have an appropriate scale factor.
We have derived the infonnation function from the postulate of eflScient
binary representation, and have found that the function so defined has the
desirable properties of independence, continuity, additivity, and natural scale.
We could have started differently, setting up these four properties 2i^ postulates.
It can be shown that these four postulates (or other sets of four similar postu-
lates) define uniquely the Shannon-Wiener function. Working it this way,
we would have derived the fact that the function so defined has the desirable
property of efficient binary representation.
The function F{p) is plotted against/; in Fig. 1. The graph shows a curve
which originates and terminates at F = 0, and has a flat top with a maximum
A Primer on Information Theory
23
of F= 0.53 for/) = 0.37. Inspection of the graph reveals some more important
properties of the function F{p) :
(5) nO) = 0:
When a particular class of events is certain not to occur {p = 0), then it does
not contribute to the measure of uncertainty.
(6) F(1) = 0:
F(p) = - p logp p
F(p) 0.3
Fig. 1. Graph of F{p) as a function of/?
When a particular class of events is certain to occur {p = 1), i.e. excludes
all other classes, then there is no uncertainty about the outcome.
(7) Effect of Averaging:
F
> i[F(p,) + F(p^)]
The function of the average is greater than or at least as large as the average
of the function. When the probabilities associated with two disjoint categories
are averaged, then the uncertainty becomes larger. Figure 2 is a graphical
demonstration of this effect.
The extreme case of averaging occurs if all r categories in a classification
are considered equiprobable. Then,
Pi') = 7
I, 1 1 11
max. of H(.x) = — ^ - log., - = — /• • - lo?., -
,-1 /• ^- r r - r
max. of H(.x) = log, r
24
Henry Quastler
In particular in a binary classification,
r = 2
max. of H{x) = 1
Thus, the maximum uncertainty associated with two alternatives is one bit; it
occurs if both alternatives are equally probable (this is the case of the unbiassed
coin!).
(8) Ejfect of Pooling:
F(pi + P2X F(Pi) + np2)
The function of the sum is smaller than the sum of the functions. That is,
pooling of two classes in one equivalence class reduces uncertainty (exactly
P|+ Po
F(P|1 + F(P2J
Fig. 2. Graphical demonstration of the effect of averaging
by that uncertainty which is associated with the distinction between the two
pooled classes). Extreme pooling results in a single category with probability 1 ;
this means uncertainty 0. Figure 3 demonstrates the effect of poohng.
The function F(p) = —p logg p has been tabulated. The reader is advised
to use Fig. 1 to obtain approximate values for use in working the exercises
below. For more precise values, one of the existing tables may be consulted
(10, 11).
EXERCISES
7. Compute the uncertainty associated with:
p(A) = .60
/•(non-A) = .40
8. Compute H(x) for two alternatives, and plot the value against /7(A).
9. Answer the question posed previously: which uncertainty is greater, that associated
with a situation (x) with three equiprobable alternatives, or that (y) where there are 4 possibili-
ties with probabilities .8, .1, .05 and .05.
A Primer on Information Theory
25
10. Estimate the uncertainty of a motorist like the one described at the beginning of this
section.
11. Certain languages have considerably fewer letters than English (that is, about 18 to
20), yet the information content per letter is nearly the same. How is this possible?
12. A situation has an unlimited number of alternatives, with probabilities of 1/2, 1/4,
1/8, 1/16, etc. in geometric progression. What is the measure of uncertainty?
F(P|)+ Flp^)
FlPl + Pg)
fj P,+ Pg P2 p,+ P2
2
Fig. 3. Graphical demonstration of the effect of pooling
The function of the sum is on the intersection between the curve and the
ordinate over the sum; the sum of the functions is on the intersection of the same
ordinate with a straight line through the origin and the midpoint of the straight
line which connects the intersections of the curve with the ordinates over pi and
P2, hence:
F(p, + P2) < F(p,) + F(p,)
IV. INFORMATION MEASUREMENTS PERTAINING TO
TWO RELATED VARIABLES
In the two preceding sections we have discussed how to represent information,
and how to measure amounts of information. Both procedures become impor-
tant if information is to be manipulated. The manipulation most commonly
used is communication.
26
Henry Quastler
In infonnation theory, we use the word 'communication' in a wider sense
than usual — just as the word 'information' is used in a wider sense than usual.
We understand by 'communication' any relation between variables, accomplished
by any means whatsoever, conscious or otherwise, provided that it results in a
mutual reduction of uncertainty. For instance: if one watches one of two
tennis players, without looking at the other, he derives a considerable amount
of information about the unseen player's action. Thus, the seen player transmits
information about the unseen player — although in this case, the transmission
of information is incidental and not normally utilized, as one ordinarily looks
at both players.
An Example of Two Related Variables
The following example is purposely selected to represent an instance of
unintentional communication. The table below is based on Pearson and Lee's
measurements of heights on 1376 father-daughter pairs. To simplify the analysis,
we have grouped the data in coarse intervals of 3 in. each, and converted all
frequencies into percentages.
Table II. Heights of Fathers and Daughters; Probabilities and
Information Measures
Joint probabilities of heights, pii,))
(Pearson and Lee's data, 1376 father-daughter pairs)
jt 59.5
62.5
65.5
68.5
71.5
74.5
pU)
-p\og2P
1% = 53.5
.001
— .
—
—
—
.001
.01
56.5
.001
.007
.006
.001
—
—
.015
.09
59.5
.005
.022
.060
.027
.005
—
.119
.37
A-t 62.5
.004
.042
.156
.152
.039
.001
.394
.53
65.5
—
.009
.075
.175
.095
.010
.364
.53
68.5
—
.001
.011
.035
.039
.010
.096
.32
71.5
—
— •
—
.003
.006
.002
.011
.07
Pij)
.010
.082
.308
.393
.184
.023
1.000
1.92
-plog^p
.07
.30
.52
.53
.45
.13
* height of fathers, in 3 in. intervals
t height of daughters, in 3 in. intervals
+ center of intervals
2.00
Information Functions:
H{x) = -S/'(/)log2/j(/) = 1.92 bits
i
my) = -i:p(j)\og,p(j) = 2.00 bits
H(x) + H(y) = 3.92 bits
H{x,y) = -i:piij) log, p(i,j) = 3.70 bits
ij
Tix.y) = H(x) + H(y) - H{x,y) = 0.22
bits
A Primer on Information Theory 27
From the marginal sums, the uncertainties concerning the height of daughters,
H{x), and of fathers, H(y), are computed as described in the preceding section.
The uncertainty concerning both heights in a father-daughter pair is computed
in similar fashion from the joint probabilities, p(i,j). This function is properly
called the Joint uncertainty, or uncertainty of the two-part system ; its symbol
is H(x,y). It is compared to the sum of the two individual uncertainties. If
the two heights were completely independent of each other, then the joint
uncertainty should be equal to the sum of the individual uncertainties. In our
case, it is smaller by 0.22 bits. The deficit is a measure of the internal constraints
in the system, which lead to an association between heights of fathers and
daughters. The function is designated by the symbol T(x;y). Its defining
equation is : j,^^ .^^ _ ^^^^ _^ ^^^^ _ j^^^.^^
This information function is germane to other statistics which measure the
relatedness of two variables, such as the coefficients of correlation and of
contingency. The T-measure is of very general applicability; the values of the
variables do not have to be quantitative, not even ordered — they must only
be distinguishable. For instance, one can compute a T-measure for a relation
between color and shape.
The two functions, H and T, differ in the way in which they are affected by
change of scale. Let us consider what would have happened if he had chosen
one-inch intervals instead of three-inch intervals. It could be the case that only
one one-inch interval out of any group of three is occupied at all. Then, the
information that a certain height falls into a given three-inch interval would
automatically locate it in some one-inch interval; hence, the uncertainty is
not increased by the subdivision of intervals. However, this is an extremely
unlikely situation. It is much more likely that the three one-inch intervals are
populated with approximately equal frequencies. In this case, additional
information of logg 3 = 1.58 bits is needed to specify the proper one-inch
interval. Then, the uncertainty concerning the height of fathers with regard to
a one-inch scale will be 2.00 + 1.58 = 3.58 bits, and the uncertainty concerning
the height of daughters 1.92 + 1.58 = 3.50 bits. The joint uncertainty will be
increased by a factor of logg 9 = 3.17, because each cell in the table will be
replaced by nine cells as one goes from three-inch intervals to one-inch intervals.
If one uses a still finer grain, going from inches to millimetres, then the individual
uncertainties can be increased by another 4.7 bits, the joint uncertainty by 9.3
bits. This is quite the expected behavior. The more categories are recognized,
the greater the uncertainty of classification. The uncertainty can become infinite
for a continuous function. However, it will always remain finite for any set of
real observations.
T, on the other hand, depends very little on the scale interval used. With
very coarse grouping, T tends to be less. In the extreme cases, where all heights
are pooled into one single class, all individual and joint uncertainties vanish,
and with them their differences. In the other extreme case, where measurements
are taken and registered to so many digits that no two results are alike, we must
get //(x) = //(;,') = //(x,v) = r(x;v) = loga 1376. But, between these un-
reasonable extremes, the measure of constraints is characteristic of the system
and not of the scale which is used in measuring it.
28 Henry Quastler
Two-part Systems in General
We proceed to a general treatment of a two-part system x, y. Let / and 7 be
the categories of x and v, respectively, and p{i) and p{j) the associated
probabilities. Further, let p{i,j) be the probability of the joint occurrence
[{x = i) and (y =;)].
Then:
H(x)^ -2 p{i) 10^2 p{i)
i
H(y) = -Ipij)^og,p(j)
H(x, y) = - 2 p(i, j) logs p{i, j)
ij
We introduce the conditional probabilities,
Piij) Prob { V = y if X = /}
/>,.(O....Prob{x = /ifj=y}
When X = i then y must have some value j with certainty (or probability
1.0), that is
IPiiJ) = 1
j
Equally,
Ip^iO = 1
i
Furthermore, the probability of the joint occurrence [x ~ i and y = j] can be
factored into the product of the probability that x equals /, times the conditional
probability that y = j ii x — i; equally, it can be factored into the product of
pij) times Pj{i). So :
P(i,j)=pii)-Pi{j)
^Pij)-Pj(0
The conditional probabilities yield naturally conditional uncertainties. For
instance, the uncertainty of j, if it is known that x = i, will be
Hiiy) = -IPiij) loga/^XO
3
The average uncertainty of j, under the condition that x is known, is designated
by H/y). It is obtained as the weighted average of the //Xv)'s-
i
Substituting the value of H^{y), we get
tJxiy) = -Ipii) 1 Piij) ^og2Pi(j)
I j
and remembering that
Pii}) - -jay
we get
A Primer on Information Theory
29
Expanding the logarithm gives
HAy) = -IpUj') iog2/X^y) + Ip(ij') Iog2/X0-
ij io
Noting that
lpiJJ)=p(i)
3
we get
H/y) = -IpiiJ) loga /?(/,;■) + lp(i) loga /?('■).
ij i
We have seen that the first term on the right side is H(x, y) and the second
-H{x). So:
H,(y) = H(x, y) - H{x) and H(x, y) = H(x) + H^y)
A parallel development shows that
H(x, y) = H(y) + H,{x)
This relation is quite obvious if put into words: the joint uncertainty con-
cerning two variables is equal to the sum of the uncertainty concerning either
one variable plus the conditional uncertainty concerning the second variable if
the first one is given.
H(
K)
W/////////////////^^^^^
- > ' f II ^ ^
"• lly (X) *■
1
"(x;y)
— U f V ^ K
- H^(y)
y//////////////^^^^^^
.^
H
(>
1
y)
V
» \
^
Fig. 4. The relation between information functions shown graphically
The difference in uncertainty concerning )', depending on whether or not x
is known,
H{y) - Hly\
is the gfl/rt in certainty about y derived from observing x. Substituting for
^^rCj')' weget:
H{y) - Ely) = H{y) + H{x) - H{x, y)
The expression on the right side is the defining equation for T{x\y):
H(y) + H{x) - H{x, y) - T{x; y).
It follows from this derivation that Tis a symmetrical function:
r(x; jO = rO-; x) = H{x) - H,{x) - H{y) - HJy)
and it becomes clear why Tis a measure of the mutual reduction of uncertainty.
The relations between the six information functions, H(x), H{v), H(x, v), H^(y),
Hy(x) and T(x;y), can be demonstrated graphically as in Fig. 4.
30 Henry Quastler
In normal code representation, i.e. reduced to efficient binary operations,
the information functions have the following meaning:
H(x) . . . .number of operations which specify x
Hy{x) . . . .no. of operations which specify x if v is given
T{x; v) . . . .no. of operations which apply to the specification of both x and v
H(x, y)- ■ ■ .no. of operations which specify the whole system.
Inspection of the graph shows that:
H(x) > H,ix)
H(y) ^ H,(y),
that is, the conditional uncertainty cannot be greater than the unconditional
uncertainty.*
Communication Systems
When a system not only transmits information but exists primarily for that
purpose, then it is called a communication system. No class of two-oart systems
has received as much attention as that of the communication system. In a
simple communication system, tlie two parts are called the source and the
destination of information. The distinction between source and destination must
be based on external grounds; the informational relations between the two are
perfectly symmetrical. The relevant states of the source are called the inputs,
or signals sent, and the relevant states of the destination are the outputs, or
signals received. A single state is called a symbol, and a higher unit composed
of several symbols, a message. The conditional probabilities for each pair of
signals sent and received form a matrix called the channel. Note that the word
'channel' is again used in a sense wider than customary. A 'channel' may but
does not have to be a means of physically conveying information. For instance,
if two variables x and y do not affect each other but are both affected by a third
variable r, then knowledge of the state of x is likely to reduce the uncertainty
concerning the state of y, and vice versa; hence, information is transmitted
between the two variables, and they are connected by a 'channel' in the sense of
information theory — although they do not communicate with each other directly.
* However, this is true only for an average conditional uncertainty, and does not apply to
every particular condition. The following example will help to fix the ideas: Consider a
diagnostic test for a certain disease; suppose the nature of the test and the occurrence of the
disease are such that in 98 per cent of the patients the test is negative ; that of the positive tests,
50 per cent are spurious ; and that virtually every case of the disease will give a positive test.
Then, if the test is not performed at all, the diagnostician's uncertainty as to the presence of the
disease in any given patient, is
-(.99 log2 0.99 + .01 log2 0.01) = .081 bits/patient.
If the test was negative then the uncertainty is zero. But, if the test is positive, the chances
are equal that it is or is not spurious; hence, the uncertainty is I.O bit, and the diagnostician is
more in doubt than he was before. However, the average uncertainty, conditional upon his
performing the test, is reduced to
.98 X + .02 X 1.0 = 0.020 bits/patient.
A Primer on Information Theory
31
The information functions in a communication system are designated as
follows:
H(x)
H{y)
Tix;y)
.uncertainty of source
.uncertainty of destination
. ambiguity
.equivocation
.information transmitted, or communicated
Amounts of information transmitted must be referred to some unit of action.
In particular, it is customary to compute transmissions per symbol or per unit
time.
A channel which associates one and only one output with each input, and
no output with more than one input, is called a noise-free channel or transducer;
in this case,
H{x) = H(y) = H(x,y) = T(x;y);
HJ,y) = H,{x) = 0.
We can think of a noise-free channel as a means by which information at
the source is represented at the destination. Physically, this involves two acts
of representation: first, states of the channel are selected so as to represent the
inputs, according to some agreed-upon code; this is called encoding. Next, the
states of the channel are translated into meaningful states at the destination ;
this is called decoding. All we have stated about representation, representability
and amounts of information could now be restated in terms of encoding and
decoding operations. In this sense, the relation which we introduced as the
'condition of representability' is also known as the Theorem of the Noise-free
Channel; and all the examples and exercises of representing information could
be re-interpreted as coding operations.
Noise — Few real channels are noise-free; in general, more than one output
can follow a particular input. For instance, the 'channel' which links a daughter's
height to her father's is far from noise-free; the following table gives the
conditional probabilities:
Table III. Data of Table II in Form of a Communication Channel
Conditional
probabi
ities, p
(0
/ = 53.5
56.5
59.5
62.5
65.5
68.5
71.5
HAx)
j = 59.5
.10
.50
.40
_
1.36
62.5
.01
.09
.27
.51
.11
.01
—
1.80
65.5
—
.02
.19
.51
.24
.04
—
1.74
68.5
—
—
.07
.39
.45
.09
.01
1.70
71.5
— .
.03
.21
.52
.21
.03
1.74
74.5
—
—
—
.04
.45
.43
.09
1.55
32 Henry Quastler
The last column, Hj{x), is the uncertainty concerning the height of the daughter
if the height of the father is known ; it is not too surprising to find this uncertainty
smallest in the extreme cases, and always smaller than the unconditional
uncertainty of 1.92 bits.
The father's height 'communicates' some information about the daughter's
height; the amount communicated is 0.22 bits. It is not more than that for a
number of reasons. Some of the deficit in information about the daughter's
height is undoubtedly due to ignorance, and could be reduced by taking proper
account of various concomitant factors. Some of the uncertainty may be
irreducible, due to a truly random process — possibly the selection of the particu-
lar chromosomes which go into determining the daughter's height. In the strict
sense, the term 'noise' is reserved for the effects of random disturbances, and
not to the eff"ects of ignorance. However, the problem of the final distinction
between uncertainty due to randomness and uncertainty due to ignorance is
an extremely delicate one; the practical information analyst will usually be
satisfied to treat any uncertainty as due to noise, which results in the greatest
reduction of certainty. This interpretation will be subject to revision in the
light of additional knowledge.
The two-part system 'father's height-daughter's height' is not a communica-
tion system, and this is one reason why so little information is transmitted.
Suppose the numbers which define the 'father's heights' categories were not
observed in a given population but could be chosen arbitrarily; for instance,
they might be input voltages applied to a system. Accordingly, the 'daughters'
heights' might be output voltages, and the table of conditional probabilities
becomes a statement of the transfer function of the system. It is obvious that
this system can be made to transmit more than 0.22 bits per symbol. For instance,
using onlyy = 59.5 andy = 74.5, with equal frequencies, one would transmit
about .90 bits per signal. In general: for each channel, Piij), there exists a set
of input probabilities, p(i), which maximizes the transmission rate. The rate so
obtained is called the channel capacity.
Even with best utihzation of the possibilities of a channel, it can do no more
than transmit all the input information, and in general it will not transmit quite
all of it. This leads to an important generalization : Manipulation of information
cannot increase its amount; it can at best preserve it, and it is likely to reduce it.
This important statement will be clarified by the discussion of an apparent
exception. Suppose A wishes to send a message to B over the channel C;
conditions being very good, B picks up not only almost perfectly the message
sent by A but acquires, in the course of doing so, considerable amount of
information about conditions in the channel. His total information received
might be more than that contained in A's message; still, he has lost some of the
information contained in the message. In general: as a result of manipulating
information, there can be more output information than there was input
information — but the contribution of the input information to the total cannot
be more than the amount of input information.
Error Detection and Correction
A codebook states which output should be associated with any given input.
A noise-free channel fulfills these requirements perfectly. In a noisy channel
A Primer on Information Theory 33
Other outputs than the required ones appear; in other words, a noisy channel
produces errors. Errors lead to loss of information, and a reduction in the
rate of transmission; in a noisy channel,
Tix;y) 0.
This loss is unavoidable. However, it is at least possible to spot and correct
the errors which have occurred. It is one of the main endeavours of information
theory to devise methods to do this efficiently.
An error in a message can never be found unless the message contains some
extra information which can be used for this purpose. For instance, if the
message consists of a string of four digits chosen without any constraint:
5 3 8 7,
one has absolutely no possibility of knowing whether or not it contains any
errors. If it has been agreed upon that the message will be repeated, then one
can detect errors :
5 4 8 7
5 3 7 7,
and if the message is repeated several times, these errors can be detected and
corrected, with arbitrary certainty if the number of replications can be made
sufficiently large:
5 3 8 7
5 3 7 7
5 3 8 7
5 4 8 7
5 3 8 1.
In the second case, the possibility of error detection was bought at the price
of making two digits do the work of one; the message is said to be 50 per cent
redundant. In the last case, the price of error correction is the use of five digits
to transmit a single one, or a redundancy of 80 per cent.
Introducing redundant information in the fonn of a simple replication is
straight-forward and eiTective, but not very economical. Error detection could be
achieved more efficiently by simply adding the sum of the digits to the message:
be achieved more efficiently by simply adding the sum of the digits to the message :
5 3 8 7 2 3.
Here, the redundant information is only one-third of the total. In fact, giving
only the last digit of the sum as 'signature' is almost as effective, and requires
only 1 digit in 5, or 20 per cent redundant infonnation. The signature check
illustrates a general principle: a given amount of redundant infonnation in a
34 Henry Quastler
message can be used for error checking the more effectively the more evenly it
is related to all parts of the message.
It is always possible to achieve reliability, in the presence of noise, by the
use of redundant information; in fact, one can approach perfect reliability
arbitrarily closely if one is willing to provide enough redundant information.
The amount of redundant information needed, for a given noise level and a
given desired reliability, will depend on the efficiency of coding. The ideal
relation between noise level and redundant information needed is formulated
in Shannon's fundamental Theorem of the Noisy Channel. This theorem can be
stated as follows: if a certain amount of information is to be transmitted with
perfect reliability in the presence of noise, then it is necessary to provide at
least as much redundant information as the amount of equivocation introduced
by the noise ; furthermore, this amount will be sufficient if the coding is maximally
efficient.
There exist several proofs of this theorem; none of them is easy to follow,
and all are existence proofs — that is, they prove that an error-checking code
exists which will fulfill the requirements, but they do not say how to construct
it. In fact, perfectly efficient error-checking codes seem to be realizable only in
a few special cases; however, close approximations to ideal efficiency are easily
obtained if it is permissible to use message blocks of great length (12).
The economics of error-checking are dominated by three factors:
(I) the frequency and costliness of errors
(II) the cost of adding redundant information
(III) the availability and costliness of checking procedure (encoding and
decoding).
The work of Shannon and his followers has dealt with one particular situation :
encoding and decoding procedures are supposed to be reliable and gratis, the
error frequency is to be reduced to almost zero, and redundant information is
supposed to be used as sparingly as possible. As long as the theory is not
completed even for this case, one cannot expect to develop a more general theory.
Some qualitative notions of what it will entail can be gathered from a considera-
tion of a much-used, and presumably well developed communication system,
namely, printed language. Symbols are gathered into various checking units
(words, sentences, paragraphs, chapters) ; on each level, there operate constraints
which will help to locate and correct errors. For instance, this sentence will be
read corretly even though one letter has been onitted and one word misspelled.
It 3eems that the redundancy per letter, in a coherent English text, is about 60
per cent. Paragraphs are constructed in such a way that the sense can be
grasped even if whole words or even sentences are missing or perturbed, and
the essence of a whole chapter is, in general, understandable even if a whole
paragraph should be left out.
Actual Communications System
So far we have dealt with two-part systems in a purely abstract way. 'Sources'
and 'destinations' are defined simply by the states which they can assume.
'Channels' are tables of conditional probabilities; in the simplest case, the
channel is a kind of telephone book which associates every input to some
A Primer on Information Theory
35
particular output. If the association is not unequivocal, then the channel is
said to be noisy. 'Noise' is defined as a random perturbation of the input-
output link. Those are nice, clean concepts, not to be confused with realities.
The 'channel' exists on paper only, and is not the same as the mechanism which
links two parts of a system. The infonnational relation between heights of
fathers and daughters does not reveal the nature of the mechanisms involved;
whether fathers affect their daughters' heights by means of their genes, or of
the food they provide, or of the mother they select for them, cannot be decided
on grounds of informational relations. Indeed, I believe that Buddhist tradition
would explain the correlation on the grounds that daughters select their fathers;
as far as information theory is concerned, this is perfectly acceptable.
The scheme shown in Fig. 5 is a somewhat closer approximation to reality:
NOISE
SOURCE
MESSAGE.
• ENCODER
TRANSMITTER
SIGNALS
CHANNEL
SIGNALS
DESTINATION [ J^^SSAGE ^ qe-qqqer l.^ RECEIVER (— '
Fig. 5. A diagrammatic representation of a communication system
It is customary to treat all links but the channel as noise-free. If need be, one
can introduce noise into the other links of the model by some straight-forward
adaptations.
If signals and channels are physical entities, then it is relevant to investigate
their physical capacity of carrying information. Suppose the nature of a unit
of action and the physical constraints are such that the channel can assume any
one of m states during one unit of action; then, these states can be made to
represent log., m bits of information. It is the function of the encoder-trans-
mitter system to match the diversity of messages generated by the source to
the diversity of states which can be assumed by the channel; those, in turn, are
matched to the diversity of messages intelligible at the destination by the
receiver-decoder system.
As long as the demands on the channel are light, the matching process is
not much of a problem. However, it may become very difficult if the channel
is to be driven at capacity, and if the various states of the channel are not of
equal value; some may be more subject to noise effects than others, some may
need more time than others, some may necessitate more effort than others.
In general, one will tend to favor the safest, shortest, and easiest states. However,
this must not go too far; if one goes to the extreme of using the very 'best'
state, then the channel does not transmit any information at all. To find
optimum compromises between informational needs of source and destination
and physical capacities of the channel, between amount of information used to
carry messages and amount of information needed for noise reduction, is one
of the fundamental problems of the theory of information and communication.
36 Henry Quastler
EXERCISES
13. The following table gives the number of times the four possible combinations of two
flower colors with two pollen shapes were found :
Pollen shape
Flower color :
Purple Red
Long 296 27
Round I 19 85
Is there information transmission between these two characters ?
14. Define the following functions, and derive their values (in terms of //-functions)
J{x,y; z)
T(x; y, z)
T(x; y; z)
15.
Ad
agnostic test
gives
the following results:
true negatives . .
false negatives . .
true positives
false positives . .
85%
5%
3%
7%
What is the informational value of the test?
What is the maximum informational value that any test could give in this situation?
16. A teletype machine sends 2.3 groups of five binary symbols per second. What is the
maximum possible rate of information transmission ?
17. Same machine as in Exercise (16). All code groups are equiprobable. Error probabili-
ties are as follows: symbols nos. 1 and 4 are always received correctly, nos. 2 and 3 are wrong
1 1 per cent of the time, no. 5 is wrong 1 per cent of the time. All errors are equiprobable.
Compute equivocation and amount of information transmitted.
18. You are to send 2-bit messages through a channel which has the property that one in
five binary symbols is bound to be in error. Construct four sequences of five binary messages
which will allow the reconstruction of the original message. What is the efficiency of the code?
V. ORGANIZATION
Systems, Structures, Pattern
A system is an organized whole made up of interrelated parts. Organization
is based upon the interrelations between parts. The parts may be strongly or
weakly coupled; their effect on each other may be quantitative or qualitative.
(Z> Kr)
Fig. 6. A simple communication network
If two parts are coupled in any fashion, then knowledge of the state of one must
imply some information about the state of the other. Accordingly, any interrela-
tion can be technically represented as a channel. So, two components of a
system can be symbolically represented by a simple communication network of
two parts, referred to as two nodes and one channel:
A Primer on Information Theory 37
Let H(x) be the amount of infonnation needed to know what state x is in.
If y is known, some of this information becomes unnecessary, or redundant.
This amount, T{x;y), is an index of the degree of coherence, constraint, integra-
tion, or organization which prevails in the system.
Consider the pair of words 'green valley'. These two words form a small
system — a whole made up of interrelated parts. The whole has a meaning
which neither part alone has. The price for this feature is elimination of many
other possible connotations of 'green' and 'valley'. As a result, the information
content of the word combination is smaller than the combined information
contents of the two words. The difference must show up as redundant informa-
tion. The presence of redundancy implies that each word contains some
information about the other. This is best demonstrated by successful error
checking. The errors 'preen 'for 'green', and 'volley' for 'valley' would not be
found in isolated words, but can be spotted in the pair.
System Analysis — There seem to be three general viewpoints under which
relations within a system are assessed: (a) the amount of information trans-
mitted — on the technical, semantic and pragmatic level ; (b) the degree of control
or cause-effect relations, dominance; and (c) the utility, or value, of the relation
to one or both of the related parts. Information theory deals only with the
first viewpoint. It does not concern cause-effect relations, or what causes the
information to flow, and it is not concerned either with the utility of the flow of
information.
Informational analysis of a system will be of interest if and only if the
informational challenge is serious, that is, when a system has to process informa-
tion at a rate which crowds its capabilities. The informational challenge is
the result of:
(1) The diversity which is characteristic of the tasks; this can be expressed
as ///task. A system which is faced with the same task all the time or most of
the time may be working very hard but the difficulty is not an informational
one.
(2) The precision which is required ; this can be expressed as the ratio TIN.
That is, the diversity of tasks is informationally challenging only insofar as
it is expressed in a diversity of responses. A system with a small response
repertoire may be working very hard, but not in the informational domain.
(3) The time which is allotted for the fulfillment of each task. A system
with very modest informational equipment can solve many tasks if given ample
time. For instance, the extremely simple logical machine devised by Turing (13)
will solve any solvable problem if given very much time.
The time rate of informational challenge of the system is the product
H T tasks _, . .
X 7> X — TT — = Tlumt time.
task H unit time
The infoiTnational output of the system will be measured in //-measures
but the effective output, or informational performance, in terms of T-measures,
as T per task or T per unit time. The limits of the informational performance
of a system can be found by systematically varying the informational challenge
and observing the resulting performance. In such studies it is important to
38 Henry Quastler
make sure that the system's performance is hmited informationally, and not
by difficulties of sensing inputs or generating outputs.
It is possible to vary the informational challenge in a number of modes;
e.g. one can vary the number of sources of information, or the amount of
information per source. Challenging in various modes reveals whether or not
there exist several modes of limitation. It seems that the informational perfor-
mance wliich a system can produce in single tasks may be limited by the follow-
ing factors, singly or in conjunction :
(1) the amount of information which can be processed effectively in a
single task,
(2) the number of independent information-carrying components which
can be involved in a single act of infomiation-processing,
(3) the informational contribution from each independent component,
(4) all information-carrying components must be assembled within a
certain length of time ;
(5) in addition, there seem to be two general limitations on time rates:
there is a minimum time for each act of information processing, and
(6) the over-all rate of information-processing is limited (only this last
limitation has the character of a channel capacity).
This list of limitations is based on psychological experiments (14) but is believed
to apply to all types of systems.
Multi-part Systems — The informational system analysis is not restricted
to two-part systems. A system of three components can be represented as a
three-node network with a connecting channel:
Fig. 7. A simple three-node network
Again, it is merely a matter of convenience which node, or set of nodes, one
treats as the input, or independent variate.
The treatment can be extended to any number of components. Thus, a
nine-node network is equivalent to one man receiving infomiation from eight
sources, or feeding information into eight sinks; or, to four men watching
two sources, communicating with each other, and feeding information into
three sinks; to a sentence of nine words; to a decision based upon eight factors.
The more parts there are to a system, the more difficult becomes the infor-
mational analysis (15, 16). This is territory that has been but recently opened,
and we are still largely concerned with the formulation and highly tentative
application of concepts. It will be helpful to consider a parallel effort, namely,
the study of organization by game theory (17). One result of this study is that
each time a new player is added, the organization (the 'game') acquires a new
qualitative feature. One-person games deal with problems of maximum;
the addition of a second person introduces competition; of a third person,
coalition; of a fourth person, an asymmetric role of one player in relation to
the group of the other three, von Neumann (17) points out that it is at this
junction that the most remarkable problems begin to appear; also at this junction,
A Primer on Information Theory 39
there occurs a change from a rigorous and complete exposition to a heuristic
and incomplete one.
The situation is similar in the study of organization by information theory.
Each time a new part is added to a system, a qualitatively new information
function appears. As long as one deals with a single variable, the problem
is one of efficient use of existing variations. A two-part system introduces
relations between parts; a three-part system, relations between relations; a
four-part system, relations between a part and a complex of relations.
Unitization — It is an empirical fact that when a system is complex enough
to require very many components, the phenomenon of unitization occurs.
That is, some components get organized in such a way that they interact strongly
among each other, and act as a unit with respect to the remainder of the system
and the external world. Unitization seems to be a necessary evil; it might be
an important key for the study of complex organization and complex mental
activities. The phenomenon has never been really explained; it is possible
that a quantitative treatment will be made possible through the use of infor-
mation theory (18).
Unitization is always coupled with the phenomenon of limited span. Any
real part has a limited information content. In any single act of communication,
the capacity for non-redundant transmission of a part is limited by its own
infomiation content. This amount must somehow be partitioned into inter-
action with the external world, and interaction with the other members of the
unit. If each of these interactions is to be of significant size, then only a limited
number is possible. The interaction of a unit with the outside may be only
a fraction of the information traffic within the unit. Hence, several units can
be organized into a secondary structure of greater versatility, and this process
can be repeated on successive levels of organization.
There appears, thus, a possibility that information theory can be helpful
in formulating both the causes and the effects of unitization, and in establishing
rational interpretations of the size of the units. This would be a very important
contribution to any theory of organization.
Conclusion — We have proceeded from simple processes of representation
to discussions of communication and, finally, organization. It was attempted
to treat in a heuristic and perspicuous manner the basic principles of Information
Theory: there exists a generalized concept of 'information' which includes
communication and organization and is so general that every real event or
structure has its informational aspects; this general concept is related to a
measurable quantity; the operation of taking a measurement of this quantity
is done by means of symbolization in a standard language. The functions
as defined obey two fundamental theorems: the Representation Theorem,
and the Theorem of the Noisy Channel. Both theorems impose a limit on
the amount of information which can be effectively processed in a given
situation; both also state that it is possible to reach this limit.
APPENDIX I
THE EVALUATION OF INFORMATION CONTENT
The examples and exercises should have familiarized the reader with the
techniques of taking information measurements. However, the investigator
40 Henry Quastler
who wishes to use this knowledge in his field is bound to run into some diffi-
culties. A typical difficulty is that a natural situation does not present itself
neatly classified with a complete set of categories and probability measures.
It often takes considerable ingenuity to supplement the missing components
of the picture. Wherever ingenuity must be used, the result will not be unequi-
vocal. Hence it becomes important to estimate not individual information
measures but rather whole ranges compatible with reasonable assumptions.
The Relativity of Information Measures
'Information content' is a measurable quantity, just as length; and, just
as length, it is a function and not a property of a particular set of events. The
theory of relativity asserts that the measured length of an object depends on
certain relations between the object and the measuring system. However,
under everyday conditions these relations will not produce any significant
effect and, most of the time, lengths behave as if they were properties of objects.
The infomiation content of an event depends on the manner in which this
event is related to the frame of reference of the evaluating system. Unlike
with length, these relations are not fixed under everyday conditions. Therefore,
information content behaves only rarely as if it were a property of an event.
The amount of information, H{x), associated with an event, x, is defined
as the expectation of the logarithm of the probability that x will fall into some
category, /. Thus, the measure of information depends on three decisions:
(1) the choice of a unit event,
(2) the establishment of categories,
(3) the selection of a set of probabihty measures.
In general, each of these decisions involves a degree of arbitrariness. Accor-
dingly, a considerable range of information measures will be compatible with
a given real situation.
The question of an appropriate selection of a unit event cannot be solved
by mechanical application of hard and fast rules. There is a lower limit to
the size of elements, imposed by limits of observability. In general, selection
of these lower limits will force one to take cognizance of a tremendous amount
of detail, most of which is bound to be irrelevant. Thus, one will try to select
a unit event broad enough that all irrelevant details are submerged in its internal
structure, yet narrow enough so that no relevant relations get lost within the
unit event. In practice, one has to make a guess, subject to revision by later
experience. This difficulty occurs with all kinds of analyses, and is not specific
to informational analysis.
The situation is quite similar with respect to categories. There, too, exists
a bound, imposed by the capabiHties of discrimination. In general a large
number of discriminations can be made which are irrelevant to the problem
at hand. For instance, if one deals with the semantic content of a printed
message, it will be quite irrelevant to categorize by shapes of letters, quality
of paper, type of printing ink, etc. The decision is not always so easy. For
instance, in categorizing the atoms found in living matter it will, by and large,
not be necessary to distinguish between isotopes; in the overwhelming majority
of occasions, differences between isotopes will have no effect. Occasionally, of
A Primer on Information Theory 41
course, a particular isotope located in a sensitive spot and decaying at a critical
moment can have very large effects. In a case like this, the selection of a set of
categories becomes a matter of compromise.
The probabilities, finally, are never actually known. We have to estimate
them, on more or less sound bases. In many situations where generalized
information theory is used, the bases for estimating probabilities are rather
uncertain. Therefore, it becomes important to assess the dependence of
information functions on fluctuations of probabilities.
The contingent nature of information measures has not always been obvious.
All early applications of infomiation theory dealt with telecommunication
systems. In all of these, all informational characteristics are perfectly well
defined. In Morse code, all we have to know is whether a particular information-
carrying element is a blackness or a whiteness, and whether it is long or short.
In pulse code modulation, the only thing that counts is presence or absence
of a pulse within a stated interval of time. In pulse amplitude modulation,
all information is vested into the amplitude of pulses. In all these cases, there
is no question about the infomiational characteristics of the process under
consideration.
The situation is radically different in the larger domain of applied infor-
mation theory. For instance, take the case of two people transmitting information
to each other by talking. The information-carrying element is a clause; to
simplify our analysis, let us consider just words (remembering that the infor-
mation content of a clause cannot be greater than that of its constituent words).
Now, each person culls his words from a reservoir which is known to be large,
but its actual size is not exactly known. The information content of a single
word depends on the probability of its use, and these probabilities are not
exactly known either. Furthermore, they will hardly be the same for both
persons involved in a conversation. Also, each word can have several meanings,
one of which may be more or less determined by the context. The relations
between words, meanings, and context, again, are not the same for any two
people. This is not all. Information is conveyed not only by the choice of
words but also by inflection of voice, loudness, timing, and accompanying
gestures. In such a situation we have obviously no hope ever to obtain a
precise, unequivocal, and incontestable measure of information content.
We are, thus, confronted with two alternatives. These are: not to use infor-
mation theory, or to try to devise ways of producing usable approximate
estimates. Obviously, our choice is the latter alternative (19).
Approximation MetJiods
It appears that the approximation methods to estimate infonnation functions
are based on the following rules:
1. Averaging increases uncertainty;
2. Pooling decreases uncertainty;
3. Disregarding constraints increases uncertainty;
4. Rare events have small effects on uncertainty measures;
5. Smafl variations in probability have small effects on uncertainty measures;
6. In systems, information functions can be estimated in different ways,
and care should be taken to select the most appropriate one;
42 * Henry Quastler
7. If it is not possible to measure the actual infonnation functions desired,
then one can try to substitute closely related measurable quantities.
In the following paragraphs, these rules will be amplified and illustrated.
1. Averaging Increases Uncertainty — The fact was demonstrated in Section
III. It suggests a simple bracketing procedure: obtain a lower and upper
bound of uncertainty by using probabilities which are certainly more and
less unbalanced than they actually are. In particular, if the number of categories
is known but their respective probabilities are not, then one can follow Laplace's
procedure and set all probabilities equal which maximizes uncertainty.
2. Pooling Decreases Uncertainty — This, too, has been proven in the third
section. It is equally of value in bracketing procedures: using only categories
actually discriminated puts a lower bound on uncertainty; assuming more
categories than could be of interest establishes an upper bound.
3. Disregarding Constraints Increases Uncertainty — Let x and y be different
events, where y may differ from x only in time or place of occurrence or in
any other respects. If H(x) is the uncertainty of x, and Hy(x) the uncertainty
of .Y if y is known, then:
H^x) < H(x).
That is, knowing some other event, y, cannot increase the average uncertainty
concerning x; it will leave it unchanged if there is no association between x
and y; it will reduce it if constraints exist which are manifested in a statistical
association between x and y.
Rule 3 can be used for a bracketing procedure. Disregarding constraints
yields an overestimate of H(x) ; introducing constraints known to be too strong,
an underestimate.
Constraints have to be very marked to cause large changes in H(x). For
instance, the large inequalities of letter frequency in English texts reduce H
from a possible maximum of 4.7 bits per letter to 4.1 bits; the strong constraints
between successive letters and words result in an additional reduction to
1.5-2.0 bits per letter.
Formally, rule 3 is a special case of rule 1.
4. Small Effects of Rare Events — The information functional is a sum of
terms of the form (—p log/»). This function rises steeply between zero and .10,
hence, small probabilities contribute little to the total sum. For instance,
ten equiprobable alternatives correspond to an H of 3.32. If one of these
alternatives is replaced by ten separate sub-categories, each of probabihty
.01, then the resulting H is 3.65. If instead of ten, one introduces 100 equi-
probable sub categories, each with probability .001, the resulting H is 3.99,
or equivalent to sixteen equiprobable categories.
A good example turned up in a study by A. A. Blank. He calculated the
information content of single Enghsh words. For particular reasons, the sample
was restricted to four letter words. Thorndyke's list contains 1550 such words.
H, based on the observed frequency of these words, is 8.13 bits per word.
Of these words, 119 occur with the greatest frequencies. Computing H on
the basis of these words alone gives a value of 6.34 bits per word. Thus, taking
into consideration only about one tenth of all categories already yields about
four-fifths of the final information function.
A Primer on Information Theory " 43
This means that information functions can be estimated successfully as
soon as the more common occurrences are categorized. The remaining
infrequent occurrences will not contribute very much, and that contribution
can be easily bracketed between values based on numbers of categories which
are certainly too small and too large.
5. Small Effects of Small Variations in Probability — The curve of the
function F(p) =^ —p log p has a flat top. Small changes in probability in
this region have small effects.
Consider the simplest case, of two categories. If their probabilities are
equal, then //= 1. If the ratio of the probabilities is 1:2, then 7/= .92. If
the ratio is 1 :3, a very considerable deviation from equality, H is still .81.
For a larger number of categories, the insensitivity of H against probability
distortion is still mOre pronounced. If one replaces equiprobable alternatives
by probabilities staggered arithmetically or geometrically, stipulating only
that the span between the extreme value should be not more than one order
of magnitude, then the resulting changes in //are quite small.
This implies that the assumption of equiprobability, which gives an upper
bound as stated in rule 1, will not go very far from the true value unless proba-
bilities are radically unbalanced. The stretch bracketed between an upper bound
based on equiprobability, and a lower bound based on a distortion undoubtedly
stronger than the real one, will not be very large.
6. Alternative Ways of Estimating Information Functions — In systems
with several nodes, the compound infonnation functions can always be esti-
mated in several ways. For instance, in a two-node communication system,
the quantity which is the function of greatest interest, the amount of information
transmitted, T(x;y), can be computed in three alternative ways: as differences
between input uncertainty and equivocation, as difference between output
uncertainty and ambiguity, or as difference between the sum of uncertainties
of input and output and the uncertainty of their union. It usually is worthwhile
to inspect the data very carefully to estabhsh which of the set of functions can
be most easily and most accurately computed. In many cases, the quantities
most readily computed are not those which result directly from the plan of obser-
vation or experimentation. For instance, in most experiments it would be
natural to measure output uncertainty and ambiguity, but it is easier to measure
input uncertainty and equivocation.
7. Substitution of Related Quantities — In many cases where it is not practical
to compute the proper information measures, one can compute information
measures associated with related quantities. Take the case of estimating the
amount of information v/liich an individual can transmit after a single glance
at a display. This quantity is very difficult to determine; but, it is fairly easy
to determine the amount of information which can be elicited from an individual
by a short interrogation procedure after he has had a glance at the display.
This function is not quite the one we want, but presumably closely related to
it. Another example: in the case of mental arithmetic, we have no way of
estimating the actual amount of information processed, but we can readily
estimate the amount of information which must be processed if computations
are done in the way in which the subject claims he computes. In cases of this
kind one will use the measurable quantity instead of the desired one. Of
44 Henry Quastler
course, results so obtained have to be used with a certain amount of restraint.
Example: Rate of Information Transmission in Conversation — ^The working
of the approximation methods can be shown by two examples. The first
example is that which we used to illustrate the need for approximation methods;
namely, that of estimating the amount of information in conversation.
We consider first the infomiation carried in words. To establish an upper
bound, we ask how much information must be transmitted so that the receiver
can recognize every single word spoken.
This upper bound, in bits per second, is the product of the rate of words
per second times bits per word. A rate of 2.1 words per second is typical for
lively discussions. The number of bits per word in English context has been
estimated as 6.5 bits (±25 per cent). This yields 11 to 17 bits per second.
Words are not the only method of communication between two persons
conversing face to face. It can be shown, however, that all other means of
communication add little to the total transmission rate.
We will now try to establish a lower bound. Of course, no general lower
bound exists; it is easy to find examples where infomiation is transmitted at
the rate of 1 millibit per second, or less. What we want is an 'upper lower
bound' a lower bound of the amount of information transmitted between
people who try to communicate at some speed, and under reasonably favorable
conditions. Such a bound is obtained by analysis of pragmatic communication.
We look at situations where the verbal messages elicit or control actions.
We make an informational analysis of the relations between actions and verbal
messages. This will yield an amount of information demonstrably transmitted,
and it certainly represents a lower bound to the amount of information com-
municated.
At this time, we have a single case where pragmatic communication has
been evaluated accurately in informational terms. Felton, Fritz and Grier (20)
measured the amount of pragmatic communication between an airplane pilot
coming in for a landing and the control tower operator. They found an average
rate of 2 bits per second, computed in terms of actual effects of the messages.
Both pilot and control tower operator have all interest to communicate as
fast as they can. On the other hand, they do so in the presence of a very high
level of noise which reduces verbal communication to probably about one
third of its optimum rate.
We conclude, thus, that information transmitted through verbal communi-
cation is certainly not less than 2 bits per second nor more than 17 bits per
second, and very likely within the range between 6 and 12 bits per second.
This estimate is rough but not at all unrealistic.
Example: Information Content per Printed Letter— A very elegant way
of computing an information measure under unfavorable conditions was
used by Shannon in his analysis of the 'entropy' of printed English (21). The
information content of a single letter is easily determined as a function of
relative letter frequencies. However, constraints between neighboring letters
lead to a reduction of information content, and in order to estimate this
reduction exactly one would have to investigate the probability distributions
for long sequences of letters. This is manifestly impossible. Shannon, therefore,
proceeded to estimate a related quantity; namely, the amount of information
A Primer on Information Theory 45
concerning language constraints which can be ehcited from a person familiar
with printed English by a carefully planned interrogation. The subject is given
a text which is truncated at some point; he is asked to guess the next letter.
If he is successful, then he is told to go on; if not, he is told to try again. Records
are taken of the number of times a letter is correctly identified at the first,
second, third, . . . statement. In this setup, the experimenter acts as source
of auxiliary infoimation, emitting sequences of the type 'wrong . . . wrong
right', with an 'alphabet' of twenty-six different sequences (if repetitions are
excluded, the letter must be identified after no more than twenty-five wrong
guesses). The informational output of the auxiliary source depends on the
relative probabilities of the various sequences. These probabilities are very
unequally distributed. In a large percentage of the cases, the first statement
is correct; the most frequent message from the auxiliary source is 'right'.
The next highest probability is for the sequence 'wrong-right'. Messages
with up to three 'wrongs' make up the vast majority of cases; the remaining
categories, with from 4 to 25 'wrongs', have low probabilities. As was pointed
out before, they contribute little to the estimated value of H. This means that
we arrive at an estimate of the information furnished by the auxiliary source
essentially as a function of two to four probabilities.
The amount of information per single letter is known to be about 4.1 bits
(on the basis of relative frequency of letters in English texts). This is the amount
of information per letter which the subject needs to reconstruct the whole
text. Of this amount of information, a certain measurable fraction is furnished
by the auxiliary source. The remainder must come out of the subject's head,
and is based on his knowledge of language constraints. The amount of infor-
mation so elicited will not be quite as high as the information content of
language constraints, but it is a closely related quantity. By the ingenious
trick of effectively reducing the size of the alphabet, this quantity has been
made easily measurable.
APPENDIX II
ANSWERS TO EXERCISES
1 . One light — peace and quiet
two lights, vertically — enemy approaches by land
two lights, horizontally — enemy approaches by sea
two lights, diagonally — enemy approaches by land and sea
(This is not the only possible solution)
2. (a) 0, 1, 10,11,100,101,110,111, 10000, 1001,1010,1100,10000, 11110100011
(b) 9, 11, 147,32
(c) .125, .6703125
3. (a) 10110100010000
(b) EDCBA
4. 'Construct a confusion-free code using five binary digits for each letter and compare
the performance of this code with that of the above by encoding and decoding a message like
this one'.
Use part of the 32 code words made up of 5 binary digits, such as: 1 1 1 1 1 , 1 1 1 10, 1 1 101 ,
11100, etc. The message will be, on average, 21 per cent longer than with the most efficient
code (5 is 121 per cent of 4.14), but it is much easier to decode. Some of the unused code
words can be used for punctuation, etc. The teletype works on this principle.
46 Henry Quastler
5. Limiting value :
-(.8 logo .8 + .15 log, .15 + .05 logo .05) = .883
Single event code:
A 1 .8
B 1 .3
C .1
1.20 -0.883
1.20, excess is — = 36 per cent.
0.883
Two-event code:
Event pair Prob. Code
AA
.64
1
.64
AB
.12
1 1
.72
BA
.12
1
AC
.04
11
.32
CA
.04
10
BB
.0225
1
.09
BC
.0075
1
.0375
CB
.0075
1
.06
CC
.0025
1.0000
1.8675
.934 digits
per event
excess =
= 5J%<10%
6. Let X designate amino acids
;, and y nucleotides.
nx = log;
J 20 = 4.322
Sx=l
tty = log
;, 4 = 2.0
1 >
' ^-^^^ - 2 161
Sy
2.0 ^-^^^
7.
P
-p\og,p
.60 .44
.40 ^
H(x) = .97
8. The curve looks similar to F(p), but has a flatter top and is symmetrical, with a
maximum of 1.0 at p{l) = .50.
9. H(x) = log2 3 = 1.58
-p log, p
y:.S
.1
.26
.33
.05
.22
.05
.22
i/(j) =
mx)
1.03
my)
A Primer on Information Theory 47
10. A realistic description of his uncertainty might be:
prob (55-64) = .95
prob (55-54) -- .02
prob (65-70) = .02
prob (any other speed) = .01
Within each range, all speeds are considered equiprobable.
We will derive the answer in two steps, obtaining first the uncertainty as to the speed range:
Range p —p log., p
55-64
.95
.07
50-54
.02
.11
65-70
.02
.11
any other speed
.01
.07
.36 bits
Next, we observe that the range from 55 to 64 miles per hour contains ten speeds (deter-
mined to the nearest mile) which are equiprobable. The uncertainty measure for ten equi-
probable categories has been found to be log., 10 = 3.32. This uncertainty will arise 95 times
out of 100; its expected contribution to the total uncertainty is 3.32 ■ 0.95 = 3.15. The other
ranges are treated equally :
Range
No.
of sub-classes
ir)
log^r
P • logo r
55-64
10
3.32
3.15
50-54
5
2.32
.05
65-70
5
2.32
.06
all other
81
6.35
.06
3.31 bits
We thus need (on average) .36 bits to determine the range of speeds, and an additional 3.31
bits (on average) to identify the speed to the nearest mile, within the range. The total uncer-
tainty is 0.36 + 3.31 = 3.67 bits.
Of course, different expectations would yield different uncertainties.
1 1 . The letters occur with more nearly equal frequencies.
12. Two bits.
„, . , /323 323 104 104\ ^^ ^.
13. i/(shape) = - — log, h — lo", — -- .80 bits
\427 ^-427 427 "■427/
rrr , ^ /315, 315 112, 112\ ^. ,.
//(color) = - — loga h — log., — = .83 bits
\427 ^427 427 ^-427/
17/1 K ^ ^96 , 296 27 , 27 19 , 19
//(color, shape) = - — log, 1 log., 1 log., —
\ 427 ^- 427 427 ^' 427 427 ^" 427
+ ^log„^) =1.26 bits
427 ^- 427/
r(color; shape) = .80 -I .83 - 1.26 = .39 bits
48
Henry Quastler
14. T{x, y; z) = mutual reduction of uncertainty between x and y on one hand,
z on the other
= H(x, y) + Hiz) - H(x, y, z)
nx;y, z) = H(x) + H(y, z) = H(,x,y, z)
T(x;y; z) = total constraint in a tri-variate system
= H{x) + H(y) + H{z) - H{x,y, z)
15.
Test
Actual
pos
neg
pos
3
7
10
neg
5
85
90
8
92
H{y) = .40
H(x) = .47
H(x,y)
nx;y)
.84
.03
The informational value of the test is .03 bits.
Its maximum possible infonnational value equals the amount of uncertainty before the
test, viz. .40 bits.
16.
2.3 X 5 X 60 = 690 bits/minute
17. Begin by computing the output uncertainty. The probabilities of receiving each signal
are obtained as the sum of receiving it correctly (0.2 for Nos. 1 and 4, .178 for 2 and 3, .198
for 5) plus the addition due to errors (1/4 of the errors, for each erroneous transmission).
This procedure yields //(out) = 2.32 bits. Next, compute the ambiguities. These are zero for
symbols no. 1 and 4. For 2 and 3, the ambiguity can be computed as the sum of the information
needed to ascertain that an error has occurred (—0.11 loga 0.11 — 0.89 loga 0.89) plus the
information needed to find out which of the possible and equiprobable four errors has occurred,
which is 0.11 x 2.0 bits/symbol. Symbol no. 5 is treated similarly. The average of the ambi-
guities is 0.31 bits, hence T equals 2.32 — 0.31 or 2.01 bits — a loss of about one-sixth of the
input information.
18. One solution is the following:
11000
10101
OHIO
00011
A single error will result in the reception of a word which is not in the code book. If one
follows the rule of substituting that message in the code book which differs from the received
one by one digit only, then every error (provided there is only one!) will be corrected.
A five-digit binary message can carry five bits of information. If it is known that one error
has occurred somewhere in a group of five symbols, then the information needed to locate
the error is loga 5 = 2.33 bits. With maximum efficiency, one should use only 2.33/5 or 46.5
per cent of redundant information (which could be achieved by coding large sequences of
five-digit words!). In our case, the redundant information is 3/5 or 60 per cent, and we trans-
mit with an efficiency of 40/53.5 = 75 per cent. (Observe that there is less uncertainty if it is
known that there is one error in every five-symbol word, than when it is only known that the
error rate is 20 per cent !)
A Primer on Information Theory 49
REFERENCES
1. L. Szilard: tJber die Entropieverminderiing einem thermodynamischen System bei
Eingriffen intelligenter Wesen. Z. Phys. 53, 840-856 (1929).
2. R.A.Fisher: On the mathematical foundations of theoretical statistics. Phil. Tram. {A)
222, 309-368 (1922).
3. R. V. L. Hartley: Transmission of information. Bell Syst. Tech. J. 7, 535-563 (1928).
4. N. Wiener: Cybernetics, J. Wiley and Sons, New York (1948).
5. C. E. Shannon: A mathematical theory of communication. Bell Syst. Tech. J. 27,
379-423, 623-656 (1948).
6. C. E. Shannon and W. Weaver : The Mathematical Theory of Communication, University
of Illinois Press, Urbana (1949).
7. L. N. Ridenour: Computer memories. Sci. Amer. 192, 92-100 (1955).
8. R. M. Fang: The transmission of information. Tech. Rep. Mass. Inst. Tech. Res. Lab.
Electron., no. 65 (1949)
9. L. Brillouin: Science and Information Theory, Academic Press, New York (1956).
10. L. DoLANSKY and M. Dolansky: Tables of log-, Ijp, etc.. Tech. Rep. Mass. Inst. Tech.
Res. Lab. Electron., no. 227 (1952).
11. E. Klemmer: Tables for computing informational measures. Tech. Rep. A. F. Cam-
bridge Research Center, ARDC.
12. Articles by J. E. Golay, P. Elias, I. S. Reed, R. A. Silverman, and M. Balser in: Trans-
actions of the I.R.E. Professional Group on Information Theory (1954).
13. A.M.Turing: On computable numbers, with an application to the Entscheidungs-
problem. Proc. Lond. Math. Soc. 42, 230-265 (1937).
14. H. Quastler: Studies of human channel capacity. In: Information Theory, ed. by
C. Cherry, Academic Press, New York (1956).
15. Wm. McGill and H. Quastler: Standardized nomenclature : an attempt. In: Infor-
mation Theory in Psychology, ed. by H. Quastler, Free Press, Glencoe, 111. (1955).
16. H. Quastler: Information theory terms and their psychological correlates, ibid.
17. J. von Neumann and O. Morgenstern: Theory of Games and Economic Behavior,
Princeton University Press, Princeton (1947).
18. H. Quastler, H. H. Chase, W. Montagna, M. V. Edds, Jr., P. F. Fenton, and P. B.
Weisz: Essays on biological unitization. Rep. Control Systems Laboratory, Univ. 111.,
No. R-52(1953).
19. A. A. Blank and H. Quastler: Notes on the estimation of information measures.
Rep. Control Systems Laboratoiy, Univ. 111., no. R-56 (1954).
20. F. Fritz and G. W. Grier, Jr.: Pragmatic communication. In: Information Theory
in Psychology, ed. by H. Quastler, 232-243, Free Press, Glencoe, 111. (1955).
21. C.E.Shannon: Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50-64
(1951).
SOME INTRODUCTORY IDEAS CONCERNING THE
APPLICATION OF INFORMATION THEORY
IN BIOLOGY
Hubert P. Yockey
Oak Ridge National Laboratory, Oak Ridge, Tennessee
Abstract — The model of protein synthesis in the cell which has been built up as the result of
the work of many researchers has been used as a basis for applying the principles of infor-
mation theory in biology. The main Une of the argument has been the role of noise in the
genome. The discussion has been kept as independent as possible of special models.
It was shown that in a real organism noise must exist in the genome and that an ensemble
of organisms may be represented by a probability distribution in H, p{H, A). Individuality is
thus incorporated in a very natural way. Dancoff 's principle requires that there be a lower
limit for viability for this distribution. Ha.
The action of a deleterious agent which induces errors in the genome by acting on nucleo-
tide pairs is assumed to be represented by an equation of the first order:
^ = -j(X)p,(j) + ija)
where /(A) measures the effectiveness of the deleterious agent, of which A is a measure,
in producing defects. A differential equation for H(X) is derived and it is shown that
{dHldX)E^ as a function of A behaves like J{,X).
I. INTRODUCTION
Information theory finds its place in biological thought through its ability
to deal quantitatively with organization and specificity. The importance of
these concepts has long been recognized in biology, but this realization is
rather sterile unless a quantitative form of expression can be found. One is
reminded of a quotation from Lord Kelvin, 'When you can measure what
you are speaking about and express it in numbers, you know something about
it, but when you cannot measure it, when you cannot express it in numbers,
your knowledge is of a meagre and unsatisfactory kind.'
The need for expressing biological quantities in numbers is clear but solving
the problem of how to do it is very much like belling the cat. Biology doesn't
seem to have any problems both really simple and terribly important such as
some which occur in the physical sciences. The application of first principles
has come much more slowly in biology for perhaps this reason. That ideas
of great general application do exist in biology is exemplified by Mendel's
laws and by the theory of evolution.
One of the purposes of this article, and indeed one of the purposes of this
book, is to explore the practical and theoretical consequences that may be
found in the discovery that biochemical specificity of proteins is carried, largely
at least, by the exact order of twenty amino-acid residues. The suggestion of
50
Some Introductory Ideas Concerning the Application of Information Theory in Biology 51
Watson and Crick (1) that genetical infomiation is carried by the exact
order of four kinds of nucleotide pairs provides a molecular vehicle for the
genetic control of protein specificity. Gamow (2) was the first to see that
this control implied the existence of a four-letter to twenty-letter code.
Thus by following the logical consequences of purely biological, or perhaps
biochemical, problems one is lead directly to a problem purely mathematical
in character.
This notion of the role of order, which is basic to information theory, is
worth pursuing in biology since it provides a way of measuring what we are
speaking about and expressing it in numbers. Furthermore, from the results
of applying the theory to specific problems, we may obtain an experimental
check on the validity of these ideas as first principles. In this article we shall
apply these considerations to the storage and transfer of biochemical specificity.
We shall explore, in particular, the role of noise in the genetical message. In
my article in Part V the theory is applied to the practical problem of calculating
and understanding survivorship curves.
The present status of the means of storage and transfer of specificity is
given by Gamow, by Ycas and by Augenstine in their respective articles in
this volume. The question of the exact way in which information is destroyed
by read-off error, radiation damage, aging, thermal fluctuations, biochemical
side reactions, and so forth, is of equal importance. This problem is also
discussed in this volume but no final and detailed account can be given at
this writing. Nevertheless, since there is virtue in attempt, we shall attempt
the development of a mathematical formahsm which is information theoretic
in character.
Most animals and plants exist at one time, at least, in the form of a single
cell; we can consider that cell to contain a substantial part of the directions
for the development of the organism. Since infonnation is conserved unless
lost due to noise, it shall be assumed that the mature organism is characterized
by substantially the same information content as the fertilized egg or seed.
In order to fix the idea we shall develop the formalism on the basis of Watson
and Crick's suggestion concerning the role of DNA. It should be remembered
that the central ideas of this paper are independent of much of the detail
embodied in Watson and Crick's papers and are dependent only on the possi-
bility of genetical endowment being conveyed by a series of structures composing
an information bearing molecule.
Suppose we imagine the symbols A, B, C, D (Gamow's predilection is to
the less prosaic spades, clubs, hearts, diamonds!) arranged in one-to-one
correspondence with the nucleotide pairs of the DNA found in a particular
given cell. The cell will have been selected from a number of similar but not
identical cells in a colony under study. This colony may be thought of as
being indefinitely large, so that in principle we may consider the ensemble
of all possible organisms identifiable as being members of the colony. Since
the number of nucleotides in DNA is finite, the number of elements in this
ensemble is also finite. Because of this one-to-one correspondence it will be
seen that the set of symbol sequences, which is the mathematical model of the
ensemble of organisms, will contain the informational or specificity properties
of the ensemble of organisms.
52 Hubert P. Yockey
The importance or value of a theory lies, among other things, in its capability
of treating a wide variety of phenomena from a single point of view. It is
well to think, at the start, of the field of validity this theory may have and,
if it should fail, the significance of its failure. If it should be discovered that
Watson and Crick's suggestion has very little bearing or applicability then
this development, while negative, is still a valuable result. One would then
perforce search for another explanation for the great detail and specificity
characteristic of any biological phenomenon. At present it is the most detailed
proposal based specifically on molecular chemistry. The theory here developed
is essentially statistical and may be expected to express its results in the form
of expectation values, probabihty distributions, and their functions. The
statistical character of the theory is directly in the line of thinking of both
modern biology and modern physics. It should be kept clearly in mind that
information theory deals with organizational problems and so some aspects
of organisms will be outside its scope. In this sense it may be that the role
information theory will play in biology will parallel that played by thermo-
dynamics in physics and chemistry.
II. NOISE IN THE GENETICAL INFORMATION
The Instability of a Perfect System
Let us consider an ensemble of organisms and discuss the communication
of information from the DNA to protein. There is evidence discussed by
Gamow and by Ycas in this volume that the code which translates information
from the four-symbol DNA code via RNA to the twenty-symbol protein
code is based on triads of nucleotide pairs. Indeed it can be seen that it must
be at least the triads since a twenty-symbol alphabet carries 4.32 bits per symbol
whereas the pairs in a four-symbol alphabet carry exactly four bits per symbol,
assuming no intersymbol constraints. The triads carry six bits per symbol
and so this represents some inherent redundance. It would be desirable to
express this formalism in terms of the DNA triads of nucleotide pairs ; however,
this requires a knowledge of the DNA to protein code. These data are missing.
Our objective is to develop the mathematical fomialism in as simple a way
as possible so it appears more appropriate to consider the communication of
specificity from DNA to RNA. Here we are dealing with a coding between
two four-symbol alphabets.
Suppose we are considering an ensemble of organisms which is isogenic, and
further that this means that each organism is characterized by exactly the
same order of nucleotides in the DNA of its nucleus. We shall now show that
this situation is unstable and that therefore a real ensemble of organisms will
be represented by an ensemble of messages recorded in its DNA. From this
it will follow that there is a distribution in the message entropy, characteristic
of any ensemble of organisms, even one which is isogenic.
The message entropy is
H=H,-H, (1)
where H^ is the message entropy of the genetical information and H„ is the
loss of information due to noise. That is, //„ is the loss of information from
Some Introductory Ideas Concerning the Application of Information Theory in Biology 53
some fault cither in the duplication process in the germ line or the somatic
line or from incorrect rcad-o(T of any kind. //„ may be expressed in terms of
the read-off or transition probabilities (3) of a letter of kind / to a letter of
kindy, Piij). The probability of letter / is p{i).
H=H,-\-y p{i) p^ij) log2 p,{j) (2)
Consider the case where these probabilities are a function of some variable 1.
In the application of these considerations A is the measure of some deleterious
influence such as dose of ionizing radiation. Form the derivative dHjd?.:
ciHldX = log2 e 2 (MO ic¥>^) P.ij) + Pii) loge pSi) {dIdX) p,{j)
+ P.(j)ioi,p,{J){dldX)p{i)] (3)
The absolute value of dHfdX will become indefinitely large because of the
second term in equation (3) as any p^{j) approaches zero if p{i) ^ and
(dldX) pi{j) 7^ 0. This may happen, in particular, if any p/ij) approaches one
for then SL\lpi{k), (j ^ k) approach zero. This situation {p,{j) = 1) corresponds
to the assumption that there is always a correct reproduction in the DNA
duplication or in the RNA read-off. Under these circumstances the first term
is finite and the third term is zero.
Watson and Crick regard a mutation as being reflected by a change in
order of the nucleotide bases in DNA. This is apparently always possible;
they have suggested a biochemical scheme by which this can be affected. This
means that in a real biological system p{i) ^ and {djdX) p/ij) 7^ 0. A real
ensemble of organisms will be represented by an ensemble of genetic messages.
This will be true even if the ensemble is isogenic. Some noise must exist in the
genetical information; if the noise is less than equilibrium it is quickly intro-
duced.
There is some experimental evidence in support of this conclusion. Burdette
(4) prepared populations of isogenic Drosophila. One strain had the same low
incidence of tumors in both sexes (about 4 per cent) and the other had a high
incidence (about 60 to 80 per cent) even greater in males than in females. The
tumor incidence of the isogenic strains was initially much lower in each case
than the stock from which it originated. But in each case, by the twelfth genera-
tion, the tumor incidence of the isogenic strain had returned to about the same
rate as that of the original stock. Tumor incidence is a morphological mal-
function and, as shown in this and other experiments, is under genetic control.
The fact that all flies were not tumor bearing and the gradual return of the
isogenic strains to the tumor incidence of the strains from which they were
selected, reflects the accumulation of errors in the genome. The results of the
experiment are in accord with the proposition proved above.
Representation of the Ensemble 0/ Organisms by a Probability Distribution in H:
piH, A)
If we grant that perfect systems do not exist, the other side of the coin is,
how imperfect may they be? This question was first discussed by Dancoff and
QuASTLER (5) and their conclusion, which is known as Dancoff's principle,
states that the amount of redundance is just that required to reduce the error
54 Hubert P. Yockey
rate to a tolerable level. According to this principle, we may expect that errors
will continue to accumulate in the genome of a given organism until at some
point serious difficulty including death will occur. This will be reflected by
some value of H, which we call //^, limited by viabihty. An argument for a
lower limit H^^ has been given previously (6).
Errors will accumulate in the genome but at the same time there is a favorable
selection for those members of the ensemble which have low equivocation.
This represents a certain reserve capacity to withstand the insults of existence.
It may therefore be expected in general that the message entropy of the ensemble
of organisms will be described by a probability distribution. This distribution
can, perhaps, be calculated from first principles, at least for simple cases, when
more is known about the storage and transfer of genetical information.
Death of an organism is defined in different ways in various fields of biology.
Permanent loss of reproductive power is the definition of death usually expressed
or implied in bacteriology (7). This is the definition chosen in spite of the fact
that there are many inteiTnediate stages between the active living cell and the
dead cell. It is known that yeast cells which have lost the power to multiply
may still be able to fennent (8). Zelle and Hollaender (7) have recently
pointed out that attempts to explain the bactericidal effects of irradiation on
the basis of one mechanism are unrealistic. In the case of animals the cessation
of metabolism, not the loss of fertility, is the criterion of death. These criteria
of death are not really different or antagonistic. Since loss of function is implied
by loss of information content any experimentally convenient definition of
lethality may be used to suit the problem at hand. The lower end of the distribu-
tion in message entropy will therefore be determined by the specificity required
by the environment.
A communications analogy may clarify the notion further. Suppose we
have a message, with redundance, which is sent through a communication
channel with a small but finite noise level. The message contains instructions
to perform some necessary task. A recording is made and the message is sent
through again, and so forth. Eventually, depending on the noise level of the
channel and the redundance in the message, it will be just barely intelligible. No
further recordings can be made without loss of part of the required information
content. The ensemble of recordings is analogous to the ensemble of organisms.
It will be seen in either case that there is a distribution of information content
among the elements of the ensemble.
Individuality finds a place in the theory developed here in a very natural
way. This feature corresponds more to reality (9) than theories which must
explain non-uniform response as fluctuations. Besides the experiments of
Burdette mentioned above it will suffice to note one other example of biological
individuality.
Consider the experiments of Schott (10, 11), Hetzer (12), Lambert (13),
GowEN (14), discussed by Gowen (15), on Salmonella tvphimurium in mice and
Salmonella gallinarum in fowl. The host population is exposed to the pathogen
and the survivors are chosen for further breeding. The case for mice is typical.
The survival ratio improved from 18 per cent to 93 per cent in six generations,
but remained nearly constant after that. One hundred per cent survival was
not achieved. The survival ratio is characteristic of the ensemble not of the
Some Introductory Ideas Concerning the Application of Information Theory in Biology 55
individual. Gowen (15) also prepared six strains of mice by sibling malings
for twenty or more generations. When survival was tested the survival ratios
were 1, 14, 34, 63, 64, 83 and 88 per cent. These results again stress the
importance of individuality as Gowen pointed out.*
Point Mutations and Chromosome Aberrations
We have now arrived, via our discussion, at territory familiar to the radiation
biologist. This is the controversy over the role played by point mutations and
chromosomal aberrations induced by deleterious agents such as x-rays. This
subject has been ably discussed recently by Muller, Kaufmann, Giles, Carlson,
SwANSON and Stadler, and by Kimball (16). The point of view of these
authors varies. Kimball takes the stand with Lea (17) that the death of cells is
due to chromosome aberrations which become effective at cell division. Swanson
and Stadler point out that the two effects occur together and that a clear cut
separation has not yet been accomplished. Muller points out some difficulties
with the mutation by breakage interpretation. Russell (18) states that gross
chromosomal aberrations, although they cause early death of embryos, are
probably not an important radiation hazard to man.
From the point of view of this article each of these effects is a way of intro-
ducing disorganization in the genome. The point mutation mechanism is the
biological analogue of the 'white noise' of the communications engineer. The
other extreme is not found in communication engineering but involves a strong
correlation between errors and is reflected as a loss of whole paragraphs or
other gross mutilation of the message. Each of these extreme cases will be
important in applications of information theory in biology. Unfortunately, the
second case has not been studied mathematically and so it is not known how
to calculate the equivocation it introduces.
It is therefore necessary to proceed with the calculation of only the part of
the equivocation which corresponds to point mutations. Since one of our
objectives is to develop a fundamental theoretical treatment of radiation hazard
to man, Russell's comment encourages one to think that this procedure is
v/orthwhile. It should be remembered that equivocation from these two extreme
conditions may have the same dependence on the deleterious influence. This is a
point which requires further mathematical study.
The Interaction of the Deleterious Agent nith DMA and the Decay of H
According to the Watson and Crick model of DNA there seems to be no
biochemical reason why there should be an interaction between nucleotide
pairs. The biological requirements for protein specificity do not seem to demand
an intersymbol influence (19). The matter is not closed, but the evidence favors
regarding the interaction of a deleterious agent with a nucleotide pair to be of
the first order.
We have previously suggested that the action of ionizing radiation or other
deleterious agent may be such that the nucleotide pair is altered in such a way
that it mimes another symbol as far as protein synthesis is concerned (6). It
* Individuality as an integral feature in biology has been emphasized recently by Rcxier J.
Williams: in Biochemical IncUvidiiality, J. Wiley and Sons, New York, Chapman & Hall,
London (1956).
56 Hubert P. Yockey
may be thrown into an excited tautomeric form from which it recovers by
relaxation. Possibly one can account for biological recovery by such a
mechanism. The consideration of recovery is omitted from this paper for
simplicity and we shall need only the notion expressed in the first sentence of
this paragraph.
In view of the above remarks we may write the following equation for the
rate of change of /?,()) with A:
idldX) p,ij) = -y,,(A) p,{j) + c,,(A) (4)
The first terni represents the loss in nucleotides responsible for the {i,j)
transition. The second term is due to the gain in nucleotides engaging in the
(i,j) transition coming from other nucleotides altered by the deleterious agent.
This can be brought into sharper focus by thinking of the binary case. Suppose
q is the correct and p is the incorrect read-off probability. We are calculating the
equivocation, or damage to the message, resulting from point errors. This means
that, accordingly, a letter is not deleted but is read off either correctly or
incorrectly. This letter switching process may continue until half the letters are
correct and half are incorrect; at that point p = Ijl and q = 1/2. The infor-
mation content vanishes. In the case of a four letter alphabet a letter which is
acted upon and which may therefore change or may retain its original read-off
character has an a priori probabiUty of 1/4 to remain or to become a correct
letter. Thus the second term is required by the normalization condition.
Equation (4) describes the effect of the interaction of the deleterious agent,
say the x-ray dose, with the information bearing molecules in the cell. It
corresponds to current views of reaction kinetics. Should it be discovered that
some effect, for example, inter-symbol influence, should be taken into account
then equation (4) may be altered suitably. The following argument would then
still be cogent except that the new form of equation (4) would be used. Present
experimental evidence substantiates equation (4) and we have no present
justification for greater complication. In fact the /./A) and c-j{X) represent more
detail than is available. Sum equation (4) over ally:
2 (d/dX) pij) = - 2 JM plj) + 1 cM (5)
Since J J
IPi(j)=l; I(dldX)p,(j) = (6)
j j
o = -2^a)AO')+2c.>a) (7)
3 3
If the /,/A) and the c^/A) may be replaced by an average value J(X) and c(l),
equation (7) becomes, for a four-letter alphabet:
= -J{X) + 4 cU) (8)
c(X) = +yiX) (9)
Equation (4) may be written as follows:
(dldX) p,(j) - -7(A) p,{j) + i/(A) ( 1 0)
Some Introductory Ideas Concerning the Application of Information Theory in Biology 57
Given (dldX) p^d) as some function of A, equation (3) may be regarded as a
differential equation for //(A). This equation has a simple form if the y,v,(A)
and the c,,(/l) may be replaced by their averages y(/l) and iJ{?>.).
{dHldX) = log2 e 2 {p{i)J{X)[ p^ij) + I] -{-p{i)m [-^p,{j) + I] loge/;,(;)
-\-p,{j)\og,pij){dicrA)p{i)] (11)
{dHldXy= -J{X) log2 e lp{i)p,{j) loge/^(7)
+i J(A) log2 e 2 p{i) loge PiCj)
+log2 e 2 Piij) HePiij) {dldX) pii) (12)
Substituting equation (2) in equation (12) and rearranging we have
{dHldX) + J{X)H = J{X)H, + ya) 2 pii) \o^z Piij)
i.j
+ lPi(j)iog,p,(j)(dld?i)p{i) (13)
i.j
The third term on the right of equation (13) is negligible for biological
systems. To show this we must discuss first the method of calculating the
{dldX) pU). By definition (3) the following relation holds:
p{i)=lpii)p^ii)- (14)
i
Form the derivative with respect to A and substitute equation (4) :
{dldX) p{i) = llpij) {dldX) p,{i) + pAi) (dIdX) p(j)] ( 1 5)
j
{djdX) pii) = - 2 y,, /.,(/•) pij) + 2 q. pij) + 2 pM) i^m p(j) ( 1 6)
j i J
The equations (16) are a set of differential equations for the p{i). They may
be rearranged in the usual form:
{dldX) pii) - 2 PjO) idldX) pij) = - 2 Jji PiU) pij) + 1 c,i pij) i 1 7)
j j J
We are interested in the conditions when the id/dX) pii) vanish. The condition
is of course that the terms on the right of the equations (17) are all equal and
that the detenninant of the coefficients of the idjdX) pii) be different from zero.
Among the circumstances in which this will occur are those where all p^ii) =
q and all Pi{k) = p ii j^ k). That is, all letters are equally probable and one kind
of error is as likely as the other. In my paper in Part V the behavior of dH/dX
under the much stronger conditions that the J^j and c^j vanish at A = will be
needed. Then, of course, providing that the determinant of the coefficients of
the idldX) pii) be different from zero, all id I dX) pii) = 0. It may therefore be
expected that except under most exceptional and special conditions the idfdX) pii)
will be very small or will vanish.
It can be further shown that for a nearly perfect system the coefficients of
the idldX)pii) in equation (13) are small compared to one. Dancoff and
58 Hubert P. Yockey
QuASTLER (5) have estimated the error rate per cell per generation to be some
10-1 tQ jo-2 times the spontaneous mutation rate per cell generation (10^* to
10~i^). Taking this to mean that
Piii) = q^{\-p) and p^ij) = p ^ \Q~^ (i ^j)
we see that
/'^(01og2/7,(0 = +log2(l -p) ^-p = -10-6
p,(j) log^Piij) = -6 X 10-« log2 10 ^ -10-5 (18)
Because of the discussion given above this term in equation (13) may be neglected.
Equation (13) gives the value of {dHjdX) at the values of /?/;) corresponding
to Hg^. Let these values be plij).
dH
= J{X)[H, -H, + 12 P{i) log2 p'm ( ' 9)
dH
The coefficient of /(A) will be a constant so that —
dl
will behave as a function
Ha
of X like J{X). This result will be needed in my article in Part V.
REFERENCES
1 . J. D. Watson and F. H. C. Crick : Genetical implications of the structure of deoxyribose
nucleic acid. Nature, Lo/tcl. Ill, 964-967 (1953).
/. D. Watson and F. H. C. Crick: The structure of DNA. Cold Spr. Harb. Symp. Quant.
Biol. 18, 123-131 (1953).
J. D. Watson and F. H. C. Crick: Molecular structure of nucleic acids. A structure for
deoxyribose nucleic acid. Nature, Lond. 171, 737-738 (1953).
2. G. Gamow: Possible mathematical relation between deoxyribonucleic acids and proteins.
Biol. Mcdd., Kbh. 22, (3), 1-13 (1954).
3. C. Shannon and W. Weaver: The Mathematical Theory of Communication, University
of Illinois Press, Urbana (1949).
4. W. J. Burdette: Incidence of tumors in isogenic strains. /. Nat. Cancer Inst. 12,
709-714(1952).
5. S. M. Dancoff and H. Quastler: The information content and error rate of living
things. In: Information Theory in Biology, ed. by Henry Quastler, 263-373, University
of Illinois Press, Urbana (1953).
6. H. P. Yockey: An application of information theory to the physics of tissue damage.
Radiat. Res. 5, 146-155 (1956).
7. M. R. Zelle and A. Hollaender: Effects of radiation on bacteria. In: Radiation Biology,
ed. by A. Hollaender, Vol. 2, chap. 10, McGraw-Hill, New York (1955).
8. O. Rahn: The order of death of organisms larger than bacteria. J. Gen. Physiol. 14,
315-337 (1930).
9. R. Pearl: On the distribution of differences in vitality among individuals. Amer.
Nat. 61, 113-131 (1927).
10. R. G. Schott: The inheritance of resistance to Salmonella aertrycke in various strains
of mice. Thesis, Iowa State College Library, 1-59 (1931).
11. R. G. Schott: The inheritance of resistance to Salmonella aertrycke in various strains
of mice. Genetics 17, 203-229 (1932).
12. H. O. Hetzer: The genetic basis for resistance and susceptibility to Salmonella aertrycke
in mice. Genetics 22, 264-283 (1937).
Some Introductory Ideas Concerning the Application of Information Theory in Biology 59
13. W. V. Lambert: Genetic investigations of resistance and susceptibility to disease in
laboratory animals. Rep. Agric. Res., Iowa Agric. Exp. Sla., 89-90 (1931); 91-92 (1932);
115(1933); 142-143(1934); 158-159(1935); 147-148(1936).
14. J. W. Gowen: Genetic investigations of resistance and susceptibility to disease in
laboratory animals. Rep. Agric. Res., Iowa St. Coll. Agric. Exp. Sta., 158-159 (1937);
151-153 (1938); 156-160 (1939); 192-194 (1940); 171-172 (1941); 189-190 (1942);
178-182 (1943); 204-210 (1944); 278-283 (1945); 257-260 (1946); 230-232 (1947).
15. J. W. Gowen: Significance and utilization of animal individuality in disease research.
/. Nat. Cancer Inst. 15, 555-570 (1954).
16. A. HoLLAENDER (cd.): Radiation Biology, McGraw-Hill, New York (1955).
17. D. E. Lea: Actions of Radiations on Living Cells, Cambridge University Press,
Cambridge, England (1955).
18. W. L. Russell: Genetic effects in mammals. In: Radiation Biology, ed. by A. Hollaender,
Chap. 12, McGraw-Hill, New York (1955).
19. M. Ycas: The protein text. This volume.
PART II
STORAGE AND TRANSFER OF INFORMATION
A CENTRAL issue in modern biology, which touches in some degree all branches
of that science, is the problem of species specificity and its relation to protein-
specificity and synthesis. The subject can be approached from many points of
view but the one adopted by the authors of the papers in Part II is to seek the
solution in terms of the properties of a communication system
The justification for considering, from this point of view, a phenomenon
which looks, at first sight, to be purely biochemical lies in the recent discovery
that protein specificity is expressed as an exact order of amino acid residues.
If this is even substantially the case then it is germane to discuss such problems
in these terms. In fact, a number of current papers on protein synthesis and
specificity have recourse, at one point or another, to the language of information
theory. Since the specificity of proteins is thought to be coded in the exact order
of pairs of nucleotide bases in DNA, the relationship of DNA, RNA, and proteins
can be considered from aspects which are mathematical rather than purely bio-
chemical.
Gamow was the first to notice these mathematical aspects. He and Ycas
pursue in this part some of the issues which they reveal. The influence one hopes
these considerations will have on the experimentalist is clear. Additional data
on the amino-acid residue sequences and other structural data for a large number
of proteins can be put to immediate practical use in solving for the protein code,
and therefore in understanding more about protein synthesis. Unfortunately,
mainly due to the lack of sufficient protein text, few definite answers can be given.
But it is possible to eliminate some past errors and to phrase the question in a
sharper fashion than before.
The notion that an abstract quantity such as information is stored in the
genetic material and is transferred to proteins during their synthesis raises
immediate questions as to how this is done, how much is transferred, and how
this quantity is aff'ected by changing experimental conditions. These questions
are attacked from diff"erent analytical and experimental points of view by the
papers by Augenstine, by Mahler, Walter, Bulbenko and Allmann, and by
Koch and by Glinos.
The information theoretic properties of communication systems of particular
concern to the papers in this part are the coding problem, the representation
theorem, and redundance. Each paper deals with issues of its own but in terms
of these ideas to a greater or lesser degree. It is in this way, among others, that
information theory may grow to be as useful to the biologist as thermodynamics
is to the chemist, whether his subject is clearly one in communication as is that
of Frishkopf and Rosenblith or somewhat less clearly that of protein specificity.
H. P. Y.
61
THE CRYPTOGRAPHIC APPROACH TO THE
PROBLEM OF PROTEIN SYNTHESIS
George Gamow and Martynas Ycas
University of Colorado and University of New York
Abstract — The Watson and Crick suggestion concerning the role of DNA in replication,
mutation, and protein synthesis requires a coding between the four-letter DNA alphabet and
the twenty-letter protein alphabet. An attempt has been made to discover this code by crypto-
graphic methods. Various schemes have been worked out but no success obtained at this
writing. There is hope that as the number of protein sequences increases this problem will
be solved.
Speaking about information storage and transfer in a living cell, one always likes
to compare the cell with a large factory. The cell nucleus is the manager's office,
directing the work of the factory, and the chromosomes are the file cabinets in
which all blue prints and production plans are stored. The cytoplasm is the
plant itself with the workers and machinery carrying out the actual production ;
those are, of course, the enzymes catalyzing various biochemical reactions. If
something goes wrong with the information stored in the chromosome, the
corresponding enzyme will also do a wrong thing. Consider, for example, an
enzyme which produces the pigment necessary for color vision. If the particular
section of chromosome carrying the directions for producing that pigment is
defective, the enzyme will not get the correct instructions, and will not produce
the right type of pigment. As a result, the individual will be color blind.
The materials of chromosomes and of enzymes are chemically different,
except that in both cases we deal with long molecular chains formed by the
repetition of a comparatively small number of different units. DNA (deoxyribo-
nucleic acid), forming the chromosomes, is a sequence o^ four different units
or 'bases': namely, adenine, thymine, guanine, and cytosine. For sake of
picturesque presentation, we may associate them with four suits of cards:
spades, clubs, diamonds and hearts. Each DNA molecule is equivalent to a
sequence of cards many thousand units long, and the way in which different
suits follow each other contains, in code form, the instructions to the original
cell (fertilized ovum) and its descendants to develop into a rosebush, a skunk,
or a man.
The first question is this. How is information which is carried by DNA
molecules of the chromosomes duplicated when the cell goes through the process
of division? An answer can be given on the basis of the model of DNA proposed
about three years ago by J. Watson and F. Crick (1). They started with the
fact, first noticed by E. Chargaff (2), that the number of adenines in any
given DNA molecule is always equal to the number of thymines, while the
number of guanines is always equal to the number of cytosines (3). In the
playing card analogy there are as many spades as there are clubs, and as many
diamonds as hearts. This suggests that we deal here with a double-stranded
63
64
George Gamow and Martynas YCas
^\.
/
/ /
\
^\
-U 1
\
I
/ \ ^o_:
u -J.
/
-u
c
*c
<
♦
o
X-U-X -^
u=u.
I
I
K
r/
~o
0=u
-u — X
\-/ \
/^" \
-u — ^
■.
-U X
./
-o
\ / \
^-\ /■
O"^ /
X — u
o
/
X
I /
/r
-u
./
-u-
-U — X
° I
I — u
\
4>
c
'c
o
3
c
m
o
■«->
U
The Cryptographic Approach to the Problem of Protein Synthesis 65
sequence in which red and the black cards are paired together. A heart is
always paired with a diainond (and vice versa), while a spade is always paired
with a club (and vice versa). The fact that DNA molecules also contain one
sugar (ribose) and one phosphate for each 'base' suggests a molecular model
similar to a rope ladder. The vertical ropes on both sides are formed by 'sugar-
phosphate- sugar-phosphate-' sequences, while the paired bases form rigid
horizontal steps attached to sugars on both sides. The reason why the above-
mentioned pairing of bases takes place is two-fold. Cytosine and thymine
(hearts and clubs) are 'pyrimidines', being formed by a single C — N — ring with
different atomic groups attached to them. Adenine and guanine (spades and
diamonds) are 'purines', and contain in their structure two connected rings, one
with six atoms, and the other with five.
The chain shown in Fig. 1 is a sequence of sugars and phosphates. To
each sugar is attached a 'base', and in tliis section of the molecule you see four
different bases. Two of them (hearts and clubs) are short, and two others
(spades and diamonds) are long. Now, in order to run the second strand beside
it in the parallel way, we should attach short bases to long ones, and long
bases to short ones. Of course, in the playing card analogy again, one could
also join a heart to a spade and a club to a diamond. But this is excluded because
in these cases hydrogen atoms will be in the wrong places to form proper
hydrogen bonds between these two bases.
The evidence supplied by an x-ray diffraction pattern indicates in addition
that the DNA molecule has a helical shape, being twisted around its central
axis by 36° each step. Thus, it makes a complete turn each 10 steps.
The Watson and Crick (4) theory of dupHcation of DNA molecules proceeds
as follows. When the cell is ready to divide, there appears a large number of
free nucleotides in the nucleoplasm surrounding the chromosomes. A nucleotide
is defined as one of the four bases with a sugar and a phosphate attached to it.
At that time the double stranded DNA molecule splits into two single strands
along its main axis, and each strand is regenerated by catching the corresponding
free nucleotides from the surrounding medium. Thus, each heart separated by
splitting from its diamond gets another diamond from the solution, and each
diamond gets another heart. As the results, we get two new double stranded
DNA molecules, each identical with the original one. Once in a while a mistake
may be made in this duplication process, and we call it a mutation. So much
for the structure and functioning of DNA molecules.
Now we come to the problem of information transfer from the chromosomes
to the enzymes. How does the sequence of bases (card suits) in DNA determine
the structure of the enzyme? Enzymes are proteins, and are formed by long
sequences of twenty different chemical groups known as amino acids. It is well
known that there are as many as twenty-four or twenty-five amino acids, but,
as Dr Yeas tells us in more detail in the next paper, one can show that the
extra ones in the original protein synthesis are modifications of the original
twenty which take place after the protein molecule is synthesized. Thus, for
example, 'proline' is an original amino acid used in protein synthesis, whereas
'hydroxyproline' is its postsynthetic modification. Since we symbolized four
bases of nucleic acid molecule by four playing card suits, it is reasonable to
symbolize the twenty basic amino acids, which have complicated chemical
66 George Gamow and Martynas Ycas
names, by twenty letters of a (reduced) English alphabet. Thus, one protein
molecule may look like:
. . .arreducesugarreducesug. . .
and another like:
. . . akeacoloruisionpigmentma . . .
Just to give an example of how the sequence of amino acids in protein
molecules may affect their biochemical activity, we will give the example of two
closely related hormones: oxytocine and vasopressin. Both are formed by a
sequence of only nine amino acids:
Oxytocine — Cys-Tyr-Z/ew-GIun-Aspn-Cys-Pro-Lew-Gly
Vasopressin — Cys-Tyr-PAe-Glun-Aspn-Cys-Pro-vlr^-Gly
The two sequences are identical except for the substitutions in the third and
eighth place. However, their functions are rather different. Oxytocine has the
property of causing the contraction of the uterus in the process of childbirth.
If you inject it into the blood of a cow, even if the cow is not pregnant it will
go through all motions it would go through if a calf were to be born. Vaso-
pressin, on the other hand, has rather different properties: it contracts the
blood vessels and causes increased blood pressure. Thus, simply by changing
two amino acids out of nine, the action of the hormone is completely changed.
Whereas replacement of some amino acids in a protein may completely
change its biological function, there also exist replacements which distinguish
the same protein taken from different species of animals. Thus, for example,
insulin A, which is formed by a sequence of amino acids with twenty-one
members, differs for cattle and swine in the eighth and tenth place. Human
insulin, which has not yet been analyzed, possibly differs slightly from that
extracted from cattle and swine. Nevertheless, the latter are successfully used
on human patients.
Since there must exist a definite relation between the sequence of bases in
nucleic acid and the sequence of amino acids in proteins, we can ask ourselves
what this relation is. Here we have to return to our analogy of a factory. The
v/orkers from the factory do not walk into the manager's of^ce to find out what
to do, and the manager also does not go to the plant to instruct workers per-
sonally. There are people, called foremen, who get the information from the
manager's ofiice and tell the workers. In the cell the role of foreman is carried
out by RNA molecules (ribonucleic acid) which are, presumably, very similar to
the molecules of DNA. They are different only in that one oxygen atom is
missing in each sugar of DNA, and there is a slight change in one of the four
bases, which in RNA is called urosil instead of thymine. RNA is presumably
synthesized by DNA inside the nucleus and receives the set of instructions
carried by DNA. Then it passes out into the cytoplasm, and is incorporated into
the so-called microsomes, i.e. foremen's offices, where the synthesis of proteins
takes place.
We do not yet have a model of the RNA molecule. It seems, however,
that in this case the pairing rules of adenine to thymine (urosil), and guanine
to cytosine do not hold, which suggests that RNA molecules are single-stranded.
The Cryptographic Approach to the Problem of Protein Synthesis
67
Since RNA serves as an intermediary between DNA and proteins, we have here
two problems. First, how is RNA formed by DNA ? Second, how are proteins
synthesized by RNA? The first problem may turn out not to be very difficult
because of the close similarity between the two molecules. For example,
RNA may be a non-regenerated half of DNA with small changes in sugars
and in one of the bases. It may be that the absence of the oxygen atom in RNA's
sugar is responsible for the failure to form a double-stranded configuration.
However, we still do not know the answer to this question.
The second problem concerning the synthesis of proteins by RNA mole-
cules presents more challenge to the imagination. How can a sequence formed
by four different units (four bases) be translated in a unique way into a sequence
formed by twenty units (twenty amino acids)? Here is a possibility which
seems to us to be very likely. Suppose one plays a game of poker in which
only three cards are dealt, and pays attention only to the suit of the card. How
many different hands will one have? Well, one can have a 'flush', i.e. three
cards of the same suit. There are four different flushes: three hearts, three
spades, etc. Then one can have a 'pair', i.e. two cards of the same kind, and
one different. How many of those are there? One has four choices for the
suit of the pair, and three choices for the third card. Thus, there are altogether
twelve possibilities. The poorest hand will be a 'bust', i.e. three different suits.
There are four different busts: no hearts, no diamonds, etc. We have altogether
twenty different possibilities. This 'magic number' 20 is just the number of
amino acids participating in the primary process of protein synthesis. We
may imagine that each amino acid in the synthesized protein is determined
by a triplet of bases in the RNA template.
Since the distances between neighboring amino acids in the extended
polypeptide chain are equal to the distances of neighboring bases in the poly-
nucleotide chain (both being equal to 37 A), it was at first natural to suppose
that the correlation between the two chains looks in a way shown in Fig. 2,
RNA-Template
where individual bases are shown by circles and the amino acids by triangles.
This represents the so-called over-lapping code in which the neighboring amino
acids have in common two bases in the RNA template. If the transfer of
information from nucleic acid to protein is carried out according to such an
overlapping code, there must exist a definite inter-symbol correlation between
the amino acids constituting protein molecules. Thus, for example, if a certain
amino acid is determined by two adenines and some other base, its neighbors
will be preferably amino acids which also contain adenine in their template
transcript. In order to see whether or not such a correlation between the
68
George Gamow and Martynas Ycas
neighbors really exists in the known protein sequences, it is necessary to test
all possible assignments between the twenty amino acids and the twenty possible
base triplets. The number of all possible assignments of that type is 20! =
3.10^^, Since 3.10^^ represents the age of our universe (5 bilhon years) expressed
in seconds, the straightforward test of that kind would require quite a consider-
able time even if we could test one assignment each second ! However, as it
often happens in cryptographic problems, one can sometimes find parts of the
message which reduce quite considerably the amount of necessary work. Thus
the code messages sent by German spies during the war were likely to contain
the combinations of letters corresponding to various possible ports of embark-
ation of American expeditionary troops. The same happens in protein sequence.
For example, the adrenocorticotropin molecule contains the sequence:
— Lys — Lys — Arg — Arg— Pro — Val — Lys — Val —
In this sequence there are two identical amino acids in succession followed
by another pair of identical ones. In the English language there are not many
words having such a property. (Tennessee is one of the rare examples!) Then
lys repeats again three steps later, and has identical neighbors (val) on both
sides. These facts simplify the problem to such an extent that, instead of
spending five billion years, it was possible to find a single assignment between
the amino acids in the above sequence, and the base triplets in the course of an
afternoon. At first it was thought the problem had been solved, but, when one
tried to extend these assignments to the other parts of the ACTH molecule
and to the other known protein sequences, one was led to direct contra-
dictions. In the course of subsequent decoding work, other examples leading
to similar contradictions were found, and it became clear that the thing just
will not work. In fact, as Dr Yeas discusses in the following article, it seems
that there is no correlation between the neighboring amino acids whatsoever.
This negative result can only mean that the original hypothesis represented
in Fig. 2 was incorrect, and that in the process of protein synthesis the nucleic
acid molecule is not present in its extended form. If, as seems to be true, we
deal here with a "non-overlapping code" in which each amino acid is determined
by an individual base triplet of its own (Fig. 3), we are forced to assume that
RNA-Template
Fig. 3.
the RNA molecule is shrunk by a factor of three. We can imagine, for example,
that during the process of protein synthesis the RNA molecule has the shape
of a spiral as shown in Fig. 4.
Closely connected with the problem of a non-overlapping code is the problem
The Cryptographic Approach to the Problem of Protein Synthesis
69
of "punctuation". Indeed, a sequence of bases can be broken into a set of
non-oveiiapping triplets in three different ways depending upon the base with
which we start. The three dilTerent readings of tiie same template can be des-
cribed mathematically as 3n, 3n/l, and 3n/2 (3n/3 being the same as 3n).
A|
3.6 A
As was suggested by Dr Barbara Law, three possible readings of the same
RNA template may explain an interesting regularity first noticed by Dr
Martynas Yeas. He observed about two years ago that, in a case of seven
proteins for which the sequences of amino acids were known, the total number
of amino acids in the protein molecule was a multiple of three : nine amino
acids in oxytocine and vasopressin, twenty-one in insulin A, thirty in insulin B,
thirty-nine in ACTH, 126 in ribonuclease, etc. This could be explained if one
assumes that each RNA template synthesizes the proteins in all three possible
vv'ays, and that these three different readings are afterwards united in one
linear sequence. If this were true, there must exist a cryptographic correlation
between the first, second, and third "thirds" of each protein molecule. One
thinks of how such a correlation could be checked, but it seems to be very
difficult indeed. Recently, though, the existence of such a correlation became
rather doubtful, since two protein sequences published recently contain 29 and
124 amino acids.
In summing up, we should say that the problem of finding the nature of the
correlation between polynucleotide chains of nucleic acids, and the polypeptide
chains of the proteins is still unsolved, although various methods for establishing
such a correlation have been worked out. We may hope, however, that with
the increased number of known protein sequences, this problem will be solved
in one way or another.
REFERENCES
1. J. D. Watson and F. H. C. Crick: Molecular structure of nucleic acids. Nature, Lond.
171, 737-738 (1953).
2. E. Chargaff: for reference see S. Zamenhof, G. Brawerman, and E. Chargaff: On the
deoxypentose nucleic acids from several micro-organisms. Biochim. Biophys. Acta. 9,
402-405 (1952).
3. G. R. Wyatt: Nucleic acids of some insect viruses. /. C^'//. P/(V.v/o/. 36, 201-205 (1952).
4. J. D. Watson and F. H. C. Crick: Genetical implications of the structure of deoxyri-
bose nucleic acid. Nature, Lond. 171, 964-967 (1953).
6
THE PROTEIN TEXT
Martynas Ycas
Department of Microbiology, State University of New York
Upstate Medical Center, Syracuse, New York
And strange to tell, among that Earthen Lot
Some could articulate, while others not:
And suddenly one more impatient cried —
'Who is the Potter, pray, and who the Pot ?'
The Book of Pots
Abstract — The sequence of residues in proteins, regarded as a text written in a twenty symbol
alphabet, is examined. The following tentative conclusions are drawn:
1. Twenty amino acids are distinguished by the protein-forming mechanism. Super-
numerary amino acids arise from the regular twenty by secondary modification of protein-
bound residues.
2. Each residue in the protein has a separate genetic representation.
3. There is no intersymbol correlation between adjacent residues.
4. Natural selection is not the only factor determining the frequency of occurrence of the
various kinds of residues. It is suggested that the method of encoding protein sequence
information in nucleic acid imposes differences in frequency of occurrence on the different
kinds of residues.
5. Peptide chains are not multiples of some fixed number of residues.
The encoding and transfer of genetic (DNA) information to RNA and protein is discussed,
as well as the problem of the independent reproduction of RNA viruses. While the data set
certain limits on the possible ways of encoding and transferring information, they are not
sufficient for a unique solution of these problems.
Ribonucleic acid of Tobacco Mosaic Virus (TMV) has been shown to deter-
mine the sequence of amino acid residues in the protein of the virus (1, 2, 3).
It seems logical therefore to believe that the sequence of other proteins is also
determined by RNA.*
Since RNA is essentially a linear sequence of four kinds of nucleotides,
while proteins are linear sequences of about twenty kinds of amino acid residues,
the RNA molecule can be regarded as a text, written in a four-symbol alphabet,
which encodes another text, the protein, written with about twenty symbols.
* The following abbreviations will be employed. RNA — ribonucleic acid; DNA — deoxy-
ribonucleic acid; Ad — adenylic acid; Gu — guanylic acid; Cy — cytidylic acid; Ur — uridylic
acid; ala — alanine; arg — arginine; asp — aspartic acid ; aspn — asparagine; asx — asparticacid
or asparagine; cys — cysteine; glu — glutamic acid; glun — glutamine; glx — glutamic acid or
glutamine; gly — glycine; his — histidine; ileu — isoleucine; leu — leucine; lys — lysine; met —
methionine; phe — phenylalanine; pro — proline; ser — serine; thr — threonine; try — trypto-
phan; tyr — tyrosine; val — valine; Hlys — hydroxylysine ; Hpro — hydroxyproline; serP —
phosphoserine. Peptides are written with the amino group to the left, the symbols being
connected by a dash ( — ). The sign (*) signifies a terminal residue. Sequences considered
uncertain are in parentheses ( ). Symbols in parentheses, with commas between (ala, gly)
mean that the sequence is not known.
70
The Protein Text 71
Several attempts, none completely convincing, have been made to determine
the coding system employed (4, 5, 6, 7). Cryptography must be based on a
study of texts, and 1 shall therefore attempt an examination of protein molecules
from this point of view. The following aspects of protein structure will be
examined :
1. The number of kinds of amino acids which occur in proteins.
2. The effect of mutations on amino acid sequence.
3. Whether intersymbol correlations exist between adjacent residues.
4. The frequency of occurrence of the various amino acid residues.
5. Whether any restrictions exist on the length of peptide chains.
After considering the empirical evidence, I shall indicate its bearing on the
problem of encoding protein sequence information into the RNA molecule.
I. THE NUMBER OF AMINO ACIDS OCCURRING IN PROTEINS
In previous studies (6, 7) it has been assumed that proteins are composed of
exactly twenty different kinds of residues. Since in fact more than twenty
kinds of residues occur in proteins, the assumption requires some justification.
All organisms, from viruses to mammals, use the same building blocks
for their proteins. With minor qualifications this is also true of the nucleic
acids, but not true of the third major class of biologically-occurring high
polymers, the polysaccharides. The amino acids which invariably occur in
all organisms and virtually all proteins are the following: ala, arg, asp, aspn,
cys, glu, glun, gly, his, ileu, leu, lys, met, phe, pro, ser, thr, try, tyr, val. The
number in this list is exactly twenty.
It will be noted that I omit cystine from this list. Because of its structure,
cystine corresponds to two residues. The structure of insulin (8) shows that
one cystinyl residue can occupy non-adjacent positions in a peptide chain
or even participate in two different chains. Cystine is best regarded as an
oxidation product of cysteine, formed after incorporation of the cysteinyl
residue into the peptide chain. This view is supported by the recent discovery
of an enzyme which reversibly catalyzes the reaction
2 cysteinyl :^ cystinyl
when these residues are protein bound (9). Another example of such a reaction
may be the cyclic oxidation and reduction of protein SH groups during the
various stages of cell division (10).
In addition to the above twenty, other alpha amino acids occur in nature.
Some of these, such as homocysteine, citruline and ornithine are well known
biochemical intermediates but do not occur in proteins. It is clear that the
number of amino acids which occur in proteins is limited by an inability to
incorporate, rather than make, amino acids. Hydroxyglutamic acid and
norleucine, previously believed to be protein constituents, have been shown
not to exist as natural products (11). Alpha amino-adipic acid has been isolated
from an impure protein hydrolyzate, but it has not been demonstrated that
it is a protein constituent in the same way as other amino acids (12). Diamino
pimelic acid, commonly occurring in bacteria, appears to be associated with
the polysaccharide material of the cell wall (13, 14).
Nevertheless, there are amino acids, other than the twenty enumerated,
72 Martynas Ycas
which certainly occur in proteins. These include hydroxylysine and hydro-
xyprohne (in collagen), phosphoserine (in a number of different proteins (15)),
thyroxine (in thyroglobulin) and tyrosine — O — sulphate (in fibrinogen) (16).
The distribution of these amino acids is different from the regular twenty.
Whereas the twenty amino acids occur in virtually all proteins, the super-
numerary ones have an erratic distribution, being confined to one or to a few.
The suggestion was first made by Crick, that the supernumerary amino acids
are the result of modifications of some of the regularly occurring amino acids
after these have been incorporated into a peptide chain. The biochemical
evidence for this is as follows.
When one of the twenty regularly occurring amino acids is presented labeled
to an organism, it is rapidly incorporated into protein and most of the label
is found in the corresponding residue. It should be noted that glutamine and
glutamic acid are separately incorporated and do not arise one from another
by addition or subtraction of amide groups after incorporation (17). (A
similar demonstration for the analogous case of asparagine and aspartic acid
is still lacking.) Clearly, therefore, these amino acids are the precursors of
the corresponding protein-bound residues.
The supernumerary amino acids behave differently. Thus lysine is the
precursor of hydroxylysine (18), but C^* or tritium-labeled hydroxylysine
is not incorporated into collagen (19). Similarly, proline is the precursor of
hydroxyprohne, but proline is a much better precursor of the hydroxyprolyl
of collagen than is hydroxyprohne itself (20, 21). These amino acids, then,
are not incorporated as such, but presumably are formed by oxidation of
protein-bound proline and lysine. Phosphoserine likewise is formed by phos-
phorylation of protein-bound serine (22). Thyroxine is apparently formed from
the tyrosine residues of thyroglobuhn (23). There is no information at present
on the metabolism of tyrosine — O — sulfate.
Since not all appropriate residues are secondarily modified, this inter-
pretation imphes that the enzymes catalyzing such conversions show specificity
for sequence in the protein. At least one enzyme is known which shows such
specificity. Prostatic phosphatase dephosphorylates phosphoserine in the
sequence asx-serP-glx-ileu-ala, but not in glx-serP-ala (24). It is therefore
suggestive of some enzyme specificity that hydroxyprohne in collagen occurs
mainly, if not exclusively, before glycine (25) (Table IV). Other amino acids,
as shown later, shovv' no such neighbor preferences. The region determining
whether proline is to be oxidized or not probably includes more than three
residues, as indicated by the isolation from collagen of the tripeptides ala-
pro-gly; ala-Hpro-gly and ser-pro-gly; ser-Hpro-gly (Table IV).
The biochemical evidence thus appears to indicate that the protein-forming
mechanism selects exactly twenty different kinds of amino acids, and that the
supernumerary ones arise by secondary modification of protein-bound residues.
A possible cause for error in this conclusion should be noted. It is virtually
certain that amino acids are not incorporated as such, but in the form of some
sort of activated derivative. If the same amino acid were to form more than
one derivative, the number of items to be selected would of course exceed
twenty. There is no evidence for this at present, and only further advances
in biochemistry can decide whether this is the case.
Tlie Protein Text 73
II. GENETIC EFFECTS ON PROTEINS
There is an increasing body of evidence indicating that tiie details of protein
structure are genetically determined. A study of the effect of mutations on
proteins should therefore tell us something both about the nature of mutations
and the protein forming mechanism. Known cases of genetic effects on proteins
are listed below.
1. In man hemoglobin occurs in several electrophoretically distinguishable
forms, the presence of each being apparently controlled by alleles of a single
gene (26). Hemoglobin C differs significantly in amino acid composition
from hemoglobin A (27). Hemoglobin A and S have been degraded in a
controlled fashion with trypsin and the resulting peptides separated. The
difference between these hemoglobins is apparently confined to a short section
of the molecule (28).
2. Two electrophoretically different hemoglobins occur in sheep. Their
presence is determined by alleles of a single gene (29).
3. Two forms of lactoglobulin occur in cow's milk, and like the hemo-
globins are determined by different alleles of one gene. Crystallographic
investigations indicate unit cells of the same size, but there are very slight
differences in the diffraction pattern, which the investigators attribute, possibly,
to the substitution of a few amino acid residues by others (30).
4. Mutants of Neurospora and Escherichia co/i produce abnormally heat-
labile forms of tyrosinase (31) and a panthothenic acid synthesizing enzyme (32),
respectively. It is clear that a change in the proteins has occurred, but unfor-
tunately there is no further information on its physico-chemical nature.
The genetic evidence indicates that there is no interaction between alleles
controlling the synthesis of different variants of one protein. If both alleles
are present, both types of protein are formed. A possible exception should
be noted. The N-terminal groups of wheat gliadin are reported to be phe,
of rye gliadin phe and glx, but unexpectedly the ghadin of wheat x rye hybrids
was found to have no amino or carboxyl terminal ends, indicating, possibly,
a cyclic protein (33). This case obviously needs further study*.
The evidence cited above shov/s that the properties of proteins are gene-
determined, but it does not indicate clearly what these properties are. More
detailed information is available on this point from a comparison of homo-
logous proteins of related species, if it is assumed, as is usually done, that
species differences are the result of gene mutations.
Available evidence on amino acid sequence of homologous proteins is
* There is considerable confusion as to the N-terminal residues of wheat gliadin. Fraenkel-
CoNRAT (51) misquotes Deich and Soreni (33) as stating that the N-terminal residues are
phenylalanine and histidine, apparently because of a misunderstanding in Chemical Abstracts
(138). KoROS, whose paper I was able to consult only in abstract (139), reports histidine as
N-terminal. Ramachandran and McConnell (140), working with wheat gliadin but failing to
specify the species, also find histidine. Deutsch (the same as Deich quoted above, the differ-
ence in spelling being due to transliteration from the Cyrillic) reports that gliadin from Triticiim
durum and Triticum milgare has N-terminal phenylalanine (141). This is misquoted as tyrosine,
and tyrosine and glutamic acid, respectively, by Ramachandran and McConnell (140).
The original paper of Deutsch (141) was also unavailable to me.
74
Martynas Ycas
collected in Table I. Mutations (as inferred from differences between homo-
logous proteins) do not produce a general scrambling of protein sequence,
but a replacement of one or more residues, leaving the rest of the sequence
unchanged. Since homologous proteins can differ by a one residue replacement,
it is clear that individual residues, rather than groups of residues, are represented
in the genetic material.
Table I. Sequences in Homologous Proteins from Different Species
Protein Species
Insulin (34)
. cys-thr-ser-ileu-cys .
. cys-ala-ser-val-cys . .
. cys-ala-gly-val-cys . .
. cys-thr-gly-ileu-cys .
. cys-thr-ser-ileu-cys .
Pig
Cattle
Sheep
Horse
Whale
Myoglobin (35)
*val . . .
*val . . .
*giy . . .
*gly . . .
Finback whale
Sperm whale
Horse
Seal {Phoca vitulina)
Protamine (36)
(Composition, not sequence)
glyasefaalaovaliileui
glyaseraalagvalaileuo
Salmo irhleus
Salnio trutta
Serum albumin (37, 38)
*asp-ala .... leu*
*asp-thr .... ala*
Man
Cattle
•
Cytochrome c (39)
. . . cys-ala-glun . . .
. . . cys-ser-glun . . .
Horse, Cattle, Pig, Salmon
Chicken
Vasopressin (40)
. . . pro-arg-gly-NHo*
. . . pro-lys-gly-NHa*
Cattle
Pig
Protein
The Protein Text
Species
75
Hemoglobin (41)
*val-leu .
*val-gly .
*val-glun
*val-leii .
*val-gly .
*val-asx .
*val-leu .
*mct-gly .
*val-Ieu .
*val-ser . ,
*val-asx .
*val-leu .
*val-gly .
*val-leu .
Horse, Pig
Dog
Cattle, Goat, Sheep
Guinea pig
Rabbit, Snake
Chicken
Gliadin (33)
*phe . . .
*phe . . .
Wheat
*phe . . .
*glx . . .
Rye
Fibrinogen (42)
*tyr . . .
*ala . . .
Man
*tyr . . .
*glx . . .
Cattle
ACTH (43, 44, 45)
. . . pro-ala-gly-glu . . .
. . . pro-gly-ala-glu . . .
Sheep
Pig
. . . glu-ala-ser-glu . . .
. . . glu-leu-ala-glu . . .
Sheep
Pig
Hypertensive p-ptide (46, 47)
. . . val . . . Cattle
. . . ileu . . . Horse
76 Martynas Ycas
Protein Species
Virus (48)
. . . thr-ser-gly-pro-ala-thr* TMV (M, YA strains)
. . . thr(thr,ala)pro-ala-thr* TMV (HR strains)
It is possible that a mutation may suppress an amino acid determining site
altogether. This is indicated by the tentative finding of Akabori (quoted in
(41)), that the 'B' chain offish insuHn has the sequence . . . pro-lys*, as compared
with the sequence . , . pro-lys-ala* in cattle.
In some cases (ACTH, TMV), two adjacent replacements differentiate
one homologous protein from another. It is not probable that this is due to
two independent but adjacent mutations, but rather that a single mutational
event has affected two residue-determining sites. Such a view is made plausible
by the work of Benzer (49). He has shown that mutations in bacteriophage
involve small sections of DNA, of molecular dimensions, but that these sections
can be of diflferent lengths. Presumably the length of the mutated section deter-
mines the number of residues changed in the protein. It is perhaps not too
sanguine to hope that eventually it may become possible to measure crossover
values in terms of distance in residues along a protein chain, and thus obtain
an estimate of the number of bases in DNA determining a single residue
selecting site. The present difficulties of such an approach are of course obvious
(50).
It would be of interest to determine if there are any restrictions on the
replacement process. Restrictions might be expected on the following grounds.
More than one nucleotide must determine an amino acid site. If the process
of mutation were predominantly to change some, but not all nucleotides
determining a site, then obviously not all sites would be interconvertible in
one step. A study of any such restrictions would be of great value, since their
nature would depend on the coding principle and could be used to infer the
latter.
Table II. Replacements Inferred from Table I and their
Frequency of Occurrence
Occurrence
3
2
2
2
2
2
Replacement
val
<->ileu
ala
<->thr
ala
■<—> ser
ala
<->gly
ala
<-> leu
ser
<-^gly
ala
<-^gIx
val
4^gly
val
<->met
phe
<-^glx
slur
i<-^ asx
arg
<->lys
The Protein Text
77
Known replacements in homologous proteins are collected in Table II.
In the small sample we have (nineteen replacements), half recur twice or more,
suggesting strongly that the process, as observed, is not a random one. Unfor-
tunately, the sample is not unbiased. Certain replacements arc lethal or semi-
lethal (hemoglobin S, for example), and are, without doubt, selected against.
What we actually observe has therefore passed through the sieve of selection.
The direct genetic approach to this problem is tedious, because of the difficulty
of determining the phenotype (the amino acid sequence), and rapid progress
is scarcely to be expected. A much larger body of data on homologous proteins
may, however, enable us to reach a decision on whether the replacement
process is intrinsically restricted or not.
An additional point emerges from a consideration of such protein mole-
cules as consist of more than one chain (Table III). It will be noted that there
Table III. Terminal Residues of Proteins having more
than one Peptide Chain
(The exact number of chains is not indicated.)
Protein
N-terminal
C-terminal
Reference
Cytochrome c
fhis
jhis
(51)
Growth hormone
[ala
phel
phej
(51)
Triosephosphate-
dehydrogenase
fval
Ival
met \
met
(51)
Collagen
giy
ala J
(51)
Gliadin (wheat)
phe
(33)
Glladin (rye)
fphe
Iglx
(33)
^ lactoglobulin
peu
|leu
ileu \
ileu
(51)
Fibrinogen (man)
ftyr
lala
(51)
Fibrinogen (cattle)
ftyr
glx
(51)
Hemoglobin (horse)
fval
jval
(41)
Hemoglobin (cattle)
fval
\met
(41)
78
Martynas Ycas
is a strong tendency for the terminal residues of such proteins to be identical.
This is certainly not due to the chains being identical in all cases, since the
hemoglobins, for example, do differ in the penultimate positions (Table I).
Rather it appears to indicate that multi-chain proteins arise by reduplication
of genetic material, so that the several chains start out by being identical,
but gradually diverge in the course of evolution in the same way as homo-
logous proteins of different species. This hypothesis, as applied to the hemo-
globins and insulin, has been previously discussed (6). Determinations of the
residue sequence along different chains of one protein may therefore throw
additional light on the replacement process.
Table I shows that the process by which replacements become established
is very slow. Elucidation of the sequence of homologous proteins may therefore
make it possible to determine phylogenetic relations between large groups
such as phyla, which cannot now be certainly determined from morphological
and embryological evidence.
III. CORRELATIONS BETWEEN ADJACENT RESIDUES
Are there any forbidden combinations of adjacent residues? An examination
of the sequence of residues in proteins (Table IV) could provide an answer
to this question.
<0 Q.Q. too 3>- CO
_iq: men >-_» _i_i —
<< < < oo oo X
3(oHuJOttQ:>-a:_i
ui>-iiJXD:uxir>4
ALA
ARG
ASP
ASPN
CYS
GLU
GLUN
GLY
H IS
ILEU
LEU
LYS
MET
PHE
PRO
SER
THR
TRY
TYR
VAL
••
••
•
•
•
••
• 9
••
••
••
•
•
•
••
••
•
•
•
••
•
•
••
••
••
••
•
••
#9
••
•
••
•o
••
••
••
•
••
•
••
•
•
—
••
•
••
••
•
•
••
••
•
•
•
••
••
••
••
••
•
•
••
•
••
••
••
••
••
•
•
•
•
«
•
•e
••
••
••
••
••
••
•
••
•
••
••
• •
••
•
•
••
•
••
•
•
9
•
••
••
••
••
••
••
•
•
• •
Fig. I. Dipeptlde sequences now known to occur in proteins, compiled from
Table IV. The N-terminal amino acids are plotted in the rows, the C-terminal
in the columns.
There are of course 400 possible pairs of the twenty amino acids. The
known protein sequences in Table IV have been broken down in the following
way. A sequence, say, of ala-arg-gly is broken down into the dipeptides ala-arg,
arg-gly, and the appropriate cells in Fig. 1 are then filled, the N-terminal
residues being represented by the rows, the C-terminal by the columns. Using
all the data available in Table IV, Fig. 1 shows that somewhat more than half
of all possible dipeptide combinations are known to occur. The question
The Protein Text 79
Table IV. List of Known Sequences in Proteins
Actin (52)
. . . his-ileu-phe*
Adrenocorticotropin (45)
*ser-tyr-ser-met-glu-his-phe-arg-try-gly-lys-pro-val-gly-lys-lys-arg-arg-pro-val-lys-val-tyr-pro-
ala-gly-glu-asp-asp-glu-ala-ser-glu-ala-phe-pro-leu-glu-phe*
Carboxypeptidase (53)
*aspn-ser; ser-thr
a Casein (54, 55)
serP-glx ; *lys-ieu-val-ala-glx-asx
Chymotrypsinogen (56, 57)
leu-ser-arg-ileu-val ; aspn-ser-gly-(glun-ala)
Clupein (58, 59)
*pro-ser-arg; ser-ala-arg-arg* ; arg-arg-arg-arg;
Collagen (60, 61)
ala-Hpio-gly; ala-pro-gly; glx-arg; glx-Hpro-gly; gly-asx-gly; gly-glx; gly-pro-ala;
gly-pro-glx; gly-pro-gly; gly-pro-Hpro ; ser-Hpro-gly ; ser-pro-gly; ala-gly-ala; gly-gly;
ser-gly; thr-gly; ala-asx; asx-asx; asx-glx; asx-gly; glx-ala; glx-glx; glx-gly;
glx-gly-gly; glx-met; glx-phe; ser-asx; val-glx; ala-arg; arg-gly-gly; arg-val-gly;
ser-arg; val-arg; ala-lys; asx-arg; lys-gly; pro-ser; pro-thr; ser-ala; thr-ala;
lys-pro-gly ; leu-ala ; ala-ala-gly ;
Cytochrome c (39)
. . . val-glun-lys-cys-ala-glun-cys-his-thr-val-glu
r Globulin (rabbit) (62)
*ala-leu-val-as\ . . .
Glucagon (63)
*his-ser-glun-gly-thr-phe-thr-ser-asp-tyr-ser-lys-tyr-leu-asp-ser-arg-arg-ala-glun-asp-phe-val-
glun-try-leu-mct-aspn-thr*
Hemoglobin (41)
*val-glun-leu; *val-leu; (horse). *val-gly; *met-gly; *val-ser; *val-glx; *val-asx;
(various species, see Table 1 .)
80 Martynas Ycas
Hypertensive peptide (46)
*asp-arg-val-tyr-val-his-pro-phe-his-leu*
Insulin (cattle) (8)
'A' chain: *gly-ileu-val-glu-glun-cys-cys-ala-ser-val-cys-ser-leu-tyr-glun-leu-glu-aspn-tyr-cys-
aspn*
'B' chain: *phe-val-aspn-glun-his-leu-cys-gly-ser-his-leu-val-glu-ala-leu-tyr-leu-val-cys-gly-
glu-arg-gly-phe-phe-tyr-thr-pro-lys-ala*
/5 Lactoglobulin (64)
his-ileu*
Lysozyme (65, 66, 67, 68)
thr-asx-val-glx-ala ; ileu-glx-leu-ala-leu; asx-glx-ala; leu-thr-ala; glx-asx-ileu ;
thr-glx-ala-gly ; ser-asx-gly-met-asx; asx-ala-met-lys-cys-arg; val-thr-pro-gly-ala ;
ser-asx-arg; lys-phe-glx-gly ; arg-cys-glx-ala ; ser-phe-asx-glx ; thr-asx-arg-arg ;
thr-gly-asx-val ; ser-val-cys-ala-lys-gly ; gly-cys-asx ; leu-gly-ala-val ; asx-ileu-pro-cys ;
arg-cys-lys-gly ; ser-val-asx-cys-ala ; asx-leu-cys-asx ; arg-asx-cys-ileu; ser-arg-leu;
ser-asx-cys-arg-Ieu ; arg-asx; arg-gly; asx-asx; gly-leu; ileu-arg; ileu-asx; ileu-val; leu-leu;
ser-ala; ser-leu; val-ala; *lys-val-phe-gly-arg; arg-his-lys; asx-gly-ala-asx-leu* ;
glx-ser-phe-asx ; ala-lys-phe-glx; asx-tyr-arg-gly ; arg-gly-tyr-ileu-leu ;
asx-ala-tyr-gly-ser-leu-asx; leu-pro; ala-ala-met ;
Melanophore expanding hormone (69, 70)
*asp-glu-gly-pro-tyr-lys-met-glu-his-phe-arg-try-gly-ser-pro-pro-lys-asp*
Myoglobin (71)
* gly-leu
Ovalbumin (72, 15, 73, 74, 64)
val-ser-pro* ; asx-serP-glx-ileu-ala ; glx-serP-ala; ala-gly-val-asx-ala-ala ; cys-ala; cys-val;
cys-gly; cys-phe; thr-cys; ser-cys; cys-glx; glx-cys; phe-cys; asx-cys; val-cys;
Oxytocin (75)
*cys-tyr-ileu-glun-aspn-cys-pro-Ieu-gly-NH2°'
Papain (76)
*ileu-pro-glu
Pepsin (77, 15)
*leu-gly-asx-asx-his-glx ; thr-serP-glx ;
Prolactin (78)
*thr-pro-val
The Protein Text 81
Ribonuclease (79)
*lys-glu-thr-ala-ala-ala-lys-phc-glun-aig ; lys-ser-arg-aspn-leu-lhr-lys-asp-aig ; lys-aspn ;
tyr-glun-ser-tyr ; tyr-Iys; lys-his; asp-ala-ser-val*
Salmine(80, 81)
*pro-arg-arg; arg-pro-val-arg-arg; pro-ileu-arg; val-gly; arg-val-ser-arg ; arg-ileu-arg;
arg-ala-ser-arg ; arg-gly-gly-arg; arg-ser-ser-arg ; val-gly;
Serum albumin (37)
*asp-ala (man); *asp-thr (cattle);
Silk fibroin (Bombyx) (82, 83, 84)
gly-ala-gly-ala-gly-[ser-gly-(ala-gly)„]8-ser-gly-ala-ala-gly-tyr
n usually 2, mean value always 2.
gly-val-gly; tyr-gly; phe-gly; gly-ser-pro-tyr-pro ; tyr-pro-ser-tyr
Tobacco mosaic virus (48)
thr-ser-gly-pro-ala-thr*
Tropomyosin (52)
ala-ileu-met-thr-ser-ileu"''
Trypsinogen (85)
*val-asp-asp-asp-asp-lys-ileu
Vasopressin (40)
*cys-tyr-phe-glun-aspn-cys-pro-arg-gly-NH2*
Wool (86)
ser-cys; gly-cys; thr-cys; ala-cys; leu-cys; cys-gly; cys-thr; cys-ala; cys-val; cys-leu;
cys-phe ;
remains whether any of the blank cells represent forbidden combinations,
or whether they are merely the result of accidents of sampling.
To answer this question statistically, the frequencies of occurrence of various
combinations have been plotted in Fig. 2. There are more blank cells here than
in Fig. 1, as a portion of the data has been discarded to avoid obvious sources
of bias. Thus the sequences of silk, collagen, wool and protamine have been
omitted, since these proteins have an obviously aberrant structure. Likewise,
sequences of less than three residues have not been used, since the ease of
82
Martynas Ycas
isolation of various dipeptides varies, making it possible that the frequencies
of some peptides have been systematically over- or underestimated.
Figure 2 can now be treated as a contingency table with 761 degrees of
<
<
O
a:
<
CL
tn
<
Z
Q-
<
>-
o
_i
o
Z
Z)
-I
o
>-
-I
O
X
13
U.'
LJ
-I
CO
>-
-I
1-
UJ
2
I
a.
O
(E
Ol
UJ
CO
1-
V
CC
>-
H
-1
<
>
ALA
4
1
2
3
1
3
3
2
1
3
1
1
1
26
ARG
1
3
2
2
4
1
2
1
1
1
18
ASP
1
2
4
2
1
1
1
1
13
ASPN
2
2
4
3
2
2
3
1
1
2
2
24
CYS
4
2
3
1
1
2
1
1
1
1
1
1
19
GLU
4
1
1
2
1
1
1
1
1
1
16
GLUN
4
1
1
1
1
2
1
3
1
2
1
18
GLY
3
1
1
1
2
1
1
2
1
1
2
2
1
1
1
21
H IS
1
3
1
1
1
1
8
ILEU
1
2
1
1
1
2
2
10
LEU
1
1
1
2
2
3
1
1
2
2
4
20
LYS
1
1
2
2
1
2
1
1
1
3
1
1
1
2
20
MET
2
1
1
1
5
PHE
1
1
4
1
1
1
1
1
1
2
14
PRO
2
1
1
1
1
2
2
1
1
1
3
16
SER
1
4
1
3
1
1
2
1
1
2
1
1
1
2
2
4
28
THR
2
2
1
1
1
1
3
4
1
16
TRY
1
1
2
TYR
1
1
2
1
2
2
1
1
1
2
1
1
16
VAL
1
1
2
3
3
4
1
1
1
1
1
1
2
22
b
c
S2
58
20
38
II
24
21
45
19
3B
13
27
22
40
27
48
6
14
11
21
23
43
16
36
6
II
15
29
16
32
20
48
II
27
2
4
15
31
24
46
330
Fig. 2. Frequencies of occurrence of dipeptide sequences in proteins, plotted as
in Fig. 1. The sequences of clupein, collagen, salmine, silk fibroin and wool have
not been used. Sequences of less than three residues, as well as those where the
acid and amide forms of glx and asx are not differentiated, were also not used. On
the basis of the study of Ohno (68), glx and asx in lysozyme are assigned to glun
and aspn, respectively. The seven-residue sequence common to ACTH and MEH
was counted only once, a — marginal totals of rows ; b — marginal totals of columns
c — marginal totals of rows and columns.
freedom, and the null hypothesis, that there is no correlation between adjacent
residues, tested. The deviation X from the expected distribution in Fig. 2 is
calculated as:
{a^_ -f- a,^{Oj, + ^.;) 1^
(1)
{a,^ -I a.i){aj. + a.j)
An
where n is the sum of the marginal totals (330), a^j the value of a cell in column /
and row j, Oj and a,, the marginal totals in column and row respectively of
the residue defining the column, <7._, and a_,. the analogous values for the residue
defining the row. For computational purposes (1) reduces to:
^="(2'K+«i. +«.)"')
From Fig. 2, A = 392. The value of /, which is calculated from
is 0.414, which is less than 1.645, the 5 per cent confidence limit.
(2)
(3)
The Protein Text
83
It may therefore be concluded that there is no evidence for any intersymbol
correlation between nearest neighbors. Inspection of sequences reveals like-
wise no obvious correlations of residues more than one removed from each
other, but to decide this question definitely will require more knowledge of
longer sequences than is now available.
Gamow, Rich and Ycas (6) have previously studied this question of
intersymbol correlation. They examined a grid diagram, similar to Fig. 2
but embodying fewer data, to see whether the frequencies of entries follow
the PoissoN distribution. This method is invalid, since it does not take into
account the fact that different amino acids occur with very different frequencies.
I am glad to avail myself of this opportunity to correct these authors.
IV. FREQUENCY OF OCCURRENCE OF DIFFERENT AMINO ACIDS
Amino acids occur with different frequencies in proteins. Some, like leucine,
are consistently abundant, others, like methionine, consistently rare. The
frequency of occurrence of the various amino acids in the bulk protein of a
whole organism, Escherichia coli, is shown in Fig. 3.
aiatt
, , E.COLI PROTEIN
10
+H
\ • AMINO ACID
L^„ + TRIPLET
j;
\ \
ucA S6f
_i 6
_
o
+i°"
2
2
1
1 1
10
RANK
20
Fig. 3. Composition of bulk protein of Escherichia coli (87), amino acids arran-
ged in order of abundance. The vakies for glu, glun and asp, aspn arbitrarily
taken as half of glx and asx, respectively. The value of cysteine taken from
Roberts and Cowie (88). 'Triplets' refers to the frequencies of triplets of
nucleotides, calculated according to the hypothesis of Gamow and Ycas (7) from
the composition of E. coli RNA (89).
Data on the composition of twenty-three proteins are summarized in Table V.
This table shows that the composition of individual proteins is not too different
from that of bulk protein. The most abundant amino acid usually has a
frequency of about 0.10 to 0.12, the least 0.005 to 0.01.
Table V suggests the possibility that the differences in composition of
various proteins may be merely the result of chance fluctuations from a mean,
and not importantly related to biological function. This notion may not be
as far-fetched as might appear at first sight. The most important function of
proteins is catalysis, and the enzymatically active site probably involves only
a few amino acids. In addition, proteins of a given organism appear to have
84 Martynas Ycas
an important mutually complementary relation to each other which enables
them to be retained by the cells. This is shown by experiments with injected
catalase. Homologous catalase injected into guinea pigs is absorbed by the
tissues, but heterologous catalase is rejected (108). Similarly, homologous
antibodies readily pass the fetal barriers in rabbits, heterologous pass much
less readily (109). This phenomenon is probably connected with the anti-
genicity of proteins. The antigenically active sites of proteins are probably
also small, and therefore the exact sequence and composition of the major
part of the protein m.ay be irrelevant to function. It might be expected, then,
that the exact structure of small parts of a protein molecule would be rigidly
determined, and any mutation affecting this portion would be eliminated by
selection. Mutations affecting the 'irrelevant' portions may not affect the
viabihty of the organism, and the same protein in different species may therefore
diverge by a process of 'evolutionary drift.' That this process is real is strongly
suggested by the facts known about cytochrome c. This enzyme serves the
same function and has the same prosthetic group in both yeast and mammalian
tissues, but the two cytochromes have very different elution volumes from
ion exchange resin columns (110), almost certainly indicating a large difference
in amino acid composition.
If for each kind of residue there is a characteristic rate of replacement by
mutation, the proteins should approach a definite equilibrium composition,
if selection is a minor factor. More definitely, each protein will constitute
a 'random grab' from a universe of amino acids, the frequencies of the amino
acids in this universe being determined by the equilibrium condition.
Qualitative considerations suggest that there is something other than selection
which tends to make a given amino acid occur with a certain frequency. Certain
amino acids, alanine, leucine, isoleucine and valine have aliphatic side chains
lacking any obvious reactive functional group. The data on replacements
(Table II) indicate, apparently, that one is as good as another, as far as their
function in a protein is concerned. Yet leucine is systematically more abundant
than isoleucine. These two amino acids are so similar that it is difficult to
separate them by paper chromatography. Each of the other aliphatic amino
acids has its own characteristic frequency, likewise.
Quantitatively, if a sample of « items is drawn at random from a population
where an item of type A occurs with frequency p, the distribution of A in a
large series of samples is given by the binomial (p + q)", where q = \ — p.
In particular, the variance cr^ of the distribution of A is given by
o- == npq (4)
If the hypothesis of a 'random grab' is correct, then in a collection of proteins
the variances of amino acids should be related to the mean value of their
frequencies and to the size of the proteins, expressed as the number of residues
per molecule.
An immediate difficulty is that the sizes of the proteins listed in Table V
are not known, and these certainly differ one from another. It should be
particularly noted that the relevant size is not necessarily that obtained from
physical measurements of diffusion, osmotic pressure and sedimentation. This
is because there is ample evidence that physical molecules can be the result of
The Protein Text
85
aggregation of smaller, chemically identical units. Furthermore, from the
evidence presented in Table III, the several peptide chains constituting some
proteins may not be identical, but are nevertheless quite similar. The statistically
relevant size of hemoglobin would then be somewhere between 600, the approxi-
mate number of residues in the whole molecule, and 150, the average size of
the four subunits.
Disregarding this difficulty, 1 have plotted the variance of each amino acid,
calculated from Table V, against pq (Fig. 4). All points (except glx) fall within
O.IO
0.05
• GLX
• PRO
gly^^'leu
CYS» /ARG,
0.05
0.10
pq
Fig. 4. Plot of variances of amino acids against/?^, where/? = mean frequency
of occurrence of amino acid, ^ = 1 — /?. Line n = 100 calculated variance for
sample size (protein) of 100 residues, (glx) is plot with the values from tropo-
mysin and y casein omitted.
(or very close to) one standard error of the line for n — 100. The fact that the
sizes of the proteins are not identical tends to scatter the points, making agree-
ment with the hypothesis somewhat more significant. The large deviation of glx
is due to its abundance in two proteins, y casein and tropomyosin. If these are
omitted the agreement is good.
The evidence therefore permits (but of course does not prove) the hypothesis
that the composition of proteins is mainly determined not by selection, but
rather approximates to a 'random grab' from a single universe of amino acids.
There is of course no question that selection can produce proteins of very
86
Martynas Ycas
i
to >
.^ I
ft.-
<3
02 «
R 3
o cr
•~ -'
■^ S
o V
;^' g
E
(Z.01) 3SBi;(joqdsoqj
(90l) uriunq[B uinjag
(901) u!inqoi§opBi ^
(50 1 ) uqnqoiS BuiopXpv
(t'Ol) UpjOjd 3AJ3^
(6/,) asEspnuoqi-a
(COl) 3SBU3§Ojp;(q3p 3p^q3piBJ33/(ir)
(eoi) asBiopiv
(301) ssBpijdadXxoqjB^
(10 1 ) aiuXzosXi E^edBtj
(001) ujinqoio 'i
(66) uiBdHd
(86) UI3SE3 /
(^6) SSBJ/CuiB iCjBAIlBS
(96) 3SEU3§ojpXq3Q loqoDiv
(S6) uiLunqjBpBq »
(^^6) auiyfeos/Cq
(e6) uiqoi§oXj>j
(f6) uiqoiSoiuaH
(36) uiso/duodojx
(36) uiPV
(16) uiqiuojqiojd
(06) 9SB[/CjoqdsoqdsuBJx
r^TtCTvio — oOfNmoor^-^— ^Ov^f^fNr^^
pOOOfNOOOOOOO-^OOOOO
d d d d d d d d d d d o o o o o d d
qp-^0-;OOOOOOOOOOOOo
voododd-^'^cs^d^rirtiricnTt'-^'^r-^d
odrnOsvdrnfSroHddd'^-^'^iO^-''^''^^
vodf^^Hr-^d'^ON'-^
Ttdio'^^io'--^rnr~-r~-
oomoo'ovOr^ON'^'—'— 1
lod'^fodod—'^'^vd
in(Nr-_poooN"^oqri- I
ri>ororn-<^-^'
Tt — i' fsj -^' t-^ i/S vd ts Tt Tt
■*' (N vS /^rnOO'0-^0\<-nO\rfirioooO'^oo
tNfNod-^'iio-4^dr-^
nTt--ir)oofNi^a\vo
c»TtON^--r-^(Nvdt~^vdf<-)<^v-ivovd— 'ro-^'r-^
fOvo(-^i/-ir4rn>nO\
vo— ••^r^O'
(S On
rn
q
■^
CO
w^
_
ro
r-;
r)
q
^— (
On
NO
t- 1-
00 ro
d
^
iri
00
CO
t-' >n
VD
—
q
NO
t^
q
NO r~-
d -^
oo'
—
00
00
ro
ND On
■* <^
NO
ro
q
Tl-
NO
"* q
NO r^
ON
"
06
r-^
fN
NO 00
fS r-^
NO
V-)
00
^
r^
00 rn
o\ V^
On
CO
■*
1-^
— "
Tt lY-i
"■t T}-
-H
NO
—
+ -*
00
n
>/-i
r»-i
On
00
en —
ro
d
oo'
(N
CN
"* 2
Tt ^
(S
NO
/^
r-.55
« CO « u oij ao,
45 " »-S ~ C l-l-c{2-iI/3-f-'*J+- > 03
The Protein Text
87
unusual composition. This occurs mainly in cases where the mechanical pro-
perties of protein fibers are important, as in keratin, collagen and silk. These
have been omitted from Tabic V. The most extreme case known to me is the
silk of the Congolese moth Anaphe nialoneyi, where glycine and alanine together
constitute 94 per cent of the entire protein (1 1 1).
Fox and Homeyer (1 12) have also noted the general similarity of composition
of various proteins, but have interpreted it in a quite novel manner. Their
suggestion is that proteins are similar because the time that has elapsed since
the origin of life has been too short to allow more differences to develop between
the various proteins, all of which are presumed to be descendants of a single
molecule. I believe the composition of silk tends to indicate that there has been
ample time for any conceivable differentiation.
V. LENGTH OF PEPTIDE CHAINS
I have previously called attention to the apparent fact that the number of
residues in naturally occurring peptide chains is an exact multiple of three (113).
Since then, a more exact determination of the composition of ribonuclease (79)
and the elucidation of the structure of glucagon (63) have shown that this
statement is incorrect (Table VI). In view of the predominance of chain lengths
Table VI. Length of Protein and Peptide Chains in Number of Residues
(Note: Cystine counted as two cysteine residues.)
Protein or peptide
Number of residues
Reference
Oxytocin
Vasopressin
Melanophore expanding hormone I (hog)
Insulin 'A' chain
Glucagon
Insulin 'B' chain
Melanophore expanding hormone II (hcg)
Melanophore expanding hormone (ox)
Ribonuclease
9
9
18
21
29
30
30
48
124
(75)
(40)
(69, 70)
(8)
(63)
(8)
(114)
(114)
(79)
that are multiples of three, it might perhaps be suspected that the exceptions
are due to secondary removal of residues, as occurs, for example, in the activa-
tion of pepsinogen, trypsinogen, chymotrypsinogen and fibrinogen. The tenta-
tive finding of Akabori (quoted in (41)), that the B chain of fish insulin has
twenty-nine residues, rather than the thirty found in cattle insulin, makes it
doubtful that secondary removal of residues is the explanation. Since twenty-
nine (the number of residues in glucagon) is a prime number, and not a factor
in the chain lengths of other peptides, it seems reasonable to conclude that
peptide chains are not multiples of some fixed number of residues.
VI. THE CODING PROBLEM
Having examined the protein text, we can now discuss what conclusions we
may draw as to the storage, transfer and replication of the information contained
in the protein molecule.
88 Martynas Yc^as
The gene, and by inference DNA, is thought to contain the infoiTnation
which eventually appears as a sequence of amino acid residues in the corre-
sponding protein. As shown by a study both of the replacement process and of
the amino acid sequences, each residue has an independent genetic representa-
tion. These representations are presumably aligned in linear order on the DNA
molecule. There is in fact no evidence at present that the gene is anything other
than a linear sequence of amino acid determining sites, although the possibility
that it may also determine the structure of immunopolysaccharides in an
analogous fashion cannot yet be dismissed.
Recent biochemical evidence (which I shall not discuss here) indicates that
it is RNA, not DNA, which is directly involved in the process of protein forma-
tion. Transfer of information therefore involves at least two steps : DNA to
RNA, and RNA to protein.
The straightforward inference would thus be that DNA serves as a template
for the formation of RNA. Absence of cytoplasmic inheritance supports the
view that RNA is not a self-replicating structure. This is also supported by four
lines of biochemical evidence :
1. The initial rate of incorporation of labeled precursors into nuclear RNA
is much greater than into cytoplasmic RNA (115).
2. In Amoeba depleted of RNA, RNA only regenerates if a nucleus is
present (116).
3. A one-way flow of RNA from nucleus to cytoplasm can be demonstrated
(117).
4. The rate of RNA fomiation is minimal at the time DNA is replicating
(118).
Unfortunately, this conclusion may be an oversimplification. There is no
lack of biochemical evidence pointing in the opposite direction:
1. The composition of nuclear and cytoplasmic RNA is not identical (119).
2. The time curves of precursor incorporation into RNA do not indicate
that the nuclear fraction is the precursor of the cytoplasmic (115).
3. Radioactive precursor is incorporated into the RNA of enucleated
Acetabularia plants (120).
4. Different strains of RNA viruses are self-replicating. This is difficult to
explain if RNA is the product of a DNA template.
The problem is to reconcile these apparently discordant facts. Consider
first the determination of RNA structure by DNA. Since both DNA and RNA
are texts written in a four symbol alphabet, it is natural to suppose that the
coding problem is very simple. It is sufficient to assume that one nucleotide of
DNA determines one nucleotide of RNA (121). Recent evidence indicates,
however, that this is incorrect.
It is possible to suppress protein synthesis in susceptible bacteria with
chloramphenicol. When this is done using amino acid-requiring strains, it can
be demonstrated that amino acids are required for RNA synthesis, even though
no protein synthesis is taking place (122, 123, 124). The natural inference,
supported by several converging lines of evidence, is that it is not the nucleotides
themselves which are the precursors of RNA, but rather compounds containing
both a nucleotide and an amino acid. This leads to a unitary picture of the
synthesis of RNA and of protein. When such precursors are lined up on a
The Protein Text
89
protein-synthesizing template (RNA), the amino acids polymerize to form
protein; when lined up on DNA, the nucleotide portions polymerize to form
RNA (Fig. 5).
If this is correct, an obvious conclusion follows. Since omission of a single
amino acid stops RNA synthesis, the RNA-fonning mechanism must distinguish
not four, but a minimum of twenty different kinds of items. But since the
product contains only four, the RNA in general must contain less information
than the template that made it. Several nucleotides in DNA must be involved
in selecting a single nucleotide of RNA. Since the template must contain more
information than the product, RNA cannot be the template for itself; i.e. it
cannot be self-replicating. There is an important exception to this statement.
AMINO ACIDS
NUCLEOTIDES
TEMPLATE
Fig. 5. Schematic representation of the synthesis of RNA and protein from
common precursors (see text). The nature of the template is presumed to deter-
mine whether the aligned precursors polymerize to produce protein or RNA.
If the information in the template is reduced below a certain level, it is possible
to obtain a product identical to the template itself. The formalization is as
follows.
While in process of formation, the RNA molecule can be visualized as a
sequence of nucleotides to which amino acids are attached (Fig. 5). Before
removal of amino acids on polymerization the informational content of the
'proto-RNA,' of length n, is n loga 20. After removal of the amino acids the
information content is reduced to n logg 4. If restrictions of some kind exist
on the number of combinations allowed, the number possible for 'proto-RNA'
will be reduced to b[n loga 20]; (b < 1). Such restrictions on 'proto-RNA' will
result in less severe restrictions on the RNA itself, since in general one con-
figuration of RNA can correspond to numerous different configurations of
'proto-RNA'. Therefore, if there are 20*" possible configurations of 'proto-
RNA', RNA itself has 4*^" possible configurations available (1 > c > b).
The information content of RNA will equal that of 'proto-RNA'
bn log2 20 = en log2 4
(5
when 1 > c^ 2.166. Since the information content of 'proto-RNA' is now
the same as that of RNA, an RNA template could, fonnally, be self-replicating.
It is now possible to reconcile the genetic and biochemical facts outlined
above. Assume that the synthesis of RNA proceeds in two steps. At the first
step, a strand of RNA is synthesized using a DNA template. Information is
thus transferred from DNA to RNA. The next step is supposed to occur in
the cytoplasin. RNA material is added to the nuclear-synthesized RNA, but
in a manner which does not add to the informational content. A model for
90
Martynas Ycas
this process could be the building up of a complementary strand of DNA, as
in the Watson and Crick scheme for DNA reproduction (125).*
Normally, the process stops at this stage, since the RNA molecule has
insufficient information to act as a template for itself. In the case of viruses,
however, the cytoplasmic process of adding new material to the original RNA
Table VII. The Composition of the Protein and RNA of Viruses
Composition of protein in moles per cent, of RNA as fractions of 1. t value
assumed. It should be noted the influenza virus contains lipid, and the protein
analysed may in part be of host provenance.
Tobacco
Tomato
Turnip
Southern
Influenza
Protein
Mosaic
Bushy Stunt
Yellows
Bean Mosaic
A
(126)
(127)
(128)
(129)
(130)
ala
9.6
8.5
6.6
7.5
5.9
arg
6.4
5.3
1.6
6.4
6.0
asx
10.3
11.1
4.2
7.3
11.7
cys
0.7
0.8
2.3
0.9
—
gbc
8.7
5.7
7.1
6.8
7.0
giy
3.9
8.6
4.2
8.9
7.0
his
0.0
1.2
1.5
1.3
1.9
ileu
5.2
3.3
9.0
6.2
8.3
leu
7.1
10.9
8.6
8.3
8.5
lys
1.1
3.4
8.0
3.0
5.2
met
0.0
0.8
2.1
2.6
3.2
phe
5.7
3.6
2.5
3.5
4.7
pro
5.5
3.9
10.2
5.3
4.7
ser
10.0
8.6
8.4
8.7
4.4
thr
11.6
11.0
13.9
11.5
6.5
try
1.1
0.5
0.6
i.ot
1.1
tyr
2.4
2.8
1.5
4.1
3.6
val
10.8
10.0
7.9
6.7
6.1
amide
12.7
11.4
8.0
—
—
RNA
(131)
(127)
(132)
(131)
(133)
Ad
0.30
0.26
0.22
0.26
0.23
Gu
0.25
0.29
0.18
0.26
0.20
cy
0.19
0.21
0.38
0.23
0.24
Ur
0.27
0.26
0.22
0.25
0.33
results in the production of material identical to the template itself. From this
point of view, an RNA virus can be regarded as a specialized RNA molecule,
which because of restrictions on the sequence of 'proto-RNA' can act as its
own template, utilizing the normal RNA-synthesizing mechanism of its host.
The composition of the RNA of viruses lends some support to these ideas.
* It is obvious that until more is known about RNA structure the question of its replication
can be discussed only in general terms. If RNA is a double-stranded structure, the nucleotide
composition shows that bases in the two chains cannot be uniquely paired as in DNA, but each
base must pair with one of two others, as shown by the equality of 6-keto and 6-amino groups
(89). In attempting to elucidate the details of RNA reproduction information on the number of
strands, whether each strand contains all the information of the whole structure, and where the
complementary strand is synthesized, is of crucial importance.
The Protein Text 91
Normally the number of 6-keto (Gu + Ur) and 6-amino (Ad + Cy) groups in
RNA is equal (89). Virus RNA does not necessarily obey this rule, indicating
that it differs in this respect, at least, from all the others (Table VII).
This hypothetical scheme is presented to show that the apparent contradic-
tions of the genetic and biochemical evidence do not make it logically necessary
to abandon a unitary view of RNA reproduction.
The coding of protein information into RNA has attracted considerable
attention, but cannot as yet be considered as solved. Study of the protein text
indicates that any solution will have to meet several requirements.
Firstly, since exactly twenty amino acids are incorporated into protein,
it is clear that at least three nucleotides are needed to determine an amino acid.
Gamow (134) has proposed that 20 is a 'magic' number, which is the result of
the existence of twenty possible sites of three nucleotides each. Four kinds of
items, taken three at a time, give twenty different combinations, if order is
disregarded.
Crick, Griffith and Orgel (135) point out, however, that there is at least
one other way of deriving a 'magic' 20 number. They start by considering the
problem of what it is that delimits one amino acid-determining site from
another, the 'punctuation mark problem'. Assuming that three bases determine
a site, it is a problem why the 3n + U 3n + 2, 3« + 3 bases represent a site,
while 3« + 2, 3n + 3, 3« + 4 do not. They solve this problem by assuming that
only certain triplets of nucleotides correspond to an amino acid (sense sites),
while others do not (non-sense sites). The criterion separating these two types
of sites is the following. The set of sense sites are all triplets which, when
placed next to each other in any possible combination, give sense sites only
at positions 3/z + 1, 2?i + 2, 3n -j- 3, but not otherwise. For example, the triplet
AAA is a non-sense site, since when placed next to itself it gives the sequence
AAAAAA. The site is not unambiguously defined, as AAA occurs both at
the 1-3 position and at the 2-4 position. They find that there are exactly
twenty triplets (out of sixty-four) which satisfy the criterion of sense sites, as
follows :
ABA
BCA
ADC
BDD
ABB
BCB
ADD
CDA
ACA
BCC
BDA
CDB
ACB
ADA
BDB
CDC
ACC
ADB
BDC
CDD
Other ways of selecting twenty sense sites are also possible. The sense sites,
these authors suggest, may correspond to amino acid-selecting sites of RNA.
The 'punctuation mark problem' could, of course, also be solved if amino
acids were selected in a sequential manner starting from one end of the template.
Secondly, besides the requirement that at least three nucleotides are required
to determine an amino acid site, the study of proteins indicates that these amino
acid determining sites are independent and share no nucleotides with their
neighbors. This conclusion follows from the absence of any intersymbol
correlations in the protein text, and also from the fact that a mutation (as
inferred from a study of homologous proteins) can result in a change at one
site only, leaving the rest of the sequence unchanged. The number of nucleotides
92 Martynas Ycas
in the template must therefore exceed the number of residues in the correspond-
ing protein by a factor of at least three.
Absence of intersymbol correlation shows that the 'overlapping' codes
discussed by Gamow, Rich and Ycas (6) do not correspond to reality.
The third requirement is somewhat more hypothetical. From the evidence
presented above, it would appear that selection is not the sole factor determining
the frequency of occurrence of the various amino acids. This is strongly
suggested by the different frequencies of amino acids with aliphatic side chains,
and particularly by the characteristic preponderance of leucine over isoleucine.
It is therefore reasonable to believe that the coding principle itself imposes
certain differences in frequency on the various amino acids.
If only one configuration of nucleotides corresponds to each amino acid,
the coding per se cannot make some amino acids frequent and others rare.
This can be done, however, if some amino acids have more than one configura-
tion of nucleotides to which they correspond. For this reason I am inclined to
believe that the type of coding proposed by Crick, Griffith and Orgel (135)
does not correspond to reality.
Gamow and Ycas (7) have proposed a code that formally meets these three
requirements. An amino acid is presumed to be determined by three nucleotides,
taken without regard to order. In addition, the number of nucleotides in the
RNA is assumed to be three times the number of amino acid residues in the
corresponding protein. This has the following consequences:
1 . There are twenty such triplets, the same as the number of amino acids.
2. Neighboring triplets share no nucleotides between them. Any sequence
of amino acids is thus permitted.
3. The frequencies of various amino acids, calculated on the assumption
that the sequence in RNA is random, are unequal. This is because the expected
frequency of any triplet is given by the product of the frequencies of the com-
ponent nucleotides and the number of configurations for the given composition.
Thus there are six triplets (all presumed to determine the same amino acid) of
the type ABC, three of AAB and one of AAA.
The pattern of frequency distribution of the various triplets, calculated in
this manner, corresponds very closely to the amino acid distribution, as shown,
for example, in Fig. 3 for the case of E. coli.
I believe that this type of coding, even if not itself the one wliich actually
occurs, is similar to the one that corresponds to reality. The most striking defect
is that it provides no explanation, in fact contradicts, the requirement that
in RNA the number of 6-keto groups should equal the number of 6-amino
groups. H. A. Simon (136) has proposed a modification to take care of this
difficulty. If RNA is a paired structure, somewhat similar to DNA, and 6-keto
bases pair with 6-amino ones, then the following four pairs of nucleotides exist
(again disregarding order) :
Ad-Gu; Ad-Ur; Cy-Gu; Cy-Ur.
If one takes these pairs, rather than the individual nucleotides, as units,
one can maintain an hypothesis of determination by sextuplets, analogous to
determination by triplets. The frequency distribution of sextuplets, calculated
for a random RNA sequence, is very similar to that obtained for the triplet
The Protein Text 93
distribution. This suggests that a whole series of codes of this type may exist,
all having similar general properties.
At present the major difficulty is not to produce a coding principle that
explains the known facts, but rather to make a choice between the many that
are possible.
The correctness of a coding principle can, in general, be ascertained from a
consistency of correspondence of the RNA and protein texts. Unfortunately,
such a direct approach is not at present possible. Except perhaps in the case of
RNA viruses, it is not possible to isolate a pure RNA corresponding to a pure
protein, and were this possible, the sequence of nucleotides could not be deter-
mined by any method currently available.
If the composition only of a series of RNA's and the corresponding proteins
is known, it is theoretically possible to check some coding schemes as follows:
If the coding scheme is correct, the various configurations of nucleotides can
be assigned to the amino acids in such a manner as to give, when summed over
the protein, the experimentally determined RNA composition, and this con-
sistently for all RNA-protein pairs. No assumption need be made that the
RNA sequence is random. Actual application of this method requires a large
number of RNA protein pairs of accurately determined composition, obviously
diftering as much as possible from each other, and the facilities of an electronic
computer.
The electronic computer is much the easier of the two to provide. At
present the data are hopelessly inadequate, although analyses of the proteins
and RNA's of viruses may eventually make such an approach possible. However,
in attempting a correlation of viral RNA and protein (Table VII), it should be
remembered that some viral RNA's do not show the equality Ad + Cy =
Gu + Ur characteristic of non-viral RNA (89). This suggests that normal
RNA may be multi-stranded, while viral may not be. It is therefore not im-
possible that viral RNA may contain all the information, but not all the material
of a protein determining structure, and hence differ in composition from it.
An additional difficulty is that it is not certain that all viral RNA is concerned
in the determination of the protein which eventually appears in the virus
particle.
In lieu of anything better, I have attempted to make consistent assignments
of triplets to amino acids on the assumption that the sequence in RNA is
random. The random frequencies of triplets were calculated for liver (Fig. 5),
Tobacco Mosaic and Turnip Yellow virus. I then tried to assign each triplet
to an amino acid in such a manner that each member of the pair would have
approximately the same frequency in the three cases. No satisfactorily consistent
assignments could be obtained by this method. Assuming that the RNA's and
proteins actually correspond, failure indicates one or more of the following:
1 . The coding principle used is false.
2. The RNA is not a random sequence.
3. The proteins of viruses are so small that relatively large deviations from
expected frequencies may be found. The molecular weight of TMV protein
is about 17000 (48, 137), that of Southern Bean mosaic about 26000 (129),
Several of the amino acids occur as only a few residues per molecule, so that a
94
Martynas YCas
difference of one or two residues from the statistically expected value produces
very large relative deviations.
Since the frequency of occurrence of an individual amino acid is small,
even a larger protein such as hemoglobin may be too small to be a statistically
valid sample for the purpose of calculating frequencies on the basis of a random
RNA sequence. The following case is of interest. The RNA's of liver and of
reticulocytes are virtually identical in composition, and therefore the proteins
(bulk liver protein and hemoglobin) would be expected to have a very similar
composition. Actually, this is not the case (Fig. 6). Considerable differences
RNA
RNA
RAT LIVER
RETICULOCYTES
A D
18.4
17.5
6 U
33.1
34.7
C Y
30.5
29.9
U R
18.0
17.9
4 6 8
LIVER PROTEIN
Fig. 6. The composition of bulk liver protein (142) and hemoglobin (93). The
RNA composition of liver from (89) of reticulocytes (143). All in moles per cent.
exist, as can be seen from the deviations of the points from the line of slope 1 .
It would be better to use for this purpose the bulk RNA's and proteins of
whole organisms and organs, were it not for the fact that bulk protein and RNA
from various sources is so similar that no strong check on the coding principle
is possible.
The method of assignments from the assumption of a random RNA sequence
fails, then, either strongly to confirm or to deny any proposed coding principle.
It is possible that as more information becomes available some light may
be thrown on the coding problem from a study of replacements of residues in
homologous proteins, if replacements prove to be nonrandom.
The reader will not fail to notice that the inadequacy of the data render
most of my conclusions tentative. More information of the type considered
The Protein Text 95
here will, of course, become available in the future and will not fail to clarify
matters. 1 have attempted to organize and analyse such data as exist, in the
hope that the value of this sort of information might become clearer, and in
order to facilitate their examination as more become available.
Obviously, data on composition and sequence are not the only possible
sources of information bearing on coding. Strong hints will eventually be
obtained from a study of RNA structure and sequence, as well as from other,
more conventional, biochemical approaches. The solution of these problems
will surely not be long delayed.
Acknowledgment— \i is a pleasure to acknowledge the collaboration of
George Gamow. I have profited from discussions of various aspects of these
problems with Drs F. H. C. Crick, Beatrice S. Magdoff and Herbert A.
Simon. Dr Louis J. Cote has given valuable assistance with statistical problems.
The errors, of course, remain my own.
REFERENCES
1. H. Fraenkel-Conrat : The role of the nucleic acid in the reconstitution of active
tobacco mosaic virus. /. Amer. Cheni. Soc. 78, 882-883 (1956).
2. A. GiERER and G. Schramm: Infectivity of ribonucleic acid from tobacco mosaic virus.
Nature, Lond. 177, 702-703 (1956).
3. A. GiERER and G. Schramm: Die Infektiositat der nucleinsaure aus tabakmosaikvirus.
Z. Natiirf. lib, 138-142 (1956).
4. A. L. Dounce: Duplicating mechanism for peptide chain and nucleic acid synthesis.
Enzymologia 15, 251-258 (1952).
5. D. Schwartz: Speculations on gene action and protein specificity. Proc. Nat. Acad.
Sci., Wash. 41, 300-307 (1955).
6. G. Gamow, A. Rich, and M. Ycas: The problem of information transfer from the
nucleic acids to proteins. Advances in Biological and Medical Physics 4, 23-68, Academic
Press, New York (1956).
7. G. Gamow and M. Ycas: Statistical correlation of protein and ribonucleic acid com-
position. Proc. Nat. Acad. Sci., Wash. 41, 1011-1019 (1955).
8. A. P. Ryle, F. Sanger, L. F. Smith, and R. Kitai: The disulfide bonds of insulin.
Biochem. J. 60, 541-556 (1955).
9. W. J. NiCKERSON and G. Falcone: Identification of protein disulfide reductase as a
cellular division enzyme in yeasts. Science 124, lll-lli (1956).
10. D. Mazia: Materials for the biophysical and biochemical study of cell division. Ad-
vances in Biological and Medical Physics 4, 69-118, Academic Press, New York (1956).
11. E. E. Howe: Properties of amino acids. In Amino Acids and Proteins, ed. by D. M.
Greenberg, pp. 13-55, Charles C. Thomas, Springfield, 111. (1951).
12. E. Windsor: a-Amino adipic acid as a constituent of a corn protein. /. Biol. Cheni. 192,
595-606(1951).
13. E. Work and D. L. Dewey: The distribution of a, e diaminopimelic acid among various
micro-organisms. J. Gen. Microbiol. 9, 394-409 (1953).
14. H. Smith, R. E. Strange, and H. T. Zwartouw: a, e-Diaminopimelic acid in the
peptide moiety of the cell wall polysaccharide of bacillus anthracis. Nature, Lond. 178,
856-866 (1956).
15. M. Flavin: The linkage of phosphate to protein in pepsin and ovalbumin. J. Biol. Chem.
210,771-784(1954).
96 Martynas Ycas
16. F. R. Bettelheim: Tyrosine-o-sulfate in a peptide from fibrinogen. /. Amer. Cheni. Soc.
76, 2838-2839 (1954).
17. J. M. Barry: Use of glutamine by the mammary gland for the synthesis of casein.
Nature, Lond. 174, 315-316 (1954).
18. F. M. SiNEX and D. D. Van Slyke: The source and state of the hydroxylysine of collagen.
J. Biol. Chem. 216, 245-250 (1955).
19. F. M. SiNEx: personal communication.
20. M. R. Stetten: Some aspects of the metabolism of hydroxyproline, studied with the aid
of isotopic nitrogen. /. Biol. Chem. 181, 31-37 (1949).
21. G. Wolf, W. W. Heck, and J. C. Leak: The metabolism of hydroxyproline-a-C" in the
intact rat. Radioactivity in amino acids from proteins. /. Biol. Chem. 223, 95-105 (1956).
22. G. Burnett and E. P. Kennedy : The enzymatic phosphorylation of proteins. /. Biol.
Chem. Ill, 969-979 (1954).
23. J. Roche and R. Michel: Thyroid hormones and iodine metabolism. Amm. Rev.
Biochem. 23, 481-500 (1954).
24. M. Flavin: The linkage of phosphate to protein in pepsin and ovalbumin. /. Biol.
Chem. 210, 771-784 (1954).
25. A. Rich and F. H. C. Crick: The structure of collagen. Nature, Lond. 176, 915-916
(1955).
26. H. A. iTANo: The hemoglobins. Annu. Rev. Biochem. 25, 331-348 (1956).
27. T. H. J. HuiSMAN, J. H. P. Jonix, and P. C. Van Der Schaaf: Amino acid composition
of four different kinds of human haemoglobin. Nature, Lond. 175, 902-903 (1955).
28. V. M. Ingram: A specific chemical difference between the globins of normal human and
sickle-cell anaemia haemoglobin. Nature, Lond. 178, 792-794 (1956).
29. J. V. Evans, J. W. B. King, B. L. Cohen, H. Harris, and F. L. Warren: Genetics of
haemoglobin and blood potassium differences in sheep. Nature, Lond. 178, 849-850 (1956).
30. D. W. Green, A. C. T. North, and R. Aschaffenburg : Crystallography of the
^-lactoglobulins of cow's milk. Biochim. Biophys. Acta. 21, 583-585 (1956).
31. N. H. Horowitz and M. Fling: The role of the genes in the synthesis of enzymes.
In Enzymes: Units of Biological Structure and Function, ed. by O. H. Gaebler, pp. 139-145
Academic Press, New York (1956).
32. W. K. Maas and B. D. Davis: Production of an altered pantothenate-synthesizing
enzyme by a temperature-sensitive mutant of Eschericia coli. Proc. Nat. Acad. Sci., Wash.
38, 785-797 (1952).
33. T. L. Deich and E. T. Soreni: Aminokontsevie gruppi gliadinov i ich izmenenie pod
vlianiem mezhrodovoi gibridizatsii. C. R. Acad. Sci. U.R.S.S. 98, 623-626 (1954).
34. J. I. Harris, F. Sanger, and M. A. Naughton: Species differences in insulin. Arch.
Biochem. Biophys. 65, 427-438 (1956).
35. J. C. Kendrew, R. G. Parrish, J. C. Marack, and E. S. Orlans: The species specificity
of myoglobin. Nature, Lond. 174, 946-949 (1954).
36. K.Felix: Zur chemie des zelkerns. Experiential, 'i\2-'i\l {\952).
37. E. O. P. Thompson: The N-terminal sequence of serum albumins; observations on the
thiohydantoin method. /. Biol. Chem. 208, 565-572 (1954).
38. W. F. White, J. Shields, and K. C. Robbins: C-terminal sequence of crystalline bovine
and human serum albumins: relationship of C-terminus to antigenic determinants of
bovine serum albumin. J. Amer. Chem. Soc. 11, 1267-1269 (1955).
39. H. TuppY and S. Paleus: Study of a peptic degradation product of cytochrome c. I.
Purification and chemical composition. Acta Chem. Scand. 9, 353-364 (1955).
40. V. Du Vigneaud, H. C. Lawler, and E. A. Popenoe: Enzymatic cleavage of glycinamide
from vasopressin and a proposed structure for this pressorantidiuretic hormone of the
posterior pituitary. /. Amer. Chem. Soc. 75, 1880-1881 (1953).
41. H. Ozawa and K. Satake: On the species difference of N-terminal amino acid sequence
in hemoglobin I. /. Biochem. 42, 641-648 (1955).
The Protein Text 97
42. L. LoRAND and W. R. Middlebrook: Species specificity of fibrinogen as revealed by
end-group studies. Science 118, 515-516 (1953).
43. P. H. Bell: Purification and structure of /^-corticotropin. J. Amer. Cfiem. Soc. 76,
5565-5567 (1954).
44. W. F. White and W. A. Landmann: Studies on adrcnocorticotropin. XL A pre-
liminary comparison of corticotropin-A with /3-corticotropin. J. Amer. Chem. Soc. 11,
1711-1712(1955).
45. C. H. Li, L L Geschwind, L D. Raacke, J. L Harris, and J. J. Dixon: Amino acid
sequence of alpha-corticotropin. Nature, Lond. 176, 687-689 (1955).
47. W. S. Peart: Composition of a hypertensin peptide. Nature, Lond. Ill, 132 (1956).
46. L. T. Skeggs, W. H. Marsh, J. R. Kahn, and N. P. Shumway: Amino acid composition
and electrophoretic properties of hypertensin L /. Exp. Med. 102, 435-440 (1955).
48. C. L Niu and H. Fraenkel-Conrat: C-terminal amino acid sequences of four strains
of tobacco mosaic virus. Arch. Biochem. Biophys. 59, 538-540 (1955).
49. S. Benzer: Fine structure of a genetic region in bacteriophage. Proc. Nat. Acad. Sci.,
Wash. 41, 344-354 (1955).
50. G. Pontecorvo and J. H. Roper: Resolving power of genetic analysis. Nature, Lond.
178, 83-84 (1956).
51. H. Fraenkel-Conrat: The chemistry of proteins and peptides. Annu. Rev. Biochem.
25, 291-330 (1956).
52. R. H. Locker: C-terminal groups in myosin, tropomyosin and actin. Biochim. Biophys.
. Acta. 14, 533-542 (1954).
53. E. O. P. Thompson: The N-terminal sequence of carboxypeptidase. Biochim. Biophys.
Acta. 10, 633-634 (1953).
54. T. Posternak and H. Pollaczek: De la protection contre I'hydrolyse enzymatique
exercee par les groupes phosphoryles. Etude de la degradation enzymatique d'un peptide
et d'un polyose phosphoryles. Helv. Chim. Acta. 24, 921-930 (1941).
55. N. Seno, K. Murai, and K. Shimura: Studies on the N-terminal lysylpeptides of a-
casein. /. Biochem. 42, 699-704 (1955).
56. H. Neurath and W. J. Dreyer: The activation of chymotrypsinogen. Isolation and
identification of a peptide liberated during activation. J. Biol. Chem. 217, 527-539 (1955).
57. F. Turba and G. Gundalch: Aminosaure sequenz in der umgebung des reaktiven
serinrestes in chymotrypsin-molekiil. Biochem. Z. "ill, 186-188 (1955).
58. (This article consists of paragraphs by different authors) Erweitertes makromolekulares
kolloquium. Angew. Chem. 65, 349-352 (1953).
59. K. Felix, R.Hirohata, and K.Dirr: Uberclupein. Hoppe-Seyl.Z.2l%,169-H9{\91^).
60. W. A. ScHROEDER, L. M. Kay, J. LeGette, L. Honnen, and F. C. Green: The con-
stitution of gelatin. Separation and estimation of peptides in partial hydrolysates. /.
Amer. Chem. Soc. 76, 3556-3564 (1954).
61. T. D. Kroner, W. Tabroff, and J. J. McGarr: Peptides isolated from a partial hydro-
lysate of steer hide collagen. II. Evidence for the prolylhydroxyproline linkage of collagen.
J. Amer. Chem. Soc. 11, 3356-3359 (1955).
62. R. R. Porter: A chemical study of rabbit antiovalbumin. Biochem. J. 46, 473-478
(1950).
63. W. W. Bromer, L. G. Sinn, A. Staub, and O. K. Behrens: The amino acid sequence of
glucagon. J. Amer. Chem. Soc. 78, 3858-3860 (1956).
64. C. I. Niu and H. Fraenkel-Conrat: Determination of C-terminal amino acids and
peptides by hydrazinolysis. /. Amer. Chem. Soc. 11, 5882-5885 (1955).
65. A. R. Thompson: Amino acid sequence in lysozyme. 2. Elution chromatography of
peptides on ion-exchange resins. Biochem. J. 61, 253-263 (1955).
66. R. AcHER, U. R. Laurila, and C. Fromageot: Contribution a Tetude de la structure
du lysozyme d'oeuf de poule. Peptide aromatique obtenue par hydrolyse enzymatique.
Biochim. Biophys. Acta 19, 97-109 (1956).
98 Martynas Ycas
67. K. Ohno: On the structure of lysozyme. III. On the carboxyl-terminal peptide. /.
Biochem. 42, 615-625 (1955).
68. K. Ohno: On the structure of lysozyme II. Characterization of aspartyl, asparaginyl,
and glutaminyl residues in lysozyme. /. Biochem. 41, 345-350 (1954).
69. I. I. Geschwind, C. H. Li, and L. Barnafi: Isolation and structure of melanocyte-
stimulating hormone from porcine pituitary glands. J. Amer. Chein. Soc. 78, 4494-4495
(1956).
70. J. I. Harris and P. Roos: Amino-acid sequence of a melanophore stimulating peptide.
Nature, Lond. 178, 90 (1956).
71. V. M. Ingram: The application of Edman's peptide degradation method to horse
myoglobin and haemoglobin. Biochim. Biophys. Acta 16, 599-600 (1955).
72. M. Flavin and C. B. Anfinsen: The isolation and characterization of cysteic acid
peptides in studies on ovalbumin synthesis. /. Biol. Chem. 211, 375-390 (1954).
73. M. Otessen and A. Wollenberg: Stepwise degradation of the peptides liberated in the
transformation of ovalbumin to plakelbumin. C R. Lab. Carhberg (Chim.) 28, 463^75
(1953).
74. M. Flavin: Cysteine and phosphoserine containing peptide sequences of ovalbumin.
Nature, Lond. 173, 214 (1954).
75. V. Du Vigneaud, C. Ressler, and S. Trippett: The sequence of amino acids in oxytocin,
with a proposal for the structure of oxytocin. /. Biol. Chem. 205, 949-957 (1953).
76. E.O. P. Thompson: Crystalline papain. IV. Free amino groups and N-terminal sequence.
J. Biol. Chem. 207, 563-574 (1954).
77. M. B. Williamson and J. M. Passman: The amino acid sequence at the N terminus of
pepsin. /. Biol. Chem. Ill, 151-157 (1956).
78. D. Cole and C. H. Li: N-terminal sequence of prolactin. Fed. Proc. 14, 195 (1955).
79. C. H. W. HiNS, W. H. Stein, and S. Moore: Peptides obtained by chymotryptic hy-
drolysis of performic acid-oxidized ribonuclease. A partial structural formula for the
oxidized protein. J. Biol. Chem. Ill, 151-169 (1956).
80. R. Monier and M. Jutisz: Contribution a Tetude de la structure de la salmine d'Oncor-
hyncus. Biochim. Biophys. Acta 14, 551-558 (1954).
81. R. Monier and M. Jutisz: Contribution a I'etude de la structure de la salmine d'On-
chorhynchus. II. Etude de quelques peptides resultant de I'hydrolyse trypsique. Biochim.
Biophys. Acta 15, 62-68 (1955).
82. F. Lucas, J. T. B. Shaw, and S. G. Smith: Amino-acid sequence in a fraction of Bombyx
silk fibroin. Nature, Lond. 178, 861 (1956).
83. L. M. Kay and W. A. Schroeder: The chromatographic separation and identification
of some peptides in partial hydrolysates of silk fibroin. /. Amer. Chem. Soc. 76, 3564-
3568 (1954).
84. E. Abderhalden and A. Bahn: Isolierung von tyrosyl-seryl-prolyl-tyrosin beim
stufenweisen abbau von seidenfibroin (Bombyx Mori). Hoppe-Seyl. Z. 219, 72-81 (1933).
85. E. W. Davie and H. Neurath: Identification of a peptide released during autocatalytic
activation of trypsinogen. J. Biol. Chem. Ill, 515-529 (1955).
86. R. Consden and A. H. Gordon: A study of the peptides of cystine in partial hydro-
lysates of wool. Biochem. J. 46, 8-20 (1950).
87. A. Polson: Quantitative partition chromatography and the composition of E. coli.
Biochim. Biophys. Acta 2, 575-581 (1948).
88. R. B. Roberts, D. B. Cowie, P. H. Abelson, E. T. Bolton, and R. J. Britten: Studies
of biosynthesis in Escherichia coli. Carnegie Institution of Washington Publication
607, p. 28, Washington, D.C. (1955).
89. D. Elson and E. Chargaff: Evidence of common regularities in the composition of
pentose nucleic acids. Biochim. Biophys. Acta 17, 367-376 (1955).
90. F. Friedberg: The amino acid composition of adenosine triphosphate-creatine trans-
phosphorylase. Arch. Biochem. Biophys. 61, 263-266 (1956).
The Protein Text 99
91. K. Laki, D. R. Komintz, P. Symonds, L. Lorand, and W. H. Seegers: The amino
acid composition of bovine prothrombin. Arch. Biochem. Biophys. 49, 276-282 (1954).
92. D. R. Komintz, A. Hough, P. Symonds, and K. Laki: The amino acid composition
of actin, myosin, tropomyosin and the meromyosins. Arch. Biochem. Biophys. 50,
148-159 (1954).
93. A. Rossi-Fanelli, D. Cavallini, and L. De Marko: Amino acid composition of
human crystallized myoglobin and hemoglobin. Biochim. Biophys. Acta 17, 377-381
(1955).
94. G. JoLLES and C. Fromageot: La proteine lysante II de la rate du lapin. II, Composition
en acides amines. Biochim. Biophys. Acta 14, 219-227 (1954).
95. W. G. Gordon and J. Ziegler: Amino acid composition of crystalline a-lactalbumin.
Arch. Biochem. Biophys. 57, 80-86 (1955).
96. K. Lange: Aminosaurezusammensetzung kristallisierte alkohol-dehydrogenase aus
backerhefe. Hoppe-Seyl. Z. 303, 272-275 (1956).
97. J. Muus: The amino acid composition of human salivary amylase. /. Amer. Chem.
^oc. 76, 5163-5165(1954).
98. W. G. Gorden, W. F. Semmett, and M. Bender: Amino acid composition of y-casein.
/. Amer. Chem. Soc.15, 1678-1679 (1953).
99. E. L. Smith, A. Stockell, and J. R. Kimmel: Crystalline papain. III. Amino acid
composition. J. Biol. Chem. 207, 551-561 (1954).
100. E. L. Smtth, M. L. McFadden, A. Stockell, and V. Buettner-Janusch : Amino acid
composition of four rabbit antibodies. J. Biol. Chem. 214, 197-207 (1955).
101. E. L. Smith, J. R. Kimmel, D. M. Brown, and E. O. P. Thompson: Isolation and pro-
perties of a crystalline mercury derivative of a lysozyme from papaya latex. /. Biol.
Chem. 215, 67-89 (1955).
102. E. L. Smith and A. Stockell: Amino acid composition of crystalline carboxypeptidase.
J. Biol. Chem. 207, 501-514 (1954).
103. S. F. Velick and E. Ronzoni: The amino acid composition of aldolase and d-glyceralde-
hyde phosphate dehydrogenase. /. Biol. Chem. 173, 627-639 (1948).
104. B. A. KoECHLiN and H. D. Parish: The amino acid composition of a protein isolated
from lobster nerve. /. Biol. Chem. 205, 597-604 (1953).
105. E. L. Smith, D. M. Brown, M. L. McFadden, V. Buettner-Janusch, and B. U. Jager:
Physical, chemical and immunological studies on globulin from multiple myeloma.
J. Biol. Chem. 216, 601-620 (1955).
106. W. H. Stein and S. Moore: Amino acid composition of lactoglobulin and bovine
serum albumin. /. Biol. Chem. 178, 79-91 (1949).
107. J. F. Velick and L. F. Wicks: The amino acid composition of phosphorylase. /.
Biol. Chem. 190, 741-751 (1951).
108. R. N. Feinstein, M. Hampton, and G. J. Cotter: Species specificity of catalase.
Enzymologia 16, 219-225 (1953).
109. I. Batty, F. W. R. Brambell, W. A. Hemmings, and C. L. Oakley: Selection of
antitoxins by the foetal membranes of rabbits. Proc. Roy. Soc. B 142, 452-471 (1954).
110. B. Hagihara, T. Horio, M. Nozaki, I. Sekuzu, J. Yamashita, and K. Okunuki:
Comparison of properties of crystalline cytochrome c from yeast, beef heart and pig
heart. Nature, Lond. 178, 631-632 (1956).
111. F. Lucas, J. T. B. Shaw, and S. G. Smith: The chemical constitution of some silk
fibroins and its bearing on their physical properties. Shirley Institute Memoirs 28,
77-89 (1955).
112. S. W. Fox and P. G. Homeyer: A statistical evaluation of the kinship of protein mole-
cules. Amer. Nat. 89, 163-168 (1955).
113. M. YcAS: Numerology of peptide chains. Naturwissenschaften 43, 197-198 (1956).
114. B. J. Benfrey and J. L. Purvis: Purification and amino acid analysis of melanophor-
expanding hormone from hog and ox pituitary glands. Biochem. J. 62, 588-593 (1956).
100 Martynas Y6as
115. R. M. S. Smellie: The metabolism of the nucleic acids. The Nucleic Acids, ed. by E.
Chargaff and J. N. Davison, vol. 2, pp. 393-434, Academic Press, New York (1955).
116. J. Bracket: Action of ribonuclease and ribonucleic acid on living amoebae. Nature,
Lond. 175, 851-853 (1955).
117. L. Goldstein and W. Plaut: Direct evidence for nuclear synthesis of cytoplasmic
ribose nucleic acid. Proc. Nat. Acad. Sci., Wash. 41, 874-880 (1955).
118. K. Lark and O. Maaloe: Nucleic acid synthesis and the division cycle of Salmonella
typhimurium. Biochim. Biophys. Acta 21, 448-458 (1956).
119. B. Magasanik: Isolation and composition of the pentose nucleic acids and the cor-
responding nucleo-proteins. 77/^ Nucleic Acids, ed. by E. Chargaff and J. N. Davidson,
vol. 1, pp. 373^07. Academic Press, New York (1955).
120. J. Bracket and D. Szafarz: L' incorporation d'acide orotique radioactif dans des
fragments nuclees et anuclees d'Acetabularia mediterranea. Biochim Biophys. Acta 12,
588-589 (1953).
121. L. S. LocKiNGEN and A. G. DeBusk: A model for intracellular transfer of DNA (gene)
specificity. Proc. Nat. Acad. Sci., Wash. 41, 925-934 (1955).
122. A. B. Pardee and L. S. Prestidge: The dependence of nucleic acid synthesis on the
presence of amino acids in Escherichia coli. J. Bad. 71, 677-683 (1956).
123. F. Gros and F. Gros: Role des aminoacides dans la synthese des acides nucleiques
chez Escherichia coli. Biochim. Biophys. Acta 22, 200-201 (1956).
124. M. YcAS and G. Brawerman: Interrelations between nucleic acid and protein bio-
synthesis in micro organisms. Arch. Biochem. Biophys. 68, 118-129 (1957)
125. J. D. Watson and F. H. C. Crick: Genetical implications of the structure of deoxyri-
bose nucleic acid. Nature, Loud. Ill, 964-967 (1953).
126. F. L. Black and C. A. Knight: A comparison of some mutants of tobacco mosaic
virus. /. Biol. Chem. 202, 51-57 (1953).
127. D. De Fremery and C. A. Knight: A chemical comparison of three strains of tomato
bushy stunt virus. /. Biol. Chem. 214, 559-566 (1955).
128. E. Roberts and G. B. Ramasarma: Amino acids of turnip yellow virus. Proc. Sac.
Exp. Biol. Med. 80, 101-103 (1952).
129. B. S. Magdoff R. J. Block, and D. B. Montie: Amino acid composition of southern
bean mosaic virus. Contr. Boyce Thompson Inst. 18, 371-375 (1956).
130. C. A. Knight: Amino acid composition of highly purified viral particles of influenza
A and B. /. Exp. Med. 86, 125-129 (1947).
131. R. W. Dorner and C. A. Knight: The preparation and properties of some plant
virus nucleic acids. /. Biol. Chem. 205, 959-967 (1953).
132. R. Markham and J. D. Smith: Chromatographic studies of nucleic acids. 4. The
nucleic acid of the turnip yellow mosaic virus, including a note on the nucleic acid of
the tomato bushy stunt virus. Biochem. J. 49, 401^06 (1951).
133. G. L. Ada and B. T. Perry: Specific differences in the nucleic acids from A and B
strains of influenza virus. Nature, Lond. 175, 854 (1955).
134. G. Gamow: Possible mathematical relation between deoxyribonucleic acid and protein.
Dansk. Biol. Medd. 22, No. 3 (1954).
135. F. H. C. Crick, J. S. Griffith, and L. E. Orgel: Codes without commas. Proc. Nat.
Acad. Sci., Wash. 43, 416^21 (1957).
136. H. A. Simon: personal communication.
137. I. Harris and C. A. Knight: Studies on the action of carboxypeptidase on tobacco
mosaic virus. /. Biol. Chem. 214, 215-230 (1955).
138. T. L. Deich and E. T. Szorenyi: Chem. Abstr. 49, 1882 (1955).
139. ZoTAN KoROs: Free amino groups of gliadin. Magyar Kim. Folyoirat 56, 131-136
(1950).
140. L. K. Ramachandran and W. B. McConnell: The terminal amino acids of wheat
gliadin. Canad. J. Chem. 33, 1463-1466 (1955).
The Protein Text 101
141. T. Deutsch: N-terminal amino acids of gliadins from wheal and rye. Ada Physiol.
Acad. Sci. Hiwg. 6, 209-224 (1954).
142. B. S. ScHWEiGERT, B. T. GuTHNECK, J. M. Price, J. A. MiLiER, and E. C. Milier:
Amino acid composition of morphological fractions of rat livers and induced liver
tumors. Proc. Soc. Exp. Biol. Med. 72, 495-501 (1949).
143. G. Rost: Zusammensetzung der ribonucleinsaure der reticulozyten. Naturwissenschaften
43,499(1956).
DISCUSSION
Koch: I should like to comment on the result of some recent tracer experiments that have
been conducted in Dr Swick's laboratory at the Argonne National Laboratory (1, 2, 3).
What we have tried to do is to ask ourselves something about the total balance of the turnover
of RNA, DNA, and protein in the tissue which is most often studied by the biochemist;
namely, rat liver. The interesting thing that comes out of this is that when suitable tracer
experiments are done, you can make the definite statement that in a single cell DNA is syn-
thesized when it is produced and DNA stays as a cell compound until the death of the ceil,
whereas on the other hand it is very easy to show that all of the RNA in the cell is turned over,
and it is turned over essentially with about the same half-life that all of the proteins are turned
over in the ceil ; that is, there are no special classes of proteins that are not turned over, especially
classes of RNA that are not turned over in this tissue.
The immediate conclusion from this is that, inasmuch as the amount of protein is many
times more than the amount of RNA, on a molar or other basis, there can be no one-to-one
hand-off of this kind. In other words, you cannot take the DNA and make the RNA from it
without using it over and over again in a different way than has been suggested here.
YcAS : While it may be true that there is turnover of RNA in rat liver, I believe, on the basis
of work with micro-organisms, that there is no obligatory turnover of RNA associated with
protein synthesis. The RNA, which is part of the protein forming mechanism, is a passive
template, and apparent coupling or dissociation of protein and RNA turnover is adequately
explained, I think, by the assumption that both have common precursors.
Koch: I would just like to add that in the case of micro-organisms it is fairly clear that the
protein turnover does not occur (4). It is also pretty well established that DNA and RNA
turnover do not occur in an actively growing culture. So the concept of turnover in the micro-
organism is not a relevant one. But what it does mean is that you cannot accept some of the
proposals that have been described that inherently require the obligatory breakdown of some-
thing (RNA), concomitant to the synthesis of another type of molecule (protein).
MoROWiTz: I would like to introduce some evidence for an alternative approach to the
problem of intersymbol influence. In some work recently published by Sidney Fox (5) analyses
are reported on the total protein of soybean, corn, wheat, and rye. These analyses indicate
that a very high proportion of the protein molecules have lysine in an N-terminal position and
arginine in the next position. This approach to statistical constraints involves an experimental
analysis of a population of proteins from a single source as contrasted to Dr Ycas' theoretical
analysis of a population of unrelated proteins.
We have attempted to determine if any constraints are to be found in E. coli protein. The
preliminary results indicate that methionine is found in N-terminal positions in a proportion
consistent with a chance distribution. Cystine and cysteine in N-terminal positions may show
a considerably greater constraint.
YcAs: I think that the method used by Fox and yourself introduces an obvious source of
bias, if what you are trying to do is look for intersymbol correlations. The abundances of
different species of protein in a cell are not equal, and more abundant proteins contribute more
end groups. You have to examine the proteins one by one, giving the same statistical weight
to each.
A similarity in end groups of proteins from related species indicates not an effect of inter-
symbol correlation, but rather descent from a common ancestor. As can be seen from the data
I summarized, proteins change only slowly in evolution.
Branson: There is one question which has been opened up by Dr Gamow's and Dr Ycas'
comments; namely, the whole problem of redundancy in protein molecules. The evidence is
fairly conclusive, I believe, that so far as the antigenic action of a protein is concerned, the
102 Martynas Ycas
active region is approximately 1 5 A on a side. If the same is true of other biological functions,
a great deal of surface area in a protein is passive. At least it is passive for a given specific
function. Thus it is reasonable to inquire how much of a protein molecule you can whittle away
and keep a given biological property.
There is a fairly convincing teleological explanation for this redundancy. In the early
history of living systems, the membranes containing the living material might have been rather
leaky. Thus to retain the small biologically-active components within the cell, they had to be
associated with a large but inactive structure which would not pass out through the large spaces.
In the evolutionary scheme, then, there remain many large units where really the functional
part is relatively small. So that when one amino acid is taken out and another put in, the sub-
stitution does not make much difference so long as it is not in the essential small functioning
unit of the protein molecule.
YcAS : I am also of the opinion that mere size of an enzyme may be quite important for the
totality of its biological functions, even if it seems to make no difference to the catalytic function
as measured in a test tube. Which part of a protein is significant and which is not is a matter
of what function we are measuring. I doubt that at present we know all the functions of
a protein from the point of view of the organism itself.
REFERENCES
1. R. W. SwiCK and D. T. Handa: The distribution of fixed carbon in amino acids. /.
Biol. Chem. 218, 557 (1956).
2. R. W. SwiCK, A. L. Koch, and D. T. Handa: The measurement of nucleic acid turnover
in rat liver. Arch. Biochem. Biophys. 63, 226-242 (1956).
3. R. W. SwiCK and A. L. Koch: The measurement of nucleic acid phosphorus turnover
in rat liver by the constant exposure technique. Arch. Biochem. Biophys. 67, 59-73 (1957).
4. A. L. Koch and H. R. Levy: Protein turnover in growing cultures of Escherichia coli.
J. Biol. Chem. 217, 947-951 (1955).
5. S. Fox: Evolution of protein molecules and thermal synthesis of biochemical substances.
Amer. Sclent. 44, 347-359 (1956).
PROTEIN STRUCTURE AND
INFORMATION CONTENT*
L. G. AUGENSTINE
Brookhaven National Laboratory, Upton, New York
I. INTRODUCTION
In stating that a given system has an information content of a certain number
of bits, care must be taken to specify not only the context within which this
number has been derived but also an attempt must be made to give meaning
and utility to this measure. Specifying the context is particularly important
since for most systems there are many levels at which the information content
can be derived. For example, the information content for a cell is very low, if
one is concerned only whether it is living or dead, but it is very large if one is
interested in specifying the parameters of each of its individual elementary
particles. In this article, estimates will be made of the information content of
given proteins by taking into account that they are a sequence of amino acids
which can assume only a discrete number of configurations. An attempt will
be made to study some of the factors which affect the infonnation content and
the types of constraints which must operate in the elaboration of proteins.
Some idea of the magnitude and types of the constraints pertinent to proteins
can be obtained from parallel studies on proteins and printed English (for which
the constraints are known). Finally, the information content based upon
structure will be compared with estimates of information content obtained
within the context of protein function.
Although the fact has not always been fully appreciated, information
measures are usually more effective in selecting among alternative hypotheses
than in suggesting new ones. This particular trait arises from the fact that
information estimates, which depend only upon the probabilities associated
with a class of experimental outcomes, will often describe the degree to which a
number of variables interact but indicate little or nothing about the behavior
of the individual variables. As a result no novel synthetic procedures or
selection principles are advanced here to explain the manner in which polypep-
tide sequences and/or configurations are determined. Rather, in this paper
information theory considerations have been used to evaluate alternative
explanations of some aspects of protein construction.
II. ESTIMATION OF STRUCTURAL INFORMATION CONTENT
AND CONSTRAINTS
At the structural level the total information content (/() of a protein will be
treated as the sum of two terms; one (/,) depends upon the amino acid sequence
* Research carried out at Brookhaven National Laboratory under the auspices of the U.S.
Atomic Energy Commission.
103
104
L. G. AUGENSTINE
• VALUES CALCULATED FROM PROTEINS
X VALUES CALCULATED FROM ENGLISH PARAGRAPHS
800
700
'~^
MYOSIN
• TROPOMYOSIN
600
500
I
ALBUMIN rO''^^"^"^'^^^^"^
400
—
• INSULIN (48,000)
•
SILK FIBROIN
EDESTIN
'• • OVALBUMIN
ZE1N»
« •GROWTH HORMONE
"^PEPSIN CHYMOTRYPSINOGEN
300
— ;3-LACT
OGLOBULIN^'
>
200
• GLIADIN
.BOVINE SERUM ALBUMIN
^ •lactogenic HORMONE
100
HORSE MYOGLOBIN ♦•^^TH
-RIBONUCLEASE
>(» INSULIN (12,000)
X
•SALMINE
X
N
1 1
1
1 1
lyd^jmox 0.50
0.60
0.70
0.80
0.90
1.00
Fig. 1 . Values of /s/C/Jmax as a function of the number of symbols,
A^ in proteins and paragraphs.
N/m
Fig. 2. Distribution of the normalized frequency, ^—^ of letters and
amino acids in the language and protein samples. See the text
for further discussion.
Protein Structure and Information Content 105
and the other (I^) upon the configurations of the polypeptide chain in the
native molecule. Treating sequence and configuration independently should
lead to overestimates of 1„ since the pennissible configurations will depend
upon the sequence. However, care has been taken to reduce the interaction
of the two terms as much as possible, so that for the purposes of this paper no
significant discrepancies should occur.
Sequence' There are twenty amino acids which are most commonly incor-
porated into proteins. Therefore the maximum value of /^ is 4.32 bits (logg 20)
per amino acid residue.* It would occur when the twenty amino acids occur
equiprobably. Values less than the maximum would occur due to any con-
straints upon the amino acid sequence. Branson (I) calculated /, of twenty-six
proteins for wliich the frequency of occurrence of the twenty amino acids had
been determined (disregarding possible sequential dependencies). He found
that those which formed part of a living structure of an organism had an ^
which was greater than 0.70 of the maximum value. His analysis is shown by
the dots in Fig. 1. The X's show the result of a similar analysis on language
samples. The language study was based on ten paragraphs chosen from diverse
sources such as want ads, newspaper articles, textbooks, and magazines and
differs from that usually used in analysis of language in that it is based on the
paragraph rather than on large continuous samples.! In this case, letters have
been treated like amino acids and paragraphs like proteins. Except for the
single value of 0.99 the values from proteins and paragraphs agree quite
well.
Similarities between the distribution of amino acid frequencies and letters
can be seen further in Fig. 2. There the ordinate indicates the number of
times that a particular normalized frequency occurs ; the normalized frequency
is the number of times, n^, that the /th symbol (either amino acid or letter)
occurs, divided by N/m, the expected number of times that each type of symbol
should occur if all m different kinds of symbols had equiprobable occurrence
in the sample of TV symbols. As can be seen in Fig. 2 the distribution of the
n ■
normalized frequencies -ttt- for the letters (solid fine) and the amino acids (shaded
^ A'//?;
area) are almost identical except for the higher incidence of rarely-used letters
in language. This small difference might not have occurred if some of the
rarer amino acids, for which assays are difficult, had been included in the
data.
Constraints — The fact that the distribution of amino acids in non-structural
proteins deviates from equiprobability about the same as (or possibly a little less
than) the letters in written English, indicates that the constraints producing such
unequal frequencies should be of the same order of magnitude as (or slightly
less than) those governing English texts. However, this tells nothing about the
* This value disregards any influence of residue 'complexions'. However, it is difficult to
see how factors other than the identity of the residues can be very important, when one con-
siders the freedom of rotation of the /^-groups with respect to the polypeptide chain.
t It was felt that such a small-sample statistics study was preferable to one based upon large
samples (such as a determination of confidence intervals for /, as a function of the paragraph
size), since by essentially duplicating the analyses applied to proteins, insightas to the limita-
tions of that procedure could be observed.
106 L. G. AUGENSTINE
nature of the constraints or the manner in which they arise. The obvious
question arises — is the unequal distribution due to unequal availabihty of the
amino acids or is it due to constraints imposed in the processes of synthesis, i.e.
by 'intersymbol influence' ?* Is the make-up of the pool of amino acids available
to the protein-synthesizing centers indicative of the nature of the processes
involved in amino acid synthesis or have these processes become adapted to the
peculiar demands of the proteins being synthesized? This is essentially the
same as looking at a collection of printer's type and asking the question, did
the printer select his supply of type because this particular distribution of
letters was all that was available to him or did he purposely purchase his
particular assortment because he had found that it satisfied his needs?
The possibility that the unequal availability of amino acids in the cellular
pool may produce the unequal distribution does not seem likely. The experi-
ments of Roberts, Cowie et al. (2, 3) at the Carnegie Institution indicate that it
requires a five to thirty-fold excess of exogenous amino acids, such as valine,
leucine and isoleucine, before the incorporation of these amino acids into
protein is seriously affected in E. coli. In fact, once a substance has been
incorporated into the amino acid pool of yeast, 1000 times the normal con-
centration of exogenous amino acid does not affect its incorporation into
protein (Cowie). Although these are excellent experiments they do suffer
from problems of cell membrane permeability, intracellular diffusion, etc.;
however, they, along with numerous experiments involving amino acid deficient
mutants, suggest that as long as the minimum required amount of each amino
acid is present the frequency distribution of the amino acids in the pool has a
relatively small influence on the distribution of amino acids incorporated into
protein.
Two methods have been utilized in searching for intersymbol influence
in proteins. In the first (reported previously (4)), the behavior of the normalized
n-
amino acid frequencies -rjj— were studied in individual proteins. The average
normalized frequency of the individual amino acids for the twenty-six proteins
was tabulated. Comparing the normalized frequency for the individual amino
acids in particular proteins with the corresponding average value from the
26 proteins indicated large deviations in many cases. The gross deviations
were examined for correlations between pairs of amino acids, both for positive
and negative effects. Examination of the 26 proteins indicated that although
there are some correlations between the frequencies of individual amino acids
combined in single proteins, none was strong enough to be measurable with
any degree of confidence for a sample as small as 26 proteins.
Similar examinations of the normalized letter frequencies in paragraphs
were investigated for significant deviations of pairs or groups of letters. Although
strong intersymbol influences are known to exist between letters (e.g. between
* 'Intersymbol influence' is a term commonly used to designate sequential dependencies,
i.e. influences upon the identity of a particular element by neighbouring elements, which are
not the only types of constraints which might be imposed by a synthesizing center. It is easy
to imagine the possibility of unequal 'acceptability' for diff"erent symbols at individual sites on
a template in which the factors affecting the specifications of each location are independent of
the neighbors.
Protein Structure and Information Content 107
q and u) no significant results were detected. Thus it can be concluded that
such analyses do not exclude intersymbol influences of the same type or order
of magnitude as those in language.*
Gamow, Rich, and Ycas (5) have made a more exacting study of possible
inter-symbol influences affecting amino acids. They treated the known amino
acids as a series of dipeptides which they tallied into a 20 X 20 matrix similar to
the 26 >: 26 digram matrices common in language analyses. The distribution for
nonstructural proteins in such a 20 X 20 matrix followed quite closely a
Poisson distribution. This they state is compatible with the assumption that
the occurrence of a given amino acid does not affect the identity of its nearest
neighbor. Their comparable analysis for English language gave a distribution
which deviated from a Poisson.
The Poisson distribution associated with the amino acid dipeptide analysis
is not too significant since the sample of experimentally determined sequences
is not necessarily a reliable representation of the bulk of amino acid sequences
in nature. As Gamovv', Rich, and Ycas point out, their available sample is
strongly affected by the composition of ACTH, lysozyme and insulin for which
the complete sequences have been determined and the shorter sequences from
other proteins are biased due to differential bond labilities within the protein
which give rise preferentially to certain amino acids occurring as terminal
peptides in the sequences isolated.
It was felt that a possible explanation of the difference noted between
digram analysis of letters and amino acids was that amino acids were also
grouped into word-like structures but that the average number of symbols
per 'word' was different than that found in English. Therefore, separate
digram analyses were performed on English words having two to five letters,
six to nine letters and those having ten or more. All the samples were selected
so that the average cell density in the 26 x 26 matrix was 0.44, the same as
that of Gamow, Rich, and Ycas, and these also all showed significant deviations
from a Poisson distribution.
MoROwiTZ (6) and some of the Biophysics group at Yale have been investi-
gating the possibility that a polypeptide chain is a segment selected from either
a single or a small number of repeating sequences which are invariant for a
given chromosomal complement. The particular segments chosen and the
unique fashion in which they are combined and folded would then account
for the highly specific properties of the individual proteins. The possibiHty
also exists that there was an initial long, or at least restricted, set of sequences
from which present day polypeptide sequences have evolved in a manner similar
to that by which organisms have evolved. Gamow^, Rich, and Ycas (5) have
pointed out the most striking evidence for a "phylogenetically common ancestral
sequence" in their comparison of the A and B chains of insulin, where the
same amino acids occur in equivalent positions in both chains four times.
The known sequences containing five amino acids or more (from Table I,
ref. 5) were examined for repeating or matching sequences. (This was done by
superposing the sequences in all possible permutations.) These data indicate
that for proteins from a given species any single repeating sequence must
* See the discussion by Dr Platt at the end of this paper.
108 L. G. AUGENSTINE
be at least forty amino acid residues or longer. Comparing the sequences
of different types of proteins indicated that (a) there is not a master sequence
operating among species, or (b) evolution, i.e. amino acid substitution, has
been so extensive as to make it undetectable, or (c) the master sequence is
200 residues or longer. The additional sequences (for hormones of sub-protein
size) cited by Ycas (7) show that short polypeptide sequences with only minor
amino acid differences do occur in cells of different species. Thus, the occurrence
of repeating or a restricted number of amino acid sequences may be an explana-
tion of the unequal amino acid frequencies observed.
This possible restriction provides a basis for estimating the minimum
value of Ig. A single, long, completely-detennined sequence would provide
a situation of minimum infonnation content for polypeptides selected from it.
To select A'^ residues from a sequence of S amino acids would require < log2 S
bits to find N and < logo {S — N) bits to determine the starting point; or
by another selection procedure, < logg (5" — 1) to find the starting point and
roughly logg S/l to determine the end point. Either of these methods of
selection gives an estimate of the minimum of /^ which is of the order of 2 log2
S bits. This is a very low minimum since according to the best present estimate
(which is obviously too low) S f^ 200 and thus 2 logo 5 ^ 15. Therefore,
the minimum of/,, is of the order of 0.1 bit/residue since A^ > 100 for proteins.
Even if 5" is found to be 10^ (2 logg S)IN will still only be ~ 0.4 bits/residue.
Thus, the search (6) for long master sequences of amino acids is of considerable
interest with respect to information content considerations.
Summarizing for 7^, we can say that for nonstructural proteins the potential
information due to the amino acid sequence should be of the order of 0.85-0.95
of the possible maximum value. Although the constraints necessary to produce
such an effect should be of the same order of magnitude as those in printed
English, tests comparing language and the available proteins for which amino
acid composition or sequences are known indicate that the constraints operating
in the elaboration of proteins are probably different from those associated
with language. Further, it seems unhkely that the unequal frequency of amino
acids in proteins is due to unequal availability of the amino acids in the cellular
pool. The possibility that polypeptide chains are segments selected from a
single or restricted number of repeating sequences may be an explanation of
the unequal frequencies, in which case /^residue would be close to zero.
Configuration '• With the present state of knowledge the factors affecting
/^ are much more difficult to assess. The number of states available to a poly-
peptide chain whose bonds retained all of the lability they had as uncombined
amino acids would be essentially innumerable. In fact, about the only con-
figurations ruled out would be those resulting in closure of the chain upon
itself. However the D- and L- forms do not both exist in nature and as has been
pointed out by Pauling, Corey and Branson (8), the a-C, N and O group
in the backbone of the polypeptide chain is essentially the planar, resonance
O
/
/
structure — C -N— . Other than these primary restrictions the polypeptide
chain, in the absence of intramolecular or secondary bonding structures is
essentially a random structure.
Protein Structure and Information Content 109
Kauzmann (9) has given an excellent discussion of the known types of
intramolecular bonds which are responsible for protein folding and which should
therefore affect 7^. The most common type is the H— bond, especially those
formed between the carboxyl O and the amide H. These are essentially non-
specific bonds which can form between any pair of amino acid residues in
which the C — O and N — H bonds are oriented at the proper angle. A stronger,
more specific, but less common H^ — bond can form between the phenolic
OH groups of tyrosine and the carboxyl group of glutamic or aspartic acid
(9, 10). Another common type of bond stems from the van der Waals forces,
which can exist between the atoms in different portions of the same or neigh-
boring chains. The third type discussed by Kauzmann is the so-called hydro-
phobic bond, which is distinct from the more commonly discussed van der
Waals bonds. This results from the tendency of the more hydrophobic amino
acid residues to avoid the aqueous phase and adhere together to form a sort of
intramolecular micelle. These bonds, although they possess a low order
of specificity, may contribute a good deal of stability since they arise as a
result of the fact that the more hydrophobic amino acids cannot participate
in the strong H-bonding with the solvent water molecules. Salt bridges, which
are the ionic bonds formed between the negatively charged (glutamic and
aspartic) and positively charged (lysine and argenine) residues, are another
type. However, Jacobsen and Linderstrgm-Lang (11) have presented evidence
which indicates that these bonds are of negligible importance as intramolecular
protein bonds. One of the most important types of intramolecular bond (at
least according to current theories (12)) is the highly specific S — S bond formed
between cysteine residues in different portions of the same or neighboring
chains. The formation of disulfide bonds as well as the 'strong' H-bonds greatly
reduces the number of physical states available to the molecule since they can
only be formed at a very few sites in the molecule. Since these two types of bond
are the most specific of the intramolecular bonds, they are undoubtedly the
most effective in determining variations in structure between different kinds of
proteins.
Repetitions Structures: Intramolecular bonds fonned in such a fashion
as to produce repetitious structures reduce 4 tremendously. In the helical or
pleated sheet structures proposed by Pauling, Corey and Branson (8) (and
illustrated in (13)) the number of free parameters necessary to describe the
configuration completely is extremely low and therefore the information content,
/f, is also very lov/. In the helices it is only necessary to specify the length
(that is, the total number of residues R), the pitch (3.7 or 5.1 residues per turn)
and the exact orientation of the helix with respect to a reference point in the protein.
An estimate of the lower bound of /^ can be obtained from these factors
as follows: 1) To find the exact number of residues, /?, in a helix requires about
2 log2 R bits.* 2) The pitch requires 1 bit (3.7 or 5.1 residues/turn of the helix).
* It is rather interesting that the determination of the value of any integer, either + or —
(other than zero), requires exactly In bits, where 2"~^ < i? < 2" (which is close to 2 logs R):
II bits are necessary to find that |^| is in the range indicated, // — 1 bits to find \R\ and 1 bit to
determine R, i.e. the sign. For example, let R = —48: six questions which can be answered
by yes or no will show that \R\ is 33-64; five more questions will determine that of the 32
possible values \R\ = 48 and one yes or no question determines R = —48. Thus, 2* < i? < 2«
and 111 = 12 bits.
110 L. G. AUGENSTINE
3) A reasonable value for the number associated with specifying the interhelical
bonds would seem to be 7?/2 bits. This arises by assuming RjA interhelical
bonds, i.e. one bond per turn of the helix, and the previous discussion of intra-
molecular bonding indicates that the identity of each interhelical bond requires
about 2 bits of information. Another reasonable value for this factor is i?/4; this
would occur for 1 one-bit interhelical bond or 1 two-bit bond every other turn,
which attempts to take into account that disulfide and "strong" H-bonds
are probably the most important interhelical bonds. Actually this factor could
be zero since it may not be possible to specify interhelical bonds independent
of the sequence. 4) The information necessary to specify the orientation of
each helix with respect to some reference point in the protein is the most
difficult factor to estimate. It may be almost zero, since the interhelical bonds
may unequivocally determine the orientation of the helix. On the other hand,
it should not be larger than (log2 R + 30) bits, where logg R bits is sufficient
to determine a specific residue and 30 bits to specify its orientation. The 30
bits would be assigned to the six parameters associated with the two vectors
necessary to specify orientation. An average 'grain' of 1 :32 is undoubtedly
too coarse for specifying the orientation of a. single isolated helix, but is probably
adequate for specifying a helix which is oriented in relation to others in the same
molecule.
The 7?/2 and 30 and the zero terms have been combined to give 'high'
and 'low' values for the estimation of the minimum of /,.. These are calculated
as /^residue (in bits) by
/./residue - "!' LO W (1)
and
30+3 logo R
= 0.50H n ff^GH (2)
The results as a function of R are shown in Fig. 3. Pauling, Corey and
Branson (8) cite examples of heUcal polypeptides for which Ris 11, 18 and 36.
The corresponding region of Fig. 3 has been shaded. From these considerations
it would appear that the minimum value of /<. should be about 1 to 4 bits/residue
depending upon R.
Although many proteins appear to be helical in nature, there are others,
such as ribonuclease (RNase), which from the available evidence would seem
not to be. In RNase the structural specificity appears to be determined pre-
dominantly by the S — S bonds with the other intramolecular bonds adding
stabihty to the structure. A further discussion of the relative importance of
the specific and non-specific intramolecular bonds in maintaining structure
will be presented later.
It is obvious that an upper limit cannot be assigned to /. as readily as to
7^.. However, since the structures proposed by Pauling, Corey and Branson
probably represent polypeptide configurations for which /. is near minimum,
it would appear that one bit/residue is a reasonable lower limit for 7^. From
the estimates of 7, and 7^ presented here, it appears that for the proteins of
general interest 7^ should have a value in excess of 4.5 bits/residue although
Protein Structure and Information Content
111
if il is found that polypeptides are chosen from a single, long master sequence
the value could be as low as 1 .0 bits/residue.
Estimates of 4.5 bits per residue or greater at the structural level give a
total information content, /,, for the non-structural proteins in excess of 500
bits (or in excess of 100 bits if the minimum estimate turns out to be the true
one). Such an estimate is in sharp contrast to the estimates of 10 bits or less
1000
3
o
UJ
o
z
o
UJ
m
13
Z
/RESIDUE-- 0.50 +
30+3 \oq, R
3 loq^ R
Ij./ RESIDUE (IN
Fig. 3. Limits for estimates of the minimum of 4 as a function of the number
of residues per helix. The shaded area indicates helical polypeptide sizes reported
by Pauling, Corey and Branson (8).
obtained by Quastler and his co-workers (14) as the amount of information
which must be transmitted for the proper functioning of most protein-controlled
systems (e.g. enzymes, immune bodies).
III. ESTIMATION OF STRUCTURAL INFORMATION CONTENT
NECESSARY FOR FUNCTION
A disparity of at least one order of magnitude or more in passing from
one context or level of organization to another is of considerable interest.
The ten-fold difference indicates that only a small part of the information
potential is actually utilized in information transmission.
Does this indicate that information transmission in such systems is very
noisy and therefore organisms obtain good transmission by utilizing a very
high degree of redundancy? Dancoff (15) proposed a principle of maximum
112 L. G. AUGENSTINE
error in which he postulated that an organism (or for instance a protein-
controlled system) will commit as many errors as are consistent with normal
function, but that the inherent error rate, which is probably quite high for such
reactions, is maintained at a tolerable level by the use of redundancy. Resorting
again to the language analogy — a protein corresponds to a paragraph in
complexity and its function may correspond to the thought which is conveyed
by a paragraph.
Does the difference in information content between the two contexts mean
that in the process of evolution the organisms found that particular polypeptide
configurations contained structures which could perform useful functions,
but that these polypeptide permutations contained a large amount of excess
and useless infomiation which has been perpetuated along with the small
amount of information associated with the necessary structure ?
Does it indicate that much of the protein structure is involved in secondary
features of information transmission (e.g. the acquisition, concentration,
and transport of energy) and only a small part of the total information content
of the protein is intimately engaged in the process of information transmission ?
Or does it indicate that each enzyme or p'rotein is capable of mediating
many reactions and our experimental ingenuity has not been able to determine
more than just a few of them ? (This is analogous to attempting to measure
the information transmitted by a source wliich is transmitting through many
channels, by monitoring only a single channel.)
The discussion which follows will attempt to throw some hght on these
questions. However, two important considerations must always be borne in
mind when one is deahng with proteins. They are first and foremost colloidal
in nature and therefore much of their activity falls in the realm of surface
reactions. In the globular proteins it is quite likely that much of the total
structural information content is in the interior of the molecules and therefore
is unavailable to participate in information transfer occurring at their surface
and can only participate in secondary operations similar to those mentioned
above. The second consideration involves the question, just what is required
for the transmission of one bit of information by a protein system? It seems
very likely that one bit of potential structural information will not always
transmit the same amount of information; rather, the efficiency of transmission
will depend upon the context within which the performance is measured.
For example, it is probably much simpler to attach either a hydroxyl or methyl
group to a benzene molecule (which would involve one bit of determination)
than it is to construct either a 3.7 or 5.1 helix (which also involves one bit
of determination). This is somewhat analogous to the relative difficulties of
determining whether a symbol is or 1, or to determining whether one should
get married or not !
Ig necessary: It appears in some cases that a fairly large fraction of the
potential surface information due to the amino acids present is superfluous.
For instance, it has been found in insulin that a large fraction of the residues
cannot be critical for function. lodination, sulfonation and chelation, each
of which can mask surface i?-groups, have been found not to affect insulin
activity. Those residues which are species-specific can also be ruled out as
being critical for function. Unfortunately, it is difficult to determine the exact
Protein Structure and Information Content 113
degree to which a particular type of residue is masked by a given treatment,
so that it is impossible to state exactly the fraction of surface residues which
are not critical. In a similar manner, it is possible to mask the lysine and arginine
residues on the surface of trypsin without destroying its activity (16). In fact,
acetyltrypsin is available commercially (17) and has the ideal feature that with
its lysine and arginine /^-groups masked, its ability to act as a substrate for
other molecules of trypsin is decreased. Haurowitz (18) has also pointed out
that some of the antigenic properties of proteins are in many cases not affected
by iodination or sulfonation of receptive surface groups.
The work of Raacke (19) has shown that a certain amount of surface
heterogeneity (as demonstrated by electrophoretic behavior) is still compatible
with a fully active protein. Her results plus the uncertainty found in the analyses
of amino acid compositions indicate that an uncertainty of the order of 3 to
10 per cent can occur in the amino acid complement without loss in charac-
teristic function. The results of Roberts and Cowie (mentioned previously)
involving competition in the amino acid pool also indicate that about 3 to 20
per cent variabihty in amino acid incorporation can occur. However, it should
be borne in mind that each position in the polypeptide sequence may not have
a 3 to 10 per cent tolerance associated with it; rather, those residues which
participate in active sites likely have a zero tolerance.
/j necessary: Kalnitsky and Rogers (20) have reported that approximately
15 per cent of the ribonuclease molecule can be digested off with carboxy-
peptidase before activity is lost. Work reported by Anfinsen (10, 21) indicates
that this estimate may be a little high. Rather, he reports that the carboxy-
tenninal three amino acids (valine, serine, alanine) can be removed with no
loss in activity; but, that digestion with pepsin which splits off these three
plus their neighbor, aspartic acid, and also ruptures a "strong" hydrogen
bond in the vicinity produces loss in activity. Partial digestion by subtilisin
(10, 22), which apparently digests central portions of the polypeptide chain,
leaves the activity of the RNase intact as long as the digested portion is not
oxidized. It is also known that fragments obtained either by hydrolysis or
partial enzymatic degradation from myosin (23-25), trypsin (26), chymotrypsin
(27, 28), lysozyme (29), papain (30) and pepsin (31, 32) retain their activity
in certain situations. The results with pepsin and papain are particularly
striking. Hill and Smith report no loss in the molar activity of papain (toward
a synthetic substrate) after an average of 120 of its 180 residues had been removed
by leucine-aminopeptidase (an N-terminal type enzyme). Perlmann has reported
that some of the dialyzable fragments (which represent 20 per cent of the
total original protein) resulting from pepsin auto-digestion retained 1 to 5
per cent of the original activity toward hemoglobin, but about 75 per cent
of the activity of the intact pepsin when tested against the synthetic substrate
acetyl 1-phenylalanyl diiodotyrosine. These latter results indicate strongly
that pepsin, at least, has more than one active site and the site specific for pep-
tide linkages adjacent to an aromatic amino acid depends upon the integrity
of only a small portion of the molecule.
4 necessary: Of parallel interest to the above considerations is the question
of how much configurational infonnation, I„ is necessary for function? The
work of Anfinsen and others (10, 33) indicates that the configuration of RNase
114
L. G. AUGENSTINE
can be considerably disrupted without loss in activity. They found that rever-
sible denaturation in 8 M urea did not cause permanent loss in activity; in
fact the RNase was still active in 8 M urea in which its specific viscosity was
8.9 as compared with 3.3 in aqueous solution. This large increase in specific
viscosity indicates that the so-called native configuration can be opened con-
siderably without destruction of activity. However, Anfinsen reports that
oxidation with performic acid, which disrupts the disulfide bonds, causes
irreversible inactivation and an increase in specific viscosity to 11.6.
The phenomenon of complete loss in activity upon the appearance of the
full sulfhydryl titer has been observed in most proteins. It has also been known
for a number of years that different degrees of loss in characteristic activity
can occur. A number of workers (34, 35) have studied reversible inactivation
of enzymes in which it has been observed that a partial unfolding of the mole-
cule can occur with a rise in specific viscosity, change in the optical rotation
of the protein solutions, changes in solubility, etc., which upon the proper
treatment can be reversed. The thermodynamics for reversible denaturation
shown in Fig. 4 indicate that quite hkely the first step is common from protein
to protein since AF* is remarkably constant for all proteins. Reversible denat-
uration invariably shows an increase in entropy. However, AS* is not constant
from protein to protein but varies by a large amount as shown by the unhatched
areas to the right in Fig. 4.
The author has proposed (12, 36) and discussed elsewhere in this volume
(37) a hypothesis involving three steps, which attempts to explain this pheno-
menon by ascribing the constant AF* to the initial opening of a disulfide
bond. This first step is followed by the rupture of a number of neighboring
intramolecular bonds (step 2) with a resulting opening of the molecule indicated
by the increase in entropy. According to the proposal, this opening of the mole-
cule is sufficient to disrupt the spatial arrangement of critical amino acids causing
loss in activity, but enough stability and configuration is retained so that under
the proper conditions the original native structure, or at least a structure
compatible with activity, can restitute. In this hypothesis the rupture of a
second disulfide bond (step 3) allows irreversible inactivation to proceed with
essentially complete destruction of the characteristic protein structure.
A conversion (using an equivalence derived in reference (38)) has been
made in Fig. 4 from AS* to A/^. By assuming an average amino acid residue
Table I
Protein
M.W. X 10-3
M.W.
^- 120
A/, (bits)
A/./iV
(bits/residue)
Pepsin
Trypsin
Emulsin
36
20
38
300
167
317
78
30
48
0.26
0.18
0.15
Amylase
Hemoglobin
59.5
67
496
558
36
110
0.07
0.20
Egg albumin
40
333
226
0.68
Lacto peroxidase
Insulin
93
12
775
103
340
18
0.44
0.18
Protein Structure and Information Content
115
weight of 120, A/,/residue is given in Table I for those proteins in Fig. 4 for
which the molecular weights are available.
Thus A/, for the loss in specific activity is of the order of 0.25 bits/residue
(the 0.68 value for egg albumin does not correspond to a loss in specific activity).
This indicates that destruction of the right 5 to 25 per cent of /, (assuming
/p is close to our minimum estimate of I to 4 bits/residue) causes loss of function,
which may be reversible or irreversible depending upon which intramolecular
bonds are disrupted.
PEPSIN
T= 298° K
T= 323° K
PROTEINASE
TRYPSIN (KINASE)
TRYPSIN
INVERTASE
INVERTASE
VIBRIOLYSIN
TETANOLYSIN
HEMOLYSIN (GOAT)
RENNIN
T = 328°K
LEUCOSIN
INVERTASE (YEAST
I NVERTASE
T=333
EMULSIN (WET)
AMYLASE (MALT)
SOLAN IN
HEMOGLOBIN
TOSS" K
EGG ALBUMIN
T=343°K
PEROXIDASE (MILK)
T= 353° K
INSULIN
Fig. 4. The equivalence between AS* and A/^ for thermal inactivation. The
shaded areas to the left represent AF* and the clear areas to the right AS*.
(Adapted from Fig. 1, ref 12, by courtesy of University of Illinois Press.)
Summary
The above discussions indicate that redundancy considerations are not
the explanation of the large excess of structural information content; rather,
that only a small fraction of the potential information on the surface of the
molecule is actively utilized in information transfer. Haurowitz (18), for
instance, has pointed out that experiments with substituted antigens indicate
that the antigenic specificity resides in an area on the surface of the protein
which is approximately 10 to 15 A in diameter. Results cited here suggest
116 L. G. AUGENSTINE
that the four or so amino acid residues which would occupy such a surface
area (13) may occur as neighbors on the same chain (30-32). Other results
mentioned previously (20, 21, 33) suggest that the critical amino acids do not
occur in sequence in a single polypeptide chain. This follows from the con-
sideration that digestion should be able to consume an average of about 50
per cent of the protein molecule before an active site composed of four or
five adjacent amino acids would be encountered ; whereas one of four or five
amino acids making up an active site should be encountered, on the average,
after about a 20 to 25 per cent digestion of the molecule if the amino acids are
distributed roughly at random. In addition, Kennedy and Koshland (39) has
found that phospho-glucomutase when placed in 6 M urea loses its activity but
recovers it upon dilution, which also indicates separated locations for the
critical amino acids. Therefore it may not be possible to state a general rule
concerning the relationship between the loci of critical amino acids within
polypeptide chains.
It seems that the role of intramolecular bonds is to insure that the amino
acids which are critical for function are maintained in the proper spatial
relationship to each other so that function can occur. Here again it is impossible
to state a general rule as to how many of these intramolecular bonds can be
disrupted before loss of function occurs, since apparently all of the hydrogen
bonds can be broken in RNase without loss in function but not so in phospho-
glucomutase. However, the integrity of the more specific secondary bonds
(such as S — S) seems to be much more critical for the maintenance of function.
The digestion experiments with pepsin and papain indicate further that it is
important where in the molecule the bonds are destroyed.
Other than ruling out redundancy as a possible reason for the discrepancy
between the large potential information and the measured performance, it
is difficult to choose among the other possibilities mentioned. The results
with pepsin and papain, which have been mentioned, suggest strongly that much
of the information content may be unnecessary for function, but has been
perpetuated along with the critical content. However, the results with pepsin
indicating that multiple sites do exist makes it impossible to assign a certain
fraction of the information content as 'garbage'. How much of the polypeptide
chain is involved in secondary features of information transmission and the
structural complexity necessary for transmitting one bit of information are
factors which are now being actively investigated by a number of workers.
The various estimates of 7^, I^ and /<. are tallied in Table II.
Table II
I total ~ I sequence + 'configuration
Maximum
Plausible
Minimum
Necessary for performing
a single specific function
— 4.32 —
>4.5 3.5 >1.0
1.0 15/A^ 1.0
10-90% 25% 35-90%
Protein Structure and Information Content 117
IV. CONJECTURES
Some of the results considered in preparing this paper lead to rather interest-
ing speculation. The repetitious minimum entropy polypeptide structures
proposed by Pauling, Corey and Branson (8) have already been mentioned.
Such configurations may be generally applicable to macromolecules, since
helical structures have also been proposed for desoxyribonucleic acid (DNA)
polymers (40) and some viruses (41). Crane (42) states that helical configura-
tions occur in linear (uni-dimensional) crystals, i.e. structures where progression
from each sub-unit to its essentially identical neighbor is by a repeated process
of translation and rotation. Lumry and Eyring (43) predict that once hydrogen-
bonded secondary structures are formed the characteristic protein 'conformation'
is determined by tertiary folding such that the free energy is minimized. How-
ever, this does not explain why crystallization should initially occur and be
maintained in solution; and to the author's knowledge no one has advanced
arguments which provide a complete basis to account for the apparent preval-
ence of minimum entropy biostructures, although there have been discussions
of how living organisms produce 'order from disorder' or 'order as a result
of order' (44). Considering the innumerable configurations available to bio-
logical polymers, the question arises 'Are there criteria which determine that
the seemingly improbable, highly ordered structures occur spontaneously?'
or 'Are these structures imposed at some specific stage in biosynthesis?'
Studies on the reversible denaturation of proteins (34, 35) suggest that the
latter possibihty is more probable: that is, mild mistreatment can be reversed;
whereas, once a certain molecular disarray or instability occurs, an unfolded
state results from which the characteristic, native structure does not reconstitute.
Neurath et al. (35) make the interesting point, that even if denaturation is
complete enough so that physical properties such as solubility, crystallizing
ability, or diffusion constants are seriously affected, some of the molecules
may subsequently revert to a biologically active form; whereas, others will
tend to reverse the molecular disarray by forming a more condensed state
but without successfully restoring the native biological properties. This suggests
that, although polypeptide chains have an inherent tendency to form semi-
condensed configurations, the highly ordered, biologically-active structures are
probably not only imposed during biosynthesis, but represent quasi-stable
structures with built-in constraints which tend to cause small fluctuations
to revert, i.e. a limited amount of disorder can be restrained without the inex-
orable Second Law prevailing. Neurath (35) has also reported that the amount
of disarray compatible with reversibility depends upon the type of denaturation.
Further, denaturation is not reversible under all conditions but may await
a change in pH or temperature. However, it is interesting that although an
entropy increase is invariably associated with denaturation, removal of the
denaturing agents can cause a decrease, which appears to contradict the Second
Law; we will later resolve this apparent contradiction.
The quasi-stability of native configurations is suggestive of the situation
in diatomic molecules where stability conditions are readily depicted as a
local 'weir (relative to the surroundings) or null area in a two-dimensional
energy-configuration plot. However, since two dimensions would allow only
118
L. G. AUGENSTINE
a very gross specification of the myriad degrees of freedom of macromolecules,
some form of multi-dimensional space will be necessary to represent their
stability conditions. The biologically significant portion of such a macro-
molecular space will also be a 'well', but in a multi-dimensional surface rather
than a line plot and will be centered near the locus of native structures in configu-
ration space. A fraction of the well will represent conditions consistent with
an active macromolecule and the remainder, conditions characteristic of
reversible inactivation. Anything outside the well will correspond to states
inconsistent with the restitution of a native configuration.
The multi-dimensional space can be of sufficient dimensionahty so that
all configurations differing by a 'single step' are neighbors. In such a 'fine-
grain' specification each microstate and its probability density (as a function
of energy, for example) can be represented. However, such a scheme has
drawbacks: first, it has little novelty since any situation can be completely
described by a sufficient number of parameters ; second, a model dealing only
with microstates would be extremely diflftcult to test experimentally; and
third, the excessive dimensionality makes it useless as an aid in envisioning
possible mechanisms of macromolecular rearrangements.
Thus, a 'coarse-grain' specification, which requires reducing the dimension-
ality by transforming the microstates into a more useful set of macrostates,
is desirable. This general operation can be schematized by the use of the follow-
ing contingency table :
Table III
■< Molecular Energy >
^1 ^2 ^k ^n
°'lll '''lia '^llk '^lln
°'l21 (^122 <^12i- ^12n
^m °'ij2 '^lik OCljn
"'all °'212 '^21fc ^2ln
'^iil '^iji ■
"■ijk
• a,.
A plausible specification for a multi -dimensional space is given in Table III,
where a sufficient number of binary digits is used so that each microstate
can be unequivocally identified, e.g. the two atoms involved in each bond
as well as the bond length and angle could be identified. Each ol^j^. represents
Protein Structure and Information Content 119
the probability density of a given microstate for molecular energy state E,^,
where the ranges of /,y and k can be essentially infinite.
A transformation to a 'coarse-grain' scheme which seems worth consider-
ation is as follows. Each macrostate, M, (depicted by the leftmost column
of digits in Table III) designates only which bonds exist in the macromolecule,
e.g. sulfur atom no. 7 is hooked to carbon no. 179 and sulfur no. 11, C-563
to C-564 and N-201, etc. Mechanistically all microstates, w,;, contained
in a given macrostate, M,, are grouped together by ordering the digits (or
analogously ordering the axes in space). To complete the transformation
other bond properties, e.g. length and orientation (the other column of digits
in Table III), and their associated probabilities (the right hand portion of
Table III) are lumped into two gross categories to provide an intuitively manage-
able representation. This 'lumped fine structure' for each macrostate, Af,-
can be represented on an 'energy-deviation' {ED,) plane at the locus (in trans-
formed configuration space) corresponding to A/,: 'deviation' is a measure of
instabihty, i.e. the extent to which individual microstates, /77,^, deviate from
the configuration »7,^ corresponding to maximum stability for macrostate
Mj. An example of a method for constructing such values is: (a) find the set
of digits «?,s in the middle column of Table III which represents maximum
stability for macrostate M^ and (b) determine how many of the corresponding
digits of /;?,, and m^j differ. This number provides an excellent measure of
'deviation' because each microstate has a unique Z)-value and 'neighboring'
microstates have adjacent Z)-values. Assigning probabilities to pairs of 'energy'
and 'deviation' values completes the "fine" to 'coarse-grain' transformation.
This requires summing the probabilities, a,^;;,, of those microstates associated
with a particular D-value. The probability densities for E and D values can
be arranged into contours of equal probability to avoid further complications
of adding a third coordinate to the ED plane. These contours will possibly
be quite irregular in shape and may well be discontinuous, since the only
obvious restriction on their form is that they be non-intersecting.
It should be noted that 'lumping' on to 'energy-entropy' planes would
have provided a simpler transformation than that to the 'energy-deviation'
planes. The microstates corresponding to a given 'deviation' can be equated
to an entropy value by the usual — S/^jlog/), procedure, where the /7/s are
the probabilities (properly normalized) associated with the microstates. Such
a scheme was considered, but was found to be intuitively less useful than the
ED transformation.
The 'energy-deviation' scheme is of considerable interest when one con-
siders possible mechanisms of both protein inactivation and enzymatic activity.
Suppose, for instance, that the energy of a molecule in a native configuration
is slowly raised, e.g. by external heat: the point representing 'molecular state'
will be driven to new loci in multi-dimensional space. Undoubtedly a trajectory
is followed such that the locus resides, 'statistically', on the contour which
has the maximum probability permissible or consistent with its energy content
and macrostate at any instant. This means that the locus first progresses over
the EDj plane of the particular native configuration. A/,-. Eventually a locus
will be reached where the probability contour occupied is lower than the corre-
sponding contour on an adjacent ED plane. The molecular state will then
120 L. G. AUGENSTINE
jump to that adjacent macrostate by some fomi of bond rearrangement.*
Even without an immediate change in molecular energy due to external heat,
the jump will likely be followed by an instantaneous migration of the molecular
state locus on the new ED plane. This would be anticipated since the new locus
might not be the position of maximum probability for that instantaneous
molecular energy. A sufficient increase in temperature would eventually
drive the trajectory out of the fraction of the null region corresponding to an
active molecule: with sufficient mistreatment the locus would be driven com-
pletely out of the null region into the portion of configuration space representing
irreversibly inactivated molecules.
Molecular energy will decrease when external heat is removed, and the
molecular rearrangements will be reversed or not depending upon the sym-
metry of the multi-dimensional surface of the well. Where denaturation is
reversed merely by reversing the denaturing conditions, apparently the inacti-
vation trajectory is retraced or else the null region is a smooth "well" with no
intervening metastable positions in the reversal trajectory. Thus, for reactiva-
tion the two trajectories would not have to be identical but need only form a
•closed loop.
Asymmetry in the probability contours of even one of the ED plots traversed,
could cause the inactivation and reversal trajectories to diverge sufficiently
so that metastable, non-active configurations would result. Such situations
have been observed experimentally; for instance, thermal denaturation at
alkaline pH is not reversed upon cooling until the pH is adjusted to acidic
conditions (35). Since a change in pH should alter the ED contours it is easy
to envision how it could make the reversal of denaturation more likely by
changing the transition probabilities between macrostates and thus alter the
reversal trajectory. Such an alteration would resolve the apparent contra-
diction of the Second Law: a changed pH would act as a 'Maxwell Demon
guiding the footsteps of the reversal trajectory'.
Considering its likely statistical nature, it is probable that much of the
trajectory of the locus of molecular states proceeds along essentially negligible
probability gradients, not only with respect to transitions from one macrostate
to another but more particularly with respect to instantaneous displacement
from the locus of arrival on a new ED plane. Such transitions should be readily
reversible and in general of limited consequence except as they lead to regions
of larger gradients. However, a 'low-gradient' region would allow considerable
leeway in trajectories. This would permit multiple pathways which would
account for the spectrum of effects often observed following physical denatura-
tion. In those transitions involving bonds which latch large segments of the
molecule together (12) (e.g. interhelical bonds) gross molecular rearrangements
could occur so that the trajectory would pass through regions of large probability
gradients. Such transitions would not be instantaneously reversible and would
therefore be relatively important in driving the trajectory away from the "active"
portion of or even out of the 'well'.
My proposed inactivation hypothesis discussed later (37) attempts to
* Somewhat more rigorous discussions of factors aflfecting the trajectory of the locus of
molecular state in similar multi-dimensional plots have been given by Teller (45) and Lumry
and Eyring (46).
Protein Structure and Information Content 121
specify the identity and sequence of high-gradient transitions. On this basis
energy from an absorbed quantum, ionization or thermal process would
migrate through the molecule in a fashion represented mainly by a 'low
gradient' trajectory. However, once the energy or charge becomes localized
in a bond of low ionization potential involved in latching large segments
of the molecule together, a 'high-gradient' transition, not readily reversible,
would occur. The inactivation efficiency of absorbed energy will thus be
a function both of the locus of the molecular state at the time energy is
absorbed as well as its resulting trajectory; where the trajectory depends
upon the amount of energy introduced, the point of absorption and any
external factors which affect the contours on the ED planes. For instance,
the quantum efficiency of UV varies considerably with pH for a number of
enzymes (47).
The interdependence of energy, configuration and probability proposed
here provides a formalism for depicting enzyme action. It is fairly typical
of enzyme, as well as other types of catalysis, that reactions proceed which
are normally not feasible because of steric or energetic hindrances. It is entirely
possible that because of their large size, enzymes act as large energy reservoirs
whose function is to "deliver" a quantity of energy to a particular site or com-
plex in an irreversible fashion. Another possibility is that energy may not
be delivered per se but as a change in configuration of the enzyme with a
corresponding alteration in the spatial relationship between reactants complexed
to the enzyme. Within these proposals the formation of the enzyme-substrate
complex could have an important function. It could act as an external agent
affecting the ED contours so as to cause a directed alteration in trajectory,
leading finally to a completed enzyme catalysis. Effective, i.e. rapid and
essentially irreversible, enzyme catalysis will likely depend upon (1) an E — S
complex formation which involves a high-gradient transition, so as to enhance
a drastic alteration in the trajectory of molecular state, and (2) the directed
trajectory passing through a high-gradient region, preferably just before
completion of catalysis, in order to make reversibility unlikely.
REFERENCES
1. H. Branson: Information theory and the structure of proteins. In: Information Theory in
Biology, ed. by H. Quastler, 84-104, University of Illinois Press, Urbana (1953).
2. R. Roberts: Carnegie Institution Yearbook:, No. 55, 110-148 (1956).
3. D. Cowie: Carnegie Institution Yearbook, No. 55, 110-148 (1956).
4. L. AuGENSTiNE, H. BRANSON, and E. Carver: A search for intersymbol influences in
protein structure. In: Information Theory in Biology, ed. by H. Quastler, 105-118,
University of Illinois Press, Urbana (1953).
5. G. Gamow, a. Rich, and M. Ycas: The problem of information transfer from the
nucleic acids to proteins. In: Advances in Biological and Medical Physics 4, ed. by J. H.
Lawrence and C. Tobias, 23-68 (1956).
6. H. MoROwiTZ, et al.: personal communication.
7. M. Ycas: (preceding paper in this volume).
8. L. Pauling, R. Corey, and H. Branson: The structure of proteins. Proc. Nat. Acad.
Sci., Wash. 37, 205-211, 235-285 (1951).
9. W. Kauzmann: In: The Mechanism of Enzyme Action, ed. by W. McElroy and B. Glass,
Johns Hopkins Press, Baltimore (1954).
122 L. G. AUGENSTINE
10. C. Anfinsen: Informal lecture at the Fourth Buena Vista Conference on Protein Synthesis
(1956).
C. Anfinsen: Advances in Protein Chemistry 11, 1-100 (1956).
D. Steinberg, M. Vaughan, and C. Anfinsen: Kinetic aspects of assembly and degrada-
tion of proteins. Science 124, 389-395 (1956).
11. C. F. Jacobsen and K. Linderstr0M-Lang : Salt hnkages in proteins. Nature, Loud. 164,
411^12(1949).
12. L. Augenstine: Structural interpretations of denaturation data. In: Information Theory
in Biology, ed. by H. Quastler, 119-124, University of Illinois Press, Urbana (1953).
"Trypsin monolayers at the air-water interface," PhD thesis at the University of Illinois
(1956).
13. L. Augenstine: Remarks on Pauling's protein models. In: Information Theory in Biology,
ed. by H. Quastler, 75-83, University of Illinois Press, Urbana (1953).
14. H. Quastler: The specificity of elementary biological functions. In: Information Theory in
Biology, ed. by H. Quastler, 170-190, University of Illinois Press, Urbana (1953).
15. S. Dancoff and H. Quastler: The information content and error rate of living things.
In: Information Theory in Biology, ed. by H. Quastler, 263-273, University of Illinois
Press, Urbana (1953).
16. J. Sri Ram, L. Terminiello, M. Bier, and F. Nord: On the mechanism of enzyme
action. LVIII. Acetyltrypsin, a stable trypsin derivative. Arch. Biochem. Biophys. 52,
464-477 (1954).
17. Worthington Biochemical Corp.: Descriptive manual 8, Freehold, N.J.
18. F. Haurowitz : Protein synthesis and immunochemistry. In : Information Theory in Biology,
ed. by H. Quastler, 125-146, University of lUinois Press, Urbana (1953).
19. I. Raacke: Heterogeneity studies on several proteins by means of zone electrophoresis
on starch. Arch. Biochem. Biophys. 62, 184-195 (1956).
20. G. Kalnitsky and W. Rogers: The activity of ribonuclease after digestion with
carboxypeptidase. Biochim. Biophys. Acta 20, 378-386 (1956).
21. C. Anfinsen: The inactivation of ribonuclease by restricted pepsin digestion. Biochim.
Biophys. Acta 17, 593-594 (1955).
22. S. Kalman, K. Linderstrom-Lang, M. Ottesen, and F. Richards: Degradation of
ribonuclease by subtilisin. Biochim. Biophys. Acta 16, 297 (1955).
23. J. Gergeley: Studies on myosin-adenosinetriphosphatase. /. Biol. Chem. 200, 543-550
(1953).
24. A. Szent-Gyorgyi: Meromyosins, the subunits of myosin. Arch. Biochem. Biophys.
42, 305-320 (1953).
25. E. MiLHALYi: Trypsin digestion of muscle proteins. II. The kinetics of the digestion.
J. Biol. Chem. 201, 197-209 (1953).
E. MiLHALYi and A. Szent-Gyorgyi: Trypsin digestion of muscle proteins. I. Ultra-
centrifugal analysis of the process. /. Biol. Chem. 201, 189-196 (1953).
26. F. Nord, M. Bier, and L. Terminiello: On the mechanism of enzyme action. LXI.
The self digestion of trypsin, Ca-trypsin and acetyltrypsin. Arch. Biochem. Biophys.
65, 120-131 (1956).
27. M. Kunitz: Formation of new crystalline enzymes from chymotrypsin. /. Gen. Physiol.
22, 207-237 (1938).
28. J. Gladner and H. Neurath: Carboxyl terminal groups of proteolytic enzymes. II.
Chymotrypsins. J. Biol. Chem. 206, 911-929 (1954).
29. J. Harris, C. Li, P. Condliffe, and N. Pon: Action of carboxypeptidase on hypophyseal
growth hormone. J. Biol. Chem. 209, 133-143 (1954).
30. R. Hill and E. Smith: Crystalline papain. Biochim. Biophys. Acta 19, 376-377
(1956).
31. G. Perlmann: Formation of enzymatically active dialysable fragments during auto-
digestion of pepsin. Nature, Lond. 173, 406 (1954).
Protein Structure and Information Content 123
32. G. Perlmann: Discussion of paper by D. Koshland in J. Cell. Camp. Physiology, 47,
Supplement 1, 217-234 (1956).
33. C. Anfinsen, W. F. Harrington, Aa. Hvidt, K. Linderstrom-Lang, M. Otteson,
J. ScHELLMAN : Studies on the structural basis of ribonuclease activity. Biochim. Biophys.
Acta 17, 141-142 (1955).
34. A. Stearn: Kinetics of biological reactions with special reference to enzymic processes.
Advances in Enzymology 9, 25-75 (1949).
35. H. Neurath, G. Cooper, and J. Erickson: The denaturation of proteins and its apparent
reversal. /. Phys. Chem. 46, 203-211 (1942).
H. Neurath, J. Greenstein, F. Putnam, and J. Erickson: The chemistry of protein
denaturation. Cliem. Rev. 34, 157-265 (1944).
36. L. AuGENSTiNEand R. Ray: Trypsin monolayers at the air-water interface. III. Structural
postulates on inactivation. J. Phys. Chem. 61, 1385-1388 (1957).
37. L. Augenstine: Discussion of a proposed mechanism of protein inactivation. (Part IV
of this volume.)
38. L. Augenstine: Information and thermodynamic entropy. In: Information Theory in
Biology, ed. by H. Quastler, 16-20, University of Illinois Press, Urbana (1953).
39. E. Kennedy and D. Koshland: Properties of the phosphorylated active site of phos-
phoglucomutase. J. Biol. Chem. 228, 419-433 (1957).
40. J. D. Watson and F. H. C. Crick : Molecular structure of nucleic acids. Nature, Land.
Ill, 737-738 (1953).
41. H. Fraenkel-Conrat: Rebuilding a virus. Sci. Amer. 194 (6), 42^7 (1956).
42. H. Crane: Principles and problems of biological growth. Sci. Monthly 70, 376-389
(1950).
43. R. LuMRY and H. Eyring: Conformation changes of proteins. /. Phys. Chem. 58,
110-120 (1954).
44. E. ScHRODiNGER : What is Life? Cambridge University Press, Cambridge, England (1946).
45. E. Teller: The crossing of potential surfaces. /. Phys. Chem. 41, 109-116 (1937).
46. R. Lumry and H. Eyring: Energy exchange in photoreactions. In: Radiation Biology III,
ed. by A. HoUaender, 1-70, McGraw-Hill Book Co. (1956).
47. P. Finkelstehm and A. McLaren: Photochemistry of proteins VI. pH dependence of
quantum yield and UV absorption spectrum of chymotrypsin. /. Poly. Sci. 4, 573-582
(1953).
48. H. Simon: On a class of skew distribution functions. Biometrika 42, 425-440 (1955).
DISCUSSION
Platt: Simon (48) has shown that skewed distributions (Yule distributions), such as those
in Fig. 2, can be obtained from models based on probabiHty assumptions much weaker than
those we were looking for. Thus our inability to determine constraints from a study of the
distribution of amino acid and letter frequencies in proteins and words is not surprising.
However (in agreement with our summarizing statement for that section), Simon points out
that the occurrence of a Yule distribution does not obviate more stringent constraints as the
underlying probability mechanism.
SPECIFIC MECHANISMS OF PROTEIN SYNTHESIS
AND INFORMATION TRANSFER IN THE
DEVELOPING CHICK EMBRYO*
H. R. Mahler, H. Walter, A. Bulbenko and D. W. Allmann
Department of Chemistry, Indiana University, Bloomington, Indiana
Abstract — Some preliminary data on precursors and pathways of protein biosynthesis in
chick embryos have been presented. The tentative conclusions stated are:
1. Egg white proteins are not utilized for the synthesis of embryonic proteins up to and
including the ninth day. Soluble proteins added to the yolk are incorporated effectively, and
preferentially to some of the yolk proteins proper.
2. Proteins, peptides and amino acids injected into the yolk sac are incorporated at
approximately equal rates. Considering the relative available pool sizes of the various pre-
cursors present in the egg, added proteins have to be regarded as the preferred amino acid
source of embryonic proteins.
3. A common precursor formed efficiently from proteins and relatively slowly from added
amino acids and peptides is considered a likely intermediate in the process.
4. Homogenates of adult organs injected into embryos can be used to elicit a response
previously reported for organ transplants, i.e. the apparently specific transfer of labeled
material from donor organs to the corresponding organ in the embryonic host. The super-
natant fraction of the cytoplasm appears to be, at least in part, responsible for the results
observed.
I. INTRODUCTION
It is the purpose of this contribution to describe, in brief, some preHminary
experiments on a controlled biosynthetic activity, namely, the precursors and
pathways of protein formation. It differs from most of the papers in this
symposium in dealing with phenomena rather than with concepts and in the
absence of any attempt to establish a functional correlation between these
biological phenomena and information-theoretical abstractions. It shares
with other papers in this volume the properties of being highly tentative,
and in presenting data and comments on a subject to which it is felt information
theory should eventually make significant contributions. With the hope
that arrival of that time might be hastened and that thought and discussion
might be stimulated, our data are presented for consideration. Some of the
results are derived from single experiments only and thus lack further con-
firmation. All of the approaches and conclusions reported are still under active
investigation and thus subject to revision and modification.
Embryos were chosen for the experiments since their cells exhibit two
fundamental and related properties, both apparently controlled by the nuclear
* The investigations reported have been supported by grants-in-aid of the National
Heart Institute, National Institutes of Health, U.S. Public Health Service (Grant No. H 2177)
and of the National Science Foundation. This article is contribution No. 746 from the
Department of Chemistry, Indiana University.
124
Specific Mechanisms of Protein Synthesis in the Developing Chick Embryo 125
machinery, which set them apart from otlier cells of higher organisms. These
are: the capacity for replication, that is, rapid yet controlled growth; the
capacity for differentiation, that is, continuous yet controlled change and
evolution (1). Therefore, one might consider this the system of choice for
attempts at discovering how the information content of the hereditary material,
the genetic potentialities, are translated into progressive biochemical capabilities
and thus into physiological and morphological realities (2). The experiments
were done with chick embryos in ovo because of the ease of handling and the
essentially closed and self-contained nature of the experimental system. Further-
more, there is a relative paucity of reliable, modern information available
about their metabolism and that of embryos of higher vertebrates in general,
as contrasted to the large body of knowledge derived from experimental
embryology.
Our eventual aim is to study the initiation, the mode, and the control of
synthesis of highly specific, respiratory enzymes as an indicator of controlled
biosynthetic events; however, our initial investigations deal with the more
modest one of a definition of parameters for embryonic protein synthesis (3).
For any protein formed de novo, as has been pointed out by Spiegelman (4)
essentially three different mechanisms may be envisaged:
1. The rearrangement of pre-existing protein molecules; namely, the
urprotein hypothesis of Northrop (5), with suitable modifications.
2. The accretion of amino acids on to pre-existing proteins or peptides.
3. De novo synthesis from amino acids.
In the special case of the formation of induced enzymes in rapidly dividing
bacterial cells and cell-free systems derived therefrom, the evidence is over-
whelmingly in favor of the third alternative (4, 6). The situation is not nearly
as straightforward in the vertebrate systems studied. On the one hand, for
example, Work and collaborators investigated the synthesis of milk proteins
(7), Velick, Simpson and co-workers the synthesis of several specific enzyme
proteins for muscle (8, 9), and Loftfield and Harris the synthesis of liver
ferritin (10). All this work was in vivo and by different experimental techniques,
but all these authors presented strong evidence for the last alternative and against
the first two. On the other hand Anfinsen and his co-workers, working with
hen's oviduct in vitro, have demonstrated that in short term incubations
incorporation of amino acids into freshly formed ovalbumin is non-uniform,
which is suggestive of the second alternative, but that after longer periods
there is a redistribution towards unifonnity (11). Similar results have also
been obtained for ribonuclease and insulin synthesis by pancreas sHces.
In the case of the proteins of the chick embryo proper, Francis and Winnick
have presented data on the incorporation of labeled amino acids in free and
protein-bound form as possible precursors of cardiac muscle protein grown
in tissue culture (12). The amino acids of the proteins did not exchange with
large pools of the corresponding unlabeled acid in the medium, and from this
and from experiments with doubly-labeled proteins it was concluded that
proteins could be transferred from a nutrient embryo extract medium to
heart muscle protein without release of free amino acids. Tracer experiments
of this sort, as will be discussed later, do not, however, prove the direct transfer
126 H. R. Mahler, H. Walter, A. Bulbenko and D. W. Allmann
of protein, but solely suggest that there may not be free equilibration between
the free added amino acid pool and amino acids formed and utilized metaboli-
cally during precursor protein breakdown and product protein formation
respectively.
Another potentially very fruitful line of investigation is provided by some
experiments of Ebert's, the results of which tentatively suggest the incorporation
of organ specific adult proteins into those of embryos subsequent to chorio-
allantoic grafts of the donor organs (3, 13). These researches were the out-
growth of findings by Murphy (14) and by Danchakoff (15), made some
forty years ago, that such transplants of adult chicken spleen lead to a specific
enlargement of the host organs. A systematic re-investigation of the phenomenon
by Weiss led to the conclusion that transplants of kidney and liver, as well
as injections of organ breis of six-day old chick embryos into four-day old
hosts, could lead to similar effects (2). Weiss correctly pointed out that experi-
ments of this sort did not permit a choice between a 'template' or a 'specific
precursor' type of mechanism. Ebert's investigations are designed to shed
some light on this question as well as on the more general ones of protein
synthesis and organ specific growth control in embryonic development.
In our own investigations we have made use of S^^-labeled organ homo-
genates, isolated proteins, peptides, and amino acids to gain some insight into
the pattern of embryonic protein biosynthesis. In this work we have been
interested not only in the immediate but also in the original precursors, which
in this case must consist of all or part of the egg white and yolk proteins.
Preliminary accounts of some aspects of this work have appeared (16).
II. METHODS AND RESULTS
1 . Preparation oj Labeled Precursors
In the experiments to be reported in tliis and subsequent sections S^^-
labeled proteins, peptides, and a mixture of amino acids were prepared bio-
synthetically as follows: Torulopsis utilis was grown on S^^-sulfate (obtained
from Oak Ridge National Laboratory), according to Wood and Perkinson. (17)
After extraction with organic solvents (18) the yeast protein was hydrolysed
with a 1 :1 mixture of 6N HCl and 90 per cent fomiic acid. Humin was removed
by centrifugation and a portion of the neutralized hydrolysate, which also
served as source of amino acids in the experiments to be reported, corresponding
to 50 mc of the original S^^, was injected intraperitoneally into a laying White
Leghorn hen in two doses, about five hours apart. Eight hours after the second
injection the blood was withdrawn by heart puncture, allowed to clot, and serum
albumin and serum globulin prepared (19). The oviduct was removed from
the hen, and ovalbumin prepared essentially as described by Steinberg and
Anfinsen (11). All proteins were treated with cysteine at a pH of 8.0 to 8.5
to assure removal of exchangeable S^^, and then dialysed. Peptides were
prepared by peptic hydrolysis of the proteins. Aliquots of the radioactive
amino acids, peptides, and proteins were prepared by standard methods and
counted. In the tracer experiments, 0.05 to 0.1 ml aliquots of the radioactive
precursor solutions, containing 0.3 to 1.8 mg and 6000 to 25,000 counts per
minute each, were injected into the yolk or the albuminous portion of some two
Specific Mechanisms of Protein Synthesis in the Developing Chick Embryo 127
to three dozen unincubatcd, embryonated White Rock eggs. The punctures
were sealed with paralTin wax and the eggs then incubated at 38° C under
conditions of controlled humidity. Starting with the fifth and ending with
the ninth day after the injection, embryos were harvested and a number pooled.
The mixture was homogenized for about three minutes in a Potter-Elvehjem
homogenizer in Ringer's isotonic saline solution, made up to 10 ml (fifth and
sixth days) or 20 ml (seventh through ninth days), and precipitated with tri-
chloracetic acid (final concentration, 8 per cent). Dry protein powders were
then prepared and counted (20).
2. Is There Evidence for Selective Utilization of Egg-white or Yolk Proteins'}
In the first set of experiments, chicken serum albumin injected into yolk
or egg-white was used as a protein tracer. Table I shows the results of two
Table L Injection of Chicken Serum Albumin into Embryonated Eggs
njection
Egg
white
Egg yolk
Day after i
% of injected
activity found
per embryo
Protein wt of
embryo in mg
% of injected
activity found
per embryo
Protein wt of
embryo in mg
5
.006
.008
5.5
7
0.79
1.12
5
6.5
6
.012
.100
13
16
1.34
0.31
11
19
7
.015
.029
28
29
2.84
1.58
17
27
8
.016
45
4.04
3.35
43
48
9
.088
.133
72
79
2.86
7.28
53
87
series of experiments. The spread of the data is indicative of the precision,
reliability, and reproducibility usually obtained in experiments of this sort.
Let us now make the following assumptions: (a) that the injected protein is
a true tracer for egg-white and yolk protein respectively, i.e. that no permea-
bility or other pool barriers exist for its equilibration with the corresponding
unlabeled egg proteins; and (b) that there is no selectivity in the uptake mechan-
ism of the embryo either for or against a serum albumin tracer as a typical
precursor protein. Now we can calculate data shown in Table II and compare
the observed mean of the amount of protein actually formed, with that expected
on the basis of the above assumptions. The latter value is calculated by
multiplying the weight of total yolk or egg-white protein, about 3000 mg
each, by the per cent of the injected activity incorporated per embryo (from
Table I).
There are profound discrepancies between the calculated and the observed
128
H. R. Mahler, H. Walter, A. Bulbenko and D. W. Allmann
values. Those for the egg white are only a small fraction of those expected,
while those for the yolk are uniformly about two-fold greater. It is thus
apparent that at least one of the assumptions cited cannot be valid. The
simplest modification would be to postulate that assumption (b) is not true,
and that over the time-period studied egg white proteins are not precursors
Table II. Amounts of Embryonic Protein Formed Compared
to that Calculated from Tracer Data
Protein (mg/embryo)
Day after
injection
Observed
Calculatec
Egg-white
1*
Yolk
5
6
0.21
28.8
6
15
1.68
24.9
7
29
0.66
66.3
8
45
0.48
111.0
9
76
3.30
152.0
* From injected albumin tracer.
of embryonic proteins. Soluble proteins injected into the yolk can be utilized
for this purpose, and may be more efficient than some of the yolk proteins
proper.
3. Is There Evidence for Selective Utilization of Amino Acids, Peptides or Proteins ?
In the next series of experiments we compared serum albumin, albumin
peptides and amino acids all injected into the yolk, with the same precursors
injected into egg white. The design of the experiment was the same as before
and the results of one run are summarized in Table III,
Table III. Incorporation of Protein Precursors into Chick Embryos*
Day
after
Precursors injected into
YOLK
Precursors injected into
egg-white
injection
albumin
albumin
amino
albumin
albumin
amino
peptides
acids
peptides
acids
5
0.75
0.44
0.34
0.0063
0.35
1.19
6
1.30
0.90
1.53
0.013
0.56
3.03
7
2.80
1.70
3.86
0.015
1.59
3.48
8
4.05
4.72
5.15
0.016
2.32
4.94
9
2.85
8.52
9.18
0.088
5.94
5.65
* Expressed as per cent of injected activity recovered per embryo.
We see that except for albumin injected into egg-white, which has already
been discussed, all the precursors tested appear to be utilized with approxi-
mately equal efficiency regardless of whether they are injected into the yolk
or the egg white. This is not limited to serum albumin, but holds true equally
well for serum globulin and ovalbumin and their peptides as is shown in Table IV.
Specific Mechanisms of Protein Synthesis in the Developing Chick Embryo 129
Table IV. Incorporation into Embryos of Proteins and
Peptides Injected into the Yolk*
Day after
injection
S. albumin
S. globu-
lin
Ovalbu-
min
S. albumin
peptides
S. globulin
peptides
Ovalbumin
peptides
5
0.75
1.10
0.45
0.44
0.20
0.95
6
1.30
1.75
0.80
0.90
0.55
1.65
7
2.80
2.35
0.40
1.70
1.15
2.20
8
4.05
2.55
1.45
4.72
2.20
—
9
2.85
4.50
2.95
8.52
4.50
6.60
Expressed as per cent of injected activity recovered per embryo.
4. Is There Evidence for Organ-specific Transfer?
In order to test the hypothesis of organ-specific transfer advanced by
Ebert we have attempted to extend investigations of this sort to the use of
S^^-labeled aduh chicken Hver and heart homogenates. These were prepared
from deep-frozen organs of a White Leghorn hen injected with a mixture of
S^^-amino acids, and treated as described above.
After several months the tissues were thawed and homogenized in a tris-
(hydroxymethyl)-aminomethane buflfer solution at pH 7.4 containing 0.9 per
cent KCl, first in a Waring blender and then in a Potter-Elvehjem homogenizer.
The liver and heart homogenates, made up to 10 per cent (weight/volume)
with the same buffer solution, were then treated with cysteine at a pH of 8.0
to 8.5 to assure removal of all exchangeable S^^. After dialysis, some undis-
solved material was removed by low-speed centrifugation, and the relatively
clear supernatant fluid was used for intravenous injection into 9-day-old
chick embryos. Embryonated White Rock eggs were incubated at 38° C
under controlled humidity conditions for a period of 9 days. They were then
candled, and the location of the blood vessels was marked on the shell of each
egg. An area of about 1 cm^ of the shell above the vessel was carefully cut out
by means of a dental drill and burr without injuring the membrane, and the
small square was removed with a razor blade. A drop of mineral oil was placed
on the membrane to render it transparent, and 0.1 ml of the liver or heart
homogenate was intravenously injected in the direction of blood flow. The
eggs were reincubated for 24 hours and the embryos were excised. Hearts
and livers were removed, the organs were pooled, and homogenized; dry
protein powders were prepared for counting as described before. Similarly
aliquots of the homogenates used for injection were prepared and counted.
The results of these experiments are given in Table V. In all, two series
of experiments make up the Table. In the first series, twenty-four embryos
each were injected with heart and liver homogenates; of these, twenty-two and
eleven respectively survived.
In the second series, forty-four out of forty-seven embryos injected with the
heart preparation survived, while the number of survivors was twenty-two
out of twenty-eight for the liver homogenate. Thus the table summarizes
data obtained on 99 survivors out of 123 embryos that were injected: 66/71
for heart; 33/52 for liver.
130
H. R. Mahler, H. Walter, A. Bulbenko and D. W. Allmann
It can be seen that the relative specific activity of hearts is higher than that
of hvers when chicken heart homogenate is injected, whereas the relative
specific activity of the livers is higher than that of hearts when chicken-liver
homogenate is injected.
Table V. Incorporation of Activity from Adult-Tissue Homogenates into Nine-
Day Embryos after Twenty-four-hour Incubation
Injection
Item
Chicken heart homogenat
Chicken liver homogenate
CoLint/min per embryo injected
398
398
2780 2780
mg injected per egg
0.1
0.1
0.1 0.1
Organs investigated
Hearts Livers
Hearts Livers
Livers Hearts Livers Hearts
No. of organs cut out
22 11
22
11
11 11 11 11
Dry protein wt of organs
obtained (mg)
38.2 72.0
38.8
70.0
84.7 20.9 77.6 22.4
Wt counted (mg)
18.3 29.8
23.4
30.0
30.1 11.6 30.2 12.6
Count/min observed*
21 24
22
19
366 173 389 214
Corrected count/min per 30 mg
28 24
25
19
365 286 386 340
Relative specific activity
1.00 0.86
1.00
0.76
1.00 0.78 1.00 0.87
Counts per minute are within 5 per cent standard deviation.
III. CONCLUSIONS
The experiments on soluble protein tracers added to yolk and egg-white
demonstrate quite clearly that proteins added to the egg-white or, probably,
egg-white proteins themselves are incorporated with such low efficiency as
to rule out any important contribution from this source to the protein of the
developing embryo, at least up to and including the ninth day. Incorporation
of protein from the yolk is rapid, and soluble proteins injected into this source
may be utilized preferentially to some of the yolk proteins themselves. This
utilization of yolk rather than egg-white proteins as a source of embryonic
protein during this period is in accord with other investigations, notably the
quantitative protein depletion studies of Rupe and Farmer (21). For the
intervals studied, amino acids, peptides and proteins, even those of relatively
'foreign' origin such as the serum proteins, all apparently provide an equally
acceptable source of S^^ for embryonic protein synthesis (within an order of
magnitude or so), provided they are injected into the yolk. Now the protein
tracer must be diluted by at least a portion of the 3.0 g or so of yolk protein —
an estimate of approxim.ately 50 per cent would appear reasonable in view of
the results reported above. On the other hand, amino acids or peptides cannot
be diluted to any appreciable extent since the pools of these substances in the
egg are vanishingly small (22). From this one might conclude that proteins
themselves or substances easily formed from them must be the preferred precur-
sors of embryonic proteins. Since the egg protein ovalbumin is used no more
efficiently than the more "foreign" serum proteins, the pathways of assimilation
for these precursors, available to the embryo, must have at least some inter-
mediates in common. The data on peptides may find a similar interpretation.
Specific Mechanisms of Protein Synthesis in the Developing Chick Embryo 131
These intermediates are not free amino acids, as evidenced by their relatively
low incorporation rates. They may be small peptides or activated forms of
amino acids, formed readily and reversibly from protein precursors, but not
identical and not in equilibrium with the pool of added low-molecular weight
precursors. This view would be in accord with the findings of Francis and
WiNNiCK (12), although not with their interpretation. The occurrence of
pools of modified amino acids, incapable of equilibrating with those in the
medium, has been demonstrated in micro-organisms. Thus Gale, working
with Staphylococcus aureus, found that added glutamic acid could be so trans-
formed, and the modified fonn used for protein synthesis (24). Similarly
CowiE and Walton (25) have presented evidence that the pools of amino
acids formed metabolically in Torulopsis utilis and utilized as effective precur-
sors in protein synthesis, are present in some modified form, possibly as com-
plexes adsorbed onto macromolecules, and do not equilibrate freely with
added amino acids in the medium. In all the cases presented, this metaboli-
cally active form of the amino acids may be formed by a variety of pathways
as indicated below.
Proteins
1
[Peptide Intermediates]
y
>'
1
Free peptides ^'Amino Acids'-^ Free amino acids
(modified)
Recent investigations, especially by Zamecnik and his collaborators, (26)
have disclosed that free amino acids are first 'activated' by enzymes in the
soluble portion of the cytoplasm (27), probably through mixed anhydride
formation with adenylic acid (27, 29, 30) prior to their incorporation into a
protein-bound form (30, 31), which takes place in RNA-rich granules associated
with the microsomal fraction of homogenates (32, 33, 34). Whether or not
the metabolically active form of amino acids alluded to above can be equated
with these aminoacyl adenylates has not yet been established.
An alternative explanation, which has been invoked to account for apparent
preferential utilization of proteins over amino acid precursors in the formation
of specific proteins, postulates proteolysis and protein synthesis sites in such
close spatial juxtaposition as to permit ready transfer of intermediates from
breakdown to synthesis site at the expense of penetration of the latter by
added amino acids. This has been suggested by Loftfield and Harris (10)
as the mechanism operative in ferritin synthesis, and by Walter et al. (20)
in the transformation of serum into organ proteins. Purely spatial factors
of this sort are probably not the determining ones in the present instance,
since it can be demonstrated that the bulk of the proteolytic activity is centred
in the yolk (23), and thus remote from the synthetic activity which is, presum-
ably, occurring in the embryo itself. It is hoped that critical experiments
now in progress will permit a choice to be made between the various alter-
natives suggested.
132 H. R. Mahler, H. Walter, A. Bulbenko and D. W. Allmann
We have shown that the organ-specific locahzation phenomenon, previously
observed with chorio-allantoic transplants, can be dupHcated by the injection
of homogenates of aduh tissue. Similarly Tumanishvili et al. (35) found almost
simultaneously that host organ enlargement could also be elicited by the same
technique. This demonstration of the essential similarity of two approaches
clears the way for an investigation of the problem by means of relatively
straightforward biochemical and enzymological techniques rather than the
more demanding ones of experimental embryology. Obviously only a bare
beginning has been made. The findings will have to be confiiTned and extended
and several relatively trivial explanations excluded. Among such explanations
are, for instance, the transfer of whole cells on the one hand, and differential
composition and/or incorporation rates with respect to cystine and methionine
in the two tissues studied, on the other. Ebert claims to have eliminated both
these alternatives in his transplantation experiments; in the light of the available
information, they are not very likely in the present case. Nevertheless they
will have to be rigorously excluded. Our tentative interpretation of the prelimi-
nary results described is identical with that advanced by Ebert: that we are
dealing with a specific transfer of rather large units from the donor preparation
to the embryonic organ.
Preliminary experiments indicate that the injection of either heart or liver
(donor) homogenates leads to an increase in specific activity in the liver as
compared to the heart. The effect in this case is therefore non-specific and
possibly related to the higher mitotic and synthetic activity of liver relative
to heart, i.e. to fuller differentiation. Another line of approach which promises
to be of some interest is to determine the cell fraction or fractions, if any,
responsible for eliciting the effect both with respect to the donor and the acceptor
organ. Impetus is added to this approach by the recent experiments which
have focussed attention on the soluble and microsomal fractions as being
involved in the initial phases of protein synthesis. In preliminary experiments
with fractionated, dialysed heart homogenates the data of Table VI were
Table VI. Transfer of Label from Donor Heart
Fractions into Organs of Recipient Embryos
Fraction
Relative specific activity of
embryonic organs
(heart/liver)
Homogenate
Nuclei
Mitochondria
Microsomes
Soluble
1.17, 1.32, 1.23
0.65, 0.74
0.22 (?)
2.56
1.85, 2.50, 1.49
obtained. The number of data in each row corresponds to the number of
experiments actually performed. Thus the results for the microsomal and
mitochondrial fractions must be regarded as exceedingly tentative. With this
proviso, components of the soluble fraction of the cytoplasm might be regarded
as responsible for the phenomenon observed with whole heart homogenates.
Specific Mechanisms of Protein Synthesis in the Developing Chick Embryo 133
A similar observation has been reported by Kutsky who found the supernatant
fraction of embryo extract to be most active in stimulating the growth of heart
fibroblasts in vitro (36).
REFERENCES
1. For a very stimulating and up to date review the reader is referred to: B. Ephrussi:
Enzymes in Cellular Differentiation, in: O. H. Gaebler: Enzymes: Units of Biological
Structure and Function, Academic Press, New York City, 29-40 (1956).
2. For an intriguing hypothesis involving control of organ development by diffusible com-
ponents see the following articles: P. Weiss, in A. K. Parpart: The Chemistry and
Physiology of Growth, Princeton University Press, 135-186 (1949).
P. Weiss: Self regulation of organ growth by its own products. Science 115, 487-488
(1952).
P. Weiss: Some introductory remarks on the cellular basis of differentiation. /. Embryol.
Exp. Morph. 1, 181-211 (1953).
3. The present state of knowledge, with special emphasis on protein biosynthesis, is admirably
reviewed in the following: J. D. Ebert: Some aspects of protein biosynthesis in develop-
ment, in: D. Rudnick: Aspects of Synthesis and Order in Growth, Princeton University
Press, 69-112(1956).
4. S.Spiegelman: On the nature of the enzyme-forming system, />;; O. H. Gaebler: Enzymes:
Units of Biological Structure and Function, Academic Press, New York City, 67-89
(1956).
5. R. S. Alcock: The synthesis of proteins in vivo. Physiol. Rev. 16, 1-18 (1936).
J. H. Northrop, M. Kunitz, and R. M. Herriott: Crystalline Enzymes, 2nd ed.,
Columbia University Press, New York (1948).
S. B. KoRiK and H. Chantrenne: The relationship of ribonucleic acid to the in vitro
incorporation of radioactive glycine into the proteins of reticulocytes. Biochim. Biophys.
/^cfa 13,209-215 (1954).
6. D. S. HoGNESS, M. CoHN, and J. Monod: Studies on the induced synthesis of /3-D-
galactosidase in Escherichia coli. The kinetics and mechanism of sulfur incorporation.
Biochim. Biophys. Acta 16, 99-116 (1955).
7. P. N. CAiMPBELL and T. S. Work: The biosynthesis of protein. Uptake of glycine, serine,
valine and lysine by the mammary gland of the rabbit. Biochem. J. 52, 217-227 (1952).
B. A. AsKONAS, P. N. Campbell, and T. S. Work: The biosynthesis of proteins. Synthesis
of milk proteins by the goat. Biochem. J. 58, 326-331 (1954).
B. A. Askonas, p. N. Campbell, C. Godin, and T. S. Work: Biosynthesis of Proteins.
Precursors in the synthesis of casein and )3-lactogIobulin. Biochem. J. 61, 105-115 (1955).
C. Godin, and T. S. Work: Biosynthesis of proteins. The effect of intravenous peptides
on casein synthesis in a lactating goat. Biochem J. 63, 69-71 (1956).
8. M. V. Simpson and S. F. Velick: The synthesis of aldolase and glyceraldehyde-3-
phosphate dehydrogenase in the rabbit. /. Biol. Chem. 208, 61-71 (1954).
M. V. Simpson: Further studies on the biosynthesis of aldolase and glyceraldehyde-3-
phosphate dehydrogenase. J. Biol. Chem. 216, 179-183 (1955).
9. M. Heimberg and S. F. Velick: The synthesis of aldolase and phosphorylase in rabbits.
J. Biol. Chem. 208, 725-730 (1954).
S. F. Velick: The metabolism of myosin, the meromyosins, actin and tropomyosin in
the rabbit. Biochim. Biophys. Acta 20, 228-236 (1956).
10. R. B. Loftfield and A. Harris: Participation of free amino acids in protein synthesis.
/. Biol. Chem. 219, 151-159 (1956).
11. D. Steinberg and C. B. Anfinsen: Evidence for intermediates inovalb umin synthesis.
/. Biol. Chem. 199, 25-42 (1952).
C. B. Anfinsen and D. Steinberg: Studies on the biosynthesis of ovalbumin. /. Diol.
Chem. 189,739-744(1951).
10
134 H. R. Mahler, H. Walter, A. Bulbenko and D. W. Allmann
M. Vaughan and C. B. Anfinsen: Nonuniform labeling of insulin and ribonuclease
synthesized in vitro. J. Biol. Chem. Ill, 367-374 (1954).
M. Flavin and C. B. Anfinsen: The isolation and characterization of cysteic acid
peptides in studies of ovalbumin synthesis. J. Biol. Chem. Ill, 375-390 (1954).
D. Steinberg, M. Vaughan, and C. B. Anfinsen: Kinetic aspects of assembly and
degradation of proteins. Science 124, 389-395 (1956).
12. M. D. Francis and T. Winnick: Studies on the pathway of protein synthesis in tissue
culture. /. Biol. Chem. 202, 273-289 (1953).
13. J. D. Ebert: The effects of chorioallanteic transplants of adult chicken tissues on homo-
logous tissues of the host chick embryo. Proc. Nat. Acad. Sci., Wash. 40, 337-347 (1954).
14. J. B. Murphy: The effect of adult chicken organ grafts on the chick embryo. /. Exp.
Med. 24, 1-6 (1916).
15. V. Danchakoff: Equivalence of different hematopoietic anlagen (by method of stimula-
tion of their stem cells). II. Grafts of spleen on the allantois and response of allantoic
tissues. Amer. J. Anat. 24, 127-189 (1918).
16. H. Walter, A. Bulbenko, and H. R. Mahler: Precursors of embryonic chick proteins.
Nature, Lond. 178, 1176-1177 (1956).
H. Walter, D. W. Allmann, and H. R. Mahler: Influence of adult tissue homogenates
on formation of similar embryonic proteins. Science 124, 1251-1252 (1956).
17. J. L. Wood and J. D. Perkinson Jr. : Yeast biosynthesis of radioactive sulfur compounds.
J. Amer. Chem. Soc. 74, 2444-2445 (1952).
18. R. B. Williams and R. M. C. Dawson: The biosynthesis of L-cystin and L-methionine
labeled with radioactive sulphur S^\ Biochem. J. 52, 314-317 (1952).
19. W. Friedberg, H. Walter, and F. Haurowitz: The fate in rats of internally and exter-
nally labeled heterologous proteins. /. Immunol. 75, 315-320 (1955).
20. H. Walter, F. Haurowitz, S. Fleischer, A. Lietze, H. F. Cheng, J. E. Turner, and
W. Friedberg : Metabolic fate of injected homologous serum proteins in rabbits. /.
Biol. Chem. 224, 107-119 (1957).
21. C. O. RuPE and C. J. Farmer: Amino acid studies in the transformation of proteins
of the hen's egg to tissue proteins during incubation. /. Biol. Chem. 213, 899-906 (1955).
22. A. L. Romanoff and A. J. Romanoff: The Avian Egg. J. Wiley and Sons, New York
City. (1949).
23. A. L. Romanoff: Membrane growth and function. Ann. N.Y. Acad. Sci. 55, 288-301
(1952).
24. E. F. Gale: Assimilation of amino acids by gram-positive bacteria and some actions of
antibiotics thereon. Advances in Protein Chemistry 8, 285-391 (1953).
25. D. B. CowiE and B. P. Walton: Kinetics of formation and utilization of metabolic
pools in the biosynthesis of protein and nucleic acid. Biochim. Biophys. Acta 21, 21 1-226
(1956).
26. M. D. Hoagland, E. B. Keller, and P. C. Zamecnik: Enzymatic carboxyl activation of
amino acids. /. Biol. Chem. 218, 345-358 (1956).
27. P. SiEKEViTz: Uptake of radioactive alanine in vitro into the proteins of rat liver fractions.
/. Biol. Chem. 195, 549-565 (1952).
28. J. A. DeMoss and G. D. Novelli: An amino acid dependent exchange between inorganic
pyrophosphate and ATP in microbial extracts. Biochim. Biophys. Acta 18, 592-593
(1955).
29. P. Berg and G. Newton: Acyl adenylates : the interaction of adenosinetriphosphate and
L-methionine. /. Biol. Chem. 222, 1025-1034 (1956).
30. E. B. Keller and P. C. Zamecnik: The effect of guanosine diphosphate and triphosphate
on the incorporation of labeled amino acids into proteins. /. Biol. Chem. 221, 45-59
(1956).
31. H. Sachs and H. Waelsch: The effect of pyrophosphate on amino acid incorporation
into rat liver microsomes. Biochim. Biophys. Acta 21, 188-189 (1956).
Specific Mechanisms of Protein Synthesis in the Developing Chick Embryo 135
32. J. W. LiTTLEFiELD, E. B. Keller, J. Gross, and P. C. Zamecnik: Studies on cytoplasmic
ribonucleoprotein particles from the liver of the rat. J. Biol. Chem. 217, 111-123 (1955).
33. M. V. Simpson and J. R. McLean: The incorporation of labeled amino acids into the
cytoplasmic particles of rat muscle. Biocliim. Biophys. Acta 18, 573-575 (1955).
34. G. C. Webster and M. P. Johnson: Effect of ribonucleic acid on amino acid incorpora-
tion by a particulate preparation from pea seedlings. /. Biol. Chem. 217, 641-649 (1955).
35. S. TuMANisHviLi, K. M. Djandier, and I. K. Skanidze: Specific stimulation of the
growth of the chicken embryo by the effects of tissue extracts. C.R. Acad. Sci., U.R.S.S.
106, 1107-1109(1956).
36. R. J. Kutsky: Growth stimulating effects by nucleoprotein and cell fractions on chick
heart fibroblasts in vitro. U.S. Atom. Energ. Comm., U.C.R.L. No. 2270 (1953).
DISCUSSION
Quastler: It is useful to compare the informational requirements of various alternative
methods of protein synthesis.
If the whole protein is synthesized directly from amino acids, then each locus on the
template must carry sufficient information to specify a single amino acid, or approximately
four bits; this is well within the informational capacities of chemical reactions. If the incor-
poration occurs in two steps, as has been suggested, then each step might have to specify no
more than two bits.
If the protein is synthesized from peptide chains, then the informational requirements are
much more stringent. Consider the linking of two peptide chains of, say, five amino acids each.
If each of the ten amino acids can be any one of the whole set of amino acids, then the linking
operation must, in some way, identify ten amino acids, for a total of about forty bits — which
is a very large amount of information to be processed in a single act. The requirements are
greater — in fact, almost certainly too great — if two chains of ten amino acids are to be linked.
The following possibilities exist which allow the use of large fragments without imposing high
informational requirements : (a) the terminal amino acid in a chain identifies automatically
the other members — this would imply very strong sequential dependencies within peptide
chains, and consequently a low informational capacity of the whole amino acid sequence;
(b) linkages are formed without reference to the nature of residues remote from the locus of
linkage, and the resulting proteins are torn down again if not functional — in this case, the
probability of producing functional sequences by chance is small, and the efficiency of protein
synthesis is low; or (c) the protein studied is such that the exact sequence of residues is irrelevant.
THE MECHANISM OF ACTION OF METHYL
XANTHINES IN MUTAGENESIS
Arthur L. Koch
Department of Biochemistry, College of Medicine, J. Hillis Miller
Health Center, University of Florida, Gainesville, Florida
Abstract — The biochemical findings relating to the action of methyl xanthines on bacteria
and bacterial extracts have been reviewed. These observations, together with those of Novick
and SziLARD on the mutagenic activity of these substances, have suggested that the biological
action results from an inhibition of enzymes of nucleic acid biosynthesis. Consequences of
this hypothesis have been discussed relative to the regulation of growth of cell constituents.
Alternative hypotheses are enumerated.
I. INTRODUCTION
A NUMBER of agents, both chemical substances and radiations, cause mutations.
One particular class appears to be potentially most fruitful in an attempt
to understand the genetic replication process. This class includes purines and
related compounds. Particularly important are the plant alkaloids responsible
for the pharmacological effects of coffee, tea and cocoa. If these substances
are added to a continuously growing culture of bacteria, the mutation rate
is caused to increase markedly (1, 2).
If we compare the structures (Fig. 1) of the alkaloids caffeine, theobromine
and theophylline, with the purine bases normally present in nucleic acids of
Me N-
= C C-
,/
Me
Me N C-
CAFFEINE
H N-
Me N-
Me
>
THEOBROMINE
Me N C-=
I I
SI
Me-
-N C-
THEOPHYLLINE
XANTHINE
-NH,
-OH
H2N C
/
ADENINE
Fig. 1 . The structure of purine derivatives
GUANINE
all species, adenine and guanine, the similarity is readily apparent. The former
are methyl derivatives of xanthine, the latter amino and deoxy derivatives of
xanthine. It is tacitly assumed that these agents are mutagens because of this
similarity.
136
The Mechanism of Action of Methyl Xanthines in Mutagenesis
137
II. TRACER STUDIES
The first possibility to test was that these compounds, or products derived
from them, are utihzed for the synthesis of the nucleic acid of the host (3).
To do this we prepared these substances as well as some others, labeled with
carbon 14 in the 8-position of the heterocyclic nucleus. These were then added
to growing cultures o^ Escherichia co// under conditions similar to those employed
by NoviCK and Szilard (I, 2) in their studies.
In Table I the data so obtained are presented. Adenine and guanine as
well as the deaminated derivatives are very well incorporated into the nucleic
Table I. Incorporation and Mutagenicity
oj Various Purines
RSA* of DNA
purines
Mutagenicity
Adenine
0.3
+
Guanine
0.20
±
Hypoxanthine
Xanthine
0.30
0.20
—
Theobromine
0.00002
+ + +
Caffeine
0.00001
+ + +
Theophylline
0.00001
+ + +
*RSA = relative specific activity = ratio of the specific activity of the
purine isolated from the bacteria to that of the growth medium.
acids of both the RNA and DNA type, whereas all methylated substances
are incorporated only to a very small extent, if at all. On the other hand,
the correlation of mutagenesis is the reverse.
A mutation is a very rare event, and though these agents, when present
in quite high concentration, may raise the mutation rate by a factor of fifteen
or so, this still only corresponds to one event in 10'^ duplications.
The small amount of radioactivity that is found associated with the DNA
from cells grown in the presence of radioactive mutagens is probably experi-
mental contamination. However, although these experiments are technically
excellent, they cannot begin to exclude the possibility that a methylxanthine
molecule is incorporated into the DNA molecule in the process of the rare
mutational event itself, since the resultant incorporation for one locus would
be many orders of magnitude below the trace amount observed here. Considera-
tion of the structures of these substances, however, makes this possibility
rather unlikely.
In the formation of the normal 9-A^-riboside or 9-A^-deoxyriboside linkages,
the single replaceable hydrogen which may be in either the 7- or 9-position is
replaced by the glycosyl residue. In the case of caffeine or theobromine, which
are 7-methyl derivatives, this is not possible because of the prior replacement
of the hydrogen by the methyl group. Thus even though the methyl group is
138 Arthur L. Koch
attached to the 7-position it prevents bond formation at the 9-position. Con-
sequently, the methyl group must be removed if the molecule is to be incor-
porated into the nucleic acids.
The isotopic data, as well as other information, are adequate to demonstrate
that there is not a single molecule of enzyme present in these bacteria capable
of removing this methyl group (3). Therefore it would appear that certain of the
mutagenic materials are not and cannot be converted into a form in which they
can be linked covalently to cell materials, not at least by the 9-A'^-glycosyl
bond which has been universally found in biological materials.
III. PURINE METABOLISM IN ESCHERICHIA COLI
The next possibility we investigated was that the mutagens act by inter-
fering with nucleic acid biosynthesis. First, however, it is necessary to discuss
the metabolism of the organism under study. Fig. 2 summarizes, from the
Adenine
Hypoxanlhine
and derivatives
Glycine
CO, ''"'C '" ^^r
Serine _- / \ 0"
DR
Glucose
Formate
Ammonia J ) _ "GR - '- G « ' 6D
Guonine
Xanthine
and derivatives
CELL WALL
AD
♦ONA
Fig. 2. The purine metabolism of Escherichia coli
available tracer data, the pathways of purine synthesis in growing cultures
of the test organism (4, 5, 6, 7, 8). C^*-labeled COg (4), glycine (8), and serine
or formate (unpublished) lead to the formation of RNA adenine, DNA adenine,
RNA guanine, and DNA guanine, all of equal specific activity. The activity
in the purines derived from CO2 and glycine is such as to indicate that the
well-accepted scheme for purine biosynthesis is the major pathway in tliis
organism (4). C^Mabeled adenine and hypoxanthine and their derivatives
yield adenine samples of equal, but lower, specific activity in both RNA and
DNA. From these facts it is inferred that there are three pools at which purine
metabolism branches, namely, a 'purine' pool which is common to all cellular
purines, and an 'adenine' and a 'guanine' pool which are precursors of the
corresponding purine in both types of nucleic acid. So far, attempts to find
a precursor which enters purine metabolism at some point beyond the 'adenine'
or 'guanine' pool have failed. Even when the intracellular adenine-C^"* ribo-
nucleotides were specifically labelled (5), the incorporation into the purines
of the ribose nucleic acid was equal to that in the deoxyribose nucleic acids.
It should be mentioned that in organisms under conditions of rapid growth,
the soluble intermediate pool concentrations relevant to this scheme are small
(5). It was impossible to demonstrate guanosine, adenine deoxyriboside,
guanine deoxyriboside or phosphorylated derivatives.
The Mechanism of Action of Methyl Xanthines in Mutagenesis
139
Although the tracer data dehneate the pathways, they do not define the
intermediates. It is, however, possible to conclude from available enzyme data
that 'adenine' and 'guanine' pools are made up at least in part of the free
bases themselves. This follows from the fact that the known enzymes of purine
metabolism which might be involved in the conversion of the hypothetical
'purine' precursor to the two types of nucleic acids catalyze reactions involving
the free purine base. The purine nucleoside hydrolases, purine nucleoside
phosphorylases, purine #-trans-glycosidases, and purine nucleotide pyro-
phosphorylases yield the free purine base. These enzymes and the postulated
pathway of direct reduction of the riboside to the deoxyriboside constitute
the only pathways of interconversion of ribose and deoxyribose purine com-
pounds that can be imagined at present. Since the reductive pathway is known
not to occur in E. coli (9) (although the interesting work from Volkin's labora-
tory may be relevant (10)), it appears quite likely that the free purine base is
involved in the 'adenine' and 'guanine' pools.
In addition to these general considerations, the specific observation of
Lampen and Manson (11) that purine deoxyriboside phosphorylase is inhibited
by adenine led us to investigate the inhibition of phosphorylases of E. coli
by methyl xanthines.
IV. ENZYMATIC INHIBITION STUDIES
The main conclusion from these studies (12, 13) was that the organism
possesses enzymes, particularly nucleoside phosphorylases of both types
(ribose and deoxyribose), that are inhibited by purines generally but specifically
by the mutagenic substances. It was also found that even in the presence of
10 20
CAFFEINE CONC. (mM)
30
5 10 20
CAFFEINE CONC. (mM)
Fig. 3. The inhibition of purine nucleoside phosphorylase.
The effect of caffeine concentration on the arsenolysis of adenine riboside is shown
at the left, and on adenine deoxyriboside on the right. The systems contain
arsenate to prevent the complication of back reaction.
large amounts of inhibitors enzyme action was not completely repressed
(Fig. 3). In all cases this suggested the presence of more than one enzyme
catalyzing the reaction under study. Studies of the effect of pH and the separa-
tion of the bacteria into several chemical fractions supported this notion.
140
Arthur L. Koch
The activity in various fractions was differently affected by caffeine and this
effect was different in acid and at neutrahty and at alkaline reaction (see Table
II). This finding explains the relatively low toxicity and bacteriostatic power
of the plant alkaloids.
Table II. Inhibition of Inosine Arsenolysis by Caffeine
Enzyme preparation* No.
Inhibition produced by
10 [J. moles caffeine per ml
Distribution of activity
(measured at pH 7)
pH5.0
pH 7.0 pH 9.0
6-1 (soluble)
6-2 (particulate)
6-3 (phosphate extract)
per cent
29
64
78
per cent
59
97
78
per cent
35
46
6
per cent
67
17
16
TOTAL
100
* Enzyme Preparation 6-1 was most active at pH 5, preparation 6-2 at pH 9, and preparation
6-3 at pH 7.
In more recent work (13) three new enzymes have been demonstrated in
extracts of this organism: an inosine hydrolase, a purine-pyrimidine trans-
ribosidase, and a purine-purine transribosidase. All are inhibited to some
degree by various purines. The results of the enzymatic studies are summarized
in Table III.
Table III. Enzymes of Nucleic Acid Metabolism
Type
Specificity
Inhibition by
methyl purines
Adenosine deaminase
Ribose
Cytidine deaminase
Deoxyribose
Ribose
Purine phosphorylases
Pyrimidine phosphorylase
Deoxyribose
Ribose
Deoxyribose
Ribose
some
some
Inosine hydrolase
Purine-pyrimidine trans-
Deoxyribose
Ribose
Ribose
+
glycosidase
Purine-purine trans-
Ribose
-f
glycosidase
V. WORKING HYPOTHESIS
The mutagenic agents do inhibit enzymes that appear to be directly linked
to the path of nucleic acid synthesis, but how can such an interference affect the
The Mechanism of Action of Methyl Xanthines in Mutagenesis 141
mutation probability? We have proposed (12) that this may result from a
change in the steady slate concentrations of the intennediates that are to be
assembled together to form the macromolecular DNA. This must happen
without any change in the flow of intermediates, in accord with the experimental
fact that the growth rate of the bacteria is not affected significantly by the
mutagens when present at concentrations that give rise to large changes in
the mutation rate (1).
Let us first consider the consequences of lowering of the concentration of
whatever adenine deoxyriboside or guanine deoxyriboside derivative is
involved in the polymerization reaction leading to macromolecular DNA. The
Watson-Crick model for DNA assumes that the specificity lies in the forma-
tion of two or three hydrogen bonds between specific pairs of nucleotides:
adenine and thymine, and guanine and cytosine. It has been suggested by
Watson and Crick (14) that the mutational event is the entry of a heterocylic
base which is not complementary. This would yield a double helix which
is energetically less stable. Upon subsequent duplication this yields two stable
molecules, one of the parental type and one of a new mutant type.
guanine thymine
Fig. 4.
It is to be recognized that the mutational event is an improbable one, and
therefore quite improbable structures may be involved. Two options for the
unfavorable pairing are available. First, two pyrimidines or two purines may
become situated opposite each other. This gives structures that should be
capable of forming hydrogen bonds, but are either too long or too short.
Alternatively, a purine and a pyrimidine may pair, but the purine may occur
in the uncommon tautomeric form and consequently pairing will occur abnor-
mally. Watson and Crick (14) suggested adenine in the lactim form binding
with cytosine, more probable is the pairing of guanine with thymine (Fig. 4).
This pair has the proper dimensions; there are no steric difliculties. In this
structure guanine is written with the oxygen in the 6- position in an enol form.
X-ray-diffraction workers have concluded that guanine is ordinarily found in the
keto form, but the evidence is not strong that the keto form is even dominant
(15), and considerations of the resonance possibilities indicate a considerable
stabilization of the enol fonn because the latter allows aromaticity of the
heterocyclic ring.
Thus, guanine-thymine pairing might well be of likely occurrence. With
this in mind, we have attempted in our enzyme studies to find differences of
the effects of mutagens on the inhibition of reactions of the adenine compounds,
as opposed to the guanine ones, that would be implied if this structure were to
142
Arthur L. Koch
account for the mutational activity of these methylated purines. So far we have
been unable to detect any such differences. We may have been examining the
wrong systems.
For the present we shall tentatively suggest the pair thymine-cytosine (Fig. 5)
as the culprit. This pair is shorter than the conventional structures. In the
very interesting paper by Donohue (16) a large number of possible pairings
are suggested. For our purposes most of these are unsatisfactory because they
give rise to helices possessing a two-fold axis parallel to the hehcal axis, whereas
thymine cytosine
Fig. 5.
in the Watson-Crick structure this two-fold axis is perpendicular to the
helical axis, and thus consistent hehces formed by substitution between the
two types can not occur. One structure (Donohue's no. 22) would fit into the
symmetry of the Watson-Crick model and it is the pairing suggested in Fig. 5.
VI. STEADY-STATE CONSIDERATIONS
Whatever may be the critical or quantitatively most significant substitution
in this type of mutational change, the hypothesis we have proposed requires
that the concentration of terminal pools be altered. The experimental data
that we have obtained have been primarily with purine ribonucleoside phos-
phorylase which catalyzes a step which is clearly non-terminal in DNA synthesis,
and very likely the reaction catalyzed by purine deoxyriboside phosphorylase
is also not the transformation of the last small-molecular-weight intermediate
into DNA.
Although it may be that the terminal processes are inhibited, let us examine
some possible situations that might lead to an alteration of the steady-state
concentration of the penultimate substance without influencing the steady-state
flux of DNA synthesis. To do this, the question of bacterial growth itself must
be raised. Bacteria grow autocatalytically. Hinshelwood (17) as well as
others have pointed out that this results from an interaction of catalytic units.
Thus, if the amount of one component, P (protein), controls the rate of synthesis
of another component, N (nucleic acid), then
dP
dt
dN
~di
k,P
(1)
The Mechanism of Action of Methyl Xanthines in Mutagenesis 143
where k^ and k^ are characteristic constants. The steady-state solution of this
pair of equations is
P = p (,\Vki^k^l \
(2)
where P^) and A^,, depend on the initial conditions and the constants Aj and ko.
Thus both P and A^ increase exponentially at the same rate and each therefore
appears to be 'autocatalytic'.
Clearly, processes of this kind are responsible for the maintenance of constant
growth rates and constant composition of cells during the exponential growth
of bacteria. However, the control of the system by this type of interaction
cannot explain the regulation of synthesis of intermediates for the biosynthesis
of either P or A'^. Additional regulatory processes must be considered. From
equation (2) it is evident that for any constituent of the cell (intermediate or
enzymatic catalyst) the steady concentration increases autocatalytically. If
expressed as amount per unit number of bacteria or per unit bacterial mass,
any cell constituent may be considered constant. Thus, if such a transformation
is made, we can consider a system with time-invariant concentrations of inter-
mediates and catalysts and also time-invariant fluxes. Thus, the steady-state
treatment of reaction rates is immediately applicable to our problem. The most
general formulation is that of Christiansen and has been well described by
Hearon (18, 19).
In essence the rate expression for each step of a concatenated reaction
scheme, in which a substance is produced in one step and utilized in the next,
is written down. Each of the terms in these expressions is the product of the
intermediate with a rate constant and also with either unity or with the concentra-
tion(s) of the other chemical reactant(s). If the product of the two latter factors is
set equal to a quantity W, bearing suitable subscripts to identify the term, and
if the usual steady-state assumptions are made, then the solutions for both the
flux of the system or the over-all reaction rate v and the concentration of each
intermediate [A'J may be computed. If the very last reaction is irreversible,
equations (3) and (4) are obtained.
W.W^W^"- W,
[X,] = V
(3)
Wi-,1 '"W^
(4)
The assumption of the irreversibility of the last step is made necessary by
the well-known metabolic stability of DNA. Recent experiments (20) demon-
strate the extreme irreversibility in the normal adult rat. The evidence
for growing cultures of E. coU is less stringent (21, 22) but does permit this
assumption in comparison with the tremendous synthetic rate in these
organisms.
144 Arthur L. Koch
Now if in addition we assume that some step is either rapid in the direction
of synthesis or irreversible, then it may easily be seen that the reaction velocity
V, is completely independent of subsequent steps. Thus, the synthetic rate can
be made to depend on the level of a few catalysts or other reactants involved
earlier in the sequence. Consequently, increased protein synthesis would cause
increased synthesis of a very few enzymes critical for nucleic acid biosynthesis,
and this would lead smoothly to increased DNA synthesis without requiring
exact synchronization in the increase of each enzyme on the biosynthetic path-
way. The concentration of the last intermediate i'j_i can be seen from equation
(4) to be vlW^, and thus is completely independent of any step that has no effect
on the reaction velocity, v.
This case does not therefore satisfy the requirements suggested above to
explain the mutagenic effects of the plant alkaloids. The independence of
growth rate in the presence of caffeine could be explained simply by assuming
that the inhibition occurs after some fast or irreversible reaction; but the
action of the inhibitor on any but the final step has no effect on the concentration
of the immediate precursor of the macromolecule, and thus cannot affect the
probability of mutation.
The scheme considered above has two desirable features: it permits a
reciprocal control of nucleic acid by the level of protein synthesis, and it prevents
the accumulation of large amounts of intermediates. Let us now turn to a
possible mechanism that will do these two things but also will fulfill the conditions
imposed by our ideas of the mutation event. Such a mechanism occurs in
systems showing product inhibition. Here the rate of production of the final
product will depend on the level of some enzyme catalyzing a step late in the
reaction sequence, but at the same time, the inhibition prevents the unlimited
synthesis of earlier intermediates.
Product inhibition is of common occurrence. It has been suggested as having
metabolic significance in two cases (23, 24) in which the product of a reaction
sequence inhibits some earlier reaction than its own formation. In the present
case it has been shown that adenine deoxyriboside is an inhibitor of the phos-
phorylase (12) as well as purine bases. Let us assume that all of these agents
are competitive inhibitors of enzyme action, although this remains to be demon-
strated conclusively.
Under such conditions the reaction velocity is given by the well-known
expression for competitive inhibition (see, for example, (25))
K, K, + Ul) + US)
where Kis the maximal velocity obtainable, K^ is the Michaelis-Menten constant
for the substrate S, and Ki is the constant for the binding of the enzyme with
the inhibitor, 1. If Kj{I) is the dominant term in the denominator, this expression
simplifies to give:
K.V(S) ...
In the present case, adenine deoxyriboside is the inhibitor which is formed
from the substrate adenine and deoxyribose-l-POj. Now, if the net rate of
The Mechanism of Action of Methyl Xanthines in Mutagenesis 145
removal of adenine deoxyriboside is to be maintained constant and determined
solely by the process of removal, then a steady-state will quickly ensue in which
(S) oc (/), and in which the rate of formation of / is dependent only on the rate
of utilization. The concentration of / will become adjusted to estabHsh such a
condition.
In the presence of the mutagen, the total inhibitor is effectively derived from
three sources; deoxyribosides, free normal bases, and the mutagen. While
maintaining constant synthesis of DNA, the effect of the mutagen will then be
to decrease the level of the normal reaction product, adenine. Similar relations
will hold for guanine deoxyriboside.
It should be noted that in this case, although not in the case considered
above, any number of intermediates may occur between the step under considera-
tion and the polymerization step, if these reactions are rapidly reversible. Then
a change in adenine deoxyribose concentration will lead to a proportional
change in the precursor immediately used for the formation of the macro-
molecule.
This model can then utilize the enzymatic finding, and the biological facts.
There is, however, one additional fact that should be introduced, viz. certain
specific substances, the purine ribosides (26), are anti-mutagens. That is, these
substances will prevent the action of caffeine and related compounds in causing
mutations. Moreover, they will decrease the so-called 'spontaneous mutation'
rate.
This can be tentatively explained on the basis that these substances are
substrates or immediate precursors of the substrates of the key step, and that
their increase simply affects the system so as to cause an increase in the concen-
tration of purine deoxyribosides and thus a decrease in the mutation rate.
VII. ALTERNATIVE HYPOTHESES
In concluding, I should like to list various hypotheses that one should consider
in this type of chemical mutagenesis. They will be considered in order of the
intimacy of the mutagen with the duplication process.
1. The mutagen is incorporated into the nucleic acid. This is tentatively
rejected as indicated above, from the tracer evidence, and the argument that
methylation in the imidazole ring prevents A^-glycoside formation. It should
be noted that production of a self-duplicating 'methylated gene' can be rejected
because the mutants cannot metabolize methyl purines and certainly do not
require them (3).
2. The mutagens inhibit enzymes of nucleic acid biosynthesis, and this causes a
change in the concentration of intermediates. This latter effect changes the
probability of mutation. This is the hypothesis we favor, but it is clear that a
great deal of work will be required to establish it or some variant thereof. It is
also clear from what has been said above that special circumstances must occur
in order that the proposed mechanism can work.
3. The mutagen causes some change in the general metabolism of the organism
and this leads to a change in the mutation probability. It is certainly true that
the mutation probability is dependent on a great many factors. Kihlman (27,
146 Arthur L. Koch
28), working with plants, has suggested such a mechanism to explain chromo-
some breakage induced with caffeine derivatives. He proposes that ATP is
necessary for the aberrations produced by the compound 8-ethoxy caffeine.
However, there appear to be considerable differences between the two systems;
with the bacteria one thinks the process involved is one of 'point mutation', but
certain clearcut differences are evident in the two types of material with regard to
the interaction ofoxygen tension and ionizing radiations. (Compare (2) and (27),
4. The mutagen causes the organism to 'adapt' to its presence, and thus causes
widespread alterations in the amount of enzymes and intermediates. This could
lead to a change in mutation rate. This may be in fact the explanation of the
effect of adenine (12). This substance inhibits the growth of bacteria which
have previously been grown in its absence. Growth resumes when the organism
has 'adaptively' produced an 'adenine deaminase' activity which is not de-
monstrable in bacteria grown in its absence. This shift in metabolism can then
be envisioned to lead to changes in the mutation rate.
This list is probably sufficiently inclusive to include the right answer if there
is only one, but at least the necessary research, both with test tubes and with
pencil and paper, to test these possibilities is feasible.
REFERENCES
1. A. NoviCK and L. Szilard: Experiments on spontaneous and chemically induced
mutations of bacteria growing in the chemostat. Cold Spr. Harb. Symp. Quant. Biol.
16, 337-343 (1951).
2. A. Novick: Mutagens and anti-mutagens. Brookhaven Symp. Biol. No. 8, 201-214
(1956).
3. A. L. Koch: The metabolism of methyl purines by Escherichia coli. I. Tracer studies.
J. Biol. Chem. 219, 181-188 (1956).
4. A. L. Koch, F. W. Putnam, and E. A. Evans Jr.: The purine metabolism of Escherichia
coli. J. Biol. Chem. 197, 105-112 (1952).
5. A. L. Koch: Biochemical studies of virus reproduction. XI. Acid soluble purine meta-
bolism. /. Biol. Chem. 203, 227-37 (1953).
6. M. E. Balis, C. T. Lark, and D. Luzzati: Nucleotide utilization by Escherichia coli.
J. Biol. Chem. 212, 641-645 (1955).
7. E. Bolton: Biosynthesis of nucleic acid in E. coli. Proc. Nat. Acad. Sci., Wash. 40,
764^772 (1954).
8. A. L. Koch: The kinetics of glycine incorporation by Escherichia coli. J. Biol. Chem.
217, 931-945 (1955).
9. I. A. Rose and B. S. Schweigert: Incorporation of C^* totally labeled nucleosides
into nucleic acids. /. Biol. Chem. 202, 635-644 (1953).
10. E. Volkin and L. Astrachan: Phosphorus incorporation in Escherichia coli ribonucleic
acid after infection with bacteriophage T2. Virology 2, 149-161 (1956).
11. J. O. Lampen: Symposium on Piwsphorus Metabolism, vol. II, ed. by W. D. McElroy and
B. Glass, Johns Hopkins Press, Baltimore, 363-380 (1952).
12. A. L. Koch and W. A. Lamont: The metabohsm of methyl purines by Escherichia
coli. II. Enzymatic studies. /. Biol. Cliem. 219, 189-201 (1956).
13. A. L. Koch: Some enzymes of nucleoside metabolism oi Escherichia coli. J. Biol. Cfiem.
223, 535-549 (1956).
14. J. D. Watson and F. H. C. Crick: The structure of DNA. Cold Spr. Harb. Symp.
Quant. Biol. 18, 123-131 (1953).
The Mechanism of Action of Methyl Xanthines in Mutagenesis 147
15. D.O.Jordan: The physical properties of nucleic acids. In: The Nucleic Acids, Cii. hy E.
Chargaff and J. N. Davidson. Vol. I, 447-492, Academic Press, New York (1955).
16. J. Donohue: Hydrogen-bonded helical configurations of polynucleotides. Proc. Nat.
Acad. Sci., Wash., 42, 60-65 (1956).
17. C. N. HiNSHELWOOD : The Chemical Kinetics of the Bacterial Cell, Clarendon Press,
Oxford (1946).
18. J. Z. Hearon: The steady-state kinetics of some biological systems. I. Bull. Math.
Biophys. 11, 29-50 (1949).
19. J. Z. Hearon: Rate behavior of metabolic systems. Physiol. Rev. 32, 499-523 (1952).
20. R. W. SwiCK, A. L. Koch, and D. T. Handa: The measurement of nucleic acid turnover
in rat liver. Arch. Biochem. Biophys. 63, 226-242 (1956).
21. A. D. Hershey: Conservation of nucleic acids during bacterial growth. J. Gen. Physiol.
38, 145-148 (1954).
22. A. L. Koch and H. R. Levy: Protein turnover in growing cultures of Escherichia coli.
J. Biol. Chem. Ill, 947-957 (1955).
23. R. A. Yates and A. B. Pardee: Control of pyrimidine biosynthesis in Escherichia coli
by a feedback mechanism. /. Biol. Chem. Ill, 757-770 (1956).
24. H. F. Umbarger: Evidence for a negative-feedback mechanism in the biosynthesis of
isoleucine. Science 123, 848 (1956).
25. P. W. Wilson: Kinetics and mechanism of enzyme reactions, in Respiratory Enzymes,
ed. by H. A. Lardy, 22-67, Burgess Publishing Co., Minn. (1949).
26. A. NoviCK and L. Szilard: Anti-mutagens. Nature, Lond. 170, 926-927 (1952).
27. B. Kihlman: Chromosome breakage in Allium by 8-ethoxy-caffeine and X-rays. Exp.
Cell Res. 8, 345-368 (1955).
28. B. A. Kihlman: Oxygen and the production of chromosome aberrations by chemicals
and X-rays. Hereditas 41, 384-404 (1955).
EVIDENCE FOR A NEGATIVE FEEDBACK SYSTEM
CONTROLLING LIVER REGENERATION
Andre D. Glinos
Growth Physiology Laboratory,
Walter Reed Army Institute of Research, Washington, D.C.
Abstract — Cell division was induced in the resting liver of the rat by lowering the concentration
of serum constituents through plasmapheresis, and was inhibited in the regenerating liver by
increasing the concentration of the serum by fluid intake restriction.
Electrophoretic analysis of serum proteins and histochemical investigation of the organiza-
tion of cytoplasmic ribonucleoprotein of the liver cells during regeneration suggest that
plasma proteins may participate as information-carrying agents in a negative feedback system
controlling the growth of liver cells.
Liver is an excellent tissue for investigating mechanisms of growth control
because it regenerates very rapidly. In the rat, removal of up to two-thirds of
the total mass of the liver is followed by active cell division leading to complete
restoration of the organ within two weeks.
As early as 1923 Akamatsu (1) reported that tissue cultures of rabbit hver
grew better in plasma from partially hepatectomized animals than in normal
control plasma, and more recently it was shown that cell division can be induced
in the resting liver of a parabiotic rat by a partial hepatectomy performed on
its partner (2, 3, 4). These findings were considered to indicate the presence
or the increase of growth-stimulating factors in the plasma of partially hepatec-
tomized animals.
In our own studies on the possible participation of the humoral system of
communication in the control of this growth, blood serum from animals
undergoing liver regeneration was assayed in tissue culture (5). These cultures
showed a comparable outgrowth in a high concentration of serum of partially
hepatectomized rats and in a low concentration of normal serum. A high
concentration of normal serum showed inhibitory effects. Based on these
findings a hypothesis was formulated with regard to the induction of the
regenerative process in the liver which follows partial hepatectomy.
According to this hypothesis, certain constituents of normal blood serum
exert a growth-inhibitory action at their normal concentration. Partial hepatec-
tomy would be expected to result in a decrease of the serum concentration of
these constituents. Thus in turn regenerative growth is initiated. During
regeneration, as the number of liver cells increases, the concentration of these
constituents will also increase. When the original equilibrium between a given
number of liver cells and a given concentration of the serum constituents is
restored, further growth is expected to cease. The evidence for a negative
feedback system of this type should satisfy the following two conditions:
1. Induction of growth in the resting tissue by plasma dilution.
2. Inhibition of growth in the regenerating tissue by plasma concentration.
148
Evidence for a Negative Feedback System Controlling Liver Regeneration 149
Figure 1 illustrates the application of the classical method for plasma
dilution, plasmapheresis, and the results obtained. Normal adult male rats
were used. Blood was withdrawn every twelve hours corresponding to 31 to
38 per cent of the initial total blood volume of the animals in the first group
RATE
TIME IN HOURS
36-48
60
72-96
MEAN
0%
<.002
.003
.002
<.002
.004
.002
.002
31-38%
<.002
.022
.002
.048
.009
.002
.014
39-46%
.168
.002
.019
.054
.441
.197
.147
MEAN
.048
.031
.163
Fig. 1. Induction of cell division in the resting liver by plasmapheresis.
A total of eighteen adult male rats was used.
The rate of plasmapheresis is expressed as the percentage of the initial total blood
volume of the animal replaced by saline per 12 hours. In the control group
rate refers to the fact that blood was merely withdrawn and re-injected, with the
animals submitted to the same stressful conditions of restriction, anesthesia,
venipuncture etc. as the experimental groups. The mitotic activity was obtained
by counting 50,000 cells, and expressed as the per cent mitotic index. When no
mitosis was found, the mitotic index was recorded as <0.002.
and 39 to 46 per cent in the second. The bleedings were followed by re-injections
of the blood cells suspended in an equal volume of saline. Under such conditions
cell division was induced in the resting liver of adult rats and was intensified
with increasing dilution of the plasma. In this experiment, then, the evidence
obtained satisfies the first condition for a negative feedback system.
With respect to the second condition, the method used to achieve plasma
concentration was restriction of fluid intake as illustrated in Fig. 2. Two experi-
mental groups were used, differing with regard to the weight of the animals
and the extent of the partial hepatectomy. All animals were partially hepatec-
tomized and tube-fed an identical isocaloric fluid diet containing 3 per cent
water. The controls were given drinking water ad libitum but the experimental
animals were deprived of water for the duration of the experiment, which was
sixty-four hours, starting sixteen hours prior to the operation and continuing
for forty-eight hours postoperatively, at which time the animals were sacrificed.
A measure of total body-water loss obtained by this regimen is given by the
difference in weight change between experimental and control animals in each
group. A measure of the plasma concentration achieved is given by the difference
in total protein change. In both the experimental groups an effective inhibition
of cell division in the liver was obtained; this inhibition became greater with
increasing concentration of the serum. On the other hand mitosis in the
intestinal epithelium was not affected. The evidence obtained in this experiment,
then, satisfies the second condition for a negative feedback system.
The smaller extent of total body-water loss and plasma concentration in
the first group can be ascribed to the greater initial weight of the animals in
11
150
Andre D. Glinos
this group. It is well known that when dehydration proceeds slowly the main-
tenance of plasma volume at the expense of extravascular fluid may be quite
successful. This is significant since the extravascular fluid of the liver must parti-
cipate in the transmission of information to the liver cells. The serum albumin
fraction in this experiment was found to be low when liver cell division was
present and nonnal or slightly increased when liver cell division was absent.
In the framework of the present discussion this feature is somewhat suggestive
DEGREE
OF
HEPATECTOMY
TREATMENT
NO. OF RATS
ANO WEIGHT
WEIGHT
% CHANGE
SERUM PROTEIN
% CHANGE
SERUM ALBUMIN
% CHANGE
MITOSES
/50000 CELLS
30 %
(MEDIAN LOBE)
CONTROLS
iO
457
- 7.9
- 4.7
- 16.0
81.4
FLUID
RESTRICTION
9
458
- 12.9
+ 5.0
- 14.0
18.0
10 %
( CAUDATE
LOBE)
CONTROLS
8
331
- 9.1
+ 1.8
- 26.2
16.5
FLUID
RESTRICTION
8
313
- 17.6
+ 28.9
+ 8.1
0.1
Fig. 2. Inhibition of cell division in the regenerating liver by fluid restriction.
The experimental variables are defined in the text.
The percentage changes refer to differences in weight, total serum protein, and
serum albumin between the values obtained before treatment and those obtained
at sacrifice.
when it is considered that albumin is synthesized in the liver. In view of these
facts it was thought that an investigation of changes in protein metabolism of
the liver cells, early after partial hepatectomy, could help in elucidating the
possible role of the serum proteins and the extravascular fluid of the liver in
the transmission of information to the liver cells.
In this, we took advantage of the many observations showing histochemically
detectable changes in the organization of cytoplasmic ribonucleoprotein with
increasing demands on the protein synthetic mechanism of the cells (6, 7).
Briefly stated, these changes consist of the disappearance from the cytoplasm
of discrete basophilic bodies which are associated with ribonucleoprotein; the
cytoplasm then stains unifonnly with basic dyes. Rats were sacrificed at frequent
intervals after partial hepatectomy and their livers fixed and stained with
gallocyanin chrome alum. Within thirty minutes after partial hepatectomy the
ribonucleoprotein-associated basophilic bodies started disappearing from the
cells in the periportal area. This change proceeded gradually toward the center,
so that eight hours after the operation all cells, even those in the centrolobular
area, were affected. After this time reconstruction of the basophilic bodies
proceeded in the opposite direction from the center of the lobule towards the
periphery. At 24 hours cells in the centrolobular area had completed the cycle
Fig. 3. Regenerating liver twenty-four hours after partial hepatectomy.
Central vein at lower left corner. Adjacent centrolobular zone with cells con-
taining ribonucleoprotein-associated basophilic bodies in their cytoplasm. Middle
and periportal zones with mostly altered cells having a uniformly basophilic
cytoplasm. Two mitotic figures in the middle zone among altered cells.
to face p. 150
Evidence for a Negative Feedback System Controlling Liver Regeneration 151
showing well organized basophilic bodies, whereas cells in the middle and the
periphery of the lobules remained altered (Fig. 3).
Confirming the earlier data of Harkness (8), we found that cell division
begins between 16 and 24 hours postoperatively in the periportal area. This
is significant because cells in this area remained altered for the longest time.
The changes in cytoplasmic ribonucleoprotein organization indicate an activa-
tion of the protein synthesizing mechanism of the liver cells after partial hepa-
tectomy, proceeding in a topographical pattern related to the direction of the
intralobular blood flow. According to the Law of Mass Action these changes
would be expected to appear with decreased protein concentration in the
immediate environment of these protein-secreting cells. The cells in the periphery
of the lobules would be expected to react faster and longer since the ones more
centrally located are in an environment richer in protein produced by the more
peripheral cells. This interpretation was, in part, verified experimentally by
TREATMENT
FLUID
SERUM PROTEIN
CHANGE
LIVER
RIBONUCLEOPROTEIN
CHANGE
ADDITION
SALINE
- 11.8
DEXTRAN
- 31.2
+
SERUM
+ 7.9
REPLACEMENT
SALINE
- 19.2
+
DEXTRAN
- 37.8
+
SERUM
- 8.7
Fig. 4. Induction of cytoplasmic ribonucleoprotein
changes in the liver by plasma dilution.
A total of six male adult rats was used.
Addition refers to a single intravenous injection of 5.5 ml of fluid. Replacement
refers to a 5.5 ml single plasmapheresis treatment. All animals were sacrificed
two hours after treatment.
Serum protein change refers to the percentage difference between the values
obtained before treatment and those obtained at sacrifice.
Liver ribonucleoprotein change refers to the disappearance of the basophilic
bodies from the cytoplasm of the cells in the periportal area.
showing that changes in the cytoplasmic ribonucleoprotein of the cells in the
periportal area appear rapidly after a sudden decrease of the serum protein
concentration (Fig. 4). After partial hepatectomy, however, these histochemical
changes occur as we have seen within thirty minutes before any appreciable
changes in the plasma proteins.
The relationships between increased pressure in the portal system following
1 52 Andr£ D. Glinos
partial hepatectomy and regeneration have been demonstrated by Grindlay
and BoLLMAN (9). It is conceivable that, under conditions of increased pressure
immediately following partial hepatectomy, the transfer of protein and water
from the intravascular to the extravascular space is altered and results in a
rapid lowering of the protein concentration of the interstitial fluid of the liver.
This leads within a short period to increased protein production in the liver
cells and sometime later to cell division.
REFERENCES
1. N. Akamatsu: Ober Gewebskulturen von Lebergewebe. Virc/i. Arch. 240, 308-311
(1923).
2. B. G. Christensen and E. Jacobsen: Studies on liver regeneration. Acta Med. Scand.
234, Suppl, 103-108 (1949).
3. N. L. R. Bucher, J. F. Scott, and J. C. Aub: Regeneration of the liver in parabiotic
rats. Cancer Res. 11, 457-465 (1951).
4. A. S. Wenneker and N. Susman: Regeneration of liver tissue following partial hepa-
tectomy in parabiotic rats. Proc. Soc. Exp. Biol. Med. 76, 683-686 (1951).
5. A. D. Glinos and G. O. Gey: Humoral factors involved in the induction of liver regenera-
tion in the rat. Proc. Soc. Exp. Biol. Med. 80, 421^25 (1952).
6. S. Lagerstedt: Cytological studies on the protein metabolism of the liver in the rat.
Acta Anat., VII, suppl. 9, 1-116 (1949).
7. A. F. HowATSON and A. W. Ham: Electron microscope study of sections of two rat
liver tumors. Cancer Res. 15, 62-69 (1955).
8. R. D. Harkness: The spatial distribution of dividing cells in the liver of the rat after
partial hepatectomy. J. Physiol. 116, 373-379 (1952).
9. J. H. Grindlay and J. L. Bollman: Regeneration of the liver in the dog after partial
hepatectomy. Role of the venous circulation. Surg. Gynec. Obst. 94, A9\-A96 {\952).
FLUCTUATIONS IN NEURAL THRESHOLDS*
Lawrence S. Frishkopf and Walter A. Rosenblithj
Research Laboratory of Electronics,
Massachusetts Institute of Technology, Cambridge, Massachusetts
Abstract — Over the past twenty-five years several independent investigations of the responsivity
of nerve tissue have led to the conclusion that the threshold of a resting neuron fluctuates
in time. The conclusion is based on the study of sensory and motor fibers, of monosynaptic
arcs and neuromuscular junctions. A number of these studies have been reviewed and com-
pared. The degree of threshold correlation among neurons of a given 'pool' or population
has been considered for several systems. A number of possible sources of threshold fluctuation,
giving rise to correlated and uncorrected threshold variations, have been distinguished.
A mathematical model based on the concept of fluctuating thresholds has been described
and applied to the problem of ensemble response from the peripheral auditory nervous system.
The results of three experiments have been described and compared with the predictions of
the model.
I. THE CONCEPT OF A FLUCTUATING THRESHOLD
The threshold of a nerve fiber is defined as the minimum stimulus intensity
that will cause an action potential to propagate. If the threshold of a nerve
fiber were a fixed parameter — not changing in time — its value could be deter-
mined by presenting stimuli of increasing intensity. The fiber would fail to
respond to all stimuli less than some value Srp, and would respond to all stimuli
greater than Srp\ Sj, would then be the threshold of the fiber. However,
careful experiments on a number of specific neural systems — sensory and motor,
peripheral and central — have shown that such a unique value Sj, does not
exist; instead, there is a range of stimulus values, 5^ to ^'2, such that a stimulus
S lying within that range, when repeatedly presented at a rate well below that
which would involve the refractory period of the fiber, sometimes evokes and
sometimes fails to evoke a response. We find that the fiber responds in a fraction
p of all trials and that p{S) is a monotonic function that rises from zero to one
as the stimulus increases from S-^ to 5^2. Stimuli less than S^ never evoke a
response; stimuli greater than So always evoke a response. We conclude that
the threshold of a neuron which exhibits this behavior is a time-varying para-
meter. The value p approximates the fraction of the time that the threshold is
somewhere below the stimulus value S. An equivalent statement is that p
approximates (and for large sample size, approaches) the probability of finding
the threshold of a fiber below the value 5".
* This work was supported in part by the U.S. Army (Signal Corps), the U.S. Air Force
(Ofiice of Scientific Research, Air Research and Development Command) and the U.S. Navy
(Office of Naval Research).
t Also in the Department of Electrical Engineering, M.LT.
153
154 Lawrence S. Frishkopf and Walter A. Rosenblith
II. SUMMARY OF STUDIES OF OTHER WORKERS
The class of phenomena that we have been discussing was first observed by
Blair and Erlanger (1). They reported that an electric stimulus, repeatedly
presented to a single sciatic nerve fiber of the frog, will for most stimulus values
either always produce or always fail to produce a response. The transition
between these two situations, however, is not sharp. Upon raising the shock
intensity, a value is reached at which the fiber sometimes responds and some-
times fails to respond to repeated stimulation. In order to obtain a response
every time it is necessary to raise the shock intensity an additional two per
cent, far in excess of the uncontrollable variation in the stimulus. Moreover,
Blair and Erlanger were able, on occasion, to record simultaneously from
two fibers whose potentials could be distinguished by their difference in latencies.
On repeated testing with a near-threshold stimulus, sometimes both would
respond, sometimes one, sometimes the other, and sometimes neither. Such a
result cannot be accounted for on the basis of stimulus instability alone.
The most complete study of this kind that has been published to date was
made by Charles Pecher (2) in 1939. Using a technique similar to that of
A- A-
J^ .^
^ J\^
Fig. 1 . Left : ink tracings of recordings from single units of frog sciatic nerve,
showing occurrence and failure of response to repeated presentations of identical
shock stimuli. Right: same, with amplitude of pulse producing the shock raised
4 per cent. Each series shown is part of a longer sequence of 100 presentations.
Thirty-five responses were obtained with the weaker stimulus (left); 85 responses
were obtained with the stronger stimulus (right). After Pecher (2).
Blair and Erlanger, he also found a stimulus range in which a fiber sometimes
responded and sometimes failed to respond to a constant stimulus. Some of
his data appear in Fig. 1. In the column on the left we see the responses to
successive identical stimuh, of which some produce a response and some fail
to do so. In the second column the intensity was raised four per cent. In
Fluctuations in Neural Thresholds
155
Fig. 2 the percentage of responses of a fiber is plotted as a function of stimulus
intensity. Again each point is based on 100 stimulus presentations. The total
range of thre'shold variation is, on the basis of these data, about seven per cent.
The function shown in Fig. 2 approximates the threshold probability function
p(S) that was discussed earlier.
100
7b
50
25
O
'/]':
-.y^:
97 98 99 X)0 101 102 103
Fig. 2. Relation between stimulus intensity (abscissa) and the number of respon-
ses obtained in 100 presentations at a fixed intensity from a single unit of frog
sciatic nerve (see Fig. 1). The interpolated solid line approximates the threshold
probability function of a unit. From Pecher (2).
Fig. 3. Left: ink tracings of simultaneous recordings from two units of frog
sciatic nerve to repeated presentations of identical shock stimuli. Units A and B
are identified by their latencies. Right: same, but recording from two other
units, identified by their amplitudes. After Pecher (2).
In the left column of Fig. 3 the responses of two different fibers were simul-
taneously recorded from a single electrode; the responses arc distinguishable
by their latencies. At a fixed level of stimulation all possible combinations of
response occur: fiber A responds alone, fiber B responds alone, both respond,
neither responds. On the right we see the responses from two other fibers;
here the responses are distinguished by their amplitudes. Again, all possible
156
Lawrence S. Fpushkopf and Walter A. Rosenblith
combinations occur. Such a result can only be explained as a result of spon-
taneous variation in fiber threshold. If threshold were fixed and the stimulus
unstable, then only three of the four combinations could occur. That combina-
tion would be excluded in which the fiber with higher threshold fires alone.
When responses from two fibers can be distinguished, an opportunity is
offered to test the degree of correlation of threshold fluctuation among different
fibers. If fluctuations occur independently in two fibers, the probability of both
firing to a single stimulus would be the product of their probabilities of firing
separately. Any correlation in threshold variations would alter the probability
of joint firing. These probabilities can be approximated by counting the number
of times that fiber A fires, that fiber B fires, and that both fire, and dividing
each by A'^. In the table below the results of such measurements by Pecher
Table I
Calculated
Number of
stimuli
Number of
responses of
fiber A
Number of
responses of
fiber B
number of
simultaneous
responses
(independence
assumed)
Observed
number of
simultaneous
responses
100
78
25
19.5
19
188
129
26
17.8
18
285
205
33
23.7
18
222
150
79
53.4
56
370
214
93
53.8
50
194
113
34
19.8
19
155
110
62
44.0
40
218
168
87
67.0
59
236
152
24
15.5
17
are given for nine different fiber-pairs. In all of these instances the computed
and observed frequencies of joint occurrence are in good agreement. The
hypothesis of independent fluctuations is thus supported by this experiment.
Pecher tried to determine whether or not for a single fiber the 'response
no-response' pattern to a sequence of periodic stimuli can be accounted for
by the hypothesis that successive responses occur with equal and independent
probability p. He chose a criterion of independence that relates the variables
r and n^, where n^ is the number of times that a sequence of r successive responses
(bounded at each end by the absence of a response) occurs in a sample of
length A^(r
u
z
UJ
D
o
hi
cc
20 40 60 80
FIRING INDICES
100
Fig. 5. Histogram showing the number of spinal motoneurons (triceps surae)
within each firing index interval ; responses were obtained by delivering repeated
shocks to the gastrocnemius nerve. The firing index of a unit is the percentage of
total stimulus presentations to which the unit responds. Units with firing indices
of zero and 100 are not included in this diagram. From Lloyd and McIntyre (5).
same amount; thus some units with a firing index of zero will be shifted into
the intermediate range ; some with intennediate firing indices will be shifted into
the range of firing index 100. But because the units are uniformly distributed
the same number will move into the intermediate range as move out of it,
and the distribution of intermediate firing indices will remain unchanged.
Fig. 6. Idealized relation between the threshold probability distribution of a
motoneuron and the levels of synaptic drive to diff"erent motoneurons of a
population (see text).
The particular choice of a bell-shaped probability distribution will lead
to the U-shaped histogram of Fig. 5. For it is clear that if we divide the abscissa
in such a way that equal areas under the distribution are subtended, those
intervals will be largest near the tails of the distribution (firing indices near
and 100) and smallest at the center of the distribution (firing index near 50).
Since the density of units along the abscissa is uniform, this means that many
more motoneurons will have firing indices between and 10 than between
45 and 55.
Fluctuations in Neural Thresholds
159
As in the study by Pi chlr, the degree of correlation of thresJiold variation
for members of the same pool of motoneurons was investigated. The extent
of correlated and uncorrelated fluctuations is a measure of the relative impor-
tance in producing fluctuations of events extrinsic and intrinsic to the fiber.
In the spinal cord there is reason to believe that threshold fluctuation is, at
least in part, the eff'ect of background activity in other fibers. Such activity
would presumably aff'ect many fibers in a neighborhood; the threshold fluctua-
tions of these fibers would therefore show definite correlations.
To determine the extent of correlated variation Rall and Hunt (6) recorded
the response of a ventral root together with the response of a single moto-
neuron belonging to an adjacent root; an example of such a recording is
shown in Fig. 7. Fig. 8 shows the results of an experiment based on a thousand
-n
nHHi
« 1
Fig. 7. Simultaneous recording of the responses of a single motoneuron (hori-
zontal deflection) and of an adjacent ventral root (vertical deflection)
upon repeated stimulation of the gastrocnemius nerve with identical shock stimuli.
From Rall and Hunt (6).
such responses. The population response amplitudes were divided into class
intervals, and the number of responses within each class interval was plotted.
For each population response within a class interval, the occurrence or failure
of a unit response was noted and the number of unit responses plotted
(shaded area). The unit responded a total of 697 times out of 1000. If
the population response and the unit response were not correlated, the firing
index of the unit would be about the same in each class interval. This is clearly
not the case. Instead, firing occurs infrequently when the population response
is small, and more often as the population response grows. The probability
of unit firing when the population response amplitude is in a given class interval
— that is, the ratio of shaded to unshaded amplitude — is plotted in the lower
part of the figure. If unit response and population amplitudes were uncorrelated
this function would be a horizontal line at about 0.7. However, it is also clear
that correlation of unit and population response is not complete. In other
words, the thresholds of the units within the population vary with respect
to one another, in addition to their collective (that is, correlated) fluctuation.
If this were not so, a particular unit would respond only after all units of
lower threshold had responded; therefore its probability of response would
be zero if the population response were smaller than a certain value, and would
be one if the population response were larger than that value. The lower curve
would therefore be a step function.
III. POSSIBLE SOURCES OF THRESHOLD VARIATIONS
Fatt and Katz (7) have found that at motor endplates miniature end-plate
potentials occur more or less randomly even though no stimulus is present.
160
Lawrence S. Frishkopf and Walter A. Rosenblith
They regard these potentials as being the result of spontaneous firings in the
fine terminal branches of a motor nerve. The occurrence of an impulse in the
nerve causes simultaneous firing in about a hundred such teitninals, giving
rise to the normal end-plate potential. Spontaneous firing implies the existence
of a local source of varying excitation. Fatt and Katz compute that for
fibers with a diameter of 0.1 /,( thermal fluctuations in ionic concentrations
>-
u
z
u
o
UJ
cr
1«U
1
1
ifin
140
-
r^'l
I2U
100
-
— j
80
-
f —
60
40
_
^
20
r-H
-'■> 1 .t, i. -i I
n^
{/
^T
/
/
O " 2 4 6 8 10 12 14 16 18 20 22 24 26 28
POPULATION RESPONSE AMPLITUDE
1.0
0,9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
O
/
Fig. 8. Top: the upper curve is a histogram of population response amplitudes
obtained as in Fig. 7 from triceps surae motoneurons by delivering repeated
identical shock stimuli to the gastrocnemius nerve. The lower curve (shaded) was
obtained from single-unit recordings like those shown in Fig. 7 ; the number of
single-unit responses associated with population responses in each amplitude
interval is plotted. Bottom : for a given population amplitude interval the number
of single unit responses is divided by the total number of trials in that interval,
and the ratio plotted as a function of population amplitude. The interpolated
solid curve is a sigmoid fit to the data points and approximates the probability of
unit response as a function of population amplitude. Note that when the popula-
tion amplitude is large, the probability of unit response is large, and when the
population response is small, the single unit probability is small, thus signifying a
high degree of correlation among the thresholds of different units of the popula-
tion. From Rall and Hunt (6).
could cause variations of resting potential of 1 mV to 2 mV. Though probably
insufficient to produce excitation, such a variation would cause threshold
fluctuations and contribute to spontaneous firing.
Both Pecher (2) and Hunt (8) have discussed possible sources of threshold
fluctuation. Pecher considers in detail the apparent threshold variation that
would result from statistical variations in the number of ions traversing the
axon membrane when a constant potential is applied across it. Assuming
that the excitatory current that he uses is uniformly distributed over a cross
section of the nerve trunk, he concludes that at threshold about a million ions
traverse a single nerve fiber. The statistical variation in this number of ions is
Fluctuations in Neural Thresholds 161
given by its square root, leading to a variation of about 0.1 per cent. This
is several orders of magnitude below the range of threshold variation that he
observed. However, he points out that the number of ions actually effecting
excitation is probably considerably less than the value mentioned above and
the resultant variability correspondingly greater. Pecher also considers as
a possible source of threshold fluctuations local statistical variations of mem-
brane potential, of the sort discussed by Fatt and Katz.
Hunt discusses two classes of possible sources of threshold fluctuation
for spinal motoneurons : (a) sources with a local origin such as we have mentioned
above, which give rise to an independent component of threshold variation
and (b) sources whose effect is felt by many fibers and which therefore produce
at least partially correlated variations in threshold. In the latter category
are included the effects of activity of spinal interneurons. By using a drug
(myanesin), in doses that block transmission through polysynaptic paths
without reducing monosynaptic reflex responses, a considerable reduction
in the range of variation of population response amplitudes was obtained.
On the basis of this result it appears likely that internuncial activity is important
in producing correlated threshold changes in spinal motoneurons.
IV. A MATHEMATICAL MODEL
Let us consider a mathematical model which is based on the concept of
fluctuating thresholds, and which attempts to derive the ensemble behavior
of large numbers of neural elements from assumed properties of neural units
in a specific area of the nervous system (9, 10, II).
This model is based on data obtained from the peripheral auditory system
of the cat. When an electrode is placed near the round window of the cochlea,
responses to clicks can be detected; such responses contain a component that
represents the summated activity of peripheral auditory neurons. Fig. 9 shows
such population responses at a number of intensities. In Fig. 10 the average
peak-to-peak amplitude of such responses has been plotted as a function of
stimulus intensity. The resultant 'intensity function' relates the number of units
firing and the intensity of the stimulus.
The present version of the model (11) postulates the existence of several
independent populations of neural units; within a population all units are
identical. The threshold of a unit is a fluctuating parameter which can be
described by a probability distribution; threshold variations in different units
occur independently. At a rate of stimulation slower than one per second the
'response no-response' sequence obtained from a single unit is assumed to
consist of a series of independent events. Thus we postulate units whose
statistical properties resemble those found by Pecher in the frog's sciatic nerve.
The experiments used to test the model fall into three classes: two-click
experiments (9, 10), measurements of variability of response amplitude (II),
and studies of masking of click responses by noise.
When two clicks are delivered at an interval of less than approximately
100 msec the population response to the second click is smaller than it would
be if the first click had not occurred. This effect is more pronounced the
stronger the first click and the smaller the interclick interval, as illustrated
162
Lawrence S. Frishkopf and Walter A. Rosenblith
in Fig. 11. Consider the ratio of the response amplitude ^R^ to a second click
and the response amphtude R^ to the same click presented alone. In Fig. 12
this ratio is plotted, for a fixed second-click intensity, as a function of the inten-
sity of the first click. The parameter is the interval between clicks, At. If
STIMULUS INTENSITY
IN DB
(RE 1.29 VOLTS)
•90
•40
-80
-30
■70
•20
-60
■10
■60
3 msec
3 msec
DB RELATIVE GAIN -12 DB
Fig. 9. Ink tracings of responses obtained from an anesthetized cat to clicks over
a 90-dB range. The electrode was located near the round window. Note that the
voltage gain of the recording equipment was reduced by 12 dB (factor of i) for
stimulus intensities above —40 dB. The first peak represents the summated
activity of first-order auditory neurons. With this calibration, click threshold for
humans (verbal report) is about —95 dB.
we assume a one-population model, we obtain the result that the ratio 1R2IR2
is Hnearly related to the intensity function for the first click, provided that
the second-click intensity (S^) and At are held constant. Specifically, we obtain
rR. , ^^^^\l - giS„ Ar)] (1)
R,
1 -
Rr
Determination of a single intensity function therefore permits us to predict
the dependence of this ratio on S^ for any value of 5*2 and of At. We may
Fluctuations in Neural Tiiresholds
163
u
E 3.0
AMPLITUDE (-c^) AND LATENCY
(-•-)OF N, AS A FUNCTION OF
2-5\-|NTENSITY OF CLICK STIMULUS "
150
125^
-100 -80 -60 -40 -20
CLICK INTENSITY (dB RE 1.29V ACROSS
PHONE)
Fig. 10. Intensity function (open circles). A''i is the first diphasic response com-
ponent seen in the traces of Fig. 9. The amplitude measurement is made between
the positive and negative peaks of N^. Each plotted point is the median of about
ten such measurements.
m
o
o
o
in
tr
50 DB
30DB
RESTING RESPONSE
tl -lODB
12.2 MSC 30MSC 6IMSC
TIME INTERVAL BETWEEN CLICKS
Fig. 11. Two-click paradigm: the responses shown are to a constant intensity
(—45 dB) second click. The vertical set shows the effect of varying the intensity
of the first click; the horizontal set shows the effect of varying the interval
between clicks. Upper right: response to a — 45 dB click presented alone.
From McGiLL (10).
in each case choose one constant, ^(5*2, At). Fig. 12 shows a number of fits
to the data points which were obtained in this way; 5*2 is constant and each
curve corresponds to a different value of At.
In a second group of experiments the standard deviation of a hundred
response ampHtudes was computed at each stimulus intensity, and the result
was plotted as a function of stimulus intensity. It is readily shown that N
164
Lawrence S. Frishkopf and Walter A. Rosenblith
independent units, each with a probability p of firing, will have a standard
deviation of total response proportional to \/Np{\ — p). As a function of
/; this quantity has minima at zero and one and has a maximum at/? = ^. The
value of p at any stimulus intensity can be obtained from the intensity function.
u
Ll -I
OU
"□
c
CM
^2
> CO
I- UJ'
< cr
_i
UJ
tr
1.00
0.75
0.50
0.25
1.00
0.75
0.50
0.25
|\
V
,
c
X
•
E
G
IX
\
o - - •
"•n
N*
N.
1
'
D
"^•.^
m
F
S2=-45 dB 1
(m sec}
g(Al
A
6.4
0.05-
B
9.1
0.15
C
12.2
0.27
D
21.5
0.46
E
30
0.54
F
61
0.70
G
102
0.83
60 40 20 60 40 20 60 40 20
INTENSITY OF FIRST CLICK IN dB BELOW
REFERENCE LEVEL
Fig. 12. 1R2IR2 (see text) as a function of first click intensity. In each block tiiis
ratio is plotted for a different interclick interval, as indicated at the lower right.
The intensity of the second click was —45 dB throughout. The curves are obtained
from the first click intensity function and eq. (1); the parameter ^(At), whose
values are given at the lower right, is chosen in each case to give the best
fit to the data. After McGill (10).
Fig. 13. Intensity function (upper) and the corresponding amplitude variance
function predicted by the model: (a) for one population; (b) for two disjoint
populations. Oq was chosen arbitrarily. Note that a peak of the variability
function occurs at the stimulus value at which an intensity function component
reaches half its maximum amplitude.
Fig. 13 shows the kind of variability function obtained by assuming one and two
disjoint populations; Cq is the stimulus-independent component of variability
arising from biological and non-biological sources. We have shown (II) that
instability in stimulus intensity, which would also lead to a peaked variability
function, can account for at most three per cent of the observed variability.
A detailed study of the shape of the intensity function led us to postulate
Fluctuations in Neural Thresholds
165
two populations of neural units, one consisting of 'sensitive' units and one of
'insensitive' units. In the three animals tested, variability measurements over
the sensitive range are in good agreement with the theory stated above. One
case is shown in Fig. 14. The intensity function and the probabilities obtained
from it are shown with the derived standard deviation function. Here, Oq is
determined from measurements of baseline variability in the absence of a
stimulus; TV is chosen to give the best fit to the data. Over the sensitive range
CLICK INTENSITY (DB RE I.29VACR0SS PHONE)
Fig. 14. Comparison of the theoretical variability function (with 70 per cent con-
fidence limits) and the measured values of cr, over the range of initial growth of the
intensity function. Each point represented by a solid circle is based on 100
responses; the open circles are based on the first fifty of these responses. The
corresponding intensity function, and the probabilities obtained from it, are also
shown.
(— lOOdB to — 60 dB) the data fall within the indicated confidence interval
approximately seventy per cent of the time, as they should if the model is
correct. Over the insensitive range of the intensity function (—60 dB to
dB), the standard deviation shows a complex behavior which cannot be simply
reconciled with the idea of a single population over that interval.
The third aspect of this study concerns the masking of the neural responses
to clicks by a background noise. Fig. 15 shows the effect of a constant noise
level on response amplitude at several stimulus values. In Fig. 16 we have
plotted these masked and unmasked intensity functions. The observation
was made that a very weak level of continuous noise was sufficient to reduce
almost to zero the N^ response to a fairly intense click. A fixed threshold model
would predict masking of only the units whose thresholds are below the noise
level. If the threshold fluctuates, however, and does so rapidly, nearly all
units of a given population will drop below the noise level and fire in a short
12
166
Lawrence S. Frishkopf and Walter A. Rosenblith
RESPONSE TO CLICK
ALONE
CLICK INTENSITY
IN DB
(RE 1.29 VOLTS)
CLICK RESPONSE IN RELATIVE
PRESENCE OF NOISE GAIN
(-82 DB RE 1.29 VOLTS)
-60
ODB
-50
-40
^x^
v/-'
-30
•12 03
-20
-10
-3 m sec -
-3 msec-
Fig. 15. Ink tracings of responses obtained from an anesthetized cat to clicks
over a 60-dB range, with and without background noise; noise level, —82 dB.
Note that the voltage gain of the recording equipment was reduced by 12 dB
(factor of i) at a click intensity of -30 dB.
Fluctuations in Neural Thresholds
167
interval preceding the click; this assumes, of course, that the noise level lies
within the range of threshold fluctuations of the unit.
By a quantitative treatment based on these qualitative notions we have
been able to show (a) that the hypothesis of a fixed threshold does not account
for the observed data and (b) that over the sensitive range of the intensity
i25r
100
75
50
I 25
^^^_ NO NOISE BACKGROUND
(COMPOSITE OF 3 FUNCTIONS)
NOISE BACKGROUN0(OB RE 1.29V ACROSS
PHONE lUNFILTEF
-92
-82
-67
-80 -70 -60 -50 -40 -30
CLICK INTENSITY (06 RE 1.29V ACROSS PHONE)
Fig. 1 6. Intensity functions for clicks, with an(d without noise backgroun(i ; noise
levels —92, —82 and —67 dB. Each point of the masked functions represents the
average A''i amplitude of ten responses to identical stimuli. The upper curve was
obtained by averaging the three unmasked functions which correspond to the
masked functions shown ; thus each point represents the average A''i amplitude of
thirty responses to identical stimuli. Typical data on which these curves are
based are shown in Fig. 15.
function a single population of units making threshold 'jumps' at a rate of
about 2000 times per second can account for the data. In addition, it is observed
that low level noise has little effect on the intensity function over the insensitive
range, except to reduce it by the constant contribution of the sensitive popu-
lation. The need for a division of units into at least two populations is thus
confinned. When the noise level is raised into the insensitive range the observed
effect is not nearly so marked, implying either that more than one population is
involved in that interval or that the rate of threshold fluctuation is considerably
slower than for the sensitive units.
It is noteworthy that population analyses based on two very diff'erent
experiments, variability and masking, have a great deal in common.
REFERENCES
1. E. A. Blair and J. Erlanger: A comparison of the characteristics of axons through
their individual electrical responses. Anier. J. Physiol. 106, 524-564 (1933).
168 Lawrence S. Frishkopf and Walter A. Rosenblith
2. C. Pecher: La fluctuation d'excitabilite de la fibre nerveuse. Arch. Int. Physiol. 49,
129-152 (1939).
3. A. Hald: Statistical Theory with Engineering Applications, p. 344 (eq. 13.3.6), J. Wiley
and Sons, New York (1952).
4. W. A. Rosenblith: Some electrical responses from the auditory nervous system. Pro-
ceedings of the Symposium on Information Networks, Polytechnic Institute of Brooklyn,
223-247 (1954).
5. D. P. C. Lloyd and A. K. McIntyre: Monosynaptic reflex responses of individual
motoneurons. J. Gen. Physiol. 38, 771-787 (1955).
6. W. Rall and C. C. Hunt: Analysis of reflex variability in terms of partially correlated
excitability fluctuation in a population of motoneurons. /. Gen. Physiol. 39, 397-422
(1956).
7. P. Fatt and B. Katz: Spontaneous subthreshold activity at motor nerve endings. /.
Physiol. 117, 109-128 (1952).
8. C. C. Hunt: Temporal fluctuation in excitability of spinal motoneurons and its influence
on monosynaptic reflex response. J. Gen. Physiol. 38, 801-811 (1955).
9. W. J. McGiLL and W. A. Rosenblith: Electrical responses to two clicks: a simple
statistical interpretation. Bull. Math. Biophys. 13, 69-77 (1951).
10. W. J. McGill: a statistical description of neural responses to clicks recorded at the
round window of the cat. Ph.D. Thesis, Harvard University (1952).
11. L. S. Frishkopf: A probability approach to certain neuroelectric phenomena. Research
Laboratory of Electronics, Massachusetts Institute of Technology, Tech. Rep. No. 307
(1956).
PART III
DETERMINATION OF INFORMATION MEASURES
It is possible (as shown by several papers in this volume) to apply information
theory to biology without introducing any actual information measures. Indeed,
if one considers that it is very difficult to estimate information measures for
living systems, and that the resulting measures are of an irreducibly relative
nature, one might wonder whether it is worth-while to take such measures at
all. However, it is difficult if not impossible to validate firmly the application of
information theory without critical tests based on quantitative measurements;
moreover, one hopes to discover lawful relations in the results of the measure-
ments themselves. So, attempts are being made to estimate information contents
associated with various biological structures and functions. All the papers in
this part are chiefly concerned with such estimations; some from a general
point of view, some with regard to particular systems, ranging in complexity
all the way from simple molecules to whole men.
H. Q.
169
CHEMISTRY AND BIOCHEMISTRY AT LOW
TEMPERATURES AND DISCRIMINATION OF
STATES AND REACTIVITIES*
Simon Freed
Chemistry Department, Brookhaven National Laboratory,
Upton, New York
Abstract — In order to apply information theory to biochemistry and biology at the molecular
level it is advantageous to reduce the number of classifications and specifications involved by
reducing the temperature of the system. In this way the number of species and states with
their reactivities is reduced. At the same time the chemical noise level falls and in consequence
a resolution may be obtained between components whose properties are practically indis-
tinguishable at ordinary temperatures. Weakly bonded systems and intermediates become
more easily detectable not only because of an increase in their concentration, that is, an increase
in their signal, but in addition because the noise level is weaker at the lower temperatuie.
Illustrations are given from chemistry where reactions in solutions proceed at the tempera-
tures approaching that of liquid nitrogen. The information content of irreversible reactions
at room temperature may be thought of as being stored in intermediates that participate in
reversible reactions at the low temperatures.
Once the properties of the more stable states have been understood, the way is clear for
investigating the system in its thermally active states since allowance can be made for the
presence of the former. In this way, an ordering of experimentation according to temperature
will bring into activity successive components of the system.
Examples have been selected mainly from work on the preservation of biological systems
at low temperatures which indicate that biochemical and biological processes may likewise
be investigated and that the finer discriminations and specificities associated with lower
temperatures may be brought to light in these fields also.
If we wish to measure a physical property, such as electrical conductivity or
viscosity, with an instrument which we have no intention of modifying, there
is little point in seeking the information content of the instrument. On the
other hand, if we wish to employ chemical substances as probes for uncovering
structures of enzymes by means of enzyme-substrate reactions, we are at once
confronted by the need of the structural and functional information of our
probes. In fact we are discussing properties at the molecular level. Pure
substances at this level are mixtures composed of molecules in various energy
states with their characteristic configurations, motions, and reactivities. The
application of information theory to biology at the molecular level requires
therefore a great expansion in the number of categories and specifications.
It is to reduce this number in a systematic manner and make these categories
more precise that I wish to draw upon the relation that has been recognized
between information and entropy which asserts that the amount of information
* Research performed under the auspices of the U.S. Atomic Energy Commission.
171
172
Simon Freed
required to specify the system will be less at lower temperatures. The system
will redistribute itself from higher to lower energy levels so that only the more
basic ones remain appreciably occupied. Fewer chemical species are now
present and also active. There has been, in a sense, a reduction in chemical
noise differing in its frequency spectrum from the continuum characteristic
of an electrical conductor. Chemical noise reflects the structural properties
of molecules and may consist of dominant discrete frequencies associated
with virtual continua of modulations. Usually these represent couphng of the
electronic system of the molecule in a given atomic configuration with its
300
■
II
B
77
II
1
1
II
4
1
Fel
III
1
l|l 1
II
1 |1
If 1
Fig. 1 . The variation of absorption spectrum of praseodymium chloride with
temperature. Line drawings of visible absorption spectra of crystals of anhydrous
praseodymium chloride (PrClg) at room temperature, at that of liquid nitrogen,
and that of liquid helium. Sharper spectra, improved resolution, and fewer hnes
are evident at lower temperatures. The fewer hnes correspond to fewer energy
states which are occupied by the praseodymium ions. At room temperature the
blocks of diffuse spectra are actually not uniform in intensity but are more
intense as a rule in those regions where the spectrum of the crystal at 77°K
possessed its most intense line spectrum. The greater diffuseness of the lines and
their increased numbers at the higher temperature may be regarded as chemical
noise associated with the spectroscopic signals from the more stable states at the
lowest temperature.
own vibrations, restricted rotations, etc. If the molecules are complex, fluctua-
tions between difl'erent atomic configurations may contribute to the noise.
In addition, coupling of the molecule in each of its states with the molecules
of its environment in different configurations leads to more and more densely
spaced energy levels which I referred to as the continua.
A reduction in temperature removes thennal energy required to activate
some motions and effect changes in configurations, and reduces the number
of perturbations of a given configuration. Not only are fewer species present
but each species is more sharply defined; thus, less infonnation is required
for specifying the system than at higher temperature. Clearly, the system is
now more specific in its reactions than at higher temperature and its specificity
can be related to more sharply defined geometric configurations. The chemical
system has become a more precise probe.
The following illustrations have been selected for the simplicity of their
phenomena rather than for their direct relevance to biology.
The sharp absorption spectrum of a crystal of a rare earth salt (Fig. 1)
shows very clearly that at the lower temperature fewer lines are present; they
are sharper and more clearly resolved and the general diffuse background
prominent at the higher temperature (not shown in the line drawing of the
figure) becomes decidedly weaker. There are then fewer kinds of absorption
centers at the lower temperature and, because the stable states are exposed to
Chemistry and Biochemistry at Low Temperatures
173
more sharply defined environmental fields, there are fewer kinds of pertur-
bations.
An especially vivid example of a solution showing somewhat similar pheno-
mena is given by the fluorescence spectrum of solutions of europium chloride
in ethanol at various temperatures (1), The spectra were taken to discover the
discrete number of lines in the three separate sets which may furnish the point
4000
4500
5000
Fig. 2. Absorption spectra of carotene (90% alpha and 10% beta). A — In hep-
tane at room temperature; B — In equal volumes of liquid propane and propene
at 77°K.
group symmetry of the electrical fields about europium ion in the solution.
It is clear that at room temperature the continuous noise is so great as to make
enumeration impossible. As the temperature is lowered a few discrete Hnes
can be resolved with such definiteness that they serve to eliminate some of the
possible point group symmetries. At the temperature of liquid nitrogen and
even at the temperature of dry ice adequate resolution is clearly achieved and
the number of possible symmetries of the environmental fields is reduced to
one only.
Figure 2 gives the absorption spectrum of a substance of some biological
interest, /9-carotene, and illustrates the increased contrast between absorption
and transmission at the lower temperature, that is, the increased signal to noise
ratio.
Figure 3 is presented to illustrate the resolution into components of what
174
Simon Freed
is apparently a single species at room temperature. The figure reproduces
the absorption spectra of chlorophyll b in ethyl ether and methanol (2). Our
first inclination is to ascribe the differences in the spectra to the perturbations
produced on the structure of the chlorophyll molecules by the two types of
solvent molecules. Figure 3b is a magnification of the Soret band in the blue
400
500 600 700
WAVELENGTH IN mjJi
Fig. 3a. Absorption spectra of chloro-
phyll b at room temperature. The thin-
lined curve with maxima at shorter wave-
lengths represents a solution of chloro-
phyll in ethyl ether; the thick-lined curve
gives the spectrum when the solvent is
methanol.
4100
5000A
Wavelength
Fig. 3b. The dependence of the absorp-
tion spectra of chlorophyll b on tem.pera-
ture. Only the Soret band in the blue is
shown. Enlarged scale of wave-lengths.
At 300"K the solvent is 20% propyl
ether, 80% hexane. At the lower tem-
perature it is 20% propyl ether, 40%
propane, and 40 % propene. The hexane
was substituted at 300°K for the hydro-
carbons propane-propene since they are
normally gases at room temperature.
region and shows that a solution of chlorophyll b in ether is really a mixture
of two species (etherates) in equilibrium with each other in roughly equal
amounts and clearly resolved at 180°K. A study of the dependence on tempera-
ture of the absorption spectrum of chlorophyll b in methanol reveals that in
this solvent, chlorophyll b also exists as a mixture of solvates which are about
equal in concentration at room temperature and together they yield the com-
posite spectrum. However the spectrum of each alcoholate differs very little
in shape from that of each etherate. Fig. 4 illustrates a form stable at a lower
Chemistry and Biochemistry at Low Temperatures
175
temperature reacting to produce reversibly a stable intermediate but at still
higher temperature ending in an irreversible reaction.
The following specific observations may prove worthwhile in illustrating
what is probably a rather common phenomenon. Chlorophyll b dissolved in
CHLOROPHYLL B'
IN 15%
MONO-i- PROPYL AMINE
AND 1:
1 PROPANE
-PROPENE
230°K
! :
: :
I93°K
• i
; 1
z
>
o
i^ '•
H
^t^^^^
1 i
cr
X^-
\ ;
o
CD
y^ 1 >
<£
tv
-^"'"^
^^V '^
1
\
1
—
1
4000 5000
WAVELENGTH %
6000
Fig. 4. Chlorophyll 6' in 15% mono-/-propyl amine in 1 : 1 propane-propene.
To show the presence of the red-brown intermediate stable at 193''K which is in
equilibrium with the original chlorophyll. At temperatures higher than about
235''K, an irreversible reaction occurs.
ether is deposited as a green powder by pumping off the ether at room tempera-
ture. When the temperature of the powder is reduced to that of dry ice (about
193°K) and propylamine is condensed upon it at this temperature, it dissolves
quickly, forming a red solution. Note in Fig. 4 the new absorption between
5000 A and 6000 A. A rise in temperature transforms the color into the green
of chlorophyll with its characteristic spectrum which reverts back reversibly
to the red substance when the temperature is reduced. However, if the tem-
perature is kept any length of time at about 235°K or higher, an irreversible
reaction sets in. For example, at room temperature the red color lasts only
a fraction of a second. This evanescent red color is produced in the well
known phase test for chlorophyll.
Figure 5 represents a chemical reaction which appears rapid even between
167°K and 75°K. Chlorophyll h dissolved in di-/.so-propylamine is undergoing
176
Simon Freed
transformation probably in an acid-base reaction. The quick readjustment
to equilibrium is shown by the interchange in relative intensities of the bands
in the red region. The band furthest towards the red grows in as the temperature
is reduced, at the expense of the band near it toward shorter wavelengths.
That these reactions occur rapidly at such temperatures is not very sur-
prising since little heat of activation is required for this type of reaction. Figure
6 depicts a type of oxidation-reduction at low temperatures. When iodine
3000
4000 5000 6000
WAVELENGTH ANGSTROMS
7000
Fig. 5. Chlorophyll b in 15% dipropylamine diluted with equal proportions of
propane and propene. A chemical readjustment toward equilibrium occurs
between 170°K and 75°K.
is finely divided it rapidly dissolves in isoprene at the temperature of dry-ice,
193°K. A brown solution forms at the solid-liquid interface but it decolorizes
very quickly, becoming colorless a little distance from the iodine surface. In
the light of other investigations it was surmised that the solution is brown
because of the presence of a 1:1 (molecular iodine-hydrocarbon molecule)
addition compound which possesses a characteristic absorption band in the
ultraviolet region. To build up any appreciable concentration of this compound
it would evidently be necessary to make solutions of iodine in isoprene below
193°K. When a solution of isoprene in propane (to which propene had been
added to increase the solubility of isoprene) at the temperature of liquid
nitrogen (77°K) is mixed with a solution of iodine in propane and propene.
Chemistry and Biochemistry at Low Temperatures
177
the new band anticipated in the ultraviolet does not appear within a day or
two. Figure 6 indicates what happens when such a solution is warmed. At
146°K the absorption band shown is due to the iodine-propene molecular
addition compound which has been identified in a previous experiment. At
150°K appears the anticipated new band arising from the compound iodine-
isoprene. At 154°K, this band quickly disappears irreversibly and at the same
time decoloration of the solution occurs. The molecular iodine has been removed,
presumably by the halogenation of the double-bond system of isoprene, just
0.
q:
o
(J)
CD
WAVELENGTH
Fig. 6. Isoprene dissolved in 1 : 1 propane-propene to which iodine dissolved in
1 : 1 propane-propene has been added. The new absorption band which appears
at 1 50°K is due to a 1 : 1 molecule addition compound of the iodine to isoprene.
Its disappearance at 154°K is due to an irreversible reaction, probably
halogenation across the double bond.
as had occurred when solid iodine reacted with isoprene at the temperature
of dry ice. This oxidation appears to require the prior formation of the inter-
mediate molecular addition compound stable at about 150°K at the concen-
trations employed.
By investigating the properties and reactions from the lowest practicable
temperature upward we would observe the appearance of new thermally
activated states and their subsequent reactions.
1 78 Simon Freed
In analogy with the phenomena illustrated we would expect that a knowledge
of biochemical and even biological processes of considerable value may be
gained by investigations at low temperature. Support for these expectations
comes mainly from recent investigations directed toward the preservation of
cells, tissues, and entire organisms. Even more cogent for our purposes are
the instances of partial preservation at low temperatures which becomes more
effective at still lower temperatures. Unless explicit references are given, the
following examples are drawn from the excellent review by Audrey U. Smith (3).
For example, H. F. Smart found that twenty-one species of bacteria, yeasts,
and molds continued to multiply in frozen media at 264. 1°K. Sizer and Joseph-
son found that lipase was active at 248. 5°K, tryptic digestion proceeded at
258°K, and that invertase continued to hydrolyze sucrose at 255°K. At 203°K,
however, they could detect no hydrolysis during several weeks. In the preser-
vation of red blood cells, about ten per cent deterioration occurs per year at
dry ice temperature, 193°K, but scarcely any loss is incurred when they are
kept at the temperature of liquid air, 80°K. Ovarian tissue failed to survive
nine days at 193°K but survived more than a year at 80°K under otherwise
similar conditions (4). Revival of rats after cooling to 273. 5°K was reported
by Andjus (5, 6).
Irreversible reactions are then clearly progressing at low temperatures,
in red blood cells and ovarian tissue at 193''K and at somewhat higher tempera-
tures in the enzymatic reactions. If the simple reactions such as those of isoprene
and iodine, chlorophyll and propylamine serve as models, the irreversible
reactions are preceded in their first and intermediate stages by reversible reactions
at still lower temperatures.*
Becquerel found that rotifers, spores of bacteria, non-sporing bacteria,
algae lichens, mosses, and seeds of higher plants, after having been dried in
a vacuum of 10^^ mm Hg over barium oxide, could be successfully kept at the
temperature of liquid helium (4°K). Parkes showed that human spermatozoa
survived exposure and storage at 80°K. Ovarian, testicular, pituitary, and
adrenal tissue have given functional grafts after storage at 80°K, especially
if glycerine was added. Luyet established that vinegar eels, spermatozoa
muscle fibres of frogs, and hearts of embryonic chicks could be revived after
sudden cooling to the temperature of liquid air (80°K). It is then not surprising
that enzymes have been cooled to such temperatures without loss of subsequent
potency. It would seem then that a number of biochemical and biological
processes are available for study at low temperatures.
I shall consider both homogeneous and heterogeneous solutions. The
first implies that solvents must maintain all the reactants in solutions fluid
at low temperatures. It would seem well worthwhile to employ conventional
solutions at as low temperatures as possible, and aqueous systems near zero
degrees or under supercooled conditions. It has been shown (8) that proteins
* Lovelock (7) ascribes the deterioration of red cells to a physical mechanism rather than to a
chemical process, namely, that the dissolution of lipoprotein and other components of the cell
membrane proceeds more rapidly than the biochemical processes can repair them at the low
temperature. Since the lipoprotein etc. is presumably bound as an integral part of molecules
composing the membrane material, the physical process may also be initiated by reversible
chemical transformations.
Chemistry and Biochemistry at Low Temperatures 179
such as enzymes are soluble in some non-aqueous solvents and that a few
enzymes can be recovered with virtually their full potency. Since some of the
solvents have melting points below that of water they can be utilized for investi-
gations of solutions of proteins at relatively low temperatures. It appears
entirely possible that had the solution process been carried out at lower tempera-
ture a larger fraction of the enzymes would have been recovered without
deterioration. Indeed it may prove fruitful to undertake studies at low tempera-
tures of the first stages of reactions which are toxic at ordinary temperatures
since the toxic substances may be removed at temperatures so low that httle
permanent injury is done to the enzyme or organism.
In analogy with the dissolution of finely divided chlorophyll and iodine
by solvents at low temperatures it is to be expected that at low temperatures
heterogeneous reactions are also possible between substances in solution and
biological materials having high specific areas. Ready-made for such reactions
with solutions seem sections of tissue with water removed by freeze-drying.
Likewise Becquerel's procedure of removing water by pumping at room
temperature would prepare material for reaction at low temperature. Some
of the reactions with the surfaces constitute a generalized staining. Many
staining processes are acid-base reactions and would be expected to be rather
rapid at low temperatures. As has been remarked, molecular steric factors are
as a rule more specific at the lower temperatures in general; hence finer dis-
criminations between structures within the surfaces are to be anticipated.
REFERENCES
1. E. V. Sayre, D. G. Miller, and S. Freed: Symmetries of electric fields about ions in
solutions. Absorption and fluorescence spectra of europic chloride in water, methanol,
and ethanol. /. Cfiem. Phys. 26, 109-113 (1957).
2. D. G. Harris and F. P. Zcheile: Effects of solvent upon absorption spectra of chloro-
phylls A and B. Bot. Gaz. 104, 515-527 (1943).
3. A.U.Smith: Eifects of low temperatures on living cells and tissues. In: Biological Applica-
tions of Freezing and Drying, tA. by R. J. Harris, 1-62, Academic Press, New York
(1954).
4. A. S. Parkes and A. U. Smith: Regeneration of rat ovarian tissue grafted after exposure
to low temperatures. Proc. Roy. Soc, (B) 140, 455-470 London (1953).
5. R. K. Andjus: Sur la possibilite de ranimer le rat adulte refroidi jusqu'a proximite du
point de congelation. C.R. Acad. Sci., Paris 232, 1591-1593 (1951).
6. R. K. Andjus and A. U. Smith: Revival of hypothermic rats after arrest of circulation
and respiration. J. Physiol. 123, 66-67 (1954).
7. J. E. Lovelock: Physical instability and thermal shock in red cells. Nature, Lond. 173,
659-666 (1954).
8. M. J. Loiseleur: Sur quelques proprietes des proteides en solutions organique. Bull.
Soc. Chim. Biol. 14, 1088-1100 (1932).
J. J. Katz: Anhydrous hydrogen fluoride as a solvent for proteins and some other bio-
logically important substances. Arch. Biochem. Biophys. 51, 293-305 (1954).
E. D. Rees and S. J. Singer: A preliminary study of the properties of proteins in some
nonaqueous solvents. Arch. Biochem. Biophys. 63, 144-159 (1956).
1 80 Simon Freed
DISCUSSION
Mahler : I can see where this might be useful in the study of the rate of formation of
enzyme-substrate complexes. This is a reaction which proceeds much too rapidly to be
measured by most ordinary techniques. It is only with very rare and very stable enzyme
complexes and by using very interesting and very sensitive experimental devices that Chance*,
for instance, has been able to study this at ordinary temperatures. But if one can find the right
kind of solvent for both substrate and enzyme — -there is no reason to assume that some of
these solvents might not work — one might be able spectroscopically to study the rate of
formation of enzyme-substrate complexes at low temperatures.
* B. Chance and G. R. Williams: The respiratory chain and oxidative phosphorylation.
In: Advances in Enzymology, ed. by F. F. Nord 17, 65-134. Interscience, New York. (1956).
INFORMATION CONTENT OF TRACER DATA
WITH RESPECT TO STEADY-STATE SYSTEMS*
MoNES Berman and Robert L. Schoenfeld
Division of Biophysics, Sloan-Kettering Institute, New York
Abstract — A method for the quantification of information in data from tracer experiments
on steady-state systems is presented. It is shown that if the system is represented by n com-
partments a point in an n^ dimensional space can serve to represent a specific model. Further-
more, uncertainty about the system due to statistical fluctuations and incomplete data can be
represented by regions in the n"^ dimensional hyperspace. A unit of information for such a
system is defined and serves as a measure of the amount of information necessary to determine
the system to within a desired accuracy.
In order to express the data in terms of the generalized n^ dimensional space, a set of
invariants is defined for the data. A concise matrix relation is shown to exist between the
invariants of the data and the parameters that characterize the compartmental system. The
matrix relation allows mappings between the data and the system.
The method presented is applicable to any compartmentalized system that shows linear
kinetics.
I. INTRODUCTION
This paper is concerned with the quantification of information contained
in data from tracer experiments performed on steady-state biological systems.
In general, the same set of data may be analysed in terms of different systems
of various degrees of complexity. To define the information content of the
data, therefore, it is necessary to specify the system in terms of which the data
are to be analysed.
It can be assumed for many tracer experiments that the system! consists
of a discrete number of compartments (or pools) each representing a locali-
zation or chemical state of the labeled material, with exchange of molecules
between compartments. The rate of exchange of the unlabeled molecules
between compartments is in general a non-linear function of the amounts of
material in the compartments. If, however, the system is in a steady state and
the amount of the tracer is sufficiently small compared to its unlabeled isotope,
the rate of exchange of the tracer may be treated as a linear function of the
amounts of labeled material in the compartments (1).
The problems that arise in treating the data of tracer experiments are:
first, to define the information content in the data, and second, to translate
the information in the data into values of the system parameters (the turn-over
rates of the compartments). In addition, it is desirable to have a measure of
* This work was supported in part by the U.S. Atomic Energy Commission Grant
AT(30-1)-910.
t For this paper, the word 'system' will be used to mean a specific number of compartments
independently of how they are interconnected. The word 'model' will refer to a specific
configuration of the system.
181
13
182 MoNES Berman and Robert L. Schoenfeld
uncertainty in the values determined for the system parameters. The uncer-
tainty in these values arises from the fact that the collected data may not be
sufficient to define the system completely and that the collected data have
associated fluctuations.
A method for the quantification of the information in data and the systematic
formulation of models consistent with it is presented here. The information
content in the data is expressed by a set of invariants, and a concise matrix
relation is shown to exist between the invariants of the data and the system
parameters. Uncertainties in the data due to incompleteness or fluctuations
are mapped into a generalized co-ordinate space which also represents the
degrees of freedom of the system parameters and their uncertainty. The
uncertainties in the data are expressed in terms of regions in the generalized
co-ordinate space in such a way as to suggest a criterion for their quantification
with respect to the system.
II. DATA INVARIANTS AND SYSTEM PARAMETERS
The response of the system to a tracer injected into any one compartment
can be expressed in terms of the amounts of tracer in the various compartments
as a function of time. If we define the probability per unit time for a transition
from any compartment / to compartment j as A^^, then the kinetics of the
tracer in the /th compartment of an n compartmental system can be represented
by the following set of differential equations :
^^ = -K^iit) +lh^qlt) (/ = 1, 2, • • •, n) (1)
where ^^(0 is the amount of tracer material in the ;th compartment at time t
and
hi ^ i hi (2)
is the probability per unit time that any molecule in compartment / will leave
that compartment.
The inequality sign expresses the possibility that a molecule may leave the
entire system from compartment / as in the case for open systems.
The solution of the set of differential equations (1) is:
n
q,{t) = I A,, e-^' (3)
i=i
In a recent paper (2) we have pointed out that data expressed in the form of
equation (3) have the following properties:
(a) There are at most n a^- in the data and these are invariants of the system
and independent of the initial conditions or site of measurements.
(b) The Ay.^ represent n^ independent variables in the data. Specification of
the initial conditions reduces the Aj.^ to {n^ — n) independent variables which
are a function of the system parameters only. The Aj^^ thus represent {n^ — n)
invariants of the system parameters.
Information Content of Tracer Data With Respect to Steady-state Systems 183
(c) The n a,- and rr" Aj.j comprise a necessary and sufficient set of data to
define uniquely the parameters of the system.
(d) A simple matrix relation (3) exists between the Aj^^ and a,- of the data and
the A,y of the system. This relation can be written:
Ml = \A l«l
or
where
I3I
— Aj2 — Aj3
/122 — "^
23
1
32
a =
h
33
ai
(-11
(4)
(5)
A,
11
1
'31
All
-^22
^32
'13
'23
^33
iy.2
0" a.
Equation (5) expresses the system parameters in terms of the invariants in
the data. If these invariants are known, the fractional turnover rates, Aj-;,
can all be determined. However, in most cases the experimental data are
incomplete in that certain of the A^j and a^ are not known. For these cases,
an infinity of models mathematically consistent with the data can be obtained
from equation (5) by inserting arbitrary values for the unknown Aj.j and a^,
preserving the initial conditions and other constraints in the data. Most of
these arbitrary models, however, will be physically meaningless because some of
the fractional turnover rates will be negative. Consequently, it is necessary to
investigate what range of values of the unknown A^^ and a, correspond to
physically meaningful models. This can be done by relating variations in A^^j
and a.; to variations in the X^j.
One may define (2) a matrix |P| in such a way that the product \PA\ will
preserve the known A^j. The number of variables in \P\ will be equal to the
degrees of freedom in the Aj^j. If both sides of equation (4) are premultiplied by
the matrix \P\ this equation can be rewritten:
(6)
(7)
a
\PXP-'^\ \PA\ = \PA\
which is of the form
[A'l l^'l = |/1'| |a|
where
M'l = l^^l (8)
\l'\ = \PKP-^\ (9)
Equation (9) expresses a mapping of the matrix \X\ corresponding to varia-
tions in the unknown Aj,j only. It also represents a general solution of all
models mathematically consistent with the data in terms of a minimum number
of variables. This solution is expressed in terms of an arbitrary model represented
by the matrix \X\.
Similarly, we can define a matrix \D\ so that the product |aZ)| will preserve
all the known a^. Incorporating this into equation (4), we get
|;.^Z)^-i||^| = |y4||aZ)| (10)
1 84 MoNES Herman and Robert L. Schoenfeld
which is of the fonn
mMI = MIH (11)
where
|a I = |ax^|
|A'| = \UDA-^ (12
Equation (12) represents a mapping of the matrix |A| in terms of the variations
in the unknown a^ only.
By applying the restriction that every fractional turnover rate must be
positive,
r,, ^
A',,^iA',i (13)
i = \
equations (9) and (12) limit the range of values of the variables in the matrices
\P\ and \D\. Since these variables are all independent, they represent a co-ordinate
space of dimension equal to their number. Every point in this space specifies a
set of values for the variables in the matrices |P| and \D\ and, thus, defines a
model through equations (9) and (12). The restrictions on the range of values
of the variables as expressed by equation (13) correspond to a region in the
co-ordinate space in which all physically meaningful models must lie.
The choice of the starting point for the transformations indicated above is
completely arbitrary and does not affect the final result. Any mathematically
consistent model leads to a region in the mapping space corresponding to
proper physical models.
III. UNCERTAINTY MAPPINGS IN GENERALIZED SPACE
We now wish to examine the problem from a somewhat different point of
view. The system is represented by n^ X^^, generally independent of each other.
We can, therefore, consider the X^^ to represent an n^ dimensional space, and any
point in that space as a specific model of the system. It was also indicated
earlier that the data could be represented by a set of invariants composed of
n oij and {n^ — n) A^j or a total of n^ invariants. Hence, the transformation
from the data space to the X^^ space is dimensionally consistent and unique.
This means that a complete set of A^j and a^ corresponds to a point in the
\X\ space, and vice versa. By definition, however, the values of the A,,- must all
be positive. Consequently all the models must lie in a restricted region of the
\X\ hyperspace. This restriction carries over to the data space, limiting the region
in which the Aj^j and a^ may lie.
Any specified A^j or a_, implies a one dimensional constraint in the data
space. This carries over as a one dimensional constraint in the \X\ space, and
restricts all models to a surface in the hyperspace. If, however, the value of
A^j or oij is known only within a certain range, the surface has correspondingly a
certain thickness.
When several A^^ or a^ are known, the dimensions of the space in which all
models must he is reduced by a corresponding number. Statistical uncertainties
Information Content of Tracer Data With Respect to Steady-state Systems 185
for any of the known values correspond to similar uncertainties along the
appropriate co-ordinates in the hyperspace.
Thus, if all Aj,j and a^ are known exactly, a point in the hyperspace of n^
dimensions specifies the model. If all the data are known to within a certain
statistical precision, the most likely model is estimated as a point in the n~
dimensional space surrounded by a region that corresponds to the statistical
uncertainty. If some Aj^j or a_, are unknown, the corresponding dimensions in
the n^ dimensional hyperspace extend to the limits imposed by the relation that
all Xjj are positive.
IV. UNIT OF UNCERTAINTY
Based on the point of view presented, we can define a unit of uncertainty
to be a certain volume of the hyperspace. The size of the volume so defined is
arbitrary; it may correspond to a volume that is equivalent to the actual
standard deviation in the data, or to some convenient standard deviation that
may serve as a reference. The information necessary to define the system can
then be expressed as the number of binary choices, or bits of information,
necessary to reduce the total uncertainty space to the size of a defined unit.
V. CONCLUSION
The treatment presented provides a framework in which information in data
from tracer experiments on steady-state systems can be quantified in terms of a
compartmental system and its parameters. Before the information can be
quantified, however, a number of compartments has to be chosen for the system.
Unless this is known from independent sources, the method in choosing the
number of compartments is based on the minimum number of exponential terms
that 'reasonably' describe the data. This, at present, is by no means a unique
procedure.
It was shown in this treatment that a model representing the system can be
expressed as a point in a generalized co-ordinate space, and that any uncertainty
in the system can be represented by a certain region in that space. The nature
of the uncertainty (whether incomplete data or statistical fluctuations in the data)
did not matter in the treatment.
There is, however, one difference in the regions of the hyperspace corre-
sponding to these two sources of uncertainty. The difference is in the probability
that any model in the region represents the true system. In the case of incomplete
data, the probability density over the entire region is assumed constant; that is,
every model in the region is considered equally probable. In the case of statistical
fluctuations, however, a certain point or unit volume represents the most likely
model, and the rest of the points or unit volumes decrease in probability in a
manner governed by the statistics of the data.
The region in the |A| hyperspace can serve to define the information content
in the data of the system as a whole or of each parameter of the system, namely
the turn-over rates, separately. The latter can be obtained by investigating their
values over the bounded region.
One need not necessarily deal with all the dimensions of the hyperspace. One
can express the uncertainties in terms of a subspace whose dimensions are equal
] 86 MoNES Berman and Robert L. Schoenfeld
to the degrees of freedom of the system, as imphed by equations (9) and (12).
In this case, however, the statistical variations of the collected data cannot be
represented since their dimensions are omitted. Any new data to be collected,
however, can be represented in this subspace. The significance of any new data
can also be evaluated by the relative reduction in the size of the region in the
subspace. A unit of uncertainty may be defined for this subspace as was done for
the hyperspace.
In references (1) and (2) it was shown how information about the system from
steady-state measurements and thermodynamic considerations can be combined
with tracer data to form a unified methodology in reducing the uncertainty
about the system. The treatment presented here can be extended to include such
additional information.
Whereas the concepts presented here are relatively simple, the application to
specific problems involves considerable work. One can handle two or three
compartmental systems with few degrees of freedom fairly easily using a desk
calculator. The handling of more complex systems becomes quite time con-
suming. It is hoped that a programming of this on digital computers can be
worked out for routine applications.
REFERENCES
1. M. Berman: The formulation of biological models from tracer and steady-state data.
Ph.D. Thesis, Polytechnic Institute of Brooklyn (unpubUshed) (1957).
2. M. Berman and R. Schoenfeld: Invariants in experimental data on linear kinetics and
the formulation of models. /. Appl. Phys., 27, 1361-1370 (1956).
3. H. Margenau and G. M. Murphy: The Mathematics of Physics and Chemistry, chap. 10,
Van Nostrand, New York (1943).
THE DOMAIN OF INFORMATION THEORY
IN BIOLOGY*
Henry Quastler
Brookhaven National Laboratory, Upton, New York
In the proper course of events, a theory is introduced to account for a specific
body of facts ; then nobody will presume to expatiate upon the domain of the
theory. With information theory and biology, the situation is less simple. The
modern development of the theory stems largely from C. E. Shannon's concern
with certain problems of communication engineering (1). I have heard Shannon
say that he was somewhat dubious about the extension of his results to remote
fields, and that he felt that people working in other disciplines might do
better to develop their own theories. This is not what happened. Shannon's
theory has been taken up with enthusiasm by psychologists, linguists, historians,
planners, librarians, sociologists, and by biologists with a wide variety of
interests. Motives for such generalizations were supplied by Wiener, who
pointed out that all control (in the animal and in the machine) depended on
communication, and that all communication involved measurable quantities of
information (2) ; and by Weaver, who emphasized the great generality of the
information concepts in a searching study (1).
It appeared then that information theory was a tool made to order to deal
with a vast variety of problems. This variety, however, is not limitless. There-
fore, a discourse on the domain of information theory is indicated. One part
of this discourse will deal with the negative domain, or with some of the limita-
tions of the theory. The other part will be concerned with positive applications ;
it is largely an attempt to give clearer definition to the somewhat vague hopes
most people have when proposing to apply information theory.
It is curious that applied information theory produces rather violent reactions,
some of them negative. Certainly, it is entirely possible that every biologist
who works with information theory, or any other systems theory, is wasting
his time. But this, of course, applies to anybody who works with a new theory.
It is difficult to see how applying information theory should irritate people —
unless the cause should be the very pleasure of gently playing with the theory.
Every scientist is aware that there is a 'difference between the labor of thought,
and the sport of musing', and knows well the danger inherent in the latter.
To go on with Dr Johnson: 'There is nothing more fatal to a man whose
business is to think, than to have learned the art of regaling his mind with those
airy gratifications .... This is a formidable and obstinate disease of the intellect,
of which, when it has once become radicated in time, the remedy is one of the
hardest tasks of reason and of virtue. Its slightest attacks, therefore, should be
* Research carried out at Brookhaven National Laboratory under the auspices of the U.S.
Atomic Energy Commission.
187
188 Henry Quastler
watchfully opposed' (from The Rambler). Is this why so many scientists do
not mind too much having collected a lot of useless data but dread to be
found working with a useless theory ?
I. APPLICATIONS
Every kind of structure and every kind of process has its informational
aspect and can be associated with information functions. In this sense, the
domain of information theory is universal — that is, information analysis can be
applied to absolutely anything. The question is only what applications are
useful.
1 . Use of Basic Concepts
The basic concepts of information theory — measures of information, of
noise, of constraint, of redundancy — establish the possibility of associating
precise (although relative!) measures with things like form, specificity, lawful-
ness, structure, degree of organization. This alluring promise has introduced
the information concepts into the thinking of many biologists. The results of
conceptual applications range from harmless modernisms of language to very
serious reasoning. In particular, the information concepts seem to lend them-
selves readily to dealing with the problems of emergence and destruction of
order in complicated systems.
The problem of emergence of order is usually treated in terms of Darwinian
machines, large more or less random assemblies of parts which can both
function and, in some manner, register the results of their functioning. The
resulting feedback loop produces some order amazingly fast (3, 4). The theory
of random networks is a very active field, and some very competent men expect
that the main contribution of information theory to biology (and to other
fields concerned with very complicated systems) will come from this endeavour.
Closely related is the problem of destruction of orderhness. In biology,
this is the problem of aging and decay; it is the topic of a major fraction of
this conference (5, 6, 7).
2. The Representation Theorem
The use of the basic concepts of information theory becomes more powerful
if one considers that the behavior of information measures follows certain rules;
these rules are the theorems of information theory. There are two basic theorems
which I like to call the 'representation theorem' and the 'noise-and-redundancy
theorem'. The first has to do with the possibility of representing one kind of
information by another kind of information. There are absolutely no quahtative
limitations as to how information can be represented ; but, there is a quantita-
tive limitation: any physical entity can assume only a limited number of
distinguishable states, and this limits the degree to which it can represent
information. This degree is further modified by the rules of selecting successive
states. The applicability of the representation theorem depends to a high degree
on knowing the process by which states are selected.
The representation theorem applies every time information is transferred —
because the transfer does involve representation of the information existing
The Domain of Information Theory in Biology 189
in the transmitter, in the medium and, finally, in the receiver. It can thus be
stated as follows: A source cannot transmit more information than it has, a
receiver cannot register more information than it can display. This sounds
trivial, but the point is that information contents can be precisely estimated
in ways which are not trivial. The representation theorem implies that it is
possible to establish an upper bound of the flow of information simply by
investigating the terminals. It is, thus, a one-sided conservation principle; being
one-sided, it is not as strong as the two-sided conservation principles which are
so commonly used in physics. It becomes stronger in situations where one
may assume that the inequality approaches an equality.
There are two conditions which are conducive to the establishment of full
conservation of information: one, that information is a valuable and critical
commodity, and two, that noise can be minimized. The concept that informa-
tion is the most precious commodity for living things has been formulated
strikingly by Schroedinger in his assertion that 'living things feed on orderli-
ness' — that they feed because they need fresh supplies of orderliness, not of
energy or matter (8). The need for fresh supplies of orderliness presupposes
that orderliness is somewhere lost, that is, that noise is present. This, however,
does not mean that noise is present everywhere. Some processes may occur in
'clockwork fashion', without loss of information. That is the case which
Schroedinger classifies as 'generation of order from order'. He suspects that
each individual act of transmission of genetic information from parent to
offspring occurs without serious loss of information. This idea agrees with the
current (Watson-Crick) model of DNA duplication; it recurs in Gamow's and
YcAs' models of information transmission from genetic to somatic material (9).
3. The Noise-and- Redundancy Theorem
Infonnation transfer from one body of information to another is not often
with clockwork regularity. As a rule, interferences occur which will more or
less affect the process of information interaction. Interference can be of many
kinds: the worst kind of interference is one the results of which are not pre-
dictable in detail. In this case, some information will be irretrievably lost.
However, in general some but not all order is lost. It is one of the most significant
results of information theory to have shown that order and disorder can be
measured by a common yardstick. Hence, it is possible to investigate the
quantitative relations between total information, noise, and remaining order-
liness. The second basic theorem of information theory states that the amount
of information effectively transmitted is exactly the amount of information
transmitted minus the amount of information lost because of noise. This implies
that a source can transmit a certain amount of information reliably in the
presence of noise provided it transmits more than the desired amount of
information. This surplus must be distributed over the whole activity because it
is never known which portions of the total activity will be interfered with by
noise; necessarily, the surplus takes the form of redundant information. Thus,
the second fundamental theorem states precisely the relation between amount of
information to be transmitted, amount of information which will be lost through
noise, and amount of redundant information needed to make up the loss. Like
the first fundamental theorem, it is a one-sided conservation principle; it limits
190 Henry Quastler
the amount of order which can prevail in an 'order-from-disorder' situation.
Again, the one-sided conservation principle will become more powerful if it can
be assumed to approximate a two-sided conservation. However, very stringent
conditions must be fulfilled if one expects to use the second theorem. There is
some reason to believe that these conditions are at least approximated in some
biological situations; this is stated in Dancoff's principle (10).
Dancoff' s principle deals with the economics of information. In 'noisy'
situations, information is lost and errors will occur unless they are checked
by redundant information. Now, errors may be costly, but so is redundant
information; accordingly, the optimum amount of redundant information
will be not that which makes all errors vanish, but that which minimizes the
sum of the cost of errors plus the cost of redundant information, plus the cost —
in information units — of error checking. Dancoff's principle asserts that any
organism or organization which has gone through competitive evolution has
approximated such an optimum; that is, it will commit as many errors as it
can get away with, and use the minimum of redundant information needed
to hold errors to this level. It follows from Dancoff's principle that the amount
of redundant information in a system is bound to be limited, even if it is a
system of enormous information content like a living thing. This is of great
interest particularly in radiobiology, because what radiation does very effectively
is to destroy information.
4. The Estimation of Information Measures and the Search for Invariants
It may well turn out that the qualitative and semi-qualitative applications
of information concepts are going to be the most important contribution of
information theory to biology. But, even successful qualitative applications
have very little power in excluding the possibility that other sets of concepts
could have been used just as successfully; besides, all scientists like to take
measures. Thus, the problem arises of estimating information measures
associated with biological structures and functions.
One fundamental diflficulty appears immediately: information measures
are relative and not absolute ; hence, any information measure associated with
a given set of biological objects will depend on the set itself and on the scientist
who does the estimating. To be sure, one can establish objective bounds.
Thus, if a certain genetic locus is known to be capable of having thirty-two
distinct allelic states, which are transmitted to the offspring with equal prob-
ability given the proper conditions, then the information stored in this locus
cannot be less than five bits. If it is also known that the region containing
the locus under consideration comprises no more than, say, 20,000 atoms,
then the total information stored cannot be more than about 60,000 bits (10).
These brackets are safe, but they are too wide to be of interest. They can be
very much reduced if one introduces specific assumptions. For instance, if
the locus is known to contain no more than, say, 2 X 50 nucleic acid residues,
and if one assumes that the genetic information is completely coded in the
sequence of the residues on one strand of a double helix, with the information
carried by each residue corresponding to unconstrained selection from four
possibilities, then the upper bound is reduced to 100 bits — but its validity is
less absolute.
The Domain of Information Theory in Biology 191
Because of the relative nature of information measures, it will always
be up to the ingenuity of the biologist to find ensembles which result in useful
measures. In many cases, even the estimation of a limit is of interest: as in
Ehret's demonstration that a few bits could be sufficient to specify the
nature of cytoplasmic structures (11), or the result easily derived from
D'Arcy Thompson's work (12) that apparently considerable differences in
fonn could be coded in, say, a few nucleic acid residues.
The relativism of information measures is a basic difficulty in estimation ;
besides, the biologist will encounter a number of technical difficulties arising
from the fact that 'message sets' and 'selection rules' are not perfectly known.
A number of approximation methods for such situations have been worked
out (13).
The relative nature of information measures and the technical difficulties
of their estimation, cast some doubts on the usefulness of actual information
measures in biology. Only experience will show whether these doubts are
justified or not. Measures will be valuable if they lead to the discovery of
invariants. In psychology, some invariants seem to be crystaUizing out of
a number of measurements: there seem to be invariant upper limits for the
channel capacity for single activities; for the range of classes distinguishable
in a single act, etc. (14). In biology, independent estimates of information
transfer associated with three elementary biological functions (allelic, anti-
genic, enzymatic specificity) have yielded closely similar values (15). Much
more material will be needed before we can draw definite conclusions.
The analysis which underlies the estimation of information measures
presents certain novel features. Consider, for instance, the informational
analysis of a honnonal control system. The traditional approach consists
in isolating one hormonal function and one hormone after the other. In
principle, this quest never ends — although physiologists might hope that some
day they will run out of undiscovered hormones. The information theorist
attacks the problem from the opposite end. He will argue that each hormone
molecule constitutes a message from a control organ to a target organ, a
message which is diffusely broadcast through the blood stream. In general
each message must contain two parts, an address and an order. Actually,
one or the other part can be omitted. We can imagine a hormonal control
system in wliich only the addresses are specified — the 'order' may be completely
determined in the target organ, and be executed automatically upon receipt of
the only kind of hormone molecule with the proper address; or, the address
may be unspecific, but the order such that only the right target organ can
execute it. One would expect that the natural systems be somewhere between
these two extremes. For the sake of simplicity we will consider a system in
which only addresses are specified — the foiTnal results have complete generality.
Thus, each hormone will be represented only by the address of the target
organ. In the interest of detailed and accurate control, it is desirable to have
a maximum number of different addresses. Any duplication of addresses
will lead to concomitant responses in other organs. On the other hand, the
'reading' of every single address involves distinguishing it from all other
addresses; the greater the variety of addresses, the greater the labor in every
single act of recognition. A compromise is indicated between the demand
192 Henry Quastler
for a great variety of addresses and the contradictory demand to keep each
address simple. For any kind of system, there will be an optimum number of
different hormones; the actual number will depend on the relative strength
of the two competing needs. By Dancoflf's principle, we expect that the actual
number will not be too far from the optimum number.
We can add another line of considerations on the number of possible addresses.
In order to fulfill its function, the hormone molecule has to enter into some
kind of relation with the target organ ; most likely, it has to form a complex.
Now, the total surface area of any molecule that can enter into a specific
complexing process is rather limited, and so is the number of molecular con-
figurations available to living organisms; hence, a limited space accommodates
only a limited number of significantly different configurations — and this limits
the number of different hormones possible (and, incidentally, the number of
distinct antigens and antibodies, enzymes and co-enzymes).
The example illustrates the concern with the whole system which is charac-
teristic of many applications of infomiation theory. It also illustrates a rather
profound difference between the information theorist and many of his scientific
colleagues. The information theorist will remain fairly cool at the news that
another enzyme, or hormone, or vitamin has been isolated ; his basic question
is: 'How many more are there to be discovered?'
II. LIMITATIONS
Information theory could not possibly apply to a wide variety of situations
if it were sensitive to every detail in every situation. Like thermodynamics
(to which information theory is related) it has a vast domain of application,
and like in thermodynamics, the vastness of