Section: New Results
Foundations of information hiding
Participants : Romain Beauxis, Christelle Braun, Konstantinos Chatzikokolakis, Mario Sergio Ferreira Alvim Junior, Catuscia Palamidessi.
Information hiding refers to the problem of protecting private information while performing certain tasks or interactions, and trying to avoid that an adversary can infer such information. Particular cases of this property are anonymity and privacy.
The systems for information hiding often use random mechanisms to obfuscate the link between the observables and the information to be protected. The random mechanisms can be described probabilistically, while the value of the secret may be totally unpredictable, irregular, and hence expressible only nondeterministically. Nondeterminism can also be present due to the interaction of the various component of the system.
Information-hiding in presence of probability and nondeterminism
Formal definitions of the concept of anonymity and information flow have been investigated
in the past either in a totally nondeterministic framework, or in a
purely probabilistic one. In [15] , we have investigated a notion of
anonymity which combines both probability and nondeterminism, and
which is suitable for describing the most general situation in which
the protocol and the users can have both probabilistic and
nondeterministic behavior. We have also investigated the properties of the
definition for the particular cases of purely nondeterministic users
and purely probabilistic users.
We have formulated the notions of anonymity in terms of probabilistic automata,
and we have described protocols and users as processes in
the probabilistic -calculus, whose semantics is
again based on probabilistic automata.
The problem of the scheduler
It has been observed recently that in security the combination of nondeterminism and probability can be harmful, in the sense that the resolution of the nondeterminism can reveal the outcome of the probabilistic choices even though they are supposed to be secret [41] . This is known as the problem of the information-leaking scheduler . In [17] we have developed a linguistic (process-calculus) approach to this problem, and we have shown how to apply it to control the behavior of the scheduler in various anonymity examples.
Information theory and Bayes risk
Recent research in quantitative theories for information-hiding tend to converge towards the idea of modeling the system as a noisy channel in the information-theoretic sense. The notion of information leakage, or vulnerability of the system, has been related in some approaches to the concept of mutual information of the channel. A recent work of Smith [51] has shown, however, that if the attack consists in one single try, then the mutual information and other concepts based on Shannon entropy are not suitable, and he has proposed to use Rényi's min-entropy instead. In [25] we have considered and compared two different possibilities of defining the leakage, based on the Bayes risk, a concept related to Rényi min-entropy.
In [27] we have analyzed the Crowds anonymity protocol under the novel assumption that the attacker has independent knowledge on behavioural patterns of individual users. Under such conditions we have studied, reformulated and extend Reiter and Rubin's notion of probable innocence, and we have provided a new formalisation for it based on the concept of protocol vulnerability. Accordingly, we have established new formal relationships between protocol parameters and attackers' knowledge expressing necessary and sufficient conditions to ensure probable innocence.
Bounds on the leakage of the input distribution
In information hiding, an adversary that tries to infer the secret information has a higher probability of success if it knows the distribution on the secrets. In [24] we have shown that if the system leaks probabilistically some information about the secrets, (that is, if there is a probabilistic correlation between the secrets and some observables) then the adversary can approximate such distribution by repeating the observations. More precisely, it can approximate the distribution on the observables by computing their frequencies, and then derive the distribution on the secrets by using the correlation in the inverse direction. We have illustrate this method, and then we have studied the bounds on the approximation error associated with it, for various natural notions of error. As a case study, we have applied our results to Crowds, a protocol for anonymous communication.