Anonymization techniques for Semantic Networks of Data
The information in social networks becomes an important data source,
and sometimes it is necessary or beneficial to release such data to
the public. Many real-world social networks contain sensitive
information and serious privacy concerns on graph data have been
raised. The famous result of Narayan and Shmatikov 
has shown that naive anonymization does not work: it is in practise
very easy to re-identify elements of a trivially anonymized (ie
replacing identifying informations such as names, social security
numbers etc. with random numbers) social networks. Later works
 pushed further the study of attacks on
anonymized social networks.
The goal of social network anonymization is to produce a graph in
such a way that some statistical functions produce the same result on
the original graph and on the transformed graph, while other functions
(namely reidentification) should not produce the same result. There
are two main ways to work on the anonymization:
- Clustering: one tries to group together edges and nodes so that
when the the cluster regroups $k$ nodes then there is no way
to distinguish an individual node among them.
- k-anonymity: one tries to modify the original graph in such a
way that there should be at least k-1 other candidate nodes
with similar features (the features are part of the assumption
made on the capability of the attacker).
The aim of the internship is to study a new approach to social data
anonymization. An important focus will be made on semantic networks of
data in which nowadays, data are often organized as graphs with an underlying semantic
to allow efficient querying and support inference engines. Such is the case in, for example, linked data
and semantic web typically relying on RDF. An individual or company may have legitimate motivations to
query a database it does not own, but preventing illegitimate access to private information while allowing
legitimate database usage is a difficult task. An example as resurfaced recently with the debates about
facebook’s privacy leaks and Cambridge Analytica.
In general, the objective is to limit the knowledge a tenant may gather from the database by limiting the
queries it may execute. This is particularly difficult when the database possesses an underlying semantic,
as attributes may have strong correlations and since inference rules can generally be exploited to infer new
- A first starting point is to use a graph rewriting approach
based on category theory . Using this approach it is very easy to
make clones of a graph while being able to ``tune'' the properties of
the clones. The idea is to select 1/k nodes, to make k-clones of this
selection and to link them together in an appropriate way so that the
global graph obtained shares good properties with relation to the
A second option is to explore topological approaches based on
algebraic topology, see .
This intership will be held at the LIG laboratory in Grenoble France
in the team CAPP, coached by Frederic Prost.
Requisite skills and abilities
- C++ programming
- Graph theory.
- Category Theory
- Ability to use/modify large programs.
 Andrea Corradini, Dominique Duval, Rachid Echahed, Fr\'ed\'eric Prost,
Leila Ribeiro: AGREE - Algebraic Graph Rewriting with Controlled Embedding. ICGT 2015: 35-51.
 Lars Backstrom, Cynthia Dwork, Jon M. Kleinberg: Wherefore art thou R3579X?: anonymized social networks, hidden patterns, and structural steganography. Commun. ACM 54(12): 133-141 (2011).
 Arvind Narayanan, Vitaly Shmatikov: De-anonymizing Social Networks. IEEE Symposium on Security and Privacy 2009: 173-187.
 Alberto Speranzon, Shaunak D. Bopardikar: An Algebraic Topological Per-
spective to Privacy: Numerical and Categorical Data, American Control Con-