Research‎ > ‎

Graph Mining

Background

Graphs are well suited to model complex structures present in the real world. Gene regulatory networks, for example, are directed graphs in which nodes represent genes and edges represent regulatory influences. The World Wide Web is well modeled by a directed graph in which nodes are pages and edges are hyperlinks. Email communications; social graphs in which an individual can follow the activities of other people; citation networks in which an article cites other articles are also ideally represented by directed graphs.

Because of these numerous applications, graphs are extensively studied in graph theory, and, more recently, in the context of data mining. Most researches focus either on unlabeled graphs or on graphs whose nodes are associated with a unique label. In a social graph, for example, labels can be the ID of individuals; in a gene regulatory network, they can represent genes’ names; in a Web graph, nodes are often labeled with pages’ URL; etc.

However, in many applications, objects (represented by nodes) can be associated with many features. Individuals in social graphs have many characteristics; as genes in regulatory networks. In citation networks, articles are associated with many data like keywords, authors, publication date, patents, etc. In these data, each characteristic associated to an objet is an attribute of the corresponding node. Graphs in which nodes are annotated with sets of attributes (or itemsets) are named attributed graphs and up to now, only few studies are devoted to their analysis.

One of the classical task when analyzing graph data is the discovery of frequent subgraphs. The interest of such patterns is to highlight how node labels are often organized. Mining attributed graphs allows the identification of structural patterns, but also highlights the relationship between nodes’ attributes.

Two main reasons make the mining of attributed graphs very difficult. First, it is necessary to combine the exploration of the graph structure with the identification of frequent itemsets associated with nodes. The second reason is that, as for labeled graphs, the performance of the mining process is highly affected by the cost of subgraph isomorphism tests.

However, up to now, few attention is given to this problem. Indeed, in labeled graphs, nodes are associated with unique labels, which are, in general, manifold. Therefore, it is not very frequent to have the same label associated with many nodes in a subgraph. This is different in attributed graphs because entities represented by nodes are characterized by multiple attributes, and some of them are very common. Thus, it greatly increases the number of possible automorphisms. In a social graph, for example, a given individual is often connected with a group of other individuals sharing several characteristics (age group, hobbies, social class, musical taste, etc.).

Research rationale

Our research deals with the extraction of frequent patterns from one or more directed attributed graphs. We worked on a method for attributed graphs enumeration that is based on two operations: itemset extension and tree extension [1, 2]. We focused our work on the particular case of automorphic patterns and show how to significantly limit the combinatorial explosion caused by subgraph isomorphisms [3]. In parallel, we proposed an original definition of what could be an (exact) condensed representation in a single-graph setting [4, 5]. In addition, we proposed to exploit expert knowledge to derive constraints that can be used during the data mining phase to improve both pattern relevancy and computational efficiency [6].

References

1. Pasquier C, Sanhes J, Flouvat F, Selmaoui-Folcher N: Frequent Pattern Mining in Attributed trees. In proceedings of the 17th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’13) J Pei et al (Eds): PAKDD 2013, Part I, LNAI 7818. Gold Coast Australia: Springer-Verlag; 2013:26–37.

2. Pasquier C, Sanhes J, Flouvat F, Selmaoui-Folcher N: Extraction de motifs fréquents dans des arbres attribués. In actes de la 13e Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC’13), Revue des Nouvelles Technologies de l’Information, volume E-24. Toulouse; 2013:193–204.

3. Pasquier C, Flouvat F, Sanhes J, Selmaoui-folcher N: Extraction de motifs dans des graphes orientés attribués en présence d’automorphisme. In actes de la 14e Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC’14), Revue des Nouvelles Technologies de l’Information, volume E-26; 2014:371–382.

4. Sanhes J, Flouvat F, Pasquier C, Selmaoui-Folcher N, Boulicaut J-F: Extraction de motifs condensés dans un seul graphe orienté acyclique attribué. In actes de la 13e Conférence Francophone sur l’Extraction et la Gestion des Connaissances (EGC’13), 29 janvier - 01 février 2013, Toulouse, France; 2013.

5. Sanhes J, Flouvat F, Pasquier C, Selmaoui-Folcher N, Boulicaut J-F: Weighted Path as a Condensed Pattern in a Single Attributed DAG. In proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13), Beijing, China, August 3-9, 2013. Beijing, China; 2013.

6. Flouvat F, Sanhes J, Pasquier C, Selmaoui-Folcher N, Boulicaut J-F: Les modèles des experts au service de l’extraction de motifs pertinents. In Reconnaissance de Formes et Intelligence Artificielle (RFIA’14); 2014.

Comments