Skip to content

Commit 838a190

Browse files
authored
Merge pull request #77 from jajupmochi/gedlib
GEDModel global GEDEnv support
2 parents d1b264b + ccbb921 commit 838a190

19 files changed

+8314
-1012
lines changed

README.md

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,13 +94,91 @@ A demo of computing graph kernels can be found on [Google Colab](https://colab.r
9494

9595
### 2 Graph Edit Distances
9696

97+
We currently support a GEDModel class compatible with the `scikit-learn` transformer interface,
98+
which can be used to compute the graph edit distance between attributed graphs.
99+
The `GEDModel` class is based on the extended [`GEDLIB`](https://github.com/dbblumenthal/gedlib) library. See Section
100+
[GEDLIB](#4-interface-to-gedlib) for more details.
101+
102+
#### The following GED methods are supported:
103+
104+
- BRANCH
105+
- BRANCH_FAST
106+
- BRANCH_TIGHT
107+
- BRANCH_UNIFORM
108+
- BRANCH_COMPACT
109+
- PARTITION
110+
- HYBRID
111+
- RING
112+
- ANCHOR_AWARE_GED
113+
- WALKS
114+
- IPFP
115+
- BIPARTITE
116+
- SUBGRAPH
117+
- NODE
118+
- RING_ML
119+
- BIPARTITE_ML
120+
- REFINE
121+
- BP_BEAM
122+
- SIMULATED_ANNEALING
123+
- HED
124+
- STAR
125+
126+
with `GUROBI`:
127+
128+
- F1
129+
- F2
130+
- COMPACT_MIP
131+
- BLP_NO_EDGE_LABELS
132+
133+
#### The following GED cost functions are supported:
134+
135+
- CHEM_1
136+
- CHEM_2
137+
- CMU
138+
- GREC_1
139+
- GREC_2
140+
- PROTEIN
141+
- FINGERPRINT
142+
- LETTER
143+
- LETTER2
144+
- Similar to `LETTER`, but uses 6 cost constants instead of 3. See details [here](https://github.com/jajupmochi/gedlib/blob/master/src/edit_costs/letter_2.hpp).
145+
- NON_SYMBOLIC
146+
- Edit costs for graphs containing only non-symbolic (numeric) node and edge
147+
labels. These labels are used to compute relabeling (substitution) costs, using
148+
e.g., the Euclidean distance. See details [here](https://github.com/jajupmochi/gedlib/blob/master/src/edit_costs/non_symbolic.hpp#L35).
149+
- GEOMETRIC
150+
- Edit costs for graphs containing mixed node and edge attributes (e.g., string (symbolic) and numeric (non-symbolic)).
151+
Users can choose the (dis-)similarity measure for each label type, e.g.,
152+
`cosine_distance` for numeric vectors. See details [here](https://github.com/jajupmochi/gedlib/blob/master/src/edit_costs/geometric.hpp#L42).
153+
- CONSTANT
154+
155+
Detailed documentation can be found [here](https://dbblumenthal.github.io/gedlib/index.html).
156+
97157
### 3 Graph preimage methods
98158

99159
A demo of generating graph preimages can be found on [Google Colab](https://colab.research.google.com/drive/1PIDvHOcmiLEQ5Np3bgBDdu0kLOquOMQK?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/median_preimege_generator.py) folder.
100160

101161
### 4 Interface to `GEDLIB`
102162

103-
[`GEDLIB`](https://github.com/dbblumenthal/gedlib) is an easily extensible C++ library for (suboptimally) computing the graph edit distance between attributed graphs. [A Python interface](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/gedlib) for `GEDLIB` is integrated in this library, based on [`gedlibpy`](https://github.com/Ryurin/gedlibpy) library.
163+
[`GEDLIB`](https://github.com/dbblumenthal/gedlib) is an easily extensible C++ library for (suboptimally) computing the
164+
graph edit distance between attributed graphs. [A Python interface](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/gedlib) for `GEDLIB` is
165+
integrated in this library, based on [`gedlibpy`](https://github.com/Ryurin/gedlibpy) library. We also extended the
166+
library, adding the following features:
167+
168+
- Support attributed graphs with the following node and edge label types:
169+
- strings, integers, floats, lists / `numpy` arrays of floats and integers. Arbitrary
170+
numbers of features can be added.
171+
172+
- Support fast vectorized computation between labels using `Eigen` (e.g., cosine or
173+
Euclidean distances).
174+
- To benefit from this, we recommend merging numeric labels into
175+
a single label with a `numpy` array.
176+
177+
- Support the following GED cost functions:
178+
- `LETTER2`, `NON_SYMBOLIC`, `GEOMETRIC`.
179+
- See Section [GED](#3-graph-edit-distances) for more details.
180+
181+
- Use modern C++ 17 features, such as `std::optional`, `std::variant`, `std::any`.
104182

105183
### 5 Computation optimization methods
106184

gklearn/experiments/ged/check_results_of_ged_env.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def compute_geds_by_GEDEnv(dataset):
7979

8080

8181
def compute_geds_by_GEDLIB(dataset):
82-
from gklearn.gedlib import librariesImport, gedlibpy
82+
from gklearn.gedlib import libraries_import, gedlibpy
8383
from gklearn.ged.util import ged_options_to_string
8484
import numpy as np
8585

gklearn/experiments/ged/ged_model/compare_gedlib_with_coords_in_string_and_attr_format.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -401,9 +401,9 @@ def compare_gedlib_with_coords_in_string_and_attr_format(
401401
seed = 42
402402
n_graphs = 500
403403
n_emb_dim = 100
404-
parellel = True
404+
parallel = True
405405
compare_gedlib_with_coords_in_string_and_attr_format(
406-
seed=seed, n_graphs=n_graphs, n_emb_dim=n_emb_dim, parallel=parellel
406+
seed=seed, n_graphs=n_graphs, n_emb_dim=n_emb_dim, parallel=parallel
407407
)
408408

409409
# # Comparison of the two versions:

0 commit comments

Comments
 (0)