Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,13 +375,17 @@ Specially useful in pre-processing steps, these are the general data manipulatio
| IMDB-BIN | Classification | Graph-level classification. | [Source](https://dl.acm.org/doi/10.1145/2783258.2783417) |
| IMDB-MUL | Classification | Graph-level classification. | [Source](https://dl.acm.org/doi/10.1145/2783258.2783417) |
| REDDIT | Classification | Graph-level classification. | [Source](https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf) |
| GraphUniverse-IND | Classification | Synthetic Generator Inductive Node Classification. | [Source](https://openreview.net/forum?id=jRWxvQnqUt&noteId=jRWxvQnqUt) |
| GraphUniverse-TRA | Classification | Synthetic Generator Transductive Node Classification. | [Source](https://openreview.net/forum?id=jRWxvQnqUt&noteId=jRWxvQnqUt) |
| Amazon | Classification | Heterophilic dataset. | [Source](https://arxiv.org/pdf/1205.6233) |
| Minesweeper | Classification | Heterophilic dataset. | [Source](https://arxiv.org/pdf/2302.11640) |
| Empire | Classification | Heterophilic dataset. | [Source](https://arxiv.org/pdf/2302.11640) |
| Tolokers | Classification | Heterophilic dataset. | [Source](https://arxiv.org/pdf/2302.11640) |
| US-county-demos | Regression | In turn each node attribute is used as the target label. | [Source](https://arxiv.org/pdf/2002.08274) |
| ZINC | Regression | Graph-level regression. | [Source](https://pubs.acs.org/doi/10.1021/ci3001277) |

**Remark:** GraphUniverse is a synthetic graph generator for community-structured data, enabling control over graph properties like homophily, feature-signal and degree structure. Live Demo: [Demo](https://graphuniverse.streamlit.app/). Package release: [PyPi](https://pypi.org/project/graph-universe/0.1.2/). GitHub repository: [Repo](https://github.com/LouisVanLangendonck/GraphUniverse).


### Simplicial
| Dataset | Task | Description | Reference |
Expand Down
51 changes: 51 additions & 0 deletions configs/dataset/graph/graphuniverse_inductive.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@

loader:
_target_: topobench.data.loaders.GraphUniverseDatasetLoader
parameters:
data_domain: graph
data_type: GraphUniverse
data_name: GraphUniverse
data_dir: ${paths.data_dir}/${dataset.loader.parameters.data_domain}/${dataset.loader.parameters.data_type}
generation_parameters:
task: community_detection
universe_parameters:
K: 20
feature_dim: 15
center_variance: 0.2
cluster_variance: 0.4
edge_propensity_variance: 1.0
seed: 42
family_parameters:
n_graphs: 1000
n_nodes_range: [50, 200]
n_communities_range: [3, 7]
homophily_range: [0.4, 0.8]
avg_degree_range: [1.0, 2.0]
degree_separation_range: [0.5, 1.0]
power_law_exponent_range: [1.5, 2.5]
seed: ${dataset.loader.parameters.generation_parameters.universe_parameters.seed}


# Dataset parameters
parameters:
num_features: ${dataset.loader.parameters.generation_parameters.universe_parameters.feature_dim}
num_classes: ${dataset.loader.parameters.generation_parameters.universe_parameters.K}
task: classification
loss_type: cross_entropy
monitor_metric: accuracy
task_level: node

#splits
split_params:
learning_setting: inductive
data_split_dir: ${dataset.loader.parameters.data_dir}/data_splits
data_seed: 0
split_type: random #'k-fold' # either "k-fold" or "random" strategies
k: 10 # for "k-fold" Cross-Validation
train_prop: 0.7 # for "random" strategy splitting

# Dataloader parameters
dataloader_params:
batch_size: 16
num_workers: 0
pin_memory: False
51 changes: 51 additions & 0 deletions configs/dataset/graph/graphuniverse_transductive.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@

loader:
_target_: topobench.data.loaders.GraphUniverseDatasetLoader
parameters:
data_domain: graph
data_type: GraphUniverse
data_name: GraphUniverse
data_dir: ${paths.data_dir}/${dataset.loader.parameters.data_domain}/${dataset.loader.parameters.data_type}
generation_parameters:
task: community_detection
universe_parameters:
K: 10
feature_dim: 15
center_variance: 0.2
cluster_variance: 0.5
edge_propensity_variance: 0.5
seed: 42
family_parameters:
n_graphs: 1
n_nodes_range: [5000, 5000]
n_communities_range: [10, 10]
homophily_range: [0.5, 0.5]
avg_degree_range: [2.5, 2.5]
degree_separation_range: [0.5, 0.5]
power_law_exponent_range: [2.5, 2.5]
seed: ${dataset.loader.parameters.generation_parameters.universe_parameters.seed}


# Dataset parameters
parameters:
num_features: ${dataset.loader.parameters.generation_parameters.universe_parameters.feature_dim}
num_classes: ${dataset.loader.parameters.generation_parameters.universe_parameters.K}
task: classification
loss_type: cross_entropy
monitor_metric: accuracy
task_level: node

#splits
split_params:
learning_setting: transductive
data_split_dir: ${dataset.loader.parameters.data_dir}/data_splits
data_seed: 0
split_type: random #'k-fold' # either "k-fold" or "random" strategies
k: 10 # for "k-fold" Cross-Validation
train_prop: 0.7 # for "random" strategy splitting

# Dataloader parameters
dataloader_params:
batch_size: 1
num_workers: 0
pin_memory: False
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ dependencies=[
"rootutils",
"topomodelx @ git+https://github.com/pyt-team/TopoModelX.git",
"toponetx @ git+https://github.com/pyt-team/TopoNetX.git@c378925",
"graph-universe==0.1.2",
"lightning==2.4.0",
"torch-scatter",
"torch-sparse",
Expand Down
67 changes: 67 additions & 0 deletions topobench/data/loaders/graph/graph_universe_loader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
"""Loaders for GraphUniverse [1] datasets.

[1] "GraphUniverse: Enabling Systematic Evaluation of Inductive Generalization" by Louis Van Langendonck and Guillermo Bernardez and Nina Miolane and Pere Barlet-Ros
Accepted at The Fourteenth International Conference on Learning Representations, 2026},
https://openreview.net/forum?id=jRWxvQnqUt
"""

from graph_universe import GraphUniverseDataset
from omegaconf import DictConfig
from torch_geometric.data import Data, Dataset

from topobench.data.loaders.base import AbstractLoader


class GraphUniverseDatasetLoader(AbstractLoader):
"""Load Graph Universe datasets.

Parameters
----------
parameters : DictConfig
Configuration parameters containing:
- data_dir: Root directory for data
- data_name: Name of the dataset
- data_type: Type of the dataset (e.g., "graph_classification")
"""

def __init__(self, parameters: DictConfig) -> None:
super().__init__(parameters)

def load_dataset(self) -> Dataset:
"""Load Graph Universe dataset.

Returns
-------
Dataset
The loaded Graph Universe dataset.

Raises
------
RuntimeError
If dataset loading fails.
"""

dataset = GraphUniverseDataset(
root=str(self.root_data_dir),
parameters=self.parameters["generation_parameters"]
)

return dataset

def load(self, **kwargs) -> tuple[Data, str]:
"""Load data.

Parameters
----------
**kwargs : dict
Additional keyword arguments.

Returns
-------
tuple[torch_geometric.data.Data, str]
Tuple containing the loaded data and the data directory.
"""
dataset = self.load_dataset(**kwargs)
data_dir = dataset.raw_dir

return dataset, data_dir
Loading