Category: A1; Team name: LangDiff; Dataset: Twitch #215
+294
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Checklist
Description
Pull request for Twitch Dataset [1] implementation.
The Twitch Dataset consists of multiple social network graphs for streamers speaking different languages on the Streaming Platform Twitch. Each node is a Streamer and Edges correspond to followership between them. Feature embeddings represent the games played. The classification task is whether or not a user is streaming mature content based on the games played.
[1] Benedek Rozemberczki, Carl Allen, & Rik Sarkar. (2021). Multi-scale Attributed Node Embedding.
Relevant PRs from PyTorch Geometric
The Dataset is present in PyTorch Geometric, but currently broken pyg-team/pytorch_geometric#10510 hence implemented fully here.
There also is a relevant PR pyg-team/pytorch_geometric#10415 which I think does not fully fix the issue.
Additional context
Submission by Jonas Müller of Team LangDiff