t-SNE Embedding Visualization

This tutorial demonstrates how to visualize embeddings learned by hopwise models using t-SNE (t-Distributed Stochastic Neighbor Embedding). This is particularly useful for understanding how your model represents users, items, entities, and relations in the embedding space.

Prerequisites

Install the required dependencies:

uv pip install hopwise[tsne]

This installs openTSNE and plotly for visualization.

Loading a Trained Model

First, load a trained model checkpoint:

import os
import torch
import numpy as np
from openTSNE import TSNE
import plotly.express as px

from hopwise.utils import init_seed

# Load checkpoint
checkpoint_name = "TransE-Jan-23-2025_16-48-43.pth"
checkpoint = torch.load(os.path.join("saved", checkpoint_name), weights_only=False)
config = checkpoint["config"]
init_seed(config["seed"], config["reproducibility"])

# Extract embeddings to CPU numpy arrays
for weight in checkpoint["state_dict"].keys():
    checkpoint["state_dict"][weight] = checkpoint["state_dict"][weight].to(torch.device("cpu")).numpy()

Configuring t-SNE

Configure the t-SNE algorithm:

tsne = TSNE(
    perplexity=30,
    n_jobs=8,
    initialization="random",
    metric="cosine",
    random_state=config["seed"],
    verbose=True,
)

Note

See the openTSNE documentation for advanced configuration options.

Visualizing User Embeddings

user_weights = checkpoint["state_dict"]["user_embedding.weight"]
tsne_embeddings_users = tsne.fit(user_weights)

fig = px.scatter(
    x=tsne_embeddings_users[:, 0],
    y=tsne_embeddings_users[:, 1],
    color=list(range(len(tsne_embeddings_users))),
    labels={"x": "Dimension 1", "y": "Dimension 2", "color": "User ID"},
    title=f"{config['model']} User Embeddings",
    width=1024,
    height=1024,
    template="plotly_white",
)
fig.show()

Visualizing Entity Embeddings

For knowledge-aware models, visualize entity embeddings:

entity_weights = checkpoint["state_dict"]["entity_embedding.weight"]
tsne_embeddings_entities = tsne.fit(entity_weights)

# Plot entities
fig = px.scatter(
    x=tsne_embeddings_entities[:, 0],
    y=tsne_embeddings_entities[:, 1],
    color=list(range(len(tsne_embeddings_entities))),
    labels={"x": "Dimension 1", "y": "Dimension 2", "color": "Entity ID"},
    title=f"{config['model']} Entity Embeddings",
    width=1024,
    height=1024,
    template="plotly_white",
)
fig.show()

Combining Multiple Embedding Types

Visualize different embedding types together:

import pandas as pd

def combine_embeddings(**kwargs):
    embeddings_list = []
    identifiers_list = []

    for name, embs in kwargs.items():
        embeddings_list.append(embs)
        identifiers_list.extend([f"{name} {i}" for i in range(embs.shape[0])])

    embeddings = np.concatenate(embeddings_list, axis=0)

    df = pd.DataFrame({
        "x": embeddings[:, 0],
        "y": embeddings[:, 1],
        "type": [id.split(" ")[0] for id in identifiers_list],
        "identifier": identifiers_list,
    })

    fig = px.scatter(
        df, x="x", y="y", color="type",
        hover_data=["identifier"],
        title="Combined Embeddings Visualization",
        width=1024, height=1024,
        template="plotly_white",
    )
    fig.show()

combine_embeddings(
    user=tsne_embeddings_users,
    entity=tsne_embeddings_entities,
    relation=tsne_embeddings_relations
)

Full Example

A complete Jupyter notebook example is available at: run_example/tSNE embedding visualisation.ipynb