Using vectors instead of characters
pseudo-rnd-thoughts opened this issue · 2 comments
pseudo-rnd-thoughts commented
I would like to use this outside of bioinformatics where for each character, it is a vector (np.ndarray) and distance function for computing the "distance" between vectors.
All your examples using strings, I was interested if this is possible with pyalign?
poke1024 commented
Yes, this is possible. Using pyalign.problems.general
you can pass in any distance or similarity function. Here is an example code snippet that computes an alignment between words, where each word is represented through an embedding vector and word similarity is computed through cosine similarity between those vectors:
import pyalign
# compute some word embeddings
import spacy
nlp = spacy.load("en_core_web_md")
import numpy as np
a = np.array([x.vector for x in nlp("old books and newer manuscripts")])
b = np.array([x.vector for x in nlp("recent writings")])
# solve alignment
from numpy.linalg import norm
def cosine_sim(a, b):
return np.dot(a, b) / (norm(a) * norm(b))
pf = pyalign.problems.general(
cosine_sim,
direction="maximize")
solver = pyalign.solve.GlobalSolver(
gap_cost=pyalign.gaps.LinearGapCost(0.2),
codomain=pyalign.solve.Solution)
problem = pf.new_problem(a, b)
solver.solve(problem)
If you pass in a distance function (instead of an affinity as above), you would use:
pf = pyalign.problems.general(
some_distance_func,
direction="minimize")
pseudo-rnd-thoughts commented
Amazing, thanks