Mini meaning example
In real NLP systems, vectors often come from embeddings. For
learning, we can invent small vectors representing features such as
animal, pet, wild, and vehicle.
import math
def cosine_similarity(a, b):
dot_product = sum(x * y for x, y in zip(a, b))
norm_a = math.sqrt(sum(x * x for x in a))
norm_b = math.sqrt(sum(y * y for y in b))
return dot_product / (norm_a * norm_b)
vectors = {
"cat": [0.9, 0.9, 0.1, 0.0],
"dog": [0.9, 0.8, 0.2, 0.0],
"tiger": [0.9, 0.1, 0.95, 0.0],
"car": [0.0, 0.0, 0.0, 1.0],
}
query = vectors["cat"]
for word, vector in vectors.items():
score = cosine_similarity(query, vector)
print(word, round(score, 4))
Expected idea: cat should be closest to dog, less
close to tiger, and far from car.
Rank sentences by similarity
import math
def cosine_similarity(a, b):
dot_product = sum(x * y for x, y in zip(a, b))
norm_a = math.sqrt(sum(x * x for x in a))
norm_b = math.sqrt(sum(y * y for y in b))
return dot_product / (norm_a * norm_b)
sentence_vectors = {
"I love programming": [0.9, 0.8, 0.1, 0.0],
"Coding is enjoyable": [0.88, 0.79, 0.12, 0.02],
"The sky is blue": [0.05, 0.02, 0.1, 0.95],
"Python is fun": [0.86, 0.82, 0.15, 0.03],
}
query = [0.9, 0.8, 0.1, 0.0]
ranked = []
for sentence, vector in sentence_vectors.items():
score = cosine_similarity(query, vector)
ranked.append((score, sentence))
for score, sentence in sorted(ranked, reverse=True):
print(f"{score:.4f} {sentence}")
This ranking pattern is used everywhere: search, recommendation,
semantic matching, question-answer retrieval, and clustering.