sentence-transformers/quora-duplicates
Viewer • Updated • 3.45M • 818 • 7
How to use cross-encoder/quora-roberta-large with sentence-transformers:
from sentence_transformers import CrossEncoder
model = CrossEncoder("cross-encoder/quora-roberta-large")
query = "Which planet is known as the Red Planet?"
passages = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)How to use cross-encoder/quora-roberta-large with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/quora-roberta-large")
model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/quora-roberta-large")This model was trained using SentenceTransformers Cross-Encoder class.
This model was trained on the Quora Duplicate Questions dataset. The model will predict a score between 0 and 1 how likely the two given questions are duplicates.
Note: The model is not suitable to estimate the similarity of questions, e.g. the two questions "How to learn Java" and "How to learn Python" will result in a rather low score, as these are not duplicates.
Pre-trained models can be used like this:
from sentence_transformers import CrossEncoder
model = CrossEncoder('cross-encoder/quora-roberta-large')
scores = model.predict([('Question 1', 'Question 2'), ('Question 3', 'Question 4')])
You can use this model also without sentence_transformers and by just using Transformers AutoModel class