Mengyao00 commited on
Commit
f51c1a8
·
verified ·
1 Parent(s): 7041118

Update README.md (#4)

Browse files

- Update README.md (af1bb4a1dd12c8120d01b8687288984047cbe120)

Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -17,7 +17,7 @@ library_name: transformers
17
 
18
  ### **Description**
19
 
20
- The Llama 3.2 NeMo Retriever Embedding 1B model is optimized for **multilingual and cross-lingual** text question-answering retrieval with **support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings)**. This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
21
 
22
  In addition to enabling multilingual and cross-lingual question-answering retrieval, this model reduces the data storage footprint by 35x through dynamic embedding sizing and support for longer token length, making it feasible to handle large-scale datasets efficiently.
23
 
@@ -25,14 +25,14 @@ An embedding model is a crucial component of a text retrieval system, as it tran
25
 
26
  This model is ready for commercial use.
27
 
28
- The Llama 3.2 NeMo Retriever Embedding 1B model is a part of the NVIDIA NeMo Retriever collection of NIM, which provide state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
29
 
30
- We are excited to announce the open sourcing of this commercial embedding model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at [llama-3.2-nv-embedqa-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2).
31
 
32
 
33
  ### **Intended use**
34
 
35
- The Llama 3.2 NeMo Retriever Embedding 1B model is most suitable for users who want to build a multilingual question-and-answer application over a large text corpus, leveraging the latest dense retrieval technologies.
36
 
37
  ### **License/Terms of use**
38
 
@@ -86,8 +86,8 @@ def average_pool(last_hidden_states, attention_mask):
86
  return embedding
87
 
88
 
89
- tokenizer = AutoTokenizer.from_pretrained("nvidia/llama-3.2-nv-embedqa-1b-v2")
90
- model = AutoModel.from_pretrained("nvidia/llama-3.2-nv-embedqa-1b-v2", trust_remote_code=True)
91
  model = model.to("cuda:0")
92
  model.eval()
93
  query_prefix = "query:"
@@ -145,8 +145,8 @@ print(scores.tolist())
145
 
146
  ### **Model Version(s)**
147
 
148
- Llama 3.2 NeMo Retriever Embedding 1B v2
149
- Short Name: llama-3.2-nv-embedqa-1b-v2
150
 
151
  ## **Training Dataset & Evaluation**
152
 
@@ -169,8 +169,8 @@ Properties: We evaluated the NeMo Rtriever embdding model in comparison to liter
169
 
170
  | Open & Commercial Retrieval Models | Average Recall@5 on NQ, HotpotQA, FiQA, TechQA dataset |
171
  | ----- | ----- |
172
- | llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048) | 68.60% |
173
- | llama-3.2-nv-embedqa-1b-v2 (embedding dim 384) | 64.48% |
174
  | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 68.97% |
175
  | nv-embedqa-mistral-7b-v2 | 72.97% |
176
  | nv-embedqa-mistral-7B-v1 | 64.93% |
@@ -183,8 +183,8 @@ We evaluated the multilingual capabilities on the academic benchmark [MIRACL](ht
183
 
184
  | Open & Commercial Retrieval Models | Average Recall@5 on multilingual |
185
  | ----- | ----- |
186
- | llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048) | 60.75% |
187
- | llama-3.2-nv-embedqa-1b-v2 (embedding dim 384) | 58.62% |
188
  | llama-3.2-nv-embedqa-1b-v1 | 60.07% |
189
  | nv-embedqa-mistral-7b-v2 | 50.42% |
190
  | BM25 | 26.51% |
@@ -193,8 +193,8 @@ We evaluated the cross-lingual capabilities on the academic benchmark [MLQA](htt
193
 
194
  | Open & Commercial Retrieval Models | Average Recall@5 on MLQA dataset with different languages |
195
  | ----- | ----- |
196
- | llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048) | 79.86% |
197
- | llama-3.2-nv-embedqa-1b-v2 (embedding dim 384) | 71.61% |
198
  | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 78.77% |
199
  | nv-embedqa-mistral-7b-v2 | 68.38% |
200
  | BM25 | 13.01% |
@@ -203,8 +203,8 @@ We evaluated the support of long documents on the academic benchmark [Multilingu
203
 
204
  | Open & Commercial Retrieval Models | Average Recall@5 on MLDR |
205
  | ----- | ----- |
206
- | llama-3.2-nv-embedqa-1b-v2 (embedding dim 2048) | 59.55% |
207
- | llama-3.2-nv-embedqa-1b-v2 (embedding dim 384) | 54.77% |
208
  | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 60.49% |
209
  | nv-embedqa-mistral-7b-v2 | 43.24% |
210
  | BM25 | 71.39% |
 
17
 
18
  ### **Description**
19
 
20
+ The Llama Nemotron Retriever Embedding 1B model is optimized for **multilingual and cross-lingual** text question-answering retrieval with **support for long documents (up to 8192 tokens) and dynamic embedding size (Matryoshka Embeddings)**. This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
21
 
22
  In addition to enabling multilingual and cross-lingual question-answering retrieval, this model reduces the data storage footprint by 35x through dynamic embedding sizing and support for longer token length, making it feasible to handle large-scale datasets efficiently.
23
 
 
25
 
26
  This model is ready for commercial use.
27
 
28
+ The Llama Nemotron Retriever Embedding 1B model is a part of the NVIDIA NeMo Retriever collection of NIM, which provide state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for domain-specific use cases, such as information technology, human resource help assistants, and research & development research assistants.
29
 
30
+ We are excited to announce the open sourcing of this commercial embedding model. For users interested in deploying this model in production environments, it is also available via the model API in NVIDIA Inference Microservices (NIM) at [llama-nemotron-embed-1b-v2](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v2).
31
 
32
 
33
  ### **Intended use**
34
 
35
+ The Llama Nemotron Retriever Embedding 1B model is most suitable for users who want to build a multilingual question-and-answer application over a large text corpus, leveraging the latest dense retrieval technologies.
36
 
37
  ### **License/Terms of use**
38
 
 
86
  return embedding
87
 
88
 
89
+ tokenizer = AutoTokenizer.from_pretrained("nvidia/llama-nemotron-embed-1b-v2")
90
+ model = AutoModel.from_pretrained("nvidia/llama-nemotron-embed-1b-v2", trust_remote_code=True)
91
  model = model.to("cuda:0")
92
  model.eval()
93
  query_prefix = "query:"
 
145
 
146
  ### **Model Version(s)**
147
 
148
+ Llama Nemotron Retriever Embedding 1B v2
149
+ Short Name: llama-nemotron-embed-1b-v2
150
 
151
  ## **Training Dataset & Evaluation**
152
 
 
169
 
170
  | Open & Commercial Retrieval Models | Average Recall@5 on NQ, HotpotQA, FiQA, TechQA dataset |
171
  | ----- | ----- |
172
+ | llama-nemotron-embed-1b-v2 (embedding dim 2048) | 68.60% |
173
+ | llama-nemotron-embed-1b-v2 (embedding dim 384) | 64.48% |
174
  | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 68.97% |
175
  | nv-embedqa-mistral-7b-v2 | 72.97% |
176
  | nv-embedqa-mistral-7B-v1 | 64.93% |
 
183
 
184
  | Open & Commercial Retrieval Models | Average Recall@5 on multilingual |
185
  | ----- | ----- |
186
+ | llama-nemotron-embed-1b-v2 (embedding dim 2048) | 60.75% |
187
+ | llama-nemotron-embed-1b-v2 (embedding dim 384) | 58.62% |
188
  | llama-3.2-nv-embedqa-1b-v1 | 60.07% |
189
  | nv-embedqa-mistral-7b-v2 | 50.42% |
190
  | BM25 | 26.51% |
 
193
 
194
  | Open & Commercial Retrieval Models | Average Recall@5 on MLQA dataset with different languages |
195
  | ----- | ----- |
196
+ | llama-nemotron-embed-1b-v2 (embedding dim 2048) | 79.86% |
197
+ | llama-nemotron-embed-1b-v2 (embedding dim 384) | 71.61% |
198
  | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 78.77% |
199
  | nv-embedqa-mistral-7b-v2 | 68.38% |
200
  | BM25 | 13.01% |
 
203
 
204
  | Open & Commercial Retrieval Models | Average Recall@5 on MLDR |
205
  | ----- | ----- |
206
+ | llama-nemotron-embed-1b-v2 (embedding dim 2048) | 59.55% |
207
+ | llama-nemotron-embed-1b-v2 (embedding dim 384) | 54.77% |
208
  | llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048) | 60.49% |
209
  | nv-embedqa-mistral-7b-v2 | 43.24% |
210
  | BM25 | 71.39% |