zstanjj commited on
Commit
5c4ff72
·
verified ·
1 Parent(s): e5d0cb5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -12,7 +12,7 @@ license: apache-2.0
12
  We release the HTML pruner model used in **HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems**.
13
 
14
  <p align="left">
15
- Useful links: 📝 <a href="https://arxiv.org/abs/2411.02959" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/plageon/SlimPLM" target="_blank">Github</a>
16
  </p>
17
 
18
  We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose **Lossless HTML Cleaning** and **Two-Step Block-Tree-Based HTML Pruning**.
@@ -22,7 +22,7 @@ We propose HtmlRAG, which uses HTML instead of plain text as the format of exter
22
  - **Two-Step Block-Tree-Based HTML Pruning**: The block-tree-based HTML pruning consists of two steps, both of which are conducted on the block tree structure. The first pruning step uses a embedding model to calculate scores for blocks, while the second step uses a path generative model. The first step processes the result of lossless HTML cleaning, while the second step processes the result of the first pruning step.
23
 
24
 
25
- 🌹 If you use this model, please ✨star our **[GitHub repository](https://github.com/plageon/HTMLRAG)** to support us. Your star means a lot!
26
 
27
  ## 📦 Installation
28
 
 
12
  We release the HTML pruner model used in **HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems**.
13
 
14
  <p align="left">
15
+ Useful links: 📝 <a href="https://arxiv.org/abs/2411.02959" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/papers/2411.02959" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/plageon/HtmlRAG" target="_blank">Github</a>
16
  </p>
17
 
18
  We propose HtmlRAG, which uses HTML instead of plain text as the format of external knowledge in RAG systems. To tackle the long context brought by HTML, we propose **Lossless HTML Cleaning** and **Two-Step Block-Tree-Based HTML Pruning**.
 
22
  - **Two-Step Block-Tree-Based HTML Pruning**: The block-tree-based HTML pruning consists of two steps, both of which are conducted on the block tree structure. The first pruning step uses a embedding model to calculate scores for blocks, while the second step uses a path generative model. The first step processes the result of lossless HTML cleaning, while the second step processes the result of the first pruning step.
23
 
24
 
25
+ 🌹 If you use this model, please ✨star our **[GitHub repository](https://github.com/plageon/HtmlRAG)** to support us. Your star means a lot!
26
 
27
  ## 📦 Installation
28