Text Classification
Transformers
Safetensors
Portuguese
bert
Eval Results (legacy)
text-embeddings-inference
Instructions to use Silly-Machine/TuPy-Bert-Base-Multilabel with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Silly-Machine/TuPy-Bert-Base-Multilabel with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Silly-Machine/TuPy-Bert-Base-Multilabel")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Silly-Machine/TuPy-Bert-Base-Multilabel") model = AutoModelForSequenceClassification.from_pretrained("Silly-Machine/TuPy-Bert-Base-Multilabel") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| datasets: | |
| - Silly-Machine/TuPyE-Dataset | |
| language: | |
| - pt | |
| pipeline_tag: text-classification | |
| base_model: neuralmind/bert-base-portuguese-cased | |
| widget: | |
| - text: 'Bom dia, flor do dia!!' | |
| model-index: | |
| - name: Yi-34B | |
| results: | |
| - task: | |
| type: text-classfication | |
| dataset: | |
| name: TuPyE-Dataset | |
| type: Silly-Machine/TuPyE-Dataset | |
| metrics: | |
| - type: f1 | |
| value: 0.84 | |
| name: F1-score | |
| verified: true | |
| - type: precision | |
| value: 0.85 | |
| name: Precision | |
| verified: true | |
| - type: recall | |
| value: 0.84 | |
| name: Recall | |
| verified: true | |
| ## Introduction | |
| TuPy-Bert-Base-Multilabel is a fine-tuned BERT model designed specifically for multilabel classification of hate speech in Portuguese. | |
| Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased), | |
| TuPy-Bert-Base-Multilabel is a refined solution for addressing categorical hate speech concerns (ageism, aporophobia, body shame, capacitism, LGBTphobia, political, | |
| racism, religious intolerance, misogyny, and xenophobia). | |
| For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/). | |
| The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data. | |
| In the creation of a specialized Portuguese Language Model tailored for hate speech classification, | |
| the original BERTimbau model underwent fine-tuning processe carried out on | |
| the [TuPy Hate Speech DataSet](https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset), sourced from diverse social networks. | |
| ## Available models | |
| | Model | Arch. | #Layers | #Params | | |
| | ---------------------------------------- | ---------- | ------- | ------- | | |
| | `Silly-Machine/TuPy-Bert-Base-Binary-Classifier` | BERT-Base |12 |109M| | |
| | `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` | BERT-Large | 24 | 334M | | |
| | `Silly-Machine/TuPy-Bert-Base-Multilabel` | BERT-Base | 12 | 109M | | |
| | `Silly-Machine/TuPy-Bert-Large-Multilabel` | BERT-Large | 24 | 334M | | |
| ## Example usage | |
| ```python | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig | |
| import torch | |
| import numpy as np | |
| from scipy.special import softmax | |
| def classify_hate_speech(model_name, text): | |
| model = AutoModelForSequenceClassification.from_pretrained(model_name) | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| config = AutoConfig.from_pretrained(model_name) | |
| # Tokenize input text and prepare model input | |
| model_input = tokenizer(text, padding=True, return_tensors="pt") | |
| # Get model output scores | |
| with torch.no_grad(): | |
| output = model(**model_input) | |
| scores = softmax(output.logits.numpy(), axis=1) | |
| ranking = np.argsort(scores[0])[::-1] | |
| # Print the results | |
| for i, rank in enumerate(ranking): | |
| label = config.id2label[rank] | |
| score = scores[0, rank] | |
| print(f"{i + 1}) Label: {label} Score: {score:.4f}") | |
| # Example usage | |
| model_name = "Silly-Machine/TuPy-Bert-Base-Multilabel" | |
| text = "Bom dia, flor do dia!!" | |
| classify_hate_speech(model_name, text) | |
| ``` |