Silly-Machine
/

TuPy-Bert-Base-Multilabel

Text Classification

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

TuPy-Bert-Base-Multilabel / README.md

victoriadreis's picture

Update README.md

9744cab over 2 years ago

|

history blame contribute delete

3.34 kB

	---
	license: mit
	datasets:
	- Silly-Machine/TuPyE-Dataset
	language:
	- pt

	pipeline_tag: text-classification
	base_model: neuralmind/bert-base-portuguese-cased
	widget:
	- text: 'Bom dia, flor do dia!!'

	model-index:
	- name: Yi-34B
	results:
	- task:
	type: text-classfication
	dataset:
	name: TuPyE-Dataset
	type: Silly-Machine/TuPyE-Dataset
	metrics:
	- type: f1
	value: 0.84
	name: F1-score
	verified: true
	- type: precision
	value: 0.85
	name: Precision
	verified: true
	- type: recall
	value: 0.84
	name: Recall
	verified: true
	---

	## Introduction


	TuPy-Bert-Base-Multilabel is a fine-tuned BERT model designed specifically for multilabel classification of hate speech in Portuguese.
	Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased),
	TuPy-Bert-Base-Multilabel is a refined solution for addressing categorical hate speech concerns (ageism, aporophobia, body shame, capacitism, LGBTphobia, political,
	racism, religious intolerance, misogyny, and xenophobia).
	For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).

	The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data.
	In the creation of a specialized Portuguese Language Model tailored for hate speech classification,
	the original BERTimbau model underwent fine-tuning processe carried out on
	the [TuPy Hate Speech DataSet](https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset), sourced from diverse social networks.

	## Available models

	\| Model \| Arch. \| #Layers \| #Params \|
	\| ---------------------------------------- \| ---------- \| ------- \| ------- \|
	\| `Silly-Machine/TuPy-Bert-Base-Binary-Classifier` \| BERT-Base \|12 \|109M\|
	\| `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` \| BERT-Large \| 24 \| 334M \|
	\| `Silly-Machine/TuPy-Bert-Base-Multilabel` \| BERT-Base \| 12 \| 109M \|
	\| `Silly-Machine/TuPy-Bert-Large-Multilabel` \| BERT-Large \| 24 \| 334M \|

	## Example usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
	import torch
	import numpy as np
	from scipy.special import softmax

	def classify_hate_speech(model_name, text):
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	config = AutoConfig.from_pretrained(model_name)

	# Tokenize input text and prepare model input
	model_input = tokenizer(text, padding=True, return_tensors="pt")

	# Get model output scores
	with torch.no_grad():
	output = model(**model_input)
	scores = softmax(output.logits.numpy(), axis=1)
	ranking = np.argsort(scores[0])[::-1]

	# Print the results
	for i, rank in enumerate(ranking):
	label = config.id2label[rank]
	score = scores[0, rank]
	print(f"{i + 1}) Label: {label} Score: {score:.4f}")

	# Example usage
	model_name = "Silly-Machine/TuPy-Bert-Base-Multilabel"
	text = "Bom dia, flor do dia!!"
	classify_hate_speech(model_name, text)

	```