YAML Metadata Warning: empty or missing yaml metadata in repo card

Check out the documentation for more information.

LSTM Seq2Seq Model for Translation

This repository contains the implementation of an LSTM-based Seq2Seq model for translation tasks. The model has been trained on a bilingual dataset and evaluated using BLEU and ChrF scores to measure translation quality.

Model Architecture

The model is a Seq2Seq architecture that uses:

  • Embedding Layer: To convert input tokens into dense vectors.
  • LSTM Encoder: To encode the source language sequences into a hidden representation.
  • LSTM Decoder: To generate the translated target language sequences from the hidden representation.
  • Linear Layer: To map the decoder output to the target vocabulary space.

Training Details

  • Training Loss: Cross-entropy loss with padding tokens ignored.
  • Optimizer: Adam optimizer with a learning rate of 0.001.
  • Number of Epochs: 10 epochs.
  • Batch Size: 32.

Evaluation Metrics

The model's performance was evaluated using:

  • BLEU Score: A metric to measure the similarity between the generated and reference translations.
  • ChrF Score: A character-based metric for evaluating translation quality.

Results

The training and validation loss, along with BLEU and ChrF scores, were plotted to analyze the model's performance:

  • Training Loss: Decreased steadily over the epochs, indicating effective learning.
  • Validation Loss: Showed minimal improvement, suggesting potential overfitting.
  • BLEU Score: Improved gradually but remained relatively low, indicating that further tuning may be needed.
  • ChrF Score: Showed a consistent increase, reflecting better character-level accuracy in translations.

Files Included

  • LSTM_model.ipynb: The Jupyter notebook containing the full implementation of the model, including data loading, training, and evaluation.
  • bleu_scores.csv: CSV file containing BLEU scores for each epoch.
  • chrf_scores.csv: CSV file containing ChrF scores for each epoch.
  • loss_plot.png: Plot of training and validation loss.
  • bleu_score_plot.png: Plot of BLEU scores over epochs.
  • chrf_score_plot.png: Plot of ChrF scores over epochs.

Future Work

  • Hyperparameter Tuning: Experiment with different hyperparameters to improve model performance.
  • Data Augmentation: Use data augmentation techniques to improve the model's ability to generalize.
  • Advanced Architectures: Consider using attention mechanisms or transformer models for better performance.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support