Spaces:
Sleeping
Sleeping
| title: Image Captioning | |
| emoji: πΌοΈπ | |
| colorFrom: pink | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.31.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| # Image Captioning App πΌοΈπ | |
| A web-based image captioning tool that automatically generates descriptive captions for uploaded images using state-of-the-art computer vision models. Built with Gradio and deployed on Hugging Face Spaces. | |
|  | |
| ## π Live Demo | |
| Try the app: [Image-Captioning](https://huggingface.co/spaces/ashish-soni08/Image-Captioning) | |
| ## β¨ Features | |
| - **Automatic Caption Generation**: Upload any image and get descriptive captions instantly | |
| - **Visual Understanding**: AI model analyzes objects, scenes, and activities in images | |
| - **Clean Interface**: Intuitive web UI built with Gradio for seamless image uploads | |
| - **Responsive Design**: Works on desktop and mobile devices | |
| ## π οΈ Technology Stack | |
| - **Backend**: Python, Hugging Face Transformers | |
| - **Frontend**: Gradio | |
| - **Model**: [Salesforce/blip-image-captioning-base](https://huggingface.co/Salesforce/blip-image-captioning-base) | |
| - **Deployment**: Hugging Face Spaces | |
| ## πββοΈ Quick Start | |
| ### Prerequisites | |
| ```bash | |
| Python 3.8+ | |
| pip | |
| ``` | |
| ### Installation | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://github.com/Ashish-Soni08/image-captioning-app.git | |
| cd image-captioning-app | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Run the application: | |
| ```bash | |
| python app.py | |
| ``` | |
| 4. Open your browser and navigate to `http://localhost:7860` | |
| ## π Usage | |
| 1. **Upload Image**: Click the "Upload image" button and select an image from your device | |
| 2. **Generate Caption**: The app automatically processes the image and generates a caption | |
| 3. **View Results**: The descriptive caption appears in the output textbox | |
| ### Example | |
| **Input Image:** | |
| ``` | |
| [A photo of a golden retriever playing in a park] | |
| ``` | |
| **Generated Caption:** | |
| ``` | |
| "A golden retriever dog playing with a ball in a grassy park on a sunny day" | |
| ``` | |
| ## π§ Model Information | |
| This app uses **Salesforce/blip-image-captioning-base**, a vision-language model for image captioning: | |
| - **Architecture**: BLIP with ViT-Base backbone | |
| - **Model Size**: ~990MB (PyTorch model file) | |
| - **Training Data**: COCO dataset with bootstrapped captions from web data | |
| - **Capabilities**: Both conditional and unconditional image captioning | |
| - **Performance**: State-of-the-art results on image captioning benchmarks (+2.8% CIDEr improvement) | |
| ## π Project Structure | |
| ``` | |
| image-captioning-app/ | |
| βββ app.py # Main Gradio application | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # Project documentation | |
| βββ example_images/ # Sample images for testing | |
| ``` | |
| ## π License | |
| This project is licensed under the Apache License 2.0 | |
| ## π Acknowledgments | |
| - [Hugging Face](https://huggingface.co/) for the Transformers library and model hosting | |
| - [Gradio](https://gradio.app/) for the web interface framework | |
| - [Salesforce Research](https://github.com/salesforce/BLIP) for the BLIP model | |
| ## π Contact | |
| Ashish Soni - ashish.soni2091@gmail.com | |
| Project Link: [github](https://github.com/Ashish-Soni08/image-captioning-app) |