Transforming Text to Vision: Integrating Gen AI with Python

Transforming text into visuals is now possible with Generative AI and Python. Follow this detailed guide to learn the steps and revolutionize your projects with cutting-edge AI integration.

Table of Contents

Introduction

In this blog, we will explore the seamless integration of Generative AI with Python to transform text into visual content. This step-by-step guide is designed for developers and tech enthusiasts looking to bridge the gap between text-based data and engaging visuals, enhancing user experience and functionality.

Use Case

Enhancing word discovery in a mobile game


In today’s digital age, there’s a growing need for applications that can generate images based on user prompts or input. This could be for various purposes such as generating personalized content, creating artwork, or even assisting in visual storytelling. However, implementing such functionality can be complex, requiring integration with machine learning models for image generation and efficient handling of user interactions. In this blog, we are taking a case of word game which is played on a mobile phone. The game is all about searching for words on a grid. Let’s say, we want to present user with a hint of what kind of word is present on the game. To support this case, we’ll pick any one word from the game and pass it to a machine learning model. The model will look up an image that best fits to the word and return a response. We will use this image to present it to user who can visualize it and use it as hint to find the word. Exicting this far, isn’t it. Let’s delve into it!

Why Backend Integration?

Leveraging a machine learning model is generally recommended on the backend over direct integration on the frontend due to several reasons:

  1. Security: Integrating the model on the backend safeguards sensitive API keys, mitigating risks such as unauthorized access or misuse.

  2. Control: Backend integration provides superior control over API usage, enabling monitoring and regulation of calls for efficient utilization and compliance with usage limits.
  3. Performance: Offloading the processing to the backend enhances frontend performance by reducing computational overhead on client devices, resulting in a smoother user experience.
  4. Scalability: Backend integration facilitates seamless resource scaling based on demand, ensuring optimal performance during peak usage periods. This scalability is essential for accommodating growing user bases and maintaining consistent service quality.

Our Services

Book a Meeting with the Experts at Yugensys


Backend Python Code

One of the key components in this integration is Gradio, a Python library that simplifies the deployment of machine learning models as web applications. Gradio allows us to create interfaces for our machine learning models with minimal code, enabling easy interaction with users. By connecting our Flask backend to Gradio, we can seamlessly integrate machine learning capabilities into our application.

from flask import Flask, request, jsonify, Response
import json
import http
import io
from PIL import Image

def generate_image(prompt, steps):
    # Initializes the Gradio client for accessing machine learning model hosted by ByteDance
    client = Client("ByteDance/SDXL-Lightning")
    # Utilizes the client to make a prediction based on the provided prompt and other parameters.
    result = client.predict(prompt, steps, api_name="/generate_image_1")
    return result

One of the key components in this integration is Gradio, a Python library that simplifies the deployment of machine learning models as web applications. Gradio allows us to create interfaces for our machine learning models with minimal code, enabling easy interaction with users. By connecting our Flask backend to Gradio, we can seamlessly integrate machine learning capabilities into our application.

app = Flask(__name__)
    @app.route("/generate"methods=["POST"])
    def handle_request():
        prompt = data.get("prompt")
            return jsonify({"error""Missing prompt parameter"}), 400
        try:
            result_path = generate_image(prompt)
            
            img_byte_array = io.BytesIO()
            img_byte_array.seek(0)
        
            return jsonify({"error"str(e)}), 500
    app.run(host="localhost"port=6000)

The code block initializes a Flask application and defines a route at “/generate” to handle POST requests. When a POST request is received, the handle_request() function is invoked.

Within handle_request(), the JSON data from the request body is extracted to retrieve the prompt provided by the user. If the prompt is missing, an error response with status code 400 is returned to notify the client.

Subsequently, the generate_image() function is invoked with the prompt parameter to generate an image based on the provided prompt. After the image is generated, it is opened and converted into a byte array.

Finally, the byte array containing the image is sent as a response with a mimetype of ‘image/png’ to the client. In case of any errors occurring during the process, an error response with status code 500 is returned, along with the corresponding error message.

In essence, this backend setup effectively utilizes Flask for API development and seamlessly integrates with Gradio to infuse machine learning capabilities. Consequently, our application enables dynamic image generation based on textual input, thereby delivering personalized and captivating user experiences while enhancing its overall functionality and appeal.

Frontend DART Code

The following Dart code represents the frontend implementation of a hint generator application. In the context of this post, this application serves as a practical demonstration of how we can read the word and pass it on to our backend image generation system to receive visual hints based on textual input.

import ‘package:http/http.dart’ as http;
void main() {
  runApp(MyApp());
}
class MyApp extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: ‘Word Hint Generator’,
      theme: ThemeData(primarySwatch: Colors.blue),
      home: MyScreen(),
    );
  }
}

Main Function and MyApp Class:

  • The main() function serves as the entry point for the Dart application. Here, we initialize the application by running an instance of the MyApp widget.
  • MyApp is a stateless widget that represents the root of our application. It sets the title of the application and defines the theme using MaterialApp.
class MyScreen extends StatefulWidget {
  @override
  MyScreenState createState() => MyScreenState();
class MyScreenState extends State<MyScreen> {
  TextController textController = TextController();
  dynamic imageUrl;
  Future<void> generateImage() async {
    String apiUrl = “https://25c2-164-120-110-140.ngrok-free.app/generate”;
    String text = textController.text;
    Map<String, String> data = {‘text’: text};
    try {
      var response = await http.post(
        Uri.parse(apiUrl),
        headers: {‘ContentType: ‘application/json’},
        body: json.encode(data),
      );
      if (response.statusCode == 200) {
        setState(() {
          imageUrl = response.bodyBytes;
        });
      } else {
        print(‘Error: ${response.reasonPhrase}‘);
      }
    } catch (e) {
      print(‘Error: $e‘);
    }
  }
  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: Text(‘Word Hint Generator’)),
      body: Padding(
        padding: const EdgeInsets.all(16.0),
        child: Column(
          crossAxisAlignment: CrossAxisAlignment.stretch,
          children: [
            TextField(
              controller: textController,
              decoration: InputDecoration(labelText:Enter Word’),
            ),
            SizedBox(height: 20),
            ElevatedButton(
              onPressed: generateImage,
              child: Text(‘Generate Hint’),
            ),
            SizedBox(height: 20),
            imageUrl != null
                ? Image.memory(
                    imageUrl,
                    fit: BoxFit.contain,
                  )
                : Container(),
          ],
        ),
      ),
    );
  }
}
 

MyScreen Class:

  • MyScreen is a stateful widget responsible for rendering the main screen of our application. It extends StatefulWidget to handle state changes dynamically.
  • Inside MyScreen, we define the MyScreenState class, which manages the state of our application screen
  1. User Input and Image Generation:

    • The generateImage() function is an asynchronous method that sends a POST request to our backend API endpoint (apiUrl) with the user-entered text as the prompt.
    • Upon receiving a response from the backend, the image URL is stored in the imageUrl variable. If the request fails, an error message is printed to the console.
    •  
  2. UI Layout:

    • The UI layout is defined within the build() method of MyScreen. It consists of an app bar with the title “Word Hint Generator” and a body containing a text field for entering words, a button to generate hints, and a space to display the generated image hint.
    • The TextController is used to control the text field, and the SizedBox widget is used for spacing between UI elements.

Summary

In this technical blog post, we delved into the development of an Image Generator App using Flutter, coupled with the machine learning model called ByteDance for image generation. Our journey began with a practical use case: providing word hints in a mobile game through image prompts. Leveraging Flutter’s UI capabilities, we crafted a user-friendly frontend interface enabling users to input words and generate corresponding image hints effortlessly.

On the backend, we harnessed Flask to manage POST requests for prompt generation and image retrieval. By shifting processing to the backend, we bolstered security, control, performance, and scalability. This architecture ensures optimal resource utilization while safeguarding sensitive API keys and enhancing frontend performance.

Throughout our implementation, we highlighted the seamless synergy between frontend and backend components, showcasing the power of cohesive integration in delivering a robust and user-friendly image generation experience. By adopting this approach, developers can create versatile applications with enhanced functionality and performance, setting new benchmarks in user engagement and satisfaction.

Vaishakhi Panchmatia

As Tech Co-Founder at Yugensys, I’m passionate about fostering innovation and propelling technological progress. By harnessing the power of cutting-edge solutions, I lead our team in delivering transformative IT services and Outsourced Product Development. My expertise lies in leveraging technology to empower businesses and ensure their success within the dynamic digital landscape.

Looking to augment your software engineering team with a team dedicated to impactful solutions and continuous advancement, feel free to connect with me. Yugensys can be your trusted partner in navigating the ever-evolving technological landscape.

Subscrible For Weekly Industry Updates and Yugensys Expert written Blogs


More blogs from Artificial Intelligence

Delve into the transformative world of Artificial Intelligence, where machines are designed to think, learn, and make decisions like humans. This category covers topics ranging from intelligent agents and natural language processing to computer vision and generative AI. Learn about real-world applications, cutting-edge research, and tools driving innovation in industries such as healthcare, finance, and automation.



Expert Written Blogs

Common Words in Client’s testimonial