Image by Author | Canva
# Introduction
This is the second article in my beginner project series. If you haven’t seen the first one on Python, it’s worth checking out: 5 Fun Python Projects for Absolute Beginners.
So, what’s generative AI or Gen AI? It is all about creating new content like text, images, code, audio, or even video using AI. Before the large language and vision models era, things were quite different. But now, with the rise of foundation models like GPT, LLaMA, and LLaVA, everything has shifted. You can build creative tools and interactive apps without having to train models from scratch.
I’ve picked these 5 projects to cover a bit of everything: text, image, voice, vision, and some backend concepts like fine-tuning and RAG. You’ll get to try out both API-based solutions and local setups, and by the end, you’ll have touched all the building blocks used in most modern Gen AI apps. So, Let’s get started.
# 1. Recipe Generator App (Text Generation)
Link: Build a Recipe Generator with React and AI: Code Meets Kitchen
We’ll start with something simple and fun that only uses text generation and an API key, no need for heavy setup. This app lets you input a few basic details like ingredients, meal type, cuisine preference, cooking time, and complexity. It then generates a full recipe using GPT. You’ll learn how to create the frontend form, send the data to GPT, and render the AI-generated recipe back to the user. Here is another advanced version of same idea: Create an AI Recipe Finder with GPT o1-preview in 1 Hour. This one has more advanced prompt engineering, GPT-4, suggestions, ingredient substitutions, and a more dynamic frontend.
# 2. Image Generator App (Stable Diffusion, Local Setup)
Link: Build a Python AI Image Generator in 15 Minutes (Free & Local)
Yes, you can generate cool images using tools like ChatGPT, DALL·E, or Midjourney by just typing a prompt. But what if you want to take it a step further and run everything locally with no API costs or cloud restrictions? This project does exactly that. In this video, you’ll learn how to set up Stable Diffusion on your own computer. The creator keeps it super simple: you install Python, clone a lightweight web UI repo, download the model checkpoint, and run a local server. That’s it. After that, you can enter text prompts in your browser and generate AI images instantly, all without internet or API calls.
# 3. Medical Chatbot with Voice + Vision + Text
Link: Build an AI Voice Assistant App using Multimodal LLM Llava and Whisper
This project isn’t specifically built as a medical chatbot, but the use case fits well. You speak to it, it listens, it can look at an image (like an X-ray or document), and it responds intelligently combining all three modes: voice, vision, and text. It’s built using LLaVA (a multimodal vision-language model) and Whisper (OpenAI’s speech-to-text model) in a Gradio interface. The video walks through setting it up on Colab, installing libraries, quantizing LLaVA to run on your GPU, and stitching it all together with gTTS for audio replies.
# 4. Fine-Tuning Modern LLMs
Link: Fine tune Gemma 3, Qwen3, Llama 4, Phi 4 and Mistral Small with Unsloth and Transformers
So far, we’ve been using off-the-shelf models with prompt engineering. That works, but if you want more control, fine-tuning is the next step. This video from Trelis Research is one of the best out there. Therefore, instead of suggesting a project that simply swaps a fine-tune model, I wanted you to focuse on the actual process of fine-tuning a model yourself. This video shows you how to fine-tune models like Gemma 3, Qwen3, Llama 4, Phi 4, and Mistral Small using Unsloth (library for faster, memory-efficient training) and Transformers. It’s long (about 1.5 hours), but super worth it. You’ll learn when fine-tuning makes sense, how to prep datasets, run quick evals using vLLM, and debug real training issues.
# 5. Build Local RAG from Scratch
Link: Local Retrieval Augmented Generation (RAG) from Scratch (step by step tutorial)
Everyone loves a good chatbot, but most fall apart when asked about stuff outside their training data. That’s where RAG is useful. You give your LLM a vector database of relevant documents, and it pulls context before answering. The video walks you through building a fully local RAG system using a Colab notebook or your own machine. You’ll load documents (like a textbook PDF), split them into chunks, generate embeddings with a sentence-transformer model, store them in SQLite-VSS, and connect it all to a local LLM (e.g. Llama 2 via Ollama). It’s the clearest RAG tutorial I’ve seen for beginners, and once you’ve done this, you’ll understand how ChatGPT plugins, AI search tools, and internal company chatbots really work.
# Wrapping Up
Each of these projects teaches you something essential:
Text → Image → Voice → Fine-tuning → Retrieval
If you’re just getting into Gen AI and want to actually build stuff, not just play with demos, this is your blueprint. Start from the one that excites you most. And remember, it’s okay to break things. That’s how you learn.
Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.