Make Sense of a 10K+ Line GitHub Repos Without Reading the Code

0
10



Image by Author | Canva

 

Navigating and understanding large codebases can be challenging, especially for new developers joining a project or when revisiting older repositories. Traditional methods of understanding code structures involve reading through numerous files and documentation, which can be time-consuming and error-prone. GitDiagram offers a solution by converting GitHub repositories into interactive diagrams, providing a visual representation of the codebase’s architecture. This tool helps in understanding complex systems, and enhancing collaboration among development teams. In this article, I will walk you through the step-by-step process of using GitDiagram locally. So, without any further wait, let’s get started.

 

Step-by-Step Guide to Using GitDiagram Locally

 

Step 1: Clone the GitDiagram repository

git clone https://github.com/ahmedkhaleel2004/gitdiagram.git
cd gitdiagram

 

Step 2: Install Dependencies

This fetches and installs dependencies into node_modules.

 
Before running pnpm install, make sure you have Node.js and pnpm installed globally.

  • To install Node.js, download it from nodejs.org
  • To install pnpm, run the following command:
  •  

 

Step 3: Set Up Environment Variables

 
Edit the .env file to include your OpenAI / Anthropic /OpenRouter API key and, optionally, your GitHub personal access token.

 

Step 4: Start Backend Services

docker-compose up --build -d

 
The FastAPI server will be available at localhost:8000. You will see the following message at the server side.

{"message":"Hello from GitDiagram API!"}

 

Step 5: Initialize the Database

Run the following commands to set up the database:

chmod +x start-database.sh
./start-database.sh
pnpm db:push

 
When prompted to generate a random password, input yes. The Postgres database will start in a container at localhost:5432.
Note: When I tried to run this command, I got this error:

sh: drizzle-kit: command not found
 ELIFECYCLE  Command failed.
 WARN   Local package.json exists, but node_modules missing, did you mean to install?

 
Turns out I hadn’t installed drizzle-kit. So if you see this, just run:

 
After that, pnpm db:push worked fine and gave me this output:

No config path provided, using default 'drizzle.config.ts'
Reading config file '/Users/kanwal/Desktop/gitdiagram/drizzle.config.ts'
Using 'postgres' driver for database querying
[✓] Pulling schema from database...
[✓] Changes applied

 

Step 6: Run the Frontend

 
You can now access the website at localhost:3000 and edit the rate limits defined in backend/app/routers/generate.py in the generate function decorator. Let’s try to visualize the github repo of the fastapi library.

Frontend Interface:
 
Frontend Interface
 

Output:
 
Output

 

Concluding Thoughts

 
This is a great idea and a really useful repository. I’ve personally felt the need for something like this in my own projects, so I appreciate the effort and vision behind it.

That said, offering an unbiased opinion—there’s definitely room for improvement.

One recurring issue I ran into was:

Syntax error in text mermaid version 11.4.

 
According to the project owner ahmedkhaleel2004, this error usually means the LLM generated invalid Mermaid.js syntax.

 

I’ve tried addressing this issue in numerous ways, but ultimately, I find that there is no reliable fix—it’s mostly a limitation of the LLM. If there were a way to validate Mermaid.js code, that would help, but as of now, I’m not sure how.

 

He also noted that the current prompt (in `prompts.py`, specifically the third one that generates Mermaid code) already tries to enforce correct syntax—but it’s not foolproof, and new syntax issues still occur.

A Solution I Found Online That Worked
While digging through the GitHub Issues, I came across a workaround shared by another user that actually worked for me:

 

Add this to the customize diagram prompt:Ignore the syntax issue from Mermaid version 11.4.1 and regenerate the remainder of the diagram.

 

Using that line helped bypass the error. Even though some components might still be missing, it at least produced a partial diagram—enough to give a high-level understanding of the codebase.
 
 

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.