yeah... so... I'm not sure if this was supposed to happen, but it sure did. So let's talk about it and what I'm going to do about it. Earlier this month I decided to do a monthly coding project to get myself to build more. Spend less time reading and procrastinating, spend more time coding. The first project? A local RAG "from scratch" project (read more about it here on my website).
Right now, Iām stuck in the messy middle. Itās where excitement fizzles, side-quests appear, and the actual coding? Well, itās kind of stalled. But let's break this down: What have I done so far? What will I tackle next? And how do I keep moving forward with the time left in this project?
What Iāve Done So Far
In my last newsletter, I shared a few early wins:
- Created a script to process local markdown files, extract content, generate embeddings using OpenAIās text-embedding-3-large, and store results in a Pandas DataFrame.
- Set up a local Postgres database with pgvector in Docker for storing and querying embeddings (not connected to Python yet).
Thatās honestly it. Since then? Progress has slowed to a crawl because I fell down the rabbit hole of embedding models. I wanted to replace OpenAIās embeddings with something else, but wow - that topic is much too complex for a short sprint. I now have 1,500 words of notes on multilingual benchmarks and metrics. Useful? Yes. Immediately actionable? Not so much.
Research Wins (a.k.a., My Procrastination Disguised as Progress)
While I havenāt written much code, Iāve dived deep into research:
- Embedding Models & Multilingual Data: I explored benchmarks for multilingual embeddings since Iām working with both German and English text data. This is especially relevant for my day job, where weāre building a vector store for German text. (apparently bge-m3 and stella-en-1.5B are popular options?)
- Blog Progress: I wrote and published a blog post about setting up a Postgres + pgvector database locally using Docker. Itās not directly tied to this project, but itās a tutorial I know Iāll need in two months when Iāve forgotten how I did it. These setup tutorials are lifesavers - without them I will never be able to answer "How did you install this?" - "No idea, followed 5 tutorials, mixed results, at some point it worked." š¤·š»āāļø
Whatās Next: The 5-Day, 30-Minute Plan
Iām committing to 30 minutes of coding per day for the next five days. Itās time to turn research into action. Hereās the plan (because without a to-do list, I won't get anything done):
- Connect Python to Postgres: Choose a library (Psycopg2? SQLAlchemy?) and set up a basic connection.
- Run a Query: Start small - maybe just list tables in the database to confirm the connection.
- Insert Data: Use Python to insert anything into the database. I have an example insert in my DBeaver already that I can use.
- Work with Notebooks: Figure out how to run
.py
interactive cells in VS Code to get away from ipynb files because AI extensions & GitHub hate them. (Do they just appear when you add the right symbols? Weāll see!)
- Move to
.py
Files: Transition the notebook code into said standalone Python scripts.
- Insert Embeddings: Add the embeddings Iāve generated from OpenAI to the database.
- Query & Compare: Write a function to compute similarity using pgvector or cosine similarity, then test it with a query.
- Integrate Everything: Combine the embedding and similarity search functions into a pipeline that takes a query string, computes its embedding, and finds the closest match in the database.
Building in Public: Humbling and (Occasionally) Embarrassing
Building in public isnāt glamorous. Right now, Iām not even sure I can call myself a ābuild in publicā person because, well, Iām not consistently building anything. But thatās what this project is supposed to fix - itās an attempt to build a coding habit. I wanted to be smart and strategic, but I overcomplicated things. Now, Iām scaling back to the basics: show up, code a little, and share the journey, messy bits included.
Looking Ahead: February and Beyond
When I started this project, I gave myself permission to continue into the next month if needed. And honestly? Iām not ready to abandon it. Thereās so much more to explore with this local RAG project. Plus, Iām not sick of it yet, which is a miracle considering my novelty-craving brain. Maybe my perfectionist side is kicking in - I donāt want to leave this project in an embarrassing state, especially since Iām building it publicly.
So here we are. If youāve ever gotten stuck in a side-project, let me know how you got unstuck. And if you have tips for making time/finding energy to code after dinner, please let me know - today I napped instead... š„“
Until next time, keep building (or procrastinating productively).