I made this following a blog post <a href="https://replit.com/@nhegday/RAG-From-Scratch" class="autolinkedURL autolinkedURL-url" target="_blank">replit.com/@nhegday/RAG-From-Scratch</a> , Here is what I learnt -RAG refers to Retrieval Augmented Generation , it is a fine-tuning technique used to enhance and personalize the responses of an LLM. I used Openai and Pinecone.DB .Pinecone is a vector database .
So here's the step-by-step process:
Read the input data which would be used to train the model .
Convert the data into smaller chunks
Embed the chunks- it generates a vector for each chunk , i did that using openai's text-embedding-ada-002 model .
Store the embeds in pinconedb by creating an index and upserting the vectors.
Create a mapping of vectors to chunks to unique ids.
Then retrieve the most similar chunks by using pineconedb query .
Then construct the prompt by passing instructions and the most similar chunks .
Then get the gpt response by chatcompletion.create function using gpt-3.5-turbo model .
Here's the colab notebook -<a href="https://colab.research.google.com/drive/1cZbfnz731N0bAXd1U9EIIzHWgMsp9Km5" class="autolinkedURL autolinkedURL-url" target="_blank">colab.research.google.com/drive/1cZbfnz731N..</a>

I made this following a blog post https://replit.com/@nhegday/RAG-From-Scratch , Here is what I learnt -RAG refers to Retrieval Augmented Generation , it is a fine-tuning technique used to enhance and personalize the responses of an LLM. I used Openai and Pinecone.DB .Pinecone is a vector database .

So here's the step-by-step process:

Read the input data which would be used to train the model .

Convert the data into smaller chunks

Embed the chunks- it generates a vector for each chunk , i did that using openai's text-embedding-ada-002 model .

Store the embeds in pinconedb by creating an index and upserting the vectors.

Create a mapping of vectors to chunks to unique ids.

Then retrieve the most similar chunks by using pineconedb query .

Then construct the prompt by passing instructions and the most similar chunks .

Then get the gpt response by chatcompletion.create function using gpt-3.5-turbo model .

Here's the colab notebook -https://colab.research.google.com/drive/1cZbfnz731N0bAXd1U9EIIzHWgMsp9Km5

RAG model for a QA bot-