Open Source Embeddings with Langchain

Open-source tech enables us to get access to the technology that big organizations build. It also allows us to build our features on top of it.
The same is the case with open-source LLM models that companies like Facebook, Mistral AI, Hugging Face, etc. release for us to use, fine-tune, and deploy on our premises.
Here in this article, we will use open-source Hugging Face embeddings, that we will download locally to build the Q&A system. We would use this system to chat with our documents.
Also, we use the Llama 2 model, which Facebook has open-sourced and released. However, since the size of the model is way too large for us to deploy, we would use the model API that Replicate has provided to us.
We are using Pinecone vector DB, to store the embeddings we create from the text in the documents.
This DB will be searched to get relevant answers. We would also show how you can add more documents to your Pinecone Vector DB (DB updation).
Lastly, you can either develop this system in Python from scratch or use libraries like Langchain and Llama Index that give us connectors to implement our Q&A system in very few lines.
Here we are using Langchain to accomplish our goal.
Basic Concepts of Retrieval Augmented Generation
Okay so let's begin with the basics.
For doing question answering on any document, pdf or any other textual data source. The data is split up into few sentences (paragraphs) and each paragraph is converted into an embedding vector (With the help of an embedding Model).
Now for those who don't know why we do so, If you remember cosine similarity from your school or your college days, we will use the paragraph vector and find out if it is similar to the question vector. For this we will use cosine similarity. You can read more about this concept if not clear.
Cosine similarity in vectors So after finding the most relevant paragraphs (text chunks) from the documents, we use a technique called as RAG (Retrieval Augmented Generation).
Basically when you would have used ChatGPT, you would have copied articles, then asked it questions ? This methodoly does the same. We basically copy the most relevant paragraphs, put them in the context of the LLM model and ask it to answer questions for us.
Implementing the Q&A System
Now that we have understood the basics, lets get on with coding then discuss more.Let's begin by installing few libraries
Let's begin by creating our Replicate account and getting our replicate API keys first.
Let's import our replicate api key in colab and try running the Llama 2 model
Looks good ! Hopefully, you would have got an interested output from Llama2 model.
Now, let's go forward with creating our document embedding after splitting the document in chunks using Langchain.
Also, go to Pinecone, create your account there. You will need your pinecone api keys, environment name and the index name (The database name you create).
After having defined our hugging face embedding model, let's go forward with saving and creating the embeddings to pinecone db.
Now let us initialise our llm model from Replicate. Langchain provides lots of options to choose our embedding model, llm model, vector db. I will make another tutorial covering Azure Open AI Embedding and LLM model. Let's initialize our llm now.
Looking good ! Now let's define our Conversational Chain.
It is time for us to begin chatting :)
Upserting New Text to Vector DB
If you wish to upload another pdf and generate its embedding -> Here is how you can update your pinecone vector db.
Hope this help you test our any language model of your choice from Replicate. If you have any more doubts, feel free to reach out to us. We will be happy to help you :)