Open Source Embeddings with Langchain

Open-source tech enables us to get access to the technology that big organizations build. It also allows us to build our features on top of it.

The same is the case with open-source LLM models that companies like Facebook, Mistral AI, Hugging Face, etc. release for us to use, fine-tune, and deploy on our premises.

Here in this article, we will use open-source Hugging Face embeddings, that we will download locally to build the Q&A system. We would use this system to chat with our documents.

Also, we use the Llama 2 model, which Facebook has open-sourced and released. However, since the size of the model is way too large for us to deploy, we would use the model API that Replicate has provided to us.

We are using Pinecone vector DB, to store the embeddings we create from the text in the documents.
This DB will be searched to get relevant answers. We would also show how you can add more documents to your Pinecone Vector DB (DB updation).

Lastly, you can either develop this system in Python from scratch or use libraries like Langchain and Llama Index that give us connectors to implement our Q&A system in very few lines.

Here we are using Langchain to accomplish our goal.

Basic Concepts of Retrieval Augmented Generation

Okay so let's begin with the basics.
For doing question answering on any document, pdf or any other textual data source. The data is split up into few sentences (paragraphs) and each paragraph is converted into an embedding vector (With the help of an embedding Model).

Now for those who don't know why we do so, If you remember cosine similarity from your school or your college days, we will use the paragraph vector and find out if it is similar to the question vector. For this we will use cosine similarity. You can read more about this concept if not clear.

‍Cosine similarity in vectors So after finding the most relevant paragraphs (text chunks) from the documents, we use a technique called as RAG (Retrieval Augmented Generation).
Basically when you would have used ChatGPT, you would have copied articles, then asked it questions ? This methodoly does the same. We basically copy the most relevant paragraphs, put them in the context of the LLM model and ask it to answer questions for us.

Implementing the Q&A System

Now that we have understood the basics, lets get on with coding then discuss more.Let's begin by installing few libraries

!pip install pinecone-client langchain pypdf huggingface sentence_transformers replicate

Let's begin by creating our Replicate account and getting our replicate API keys first.

Let's import our replicate api key in colab and try running the Llama 2 model

import os 
os.environ["REPLICATE_API_TOKEN"]="Your_api_key" 

import replicate 
output = replicate.run( "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1", 
                       input={ "prompt": "Can you explain what a transformer is (in a machine learning context)?", 
                              "system_prompt": "You are a pirate" } ) ''.join(output)

Looks good ! Hopefully, you would have got an interested output from Llama2 model.
Now, let's go forward with creating our document embedding after splitting the document in chunks using Langchain.
Also, go to Pinecone, create your account there. You will need your pinecone api keys, environment name and the index name (The database name you create).

import os 
import sys 
import pinecone 

from langchain.llms import Replicate 
from langchain.vectorstores import Pinecone 
from langchain.text_splitter import CharacterTextSplitter 
from langchain.document_loaders import PyPDFLoader 
from langchain.embeddings import HuggingFaceEmbeddings 
from langchain.chains 

import ConversationalRetrievalChain 

# Initialize Pinecone 
pinecone.init(api_key='your-pinecone-api-keys', environment='your-pinecone-environment') 

# Load and preprocess the PDF document 
loader = PyPDFLoader('/content/INDAS7.pdf') 
documents = loader.load() 

# Split the documents into smaller chunks for processing 
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) 
texts = text_splitter.split_documents(documents) 

# Use HuggingFace embeddings for transforming text into numerical vectors 
embeddings = HuggingFaceEmbeddings()

After having defined our hugging face embedding model, let's go forward with saving and creating the embeddings to pinecone db.

# Set up the Pinecone vector database 
index_name = "your-pinecone-index-name" 
index = pinecone.Index(index_name) 

vectordb = Pinecone.from_documents(texts, embeddings, index_name=index_name)

Now let us initialise our llm model from Replicate. Langchain provides lots of options to choose our embedding model, llm model, vector db. I will make another tutorial covering Azure Open AI Embedding and LLM model. Let's initialize our llm now.

# Initialize Replicate Llama2 Model 
llm = Replicate( model="a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5", input={"temperature": 0.75, "max_length": 3000} )

Looking good ! Now let's define our Conversational Chain.

# Set up the Conversational Retrieval Chain 
qa_chain = ConversationalRetrievalChain.from_llm( llm, vectordb.as_retriever(search_kwargs={'k': 2}), return_source_documents=True )

It is time for us to begin chatting :)

# Start chatting with the chatbot 
chat_history = [] 
  
while True: 
  query = input('Prompt: ') 
  if query.lower() in ["exit", "quit", "q"]: 
    print('Exiting') 
  sys.exit() 
  result = qa_chain({'question': query, 'chat_history': chat_history}) 
  print('Answer: ' + result['answer'] + '\n') 
  chat_history.append((query, result['answer'])) ‍

Upserting New Text to Vector DB

If you wish to upload another pdf and generate its embedding -> Here is how you can update your pinecone vector db.

# Load and preprocess the PDF document 
loader = PyPDFLoader('/content/INDAS2.pdf') 
documents = loader.load() 

# Split the documents into smaller chunks for processing 
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) 
texts = text_splitter.split_documents(documents) 

# Use HuggingFace embeddings for transforming text into numerical vectors 
embeddings = HuggingFaceEmbeddings() 

index = pinecone.Index("audit-gpt") 
vectorstore = Pinecone(index, embeddings.embed_query, "INDAS2") 

page_content_list = [doc.page_content for doc in texts] 
vectorstore.add_texts(page_content_list)

‍Hope this help you test our any language model of your choice from Replicate. If you have any more doubts, feel free to reach out to us. We will be happy to help you :)