Results for ""
Now replace the kitchen with the coding environment. Swapping every ingredient with elements like Langchain, LLM, chains, etc. just like cooking, making a chatbot is a nonlinear process . Each line of code , Each algorithm , Each chain we are using will shape our final product , yes what other than our RETRIEVAL CHATBOT .
In this blog, we will delve deeply into this process, discussing what problems may arise and how we will tackle them. Here, we 'cook' not a dish, but a CHATBOT. A chatbot that can answer questions from a PDF and is also capable of responding to casual chit-chat queries. Sounds exciting, right? Let's dive in!
Building the Chatbot -
Building the chatbot is a complex process. We have to go through many steps. So, we will take it as a journey where every step will be taken as motivation to create the best chatbot. In this adventurous journey, we have two friends along with us who will help in our journey: the Langchain framework and Anthropic's LLM.
The first step of our journey will start by taking the PDF with the help of PyPdfLoader as a document loader. As it is a large document, we will break it into chunks of size 400. For doing this, we will use CharacterTextSplitter.
In the next step of our journey, we have to embed our chunks. For embedding, we will use Sentence Transformer Embeddings, as it will generate embeddings for every sentence, making it easier for the similarity search.
Why Embeddings and Why Is Similarity Important? At this point, you might ask: "Why are embeddings so important in the creation of a chatbot? And why should we care about similarity?"
An embedding is a mapping of the sentence to the coordinates . It transforms every word string into a series of numbers. The advantage of sentence embeddings is their ability to capture the semantic meaning of sentences. So, when we measure a distance between this embedding and the prompt embedding, we will get the semantic similarity.
Moving ahead, for storing our embedded vectors, we will need a vector space. We will use FAISS vector space, which is a vector store by Facebook and is fast for similarity search. From the vector space, we will take the document as a retriever, as it gives us extra parameters for the similarity search, such as search type and score thresholding.
We don't want our chatbot to be unaware of previous questions and answers, and unable to provide answers to follow-up questions, right? yaa , then we need memory and chains to do it . Memory for storing our history and chains to understand the context of history with the prompt.
To make our journey much more exciting, we won't use any memory from the Langchain. We are going to use a list of tuples as memory. Every tuple will be of size 2, in which the first element is the prompt and the second is the chatbot's answer.
For follow-up questions, we will use ConversationalRetrievalChain from Langchain . We prefer this chain because we want the chatbot to correctly retrieve from the embeddings and be less generative. Another reason is that very few chains support a list of tuples as memory, and ConversationalRetrievalChain is one of them.
Our journey is about choice - the kind of paths we decide to go. Like choosing the chat model. For our journey, we will use Anthropic Claude-2 as a chat model. We could use other chat models too, like GPT or Hugging Face, but we will choose this model to make our journey non-linear and more exciting.
Image Source: Theaigodz on Medium
Now, we will pass the prompt and chat history to the LLM with the help of the chain, and then the LLM will give output on behalf of the prompt, chat history, and its own creativity.
Before we proceed further in our journey, let's consider an example. Imagine a scenario where we have a PDF titled "Virat Kohli: A Modern Master". This PDF contains chapters about Virat Kohli's cricket journey, year by year.
If a user asks our chatbot, "How many runs did Virat Kohli score against KXIP in 2016 and what was special about that match?" Then, our chatbot, utilizing the chunks from the PDF and Anthropic's LLM, will generate the output, "Virat Kohli scored 113 runs in just 50 balls with a tremendous strike rate of 226. The special aspect of this match is that Virat played this unbelievable knock when he had 9 stitches on his hand. This shows the level of dedication and determination of Virat Kohli." The chatbot effectively retrieves the response from the PDF document.
Now our chatbot is ready, which means our journey ends here. No, not that early. A journey without problems means our journey is not very exciting.
Problem -
The main problem in our journey is "PROMPT WHICH ARE UNRELATED TO THE PDF" . As I mentioned earlier, to make our journey non-linear, we will use the Anthropic Claude-2 as a chat model.
When the chat history, chunks of the document, and prompt are passed to Anthropic, it will provide answers only from these chunks. This approach works well when the prompt is related to the PDF.
But what if the prompt is unrelated to the PDF ? Here's where a big problem arises. If the prompt is something akin to simple chit-chat like "hello", "thanks", or "how are you", then Claude is not smart enough to catch that it is not a question from the PDF, but rather a general chit-chat question.
It is the biggest problem in our journey as it will give ambiguous results for general chit-chat. Let's assume we pass "Hello" as a prompt to the chatbot. It could respond in two ways:
1. It might respond with something like, 'From the given context, I don't have enough information to answer this question.
2. Second, it will give any answer from the context . like any chunk which comes with a chain , and these chunks are for sure unrelated to the prompt , but it gives it as an answer .
Clearly in the image , we can see it is not a correct answer for the given prompt.
Solution -
To solve this new type of problem in our journey, we need to come up with a new type of solution.
As we know, unrelated chunks affect the LLM's answer . What if we don't pass unrelated chunks to the LLM ? If we apply this, then there will be a higher chance of getting better results from the LLM, because afterward, the chat history and user input will remain as context for the LLM.
As I mentioned earlier, the document as a retriever gives us extra parameters for similarity search. So, we will use cosine similarity as a similarity search algorithm. It provides similarity search results between -1 and 1.
1. -1 means the chunk is totally unrelated to the prompt. The vectors are pointing in the opposite direction.
2. 0 means the vectors are perpendicular to each other.
3. 1 means the chunk is similar to the prompt. The vectors are pointing in the same direction.
Image Source: Milana Shkhanukova on Medium
If the similarity search score is above 0, then it means the vector of the prompt and the vector of the chunk are in a similar direction. This implies that the chunk is related to the prompt and can be useful for our result.
So, we will take 0.0 as a threshold value. The threshold value is like a barrier, below which all the values will be considered as unrelated results, and scores above the threshold value will be considered as related results.
So, in our chatbot, if we take 0.0 as a threshold value, it means all results above 0.0 will be taken as context. All these chunks will be related to the prompt.
If we ask a normal chit-chat question, then none of the chunks will be related to the document. As a result, no unrelated chunk will be passed to the chain. It will give a precise and correct result.
After implementing this , for the same chatbot , the same prompt changes the output , and gives the correct answer .
And so, our brief journey comes to an end. We have achieved our goal of creating a retrieval chatbot capable of engaging in casual chit-chat.
But, in every journey, we should learn and experience something new. Here, we learned about score thresholding, so let's discuss it a bit more.
Score Thresholding -
The threshold value acts as a barrier. All values below it are considered unrelated results, and scores above it are considered related results.
In relation to chatbots, we use score thresholding to determine which parts of the document are relevant to the user's prompt and which parts aren't. We determine this using relevance scores or similarity scores.
We can set a threshold value according to our needs and flexibility. However, the midpoint between the upper and lower values is generally considered a good threshold value.
Importance of Score Thresholding -
It is a crucial aspect of many machine learning and information retrieval tasks.
⦁ It enhances the precision of the document by determining the level of preciseness we want for our query results
⦁ It controls the number of results. A high threshold value can yield more precise results, but reduce the total number of results. This can also be achieved by specifying a K value. However, using a K value might sometimes eliminate closely related results or introduce unrelated ones.
⦁ By delivering more precise results, it enhances performance. In the context of a chatbot, it provides more relevant documents based on the query, making it easier for the Language Model (LLM) to identify the correct response.
⦁ It is very beneficial for low-end LLMs. Low-end LLMs often deliver inappropriate results as they sometimes fail to comprehend the actual query and respond appropriately. By using thresholding, unrelated results are removed before being given to the LLM. This ultimately makes it easier for the LLM to understand the query and provide a relevant answer.
CONCLUSION -
Building a chatbot is a complex task, so we should break it down into smaller tasks. Everything we do, like determining chunk size, choosing an embedding model, a chain, and a chat model, should be according to the needs and flexibility.
When making chatbots with low-end LLMs for answering questions, it is crucial to apply a threshold value to achieve high-precision results and prevent our chatbot from giving any ambiguous results to the user.
Milana Shkhanukova on Medium, Theaigodz on Medium