Artificial intelligence has been touching our lives in one or another in a more profound way since the advent of cloud computing. The big tech companies have constantly leveraged power of AI to deliver lower costs to customers by suggesting changes in the resources in cloud. The mini movie that gets generated on our iPhones or Android phones too is a result of an AI algorithm in the background. However, with the launch of ChatGPT, powered by GPT, owned by OPENAI, has made a technology - GEN AI much more accessible to everybody. Business around the world are brainstorming on what use cases they can use the technology for.

One thing is certain that we are just at the surface here and sky is the limit as we progress through the decade. Companies and businesses around the world, even if they do not want to, will be compelled and forced to provide better customer experience and services. Imagine a travel site today requires 50+ click to plan an itinerary but with the power of GEN AI, a bot can plan everything for you with few sets of questions. This is just an example and potential is immense.

In this article, I would like to talk about a use cases we solved to reduce the time spent by team members to go through a document but rather giving them a tool at hand that can be used to answer specific question they might have from the document. Instead of going through the document and using the compute of our brain, handing it off to LLMs on AWS Bedrock, a productivity boost was achieved.

Let's dive into the architecture.

The above architecture uses a series of services across AWS :

  • API Gateway
  • ECS - for compute and cache layer
  • Vector Database - pickle files on S3
  • Foundational models such as Claude and Titan on AWS Bedrock

Amongst the libraries and components used, we used React for the frontend, FAST API for the back end api along with lang chain to interact with AWS bedrock.

The architecture is pretty straightforward along with the customer journey. There are 2 user stories associated with it - 1 : Uploading and preparing the document for interaction. 2: Interacting with the document.

Find below the steps for customer journey for use case of uploading a file and making it ready for interaction:

  • A user uploads a file
  • File gets stored on a S3 bucket
  • A backend process triggers to chunk the file using lang chain.
  • Conversational buffer is also used to keep the customer prompts and responses.
  • The chunks are individually vectorized by invoking Titan LLM model
  • FAISS is used as a in memory vector database
  • The vector data is serialized as a pickle file and stored in S3.
  • A notification is sent to the front end that file is ready for interaction.
  • User starts asking questions in "Natural Language". Example : What is this document about ?
  • In the background another process to generate a summary of the document and generate FAQ(s) is also triggered in the background so that it's ready for the user.

The customer journey for interacting with the document is stated below:

  • Customer asks questions in Natural language.
  • A back end API vectorizes the question asked by the user.
  • In memory vector database - FAISS loads the respective pickle file from S3 and performs the similarity search.
  • Similar texts along with the prompt is sent to LLM - Claude on AWS bedrock to provide a response.

Since we use lang chain, the conversational buffer is maintained keeping the context of all the conversations for the user.

It's amazing how the similar search also performs well in terms of out of context questions being asked and does the best match possible.The system can also be made smarter by using Agents (Bedrock Agents) to communicate to other systems as well.

Why choose Pickle files instead of using a vector data base such as Weviate or open search ?

It's a valid question and it's a matter of tradeoff. When a user is interacting with a document, the context needs to be of document itself. Having a shared vector database such as open search will be much more nuance in this case. A lot more work needed to be done for Authorization controls and making sure the right context is picked as well. However, if you are building a knowledge base of documents that does not need AuthZ or the data in the documents is not related at all, then using systems such as AWS Kendra or vector DBs such as Open search should not be an issue. In fact moving to pickle files in such case would not scale and also will not be be an ideal solution.

Hence, no one strategy fits everything.

How do we summarize or generate FAQ(s) for larger documents - 100 MB+?

It's a great question to ask as well. Unfortunately, all LLMs are restricted by the context window size. In order to generate Summary of FAQ the entire document must be presented to the LLM so that appropriate response can be generated. But here the context window size is a limiting factor. Claude is little ok with 100,000 limit. Refer here.

There are certain model specific strategies to increase the context window size can be employed. One of the other strategies could be an architecture such as below:

However, this should not be done as a Sync call since the user might end up waiting for ever. Above all, the API gateway has a time out of 30 seconds max so connection might time out.

To conclude, we looked at how an application can be architected that can help interact with documents along with vectorizing it using Titan model and keeping the context in a secure manner. There are also strategies on how to improve customer experience by employing caching, lang chain conversational buffers, and performing async operations such as summarizing and generating FAQ(s).

Sources of Article

AWS bedrock

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE