Select Page

RAG-POWERED KNOWLEDGE BASE

Web ScrapingLLMRAGPrompt Engineering

CONTEXT AND OBJECTIVE

As part of the Swiss democratic system, the legislative branch of the government can ask formal questions to the executive branch, which is then required to answer in written form. These answers contain interesting information on various public topics such as healthcare and education. While publicly available, they are difficult to find in practice. Web searches are not reliable and navigating the government website can be time consuming.

The goal of this project was the retrieve in a knowledge base all publicly available documents, and to design a RAG interface to provide fast and convenient access to all this information.

WHAT WAS DONE

Using Python’s BeautifulSoup packages, the relevant documents were scraped from the government website.

They were uploaded on a cloud Retrieval-Augmented Generation (RAG) platform, powered by OpenAI ChatGPT-4 API. The system was configured, via prompt engineering, to act as a chatbot specialized on retriving answers from the executive branch.

 

RAG system web interface

The RAG system acts a chatbot that can be queried on the content of its knowledge base. In this example, the question was about the COVID-19 crisis. The chatbot provides references to the information found in the knowledge base, and gives links to the source documents, for direct download.

Configuring the RAG System

Via prompt engineering, a “custom persona” was design, providing context, rules and guidelines to the system. In total, about 5,000 documents were uploaded to the knowledge base, which represents more than 20 million words.