•  


GitHub - erikhren/RAG: Upload a file and chat with your data.
Skip to content

erikhren/RAG

Repository files navigation

LinkedIn GitHub Repo stars

?? Retrieval Augmented Generation (RAG)

This application is designed to enhance data interaction through natural language processing. By uploading documents in various formats (PDF, CSV, XLSX, TXT), users can leverage the power of OpenAI's language models to query and interact with their data directly. This application is particularly useful for data analysts, researchers, and business professionals who need to extract insights from their documents swiftly and conversationally.

Features

  • File Upload: Supports multiple file formats including PDF, CSV, XLSX, and TXT.
  • Data Processing: Transforms raw text into actionable insights by extracting key information and generating relevant questions.
  • Text Embedding and Retrieval: Uses OpenAI's embedding models to create dense vector representations of the data which are then indexed for efficient retrieval.
  • Interactive Chat: Allows users to ask questions and receive answers directly related to the uploaded content.
  • Flexible Configuration: Provides options to select different models and parameters to customize the processing according to the user's needs.

What is RAG?

Large Language Models (LLMs) are developed using vast datasets, but they don't initially include specific user data. Retrieval-Augmented Generation (RAG) addresses this by integrating user-specific data with the existing datasets accessible to LLMs.

In the RAG framework, user data is first indexed, making it searchable. When a user submits a query, the system consults this index to extract the most pertinent information. This selected data, along with the user’s query, is then submitted to the LLM in the form of a prompt, prompting the LLM to generate a relevant response.

Whether you are developing a chatbot or another type of interactive agent, understanding how to incorporate RAG techniques to incorporate relevant data into your application is crucial.

Getting started

Create a .env file, add your OpenAI API key & desired directory where index will be stored (default is ./index):

OPENAI_API_KEY='your_api_key'
DIR_PATH='your_directory'

Prerequisites installation

Windows

NOTE: On latest Windows version, command wsl --install will automatically configure and install the WSL 2. If the command is missing on your system, you can also do perform manual installation

Linux

Common

Starting up

You can clone the git repository on windows filesystem or inside WSL 2. WSL 2 is preferred as the file system performance is better.

Windows

Open Powershell or WSL 2 terminal and issue the following commands (you can choose a different name for "myprojects"):

mkdir ~/myprojects
cd ~/myprojects
git clone git@github.com:erikhren/RAG.git
cd RAG
code .

Once visual studio code opens, select Extensions in the left menu and type Remote Development extension pack into search field and install the one at the top (it should be authored by Microsoft).

Once the extension is installed, select Remote explorer from the left menu and select Containers from the drop-down menu at the top of the navigation pane. Click the + sign and select Open current folder in container . The container should be built and started.

Alternatively, you can use Ctrl-Shift-P and and select Dev Containers: Open Folder in Container and select current folder.

Python interpreter selection

When starting the container for the first time, the extension might incorrectly determine the python interpreter location. To fix this, open any python source file (e.g. cli.py) and check the interpreter in the lower right corner. It should read: 3.11.0 ('.venv': venv). If it shows other value, click on it and select 3.11.0 ('.venv': venv) from the drop down.

Python path

Python path environment variable is set in '.devcontainer/devcontainer.json' in remoteEnv section. This sets the module root directory to app so any imports of custome modules are relative to that directory.

About

Upload a file and chat with your data.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
- "漢字路" 한글한자자동변환 서비스는 교육부 고전문헌국역지원사업의 지원으로 구축되었습니다.
- "漢字路" 한글한자자동변환 서비스는 전통문화연구회 "울산대학교한국어처리연구실 옥철영(IT융합전공)교수팀"에서 개발한 한글한자자동변환기를 바탕하여 지속적으로 공동 연구 개발하고 있는 서비스입니다.
- 현재 고유명사(인명, 지명등)을 비롯한 여러 변환오류가 있으며 이를 해결하고자 많은 연구 개발을 진행하고자 하고 있습니다. 이를 인지하시고 다른 곳에서 인용시 한자 변환 결과를 한번 더 검토하시고 사용해 주시기 바랍니다.
- 변환오류 및 건의,문의사항은 juntong@juntong.or.kr로 메일로 보내주시면 감사하겠습니다. .
Copyright ⓒ 2020 By '전통문화연구회(傳統文化硏究會)' All Rights reserved.
 한국   대만   중국   일본