•  


GitHub - TinToSer/GPT4Docs: An Offline Document Enquiry LLM for Everyone
Skip to content

TinToSer/GPT4Docs

Folders and files

Name Name
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT4Docs

An Offline Document Enquiry LLM for Everyone

NO GPU

Works On CPU Only

You have freedom of benchmarking various models

Nowadays Everyone is curious regarding proprietary and data leakage with GPT, that's why we, the lovers of Open Source are bringing fully offline frameworks so that there is no leakage of data.

It usage Streamlit(Open source) as frontend so at first run it may prompt that usage analytics will be sent, so don't panic. Use our script to avoid all these panics by disabling it

https://docs.streamlit.io/library/advanced-features/configuration

The Idea is that Tool must work locally and don't upload our data to any server, so you can understand that it's not like OpenAI where your documents are loaded to remote server. In our case no uploads but download of framework and necessary files to run the tool

GPT4Docs is made in such a way that you follow below instructions and focus on work i.e exploit power of LLM to query documents, don't waste time in configuration

Install:-

  1. https://www.python.org/ftp/python/3.11.4/python-3.11.4-amd64.exe

    During installation don't forget to check "Add to path"

  2. Open CMD and run "pip install -r [path of requirements.txt in GPT4Docs folder]"

  3. use "git clone https://github.com/TinToSer/GPT4Docs.git " else your offline_files folder will be empty it should be 87 Mb

--------------Setup Done-----------------

Put the downloaded models in "models" folder, use the below link, remember anything before first dash in the name contains Type information so don't change the name or name accordingly

For example:-

llama-2-7b-chat.ggmlv3.q8_0.bin --- llama is Type name

mpt-7b-instruct.ggmlv3.q8_0.bin --- mpt is Type name

https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/blob/main/mpt-7b-instruct.ggmlv3.q8_0.bin

https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/blob/main/llama-2-7b-chat.ggmlv3.q8_0.bin

  • you can get various models from huggingface or facebook website .For example 7billion,30 billion,70 billion parameters models
  1. Put your PDF files in "data" folder

  2. Whenever new files are added or older files are removed from "data" folder then you have click "Rebuild VectorDB" in the browser app

  3. Double click "START.bat" and it will run the app in locally hosted browser

You can share your app to the world by port forwarding using ngrok etc.

-------------Contribution goes to below link, I have beautified only----------------------- *

https://github.com/kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference

- "漢字路" 한글한자자동변환 서비스는 교육부 고전문헌국역지원사업의 지원으로 구축되었습니다.
- "漢字路" 한글한자자동변환 서비스는 전통문화연구회 "울산대학교한국어처리연구실 옥철영(IT융합전공)교수팀"에서 개발한 한글한자자동변환기를 바탕하여 지속적으로 공동 연구 개발하고 있는 서비스입니다.
- 현재 고유명사(인명, 지명등)을 비롯한 여러 변환오류가 있으며 이를 해결하고자 많은 연구 개발을 진행하고자 하고 있습니다. 이를 인지하시고 다른 곳에서 인용시 한자 변환 결과를 한번 더 검토하시고 사용해 주시기 바랍니다.
- 변환오류 및 건의,문의사항은 juntong@juntong.or.kr로 메일로 보내주시면 감사하겠습니다. .
Copyright ⓒ 2020 By '전통문화연구회(傳統文化硏究會)' All Rights reserved.
 한국   대만   중국   일본