Would you be willing to create a guide? I think this would be of great help.

horsawlarway · on July 7, 2023

I started here https://github.com/ggerganov/llama.cpp

Which won't run everything, but will run model in the GGML format such as https://huggingface.co/TheBloke/llama-65B-GGML

The steps are basically:

1. Download a model

2. Make sure you have the latest nvidia driver for your machine, along with the cuda toolkit. This will vary by OS but is fairly easy on most linux distros.

3. compile https://github.com/ggerganov/llama.cpp following their instructions (in particular, look for LLAMA_CUBLAS for enabling GPU support)

4. Run the model following their instructions. There are several flags that are important, but you can also just use their server example that was added a few days ago - it gives a fairly solid chat interface.

frognumber · on July 7, 2023

I'll make a simpler guide:

1) Go to https://gpt4all.io/index.html

2) Click the downloader for your OS

3) Run the installer

4) Run gpt4all, and wait for the obnoxiously slow startup time

... and that's it. On my machine, it works perfectly well -- about as fast as the web service version of GPT. I have a decent GPU, but I never checked if it's using it, since it's fast enough.