Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I made a self-hosted ChatGPT UI (github.com/cogentapps)
141 points by tottenval on March 14, 2023 | hide | past | favorite | 52 comments



> Chat with GPT is an open-source, unofficial ChatGPT app with extra features and more ways to customize your experience. It connects ChatGPT with ElevenLabs to give ChatGPT a realistic human voice.

Looks like only GUI aspects of the UI are self-hosted, but that the text and speech aspects of the UI (and the bulk of the computation and IP) are provided by two SaaS services.

Self-hosted (and some degree of open) ML models are what a lot of people might want, so we should probably be careful when saying "self-hosted" right now, to not disappoint people or confuse discussion when talking about what we want.


It's somewhat ambiguous language - "self-hosted ChatGPT UI" could lead many to believe it's completely self-hosted.

However, sophisticated readers familiar with ChatGPT will know the model and weights haven't been released and absent a leak/hack/release by OpenAI a completely self-hosted ChatGPT solution is impossible. Eventually we'll almost certainly see a "Completely self-hosted ChatGPT equivalent" (similar to Dall-E vs Stable Diffusion) but that's another thread for another time.

Based on my native speaker parsing of English "Self-hosted ChatGPT UI" is accurate and I'm not sure how else I would write it to disambiguate between a self-hosted UI and a completely self-hosted ChatGPT with a UI.


> I'm not sure how else I would write it to disambiguate between a self-hosted UI and a completely self-hosted ChatGPT with a UI.

"Show HN: I made a self-hosted UI for the ChatGPT API"

"Show HN: I hade a self-hosted UI for a local GPT model"


Ironically, this sounded like an answer from chat gpt.

But more to the point, a fully self hosted solution (llama), even running on a cellphone, is entirely believable. Look at some of the recent developments with llama.cpp and Stanford over the last week.


"Show HN: I made a self-hosted UI for ChatGPT," perhaps?


It's a self-hosted UI for ChatGPT right now, but my primary goal is to build a good open source chat interface that can be adapted to open source chat models as they become available.

Integrating with Alpaca, Llama, ChatGLM, OpenChatBox and whatever comes next should be straightforward once people figure out reliable and fast methods to run the models locally.


I think if you integrated with Llama, this repo would be wildly popular. I downloaded the weights over the weekend and decided I didn't want to spend my free time working on an acceptable UI.


I honestly am not that familiar with this space. How realistic is it that someone could self-host a ChatGPT instance?

Assuming the model was available, how big are the models and what kind of hardware is necessary to run the instance?


OpenAI hasn't published any information about the size or hardware requirements for running ChatGPT. Reading between the lines, the default ChatGPT Turbo model seems to be significantly smaller than GPT-3 (it's a distilled model), but probably still heavier than the Alpaca and Llama 7B models people are running (very slowly) on their single GPU computers this week. You'd probably need multiple A100s to get comparable performance to the ChatGPT API.


Does the llama code that dropped leverage the GPU at all? On an M1 it appears to just run on as many CPU cores as you want to throw at it. The 65B heats up 8 cores real nicely, and it's slow, but I imagine it would be a lot faster on the GPU.


I've seen people saying that limiting it to 4 cores out of the 8 total can actually lead to improved performance. Have you seen that?


8 starts and runs a bit faster for me if plugged in and before the fan kicks on and the CPU starts throttling. Once that happens it's probably better to stick with 4.


All of the llama implementations for Apple are CPU only afaik.


If you run it with 4-bit quantization completely on the CPU (similar to llama.cpp), ChatGPT should run in about 90 GB of RAM. Which is easy to get your hands on for a desktop, but it's out of reach for notebooks.

Also expect performance of couple seconds per token in that setup, for now you need something involving GPUs


I think you’d need 2x A100 GPUs, which is $4.18 an hour on Runpod. If I was super bored I’d probably be willing to drop $50 for 10 hours to mess around with it.

https://www.runpod.io/gpu-instance/pricing


Probably should just call it OpenAI/ChatGPT Client.


It says self-hosted ChatGPT UI in the title, was that different when you posted?


I think it's within the broad meaning of self-hosting. One thing it means is that there's no random new company that you have to trust with your data.


I think ifs pretty clear what he meant by self hostef chatgpt ui. To assume the non ui aspected is also self hosted is illogical no?

Edit i think the title name was changed. Dang can you please show revision history otherwise i cant dicuss properly


I have tried this and many, many other ChatGPT frontends. I recently did a search for "chatgpt" on GitHub and filtered for frontends, but I was a bit disappointed with the results. Most of them seemed to be pretty similar and didn't offer anything new or unique.

I'm really interested in finding a frontend with LangChain integration that can switch between chat mode and doc mode or something along those lines. It would be great to have a more versatile tool for communication and collaboration.

Do any of you have any recommendations or know of any projects that fit this description?



What specific features would you like to see?


It's a shame that the screencast has no sound. I was curious about what it would sound like. I could try it myself via the netlify app but I don't feel very comfortable sharing my API key somewhere...


I posted a screencast on Reddit earlier in the development process with audio demonstrating the text-to-speech feature. The UI has changed a bit since then, but you can hear what the voices sound like:

https://old.reddit.com/r/OpenAI/comments/11k19en/i_made_an_a...


ChatGPT API can be a lot more useful when you use it in context. Like selecting a chunk of text on any web page, right-click, and select summarize/translate/ELI5. Or executing your own custom prompt.

I'm building a chrome extension called SublimeGPT[1] to do exactly that. Right now, you can log in to your existing ChatGPT account, go to any page, and open a chat overlay. Next version will have the context options.

[1] https://sublimegpt.com


you can also just use bookmarklet (or multiple defining different prompts):

    function __summarize(api_key) {
        var selection = window.getSelection().toString();
        if (selection.length == 0) return;
    
        var xhr = new XMLHttpRequest();
        xhr.open("POST", "https://api.openai.com/v1/chat/completions");
        xhr.setRequestHeader('Content-Type', 'application/json');
        xhr.setRequestHeader('Authorization', 'Bearer ' + api_key);
        window.scrollTo({top: 0})
        document.body.innerHTML = 'asking...'
        document.body.style.backgroundColor = "white";
        document.body.style.color = "black";
        document.body.style.fontFamily = 'monospace'
        document.body.style.fontSize = "16px"
        document.body.style.margin = "auto"
        document.body.style.padding = "1rem"
        document.body.style.maxWidth = "60rem"
        xhr.onreadystatechange = function() {
            if (xhr.readyState == 4) {
                if (xhr.status == 200) {
                    var response = JSON.parse(xhr.responseText);
                    var summary = response.choices[0].message.content;
                    document.body.innerHTML = summary
                } else {
                    try {
                        var e = JSON.parse(xhr.responseText);
                        document.body.innerHTML = e.error.message
                    } catch(e) {
                        document.body.innerHTML = 'error asking.. check the console'
                        console.log(xhr)
                    }
                }
            }
        }
    
        var data = JSON.stringify({
            "model": "gpt-3.5-turbo",
            "messages": [
                {"role": "system", "content": "Summarize the following text as if you are Richard Feynman"},
                {"role": "user", "content": selection}
            ]
        });
        xhr.send(data);
    }
(i have it as bookmarklet here https://gist.github.com/jackdoe/ce5a60b97e6d8487553cb00aa43f... change "YOUR API KEY HERE" with your key)


Sorry but I have to ask why the XMLHttpRequest instead of fetch?


no reason really, at the time i was not sure if the api will be too slow (like the chat web ui) and i will need progress bar, but by the time i found out i dont, the code was already written


You can use streaming


And when you want to create/edit/delete/import custom prompts? AI is a commodity now and a great UX drives adoption.


then you download 1xdeveloper's extension :)

though i just copy and paste the bookmarklet and change the prompt


Hehe, I'll put it on github when it reaches 1.0.


Looks great! I have something very similar:

https://github.com/Niek/chatgpt-web


Is this allowed under OpenAI's ToS? I just don't want to connect my account and then get it banned.

Edit: It seems like it is just using the API instead of the web interface, and thus charging my account each time. I originally thought it was injecting into the free web interface. But is changing the system prompt going to get me banned?


Changing the system prompt is not going to get you banned, as it's something OpenAI encourages people to do when making API calls to gpt-3.5-turbo. [0]

[0]: https://platform.openai.com/docs/guides/chat/introduction


Thanks for sharing. It's really quick with responses. At least compared to couple of other frontend projects for chatgpt/OpenAI API clients I've used in the past few days.


What I think i need is something like this, but in bookmarklet form. I click it, it prompt()s me for the prompt and displays the output in a textarea so i can quickly paste it. Thinking of it it should be possible to put the output straight into the clipboard, right? The use case of course would be email/forum communication. The problem is that you have to make a UI to embed the API key into the code, because pasting it into an urlencoded script is bound to be a pain.


What I think would be cool is taking automatically from highlighted text in any app, falling back to my clipboard as input, and then outputting to my clipboard.

That way it works in any app automatically. Seamless system wide clipboard read is a big ask though, so ideally you'd want a self hosted model like llama.cpp


Apologies if this is so unrelated as to be off-topic, but I'm new to this and so my mental model is incomplete at best and completely wrong at worst. My question is:

How would one create a "domain expert" version of this? The idea would be to feed the model a bunch of specialized, domain-specific content, and then use an app like this as the UX for that.


Either you can try it out with a longer system prompt, or wait until OpenAI releases a fine-tune API for the gpt-3.5-turbo model. The system prompts aren't designed to be very long, so the fine-tune is definitely what you'd be looking for. But it's only provided for the older models, so it's outdated at this point.

I guess you could also try to tack on an extra layer before the actual API call, and make your own system that includes key bits of info to the prompt from a more specific data set. But I'd guess at this rate of new releases from OpenAI, it might be a safe bet to wait the couple of weeks until they update the fine-tune API.


I just did this exact thing, it's very easy.

https://dev.to/dhanushreddy29/fine-tune-gpt-3-on-custom-data...


This actually has nothing to do with fine-tuning in the technical sense. You are actually using vector search and injecting the results of that into the prompt for GPT.

It is a good approach, but to use the word “fine-tuning“ for that is confusing, given that OpenAI actually has a process for fine-tuning, which works in a very different way.


Would be cool if they add support for llama.cpp


You really want it integrated with an OpenAI API clone rather than directly integrated. Otherwise, interoperability will suffer greatly as new and improved models are released.


I like it. The chat.openai.com frontend is very slow and frequently breaks, so I would consider using this. Have you considered adding different tts providers? It doesn't get better than elevenlabs right now, but they are also much more expensive than for example the azure neural voices.


> is very slow and frequently breaks

True, and the free version does it lot, almost on purpose.

The paid version is a lot faster and doesn’t break as often, but it still breaks (eg. For the last two days, the chat list on the sidebar disappeared and it showed a message saying “don’t worry, your chats will show up eventually”).


Yes, I plan to add other providers soon, and native text-to-speech as well.


This would be really useful if the API key could be stored in the config file


Do you know if people get charged for prompts now on the original chatGPT site now that the API is out? Or is it still free for users that use the original site?


It's still free. On the site: "Free Research Preview."


A simple ChatGPT client can be very simple .html and a .js file that runs all locally and stores data in browser local storage.


Thank you!

I can't wait to test this! As other have mentioned, the "free" chat frontend is slow and the "Plus" one, not much better. Also, at $20/month, based on my usage, it's actually more expensive than using the API.

The last hurdle: as ChatGPT is not GDPR compliant, it would be really interesting/useful to find a way to "hide" the queries from openai and prevent the usage of your input in future training - basically, a self-hosted, non-leaking, chatGPT.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: