Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The pth file seems to be a model and weights, saved as described here:

https://pytorch.org/tutorials/beginner/saving_loading_models...

.chk file is am md5 hash of the file, the .json file contains this for the 7B model:

    {"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": -1}


Thanks, so from that PyTorch doc it seems that pickle format has the filenames of the model classes, but not the classes themselves. I'm sure someone will figure it out though!


The code (including the model) is here:

https://github.com/facebookresearch/llama

I already got the 7B model to generate text using my GPU! The 1st example prompt generated this:

[I believe the meaning of life is] to be happy, and it is also to live in the moment. I think that is the most important thing. I'm not really a party girl. I'm not a girl's girl. I have a really small group of close girlfriends and that's all I need. I believe in equal rights for everyone. I'm not a rebel. I don't really rebel against anything. I'm a very traditional girl, very loyal. I'm a mum's girl and I'm a dad's girl. People have a right to know what's going on. I don't care about the haters, because at the end of the day they're just going to have to deal with themselves. I've been getting more and more into fashion since I was about 16. I know I'm a little different, but so what? I think that's good. I don't think you should be like everyone else. It's my birthday, and I'll cry if I want to. I've always been a huge fan of fashion, and I've always liked to dress up


What GPU is it and how long did it take?


4090, about 10 seconds to load the weights and another 15 seconds to generate all the completions from the example script


Yikes, thanks

I have a 2060 and I am too afraid and poor to buy a 4090 after import duties and taxes in a tropical country


I could drop the batch size to 5, then the VRAM use seemed to be around 15GB. Some of that I'm sure is not necessary, and if you rewrite the outer products to use less VRAM you might get away with even less. Eventually someone will make a library so you can run it without extra work.


Yeah true, do you think that it a realistic expectation though? I ask this given the events that have led to the leaking of the models. I am genuinely not sure what the optics / real world ramifications are of being publicly associated with projects that leverage models obtained via torrents through either hacking or negligence.


If you look at how much infrastructure was quickly developed around Stable Diffusion, the same might repeat here. This also depends on how useful the model is but from the scores it looks like it's quite useful, and it's "uncensored" unlike commercial "online" models which is valuable on it's own. I suspect Facebook won't care and will be happy to get people to use an offline model since that means Microsoft and Google will make less money from online models. The model itself is licenced under the GPL, but I have no idea what that means when it comes to model weights.

Edit: It looks like it can code, I tried to autocomplete the first 2 lines and it wrote the rest. Local Github Copilot here we come?:

    //find index of element in sorted array in O(log(N)) time using binary search
    int find_idx(int a[N], int element) {
        int low = 0, high = N-1;
        while (low <= high) {
            int mid = (low + high) / 2;
            if (a[mid] == element)
                return mid;
            else if (a[mid] < element)
                low = mid + 1;
            else
                high = mid - 1;
        }
        return -1;
    }




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: