Each vector is an array of n floats that represent a location of a thing in an n-dimensional space. The idea of learning an embedding is that you have some learning process that will put items that are similar into similar parts of that vector space.
The vectors don’t necessarily need to represent words and the model that produces them doesn’t necessarily to be a language model.
For example, embeddings are widely used to generate recommendations. Say you have a dataset of users clicking on products on a website. You could assume that products that get clicked in the same session are probably similar and use that dataset to learn an embedding for products. This would give you vector representing each product. When you want to generate recommendations for a product, you take the vector for that product and then search through the set of all product vectors to find those that are closest to it in the vector space.
I’ve worked on e-commerce recs. Typically you would represent each product with a vector. Then finding similar products becomes a nearest-neighbour search over these vectors. Depending on your use-case it’s feasible now to do this search in the db using pgvector, or using something like solr/elastic which both support vector search in recent releases. You could also use something like faiss or one of the many nearest-neighbour libraries or dedicated vector search engines. (Since you are working with Elixir, you might find ExFaiss interesting [1][2][3]).
But I would say that for recommendations, searching the vectors is the easy part. The main work in getting good recommendations is generating a good set of product vectors in the first place. The quality of the recommendations is directly related how you generate the vectors. You could use one of the many open-source language models to generate vectors, but typically that approach isn’t very good for product recommendations. It will just give you items that are textually similar, and this usually doesn’t give good product recommendations.
To get good product recommendations you’d probably want to build a custom embedding that captures some notion of product similarity during training using some signals you get from user behaviour. E.g. things like products clicked in the same session, or added to cart at the same time, gives a signal on product similarity that you can use to train a product embedding for recommendations.
This is a bit more involved, but the main work is in generating the training data. Once you have that you can use open source tools such as fasttext [4] to learn the embedding and output the product vectors. (Or if you want to void training your own embedding, I’d guess that there are services that will take your interaction data and generate product vectors from them, but I’m not familiar with any).
> "selective treatment allowed Apple to pay an effective corporate tax rate of 1 percent on its European profits in 2003 down to 0.005 percent in 2014."
Tricks include transfer pricing; company A charges company B an arbitrary number to reduce company B's profit to zero (so it pays no tax) but this gets classed as exempt in company A.
That is the rate for Irish corporations, not foreign ones. Ireland, Netherlands, Luxembourg all negotiated bilateral deals with individual corporations so their effective tax rates are, at most, 3%.
"Show HN is for something you've made that other people can play with. HN users can try it out, give you feedback, and ask questions in the thread.
On topic: things people can run on their computers or hold in their hands. For hardware, you can post a video or detailed article. For books, a sample chapter is ok.
Off topic: blog posts, sign-up pages, newsletters, lists, and other reading material. Those can't be tried out, so can't be Show HNs. Make a regular submission instead. "
Add to that the fact that the meeting breaks up everyones day, making it more difficult for attendees to have long uninterrupted periods of deep, concentrated work.
Mind you, alignment before said work is important. I had a colleague that started at an assignment the same day I did. We had an onboarding day (lots of stuff to explain), and only a few hours in he was rubbing his hands asking when he could start coding.
But there wasn't anything to code yet, or if there was, that wasn't explained to him yet, so, ????
Morale: don't write code if there's no clear "what to build".
Eating also takes time and breaks up people's days. So should not eat?
Some of the debate I'm seeing here is lacking synthesis of the tradeoffs. First, getting people to communicate is essential (a collective activity). Second, giving individuals time to work productively is essential. Some work is done best solo, some is done best collaboratively. The key is intelligently striking a balance.
Here are 2 good articles that follow up on the arguments presented by VWO in that article.
From the first link below: "They do make a compelling case that A/B testing is superior to one particular not very good bandit algorithm, because that particular algorithm does not take into account statistical significance.
However, there are bandit algorithms that account for statistical significance."
The software is awesome for the purposes of writing the book. The reason I didn't go with softcover as a selling platform is that the new EU VAT rules came in and as a result I had to use someone that supported those new VAT collection rules.
So I ended up using softcover to generate the ebooks and html, but using gumroad (and leanpub) to sell them. So it's totally feasible to use softcover for writing your book but self-host the resulting html and keep control of your sales data etc.
In theory, if you are selling to customers in the EU you are supposed to collect VAT from your EU customers and make the appropriate VAT return to the VATMOSS authorities. Or else stop selling to EU customers.
In practice, if you're based outside of the EU then you can probably just ignore these regulations (many smaller US sellers seem to be taking this approach). It's hard to imagine the EU tax authorities chasing you down.
Ok thanks. It sounds like this should be something your payment provider (softcover via proxy through Stripe or whatever they use) would take care of for you.
Other platforms that sell items globally tend to charge EU customers more which probably account for the VAT increase, but in the end gives the author roughly the same amount per sale?
Yes, a lot of sales platforms now handle EU VAT. I've used Gumroad, Sendowl and Leanpub and they all handle the EU VAT problem. In each case the author sets a price. They then detect if the customer is in an EU country and apply the VAT rate for that country, adding it on to the customers total. So the author gets the same amount for each sale, but the customers pay different amounts depending on which country they are in.
Nice work! I'm a long-time emacs user who thought that emacs was the best way to write clojure code. I was really happy with emacs as a clojure dev environment but after trying cursive recently it's now my preferred environment for writing clojure.
What made you switch? I mean, if you are already familiar with Emacs, I don't see any reason to switch. CIDER and clj-refactor offer a lot more functionality, and the debugger is fantastic.
When I was using emacs, I would spend a good chunk of time maintaining my editor, not coding. Cursive Just Works, and as a consultant it means I can put my full time into client work.
> This package is not written for speed. It is meant to serve as a working example of an artificial neural network. As such, there is no GPU acceleration. Training using only the CPU can take days or even weeks. The training time can be shortened by reducing the number of updates, but this could lead to poorer performance on the test data. Consider using an exising machine learning package when searching for a deployable solution.
It seems the main aim of this software is educational, not production use.
Actually, I added that disclaimer on performance because of your first comment. I realized people were getting the wrong idea about my little example, and were thinking this could be used in place of packages like Caffe, Torch7, Theano, TensorFlow, ect...
Each vector is an array of n floats that represent a location of a thing in an n-dimensional space. The idea of learning an embedding is that you have some learning process that will put items that are similar into similar parts of that vector space.
The vectors don’t necessarily need to represent words and the model that produces them doesn’t necessarily to be a language model.
For example, embeddings are widely used to generate recommendations. Say you have a dataset of users clicking on products on a website. You could assume that products that get clicked in the same session are probably similar and use that dataset to learn an embedding for products. This would give you vector representing each product. When you want to generate recommendations for a product, you take the vector for that product and then search through the set of all product vectors to find those that are closest to it in the vector space.