More

bazizbaziz · on Jan 5, 2018

> There's no reason to predict a branch if you're not going to execute speculatively.

Not quite. Branch prediction is typically used on non-speculative architectures in order to avoid pipeline bubbles. (You could argue that pipelining is a form of speculation)

Here is the branch prediction documentation for one of the processors they claim is not vulnerable. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....

Whether or not they're vulnerable has more to do with how their pipeline is structured. It's possible for an architecture to be vulnerable if a request to the load store unit can be done within the window between post-branch instruction fetch/exec and a branch resolution. Eyeballing the pipeline diagram from the above docs, it looks like you can maybe get a request to the LSU off before the branch resolves. dramatic music

bazizbaziz · on Nov 28, 2017

"There is nothing to support bitcoin except the hope that you will sell it to someone for more than you paid for it.”

Genuinely asking, isn't this the plan for many people? How is bitcoin different than other commodities or property in this regard? Would Bogle say the same thing about those investments?

CPLX · on Nov 28, 2017

There is a big difference. If you buy wheat futures and nobody wants to buy them, you can take delivery of a bunch of wheat. If you buy investment real estate and nobody wants it, you can live in it, or build things on it. If you buy stock in a random Fortune 500 company and nobody wants it, you can take possession of a bunch of desks or factories or inventory or whatever their deal is. And if you take possession of a bunch of US currency you can be assured that you'll be able to use it to settle debts or pay your taxes as long as the US government is around and has better guns than everyone else.

Bitcoin is different from all those things.

jozzas · on Nov 28, 2017

> If you buy stock in a random Fortune 500 company and nobody wants it, you can take possession of a bunch of desks or factories or inventory or whatever their deal is

Not really, the value of the stock approaches zero, and when the company closes up shop I doubt you get anything at all as an investor. Assets would typically be sold to pay off outstanding debts.

tptacek · on Nov 28, 2017

More fundamentally, the value of a share of stock is tied to earnings. When companies accumulate capital beyond their horizon of their operational needs, they issue dividends.

Yes, companies can go out of business, but that's not the point Bogle is making. He's not saying stocks are riskless. He's saying that their valuations are tied to notional future earnings and the dividends they will generate.

wallace_f · on Nov 28, 2017

All those things typically have a potential DCF PV associated with them. Bitcoin is 100% speculative.

econner · on Nov 28, 2017

Yes, there's really nothing that gives Bitcoin value beyond new investors being willing to pay more for it than previous investors. Most people don't transact using it and the network is far too slow to support that use case anyway. It is a "store of value" that loses or gains half its value every few months. The community is torn with 3 separate forks & some of the earliest Bitcoiners have gone all in on Bitcoin Cash. The majority of the hashing power is controlled by a few entities and is a far cry from the decentralization originally envisioned. The communities online no longer talk about Bitcoin as a useful tool. They just talk about getting rich off of it. coinmarketcap.com, a site only for checking the price of cryptocurrencies, is a top 500 site globally according to Alexa, meaning there are a lot of people checking the price of bitcoin, few people using it for anything.

ohhhlol · on Nov 28, 2017

> most of the early Bitcoiners have gone all in on Bitcoin Cash.

this is an absurdly untrue statement

econner · on Nov 28, 2017

Ok, "most" was the wrong word, but Roger Ver and Gavin Andresen were instrumental in the success of Bitcoin and they now support BCH.

lostsock · on Nov 29, 2017

Gavin Andresen maybe, but what has Roger Ver done?

econner · on Nov 29, 2017

In April 2011 Roger Ver bought enough bitcoin to drive the price up from $1.89 to $3.30. His company was the first to accept Bitcoin for payment. He paid for the first radio and billboard advertisements for Bitcoin. He funded BitInstant, Blockchain.info, BitPay, Ripple, and Kraken. He helped found the Bitcoin foundation. He started an online store that accepted only bitcoin. He essentially became a spokesman for Bitcoin.

makomk · on Nov 29, 2017

Bitcoin is processing roughly an order of magnitude more transactions than Bitcoin Cash right now. Though the fact that Bitcoin Cash is only running at about 1/100th of its nominal capacity hasn't stopped the community cheering for another hard fork to increase that capacity limit further, presumably because its price is driven by speculation about how much it could do if anyone actually cared about it.

kafkaesq · on Nov 28, 2017

How is bitcoin different than other commodities or property in this regard?

The key distinction is that, mercurial though their price fluctuations may be - traditional commodities such as oil or real estate at least have some intrinsic value, and hence, an intrinsic floor to their valuations. Meanwhile, to the extent that any of these "coins" have such an intrinsic value - if they can even be thought of as "currencies" at all - it is extremely hard to pin down.

Which is Bogle's central point: to the extent that any of these instruments have "value", it's in the belief that ... that value will keep going up, and up, and ever up.

saryant · on Nov 28, 2017

Property can be used to generate returns (rents). Bitcoin cannot.

MrFantastic · on Nov 28, 2017

Mature stocks will usually start paying a dividend.

Equity in a company means you own part of that company.

Owning Bitcoin means you own a BTC.

AnimalMuppet · on Nov 28, 2017

If I buy stock in a company, and the company pays dividends, I get the dividend income until I sell the stock. That is, I bought a future income flow. Same thing if I bought an income-producing property.

Raw land? Non-income-producing property? Gold? Not so much.

phnofive · on Nov 28, 2017

"This is a great question, in part because it has no easy answer."

https://www.investopedia.com/ask/answers/09/difference-betwe...

bazizbaziz · on Nov 27, 2017

Acquisition and administration of public housing projects. The world needs more publicly housing for people of all economic statuses. It seems incredibly time consuming to build or convert private housing due to vast amounts of paperwork and groups that must be involved to finance and manage these projects. Automation can help to find a business plan, search for funding, help run governance, and do accounting for the on-going operations.

bazizbaziz · on Oct 31, 2017

FWIW it should be possible to vectorize the search across the row store. With 24 byte tuples (assuming no inter-row padding) you can fit 2.6 rows into a 64 byte cache line (a 512 bit simd register). Then it's just a matter of proper bit masking and comparison. Easier said than done, I figure because that remainder is going to be a pain. Another approach is to use gather instructions to load the first column of each row and effectively pack the first column into a register as if it were loaded from a row store and then do a vectorized compare as in the column-store case.

All of that to underscore it's not that one format vectorizes and the other doesn't. The key takeaway here is that with the column store, the compiler can automatically vectorize. This is especially a bonus for JVM based languages because afaik there is no decent way to hand-roll SIMD code without crossing a JNI boundary.

glangdale · on Oct 31, 2017

This isn't that hard. A sane person would do 3 cache lines at once as 192 bytes = 8 24 byte tuples. You would do 3 AVX512 loads, a masked compare at the proper places (actually, I think the masked compare instructions can take a memory operand, which might get you uop fusion so the load+compare is one instruction) yielding 3 masks each with 16 bits (of course, most of the bits would be junk). The 16 bit masks can be shift+or'ed together (whether as "k-regs" or in GPRs) and the correct bits can be extracted with PEXT.

The downside of this is that you are still reading 6 times as much data. A straightforward implementation of this should not be CPU bound IMO. If a Skylake Server can't keep up with memory doing 32-bit compares I'll eat my hat.

Gather is not a good idea for this purpose. Gather is very expensive. It's really mainly good for eliminating pointer chasing and keeping a SIMD algorithm in the SIMD domain.

jacquesnadeau · on Oct 31, 2017

Possible, yes. Performant or pleasant? Maybe not :)

bazizbaziz · on Oct 27, 2017

Reminds me of this paper: https://blog.acolyer.org/2017/08/25/towards-deploying-decomm...

bazizbaziz · on Oct 4, 2017

I notice on the tech spec's page that the Mini also has bluetooth [0]. Would love to know if this means all connected homes will play (the same) audio via bluetooth.

[0] https://store.google.com/product/google_home_mini_specs

bazizbaziz · on Sept 14, 2017

Undecidable in general, but there are two approaches I've seen work based on AST manipulation and comparison:

1. Solvers that search for applicable re-write rules to transform X into Y, such as Cossette for SQL. These may not terminate because undecidability lol http://cosette.cs.washington.edu/

2. Canonicalization of the AST. This is a form of #1 but much more restricted, and the hope is that functions that are equivalent end up canonicalized in the same way. LLVM and GCC do this for a variety of reasons. In the example given, you'd hope that both functions get canonicalized to either the left or right hand side. https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations... http://llvm.org/docs/MergeFunctions.html

bazizbaziz · on June 12, 2017

> "The growing body of Big-data, HPC, and especially machine learning applications don’t need Windows and don’t perform on X86. So 2017 is the year Nvidia slips its leash and breaks free to become a genuinely viable competitive alternative to x86 based enterprise computing in valuable new markets that are unsuited to x86 based solutions."

Google's TPU paper [0] showed the CPUs were relatively competitive in the machine learning space (within 2x of a K80). It's not true that x86 doesn't perform on these workloads.

The existence of the TPU itself threatens Nvidia's dominance in the ML processor space. Google built an ASIC in a short time period that more than rivals a GPU on these tasks. The TPU performance improvements (section 7) make it look very straightforward to get even better performance with a few more years of development effort. With developers moving to higher level libraries, migration between GPU/CPU/TPU becomes painless, so they'll just go with whatever has the lowest TCO. (Google hosted TPUs?)

Aside from machine learning tasks, the author seems to be advocating for the cpu/gpu combinations that AMD is already selling to game console manufacturers. Granted, Nvidia has a piece of this via the Switch. If Microsoft/Qualcomm goes full-on with their ARM-based x86 emulation, then perhaps a future ARM-based Xbox is in the cards driven by an Nvidia chip? /speculation

[0] https://arxiv.org/abs/1704.04760

bazizbaziz · on Dec 14, 2016

This is really great work, and awesome that they're developing this as open source. Combined with the results from HyPer folks it sure is starting to look like using LLVM to specialize code on the fly is a good idea for any data processing engine.

Looking more closely at the benchmarking results has me scratching my head, though: their reported 16x performance benefits from codegen for TPCH Q1 has seemingly dropped to 2x when compared to the [REDACTED] database. What's happening?

My guess is that Impala is sort of inefficient in a few places that still need work (which is OK, this is not a criticism of that). I bet that [REDACTED] is quite efficient due to having been in development for least 2x longer than Impala. Maybe even closer to 10x. In which case, getting within 2x is fantastic!

bazizbaziz · on Nov 22, 2016

How do people in production handle the possibility that your service might miss a webhook notification? If you miss a notification you'll end up with stale data and you won't know it.

Slack has a retry policy for a while but will then just give up. Another webhook provider I've looked at says nothing at all about this sort of thing. How do folks deal with this in production systems?

Seems to me like the best way to address this issue is to use the webhook as a hint that you need to run some other process that guarantees you've got all updates.

johns · on Nov 22, 2016

When I was at IFTTT (a few years ago, so it's definitely changed since then) we tried not to rely on the content of the webhooks and just used them as a hint as you describe to fetch new data. Not every API made this easy though.

If receiving a webhook is critical, you should make your receiver do as little as possible to place the event into a resilient queueing system and then process them separately. That won't save you from bad DNS, TLS, etc. configs but it should help reduce the possibility that you DoS yourself with a flood of webhook events.

Also (shameless plug), you could monitor and log them (we offer retries if your server fails): https://www.runscope.com/product/alerts

developer2 · on Nov 23, 2016

I would prefer to implement the sending of webhooks in bulk - if the consumer falls behind, they receive up to 100-1000 webhooks per request (depending on the size and complexity of each individual webhook - ids only is 1000, complex documents 100). This drastically cuts down on the number of concurrent requests to a single client when load is high, or the consumer broke down for a period of time.

Unfortunately, developers writing code to receive batch requests are often... inadequate, to say the least. They'll write basic looping code without any error/exception handling; so if the 3rd item in a bulk request of 100 items causes a server-side error for them, they throw a 500 Internal Server Error or similar and fail to continue processing items 4 through 100. You simply cannot batch webhooks as a producer, unless you detect a single failure from the client to process a batch as a cue to drop to performing "batches" of size 1 until you receive an error for a single request, at which point you return to bulk. Rinse and repeat.

Honestly, being the producer sending webhooks to consumers which are written by random developers is a nightmare. You have to understand that your customers will not write proper code to accept your webhook requests, even if each request is for a single webhook. You also must understand that your customers will not look to blame themselves for shitty code. You can retry 1,000 times over a 48 hour period, and if their code still fails to process the webhook, it will be YOUR fault, not theirs. Truthfully, it is horrible to be on the sending end of webhooks to random developers/customers.

_pmf_ · on Nov 23, 2016

Transactions are obviously too enterprisey for fast moving unicorns; better spend 3 weeks to badly hack together a ridiculous farce.

notgood · on Nov 23, 2016

I don't understand, if it's such a nightmare why don't you (the producer) create the code/libraries to use those webhooks? At least in the 2 most common platforms (e.g. PHP, Java)

madamelic · on Nov 22, 2016

Stripe has a retry policy as well.

You can set up something where it will alert you if there are too many failures in a certain time period. That isn't offered by Stripe but you can build it.

If you mean in the case of "catastrophic failure", there is none.

If there is a "catastrophic failure" (machine gets shut off for a week, data center blown up, whatever), there are probably bigger issues or we probably would already know.

brandur · on Nov 22, 2016

Stripe has an "events" API that can be polled to receive the same content that you would have received via Webhook [1].

(Disclaimer: I work there.)

If you missed some Webhooks due to an application failure, it's possible to page through it and look for omissions. I've spoken to at least one person integrating who had this sort of setup running as a regular process to protect against the possibility of dropped Webhooks. This usually works pretty well, but does start to break down at very large scale where events are being created faster than you can page back.

The possibility of dropped events is a major disavantage of Webhooks in my mind -- if you consider other alternatives for streaming APIs like a Kafka/Kinesis-like stream (over HTTP) that's simply iterated through periodically with a cursor, you avoid this sort of degenerate case completely, and also get nice things like a vastly reduced number of total HTTP requests, and guaranteed event ordering.

(But to be clear, Webhooks are overall pretty good.)

[1] https://stripe.com/docs/api#events

madamelic · on Nov 22, 2016

Oh gosh, that is super neat! :)

I never even thought of using it that way. I just use events to check that it is a valid Stripe event (Probably easier / better to set up the ELB to only listen to certain addresses)

Rapzid · on Nov 23, 2016

Some further related reading; Fowler talks polling for events in some of his Enterprise Integration stuff http://www.martinfowler.com/articles/enterpriseREST.html

EDIT: Not Fowler, but his site lol.

sanjeevkm · on Nov 23, 2016

We[1] had a similar problem with clients reporting to us about lost callbacks[2] (our term for webhook). To solve it, we have built two options.

- Get a notification email everytime the callback fails. The email contains the same information the callback was supposed to deliver

- Retries. We retry for the next 24 hrs (max) with an interval of 5 mins or until the callback call succeeds (within those 24hrs). We created a sub-resource called `calls` (/callbacks/[id]/calls) that keep the status of the call we made. If it succeeds, the status changes to "SUCCESS", if it fails, it remains in "FAILED". If even after 24hrs the receiver system being down, and the call does not succeed, the developer can make a call to GET /callbacks/[id]/calls?status=FAILURE and receive all the failed calls. They can process the content and do a PUT /callbacks/[id]/calls?id=ID1&id=ID2&id=ID3... with body as `{ "status": "SUCCESS" }` to mark them as "SUCCESS".

The calls are saved for upto 7 days, so that the dev has enough time to fix their server issues, and get back all the lost callback calls. This solved much of the client issues.

* An added benefit of this came to the devs who could not get an inbound POST from us into their network due to firewall restrictions. The firewall restriction defeated the purpose of live callbacks, but with the `status` option, they only checked for new (`FAILED`) notifications once every 2 hrs or so , and mark the one processed with `SUCCESS`. This way, they only look for `FAILED` and process when they have one. Else, nothing to do.

[1] Whispir - https://www.whispir.com/ [2] https://whispir.github.io/api/#handling-callback-failures

blairanderson · on Nov 23, 2016

I have recently moved all received webhooks to a job queue and have been very happy. you can retry the processing on your own terms.

akamaozu · on Nov 24, 2016

This.

Previous devs were doing expensive things whenever we received webhooks. This meant we DoS'd ourselves every time a sizable amount of webhooks came our way.

Set up a tiny server on Heroku that received the webhooks and put them on a queue. A worker with a configurable concurrency level later forwards the events on the queue .

Dropped from four digit 502s and 504s weekly to virtually none.

stevekemp · on Nov 23, 2016

Agreed.

It also allows you to do testing by injecting pre-cooked payloads into your queue system.

iddqd · on Nov 22, 2016

Maybe webhook providers could provide an endpoint where one could poll for events that failed to deliver.

developer2 · on Nov 23, 2016

The good APIs do, but it's still at a loss to both sides.

a) The producer of the events has to store them in semi-permanent storage. I've been there and done that - failed webhooks result in a table of tens of millions of rows, even if the memory on each event is only 48 hours. It's astounding how many events fail to process. And I've been through extensive verification that there is truly no problem on our side - it's always the client who is wrong. Emails back and forth for weeks with the client screaming "it's your fault!" - only to finally receive an "oops, we found the problem on our end... sorry".

b) Frankly, if the consumer of the events fails on a single webhook more than 5 times in a 24 hour period, that event is a permanent loss. The reason it fails consistently is because that specific event is a permanent failure to process on the consumer's side. It is probably throwing a 500 Internal Server Error or similar - every single time. 0.001% of webhook consumers actually have emergency alerts when webhooks fail on their end, so the job will continue to throw a silent/unlogged/unnoticed/ignored error no matter how many times you retry. These are the same type of developers who will never poll your "failure queue", because they don't even understand that their consumer endpoint throws 500 Internal Server Errors on 10% of your requests. You're trying to provide a service to developers that live in a fantasy world where errors and exceptions never happen on their end.

It's a simple fact that developers who consume webhook requests are a disgrace. Chances are that if a request fails two times, it will never succeed. And yet the best APIs will try hundreds/thousands of times over a 24 hour period - simply to prove to that client that it is their fault that they are not processing webhooks properly. There is only so much a webhook producer can do. There is no magic we can do if the consumer is copy/pasting PHP snippets from Google or Stackoverflow.

Story time. The most memorable situation I can remember is a client who was experiencing 100% webhook consumer failure for more than three weeks. The emails from their team - and subsequent phone calls from their CTO - were absolutely stunning; it got to the point that we were hounding our own business people to drop them as a client, the verbal abuse was that bad. Turns out they had a bunch of PHP developers who were for the first time writing their consumer webhook endpoint in C for some reason. They were trying to parse the custom "id" field that they sent us as a string in a JSON field, as an integer. It was all because they sent us a string, and choked on trying to re-interpret it as an integer. It hurts to even think about that case.

tldr; Fuck webhook consumers. Incompetent developers who don't know how to handle errors that are 100% their fault.

Funny aside: the most amusing cases come from PHP and .NET developers who expose their internal server errors in production. When you can copy/paste the response they gave you on a webhook because they are calling an undefined function or method... pure bliss.

dajonker · on Nov 23, 2016

You could also help customers who apparently have trouble properly connecting to your APIs by giving better error returns (got type A, expected type B), providing client libraries or giving more extensive support (for a price). Blaming the customer is easy, providing a way for even those "incompetent developers" to interface with you in a way that is easy to understand and debug for all parties is hard.

Moru · on Nov 23, 2016

The truly great developers find a better way than only retrying webhooks and prepare a client library that the customer can just plug in to their code :-)

draaglom · on Nov 22, 2016

I like what Shopify does here - because your app is tied to a partner account, they can email you saying "this payload has failed 20 times in succession". If it fails too many times then the webhook is uninstalled.

Not to be snarky - but it's a distributed system. There's no way to guarantee you've got all updates! At a certain combination of latency and volume polling becomes impossible so webhooks (or something analogous) are all you've got :)

icebraining · on Nov 22, 2016

At a certain combination of latency and volume polling becomes impossible so webhooks (or something analogous) are all you've got :)

Isn't it the opposite? At a certain volume, when each polling request aways returns results, polling becomes more efficient than "interrupts". It's only at low volumes that webhooks are more efficient, since polling would have to issue a lot of requests with no response if a low latency is required.

draaglom · on Nov 23, 2016

Assuming here you mean something like a classic REST-alike "/events" endpoint which returns a bunch of stuff that's changed since the last time you requested it.

In that case, as the number of events grows, the HTTP transaction overhead goes to zero with polling, yeah.

But now you have a bunch of extra things which will impact your latency:

- The third-party service will do more work preparing the payload, meaning that the earliest event on the list no longer hits the wire right away

- related: someone might be holding a lock on event 63 of 100. Now other events have to wait for it before they can hit the wire

- In your application code, you may have to read the entire request before you can validate it or do anything with it (at least, this goes for APIs which speak JSON)

- You probably have to commit your transaction for the previous page of events before you can start your next request. Otherwise, whichever side of the network is keeping tabs on your current pointer in the list, that pointer may end up in the wrong place. Oops!

- If more events happen during the time it takes you to request a page than will fit on a page, then you're really stuck.

- An error anywhere in the super-http-transaction (network, user code...) now means that an entire page of updates has been delayed rather than just one.

It's possible to remove the sequential-ness constraint from our hypothetical "/events" but not without introducing other fun new problems.

niftich · on Nov 22, 2016

By periodic reconciliation of the full dataset.

rakoo · on Nov 23, 2016

Yeah, I feel the best way is just for providers to give a RSS feed as the primary way of listing events and then notify with PubSubHubbub directly. Big advantage: everything already exists and is standard.

z3t4 · on Nov 23, 2016

The easiest implementation would be a serial number. Then the client can check for holes in the number series.