Hacker News new | past | comments | ask | show | jobs | submit | mrsalt's favorites login

I absolutely love AutoHotKey. I've been using it for over 15 years. I used to use it to format my HTML back when i only used a Text Editor (similar to what this link is using for Markdown. I never use my right-control key so I map a lot of stuff to it. I used to have macros on rightctrl+b = surround the selection with <b></b> and later <strong></strong> (yes, it was that long ago). I had some nifty ones with logic for adding <a>s (looking for @ and http://).

These days I use it for fewer things, but still really useful. One of those is making chat applications that don't offer ctrl+enter to send a message - Like Google Hangouts (basically flopping the functionality):

#IfWinActive Google Hangouts ahk_class Chrome_WidgetWin_1

    ^enter::
    Send {Enter}
    Return

    enter::
    Send +{Enter}
    Return
Another extremely useful is this one, which will type what's on the clipboard rather than pasting it:

    >^v::
    SendRaw %clipboard%
    Return
And when I hit shift also, it does it slow (for input lag situations):

    >^+v::
    SetKeyDelay 500
    SendRaw %clipboard%
    Return
Extremely useful for two main reasons. 1 being input boxes on websites that prevent pasting and 2. certain remote sessions that prevent pasting (like Kesaya or any kind of Console remote application).

* Get Your Shit Together – https://getyourshittogether.org/

* What to Do Before You Die: A Tech Checklist – https://archive.is/6vjqQ

* Cheat Sheet For If I’m Gone – https://archive.is/lnWX6https://github.com/christophercalm/if-im-gone/blob/main/exam... (HN discussion: https://news.ycombinator.com/item?id=31748553)


Many years ago (2012) Delta inflight wifi would allow DNS queries out without paying. Being a very frequent flyer I used to run an ip-over-dns tunnel using Iodine[1]. It was slow but worked. I wonder if they’ve blocked that hole yet.

[1] https://code.kryo.se/iodine/


I've used GnuCash for years before switching to beancount (https://bitbucket.org/blais/beancount/ with smart_importer: https://github.com/beancount/smart_importer) and fava (https://github.com/beancount/fava/). Much easier to work on your journals (ledger, trades, prices...) since they are just text files. Really great if you're using the beancount package for Sublime: https://packagecontrol.io/packages/Beancount.

More importantly, the importers (for all my banks and financial services) let me import and reconcile all transactions, but also archive all documents (including PDF, text files, etc) in one, well organized directory: each file is saved into a folder that corresponds to my account structure such as Asset:Current:Cash, Liability:Mortgage, Income:Salary, Expenses:Health:Dentist. It's great to rely on fava (example: https://fava.pythonanywhere.com/example-beancount-file/incom...) to check your accounting (with all files listed in the journal by date, with tags and links and other neat features) and still be able to browse documents in your file browser.


For kicks, I'm going to throw out a possibly odd analogy.

The length of a string determines how quickly it vibrates (assuming tension is the same). Shorter strings vibrate faster than long strings. If you're in a noisy environment and you want to make sense of all of the chaotic sounds around you, one way to do it would be to take a bunch of strings of different lengths (say a piano) and see which ones resonate more than others. The gentle ringing of those piano strings, some louder than others, tells you which frequencies are more dominant in the surrounding acoustic environment, because they cause their matching strings to resonate more.

As I get older (I'm in my forties), I feel like a lengthening string. When I was a twenty-something programmer, I could tell you how things had changed over a year or two, but trends or cycles on longer timescales than that were hidden to me. Now I know what a decade or two feels like and can see and intuitively sense cycles of that scale. At the same time, shorter trends are harder for me to pick up on now. It feels like noise or beneath my notice.

Having people of different ages in your organization is incredibly value because they all resonate at different time scales like this and help you pick up chronological patterns at frequencies you'd otherwise miss.


140 https://news.ycombinator.com/item?id=4247615

138 https://news.ycombinator.com/item?id=15603013

108 https://news.ycombinator.com/item?id=18442941

93 https://news.ycombinator.com/item?id=13436420

86 https://news.ycombinator.com/item?id=8902739

81 https://news.ycombinator.com/item?id=11042400

81 https://news.ycombinator.com/item?id=14948078

76 https://news.ycombinator.com/item?id=6199544

65 https://news.ycombinator.com/item?id=12901356

63 https://news.ycombinator.com/item?id=35083

60 https://news.ycombinator.com/item?id=7135833

58 https://news.ycombinator.com/item?id=14691212

57 https://news.ycombinator.com/item?id=35079

57 https://news.ycombinator.com/item?id=18536601

55 https://news.ycombinator.com/item?id=9224

55 https://news.ycombinator.com/item?id=21260001

54 https://news.ycombinator.com/item?id=16402387

53 https://news.ycombinator.com/item?id=9282104

53 https://news.ycombinator.com/item?id=23285438

52 https://news.ycombinator.com/item?id=14791601

51 https://news.ycombinator.com/item?id=9440566

51 https://news.ycombinator.com/item?id=22787313

50 https://news.ycombinator.com/item?id=12900448

49 https://news.ycombinator.com/item?id=11341567

47 https://news.ycombinator.com/item?id=19604657

42 https://news.ycombinator.com/item?id=20609978

42 https://news.ycombinator.com/item?id=2439478

40 https://news.ycombinator.com/item?id=14852771

39 https://news.ycombinator.com/item?id=12509533

38 https://news.ycombinator.com/item?id=22808280

38 https://news.ycombinator.com/item?id=16126082

37 https://news.ycombinator.com/item?id=5397797

37 https://news.ycombinator.com/item?id=21151830

37 https://news.ycombinator.com/item?id=19716969

36 https://news.ycombinator.com/item?id=17022563

36 https://news.ycombinator.com/item?id=19775789

35 https://news.ycombinator.com/item?id=11071754

33 https://news.ycombinator.com/item?id=20571219

33 https://news.ycombinator.com/item?id=7260087

33 https://news.ycombinator.com/item?id=17714304

32 https://news.ycombinator.com/item?id=22043088

32 https://news.ycombinator.com/item?id=18003253

30 https://news.ycombinator.com/item?id=341288

29 https://news.ycombinator.com/item?id=7789438

29 https://news.ycombinator.com/item?id=9048947

29 https://news.ycombinator.com/item?id=14162853

28 https://news.ycombinator.com/item?id=20869111

28 https://news.ycombinator.com/item?id=19720160

28 https://news.ycombinator.com/item?id=287767

28 https://news.ycombinator.com/item?id=1055389


Lots of people make the mistake of thinking there's only two vectors you can go to improve performance, high or wide.

High - throw hardware at the problem, on a single machine

Wide - Add more machines

There's a third direction you can go, I call it "going deep". Today's programs run on software stacks so high and so abstract that we're just now getting around to redeveloping (again for like the 3rd or 4th time) software that performs about as well as software we had around in the 1990s and early 2000s.

Going deep means stripping away this nonsense and getting down closer to the metal, using smart algorithms, planning and working through a problem and seeing if you can size the solution to running on one machine as-is. Modern CPUs, memory and disk (especially SSDs) are unbelievably fast compared to what we had at the turn of the millenium, yet we treat them like they're spare capacity to soak up even lazier abstractions. We keep thinking that completing the task means successfully scaling out a complex network of compute nodes, but completing the task actually means processing the data and getting meaningful results in a reasonable amount of time.

This isn't really hard to do (but it can be tedious), and it doesn't mean writing system-level C or ASM code. Just seeing what you can do on a single medium-specc'd consumer machine first, then scaling up or out if you really need to. It turns out a great many problems really don't need scalable compute clusters. And in fact, the time you'd spend setting that up, and building the coordinating code (which introduces yet more layers that soak up performance) you'd probably be better off just spending the same time to do on a single machine.

Bonus, if your problem gets too big for a single machine (it happens), there might be trivial parallelism in the problem you can exploit and now going-wide means you'll probably outperform your original design anyways and the coordination code is likely to be much simpler and less performance degrading. Or you can go-high and toss more machine at it and get more gains with zero planning or effort outside of copying your code and the data to the new machine and plugging it in.

Oh yeah, many of us, especially experienced people or those with lots of school time, are taught to overgeneralize our approaches. It turns out many big compute problems are just big one-off problems and don't need a generalized approach. Survey your data, plan around it, and then write your solution as a specialized approach just for the problem you have. It'll likely run much faster this way.

Some anecdotes:

- I wrote an NLP tool that, on a single spare desktop with no exotic hardware, was 30x faster than a 6-high-end-system-distributed-compute-node that was doing a comparable task. That group eventually used my solution with a go-high approach and runs it on a big multi-core system with as fast of memory and SSD as they could procure and it's about 5 times faster than my original code. My code was in Perl, the distributed system it competed against was C++. The difference was the algorithm I was using, and not overgeneralizing the problem. Because my code could complete their task in 12 hours instead of 2 weeks, it meant they could iterate every day. A 14:1 iteration opportunity made a huge difference in their workflow and within weeks they were further ahead than they had been after 2 years of sustained work. Later they ported my code to C++ and realized even further gains. They've never had to even think about distributed systems. As hardware gets faster, they simply copy the code and data over and realize the gains and it performs faster than they can analyze the results.

Every vendor that's come in after that has been forced to demonstrate that their distributed solution is faster than the one they already have running in house. Nobody's been able to demonstrate a faster system to-date. It has saved them literally tens of millions of dollars in hardware, facility and staffing costs over the last half-decade.

- Another group had a large graph they needed to conduct a specific kind of analysis on. They had a massive distributed system that handled the graph, it was about 4 petabytes in size. The analysis they wanted to do was an O(N^2) analysis, each node needed to be compared potentially against each other node. So they naively set up some code to do the task and had all kinds of exotic data stores and specialized indexes they were using against the code. Huge amounts of data was flying around their network trying to run this task but it was slower than expected.

An analysis of the problem showed that if you segmented the data in some fairly simple ways, you could skip all the drama and do each slice of the task without much fuss on a single desktop. O(n^2) isn't terrible if your data is small. O(k+n^2) isn't much worse if you can find parallelism in your task and spread it out easily.

I had a 4 year old Dell consumer level desktop to use so I wrote the code and ran the task. Using not much more than Perl and SQLite I was able to compute a large-ish slice of a few GB in a couple hours. Some analysis of my code showed I could actually perform the analysis on insert in the DB and that the size was small enough to fit into memory so I set SQLite to :memory: and finished it in 30 minutes or so. That problem solved, the rest was pretty embarrassingly parallel and in short order we had a dozen of these spare desktops occupied running the same code on different data slices and finishing the task 2 orders of magnitude than what their previous approach had been. Some more coordinating code and the system was fully automated. A single budget machine was theoretically now capable of doing the entire task in 2 months of sustained compute time. A dozen budget machines finished it all in a week and a half. Their original estimate on their old distributed approach was 6-8 months with a warehouse full of machines, most of which would have been computing things that resulted in a bunch of nothing.

To my knowledge they still use a version of the original Perl code with SQlite running in memory without complaint. They could speed things up more with a better in-memory system and a quick code port, but why bother? It's completing the task faster than they can feed it data as the data set is only growing a few GB a day. Easily enough for a single machine to handle.

- Another group was struggling with handling a large semantic graph and performing a specific kind of query on the graph while walking it. It was ~100 million entities, but they needed interactive-speed query returns. They had built some kind of distributed Titan cluster (obviously a premature optimization).

Solution, convert the graph to an adjacency matrix and stuff it in a PostgreSQL table, build some indexes and rework the problem as a clever dynamically generated SQL query (again, Perl) and now they were realizing .01second returns, fast enough for interactivity. Bonus, the dataset at 100m rows was tiny, only about 5GB, with a maximum table-size of 32TB and diskspace cheap they were set for the conceivable future. Now administration was easy, performance could be trivially improved with an SSD and some RAM and they could trivially scale to a point where dealing with Titan was far into their future.

Plus, there's a chance for PostgreSQL to start supporting proper scalability soon putting that day even further off.

- Finally, a e-commerce company I worked with was building a dashboard reporting system that ran every night and took all of their sales data and generated various kinds of reports, by SKU, by certain number of days in the past, etc. It was taking 10 hours to run on a 4 machine cluster.

A dive in the code showed that they were storing the data in a deeply nested data structure for computation and building and destroying that structure as the computation progressed was taking all the time. Furthermore, some metrics on the reports showed that the most expensive to compute reports were simply not being used, or were being viewed only once a quarter or once a year around the fiscal year. And cheap to compute reports, where there were millions of reports being pre-computed, only had a small percentage actually being viewed.

The data structure was built on dictionaries pointing to other dictionaries and so-on. A quick swap to arrays pointing to arrays (and some dictionary<->index conversion functions so we didn't blow up the internal logic) transformed the entire thing. Instead of 10 hours, it ran in about 30 minutes, on a single machine. Where memory was running out and crashing the system, memory now never went above 20% utilization. It turns out allocating and deallocating RAM actually takes time and switching a smaller, simpler data structure makes things faster.

We changed some of the cheap to compute reports from being pre-computed to being compute-on-demand, which further removed stuff that needed to run at night. And then the infrequent reports were put on a quarterly and yearly schedule so they only ran right before they were needed instead of every night. This improved performance even further and as far as I know, 10 years later, even with huge increases in data volume, they never even had to touch the code or change the ancient hardware it was running on.

It seems ridiculous sometimes, seeing these problems in retrospect, that the idea was that to make these problems solvable racks in a data center, or entire data centeres were ever seriously considered seems insane. A single machine's worth of hardware we have today is almost embarrassingly powerful. Here's a machine that for $1k can break 11 TFLOPS [1]. That's insane.

It also turns out that most of our problems are not compute speed, throwing more CPUs at a problem don't really improve things, but disk and memory are a problem. Why anybody would think shuttling data over a network to other nodes, where we then exacerbate every I/O problem would improve things is beyond me. Getting data across a network and into a CPU that's sitting idle 99% of the time is not going to improve your performance.

Analyze your problem, walk through it, figure out where the bottlenecks are and fix those. It's likely you won't have to scale to many machines for most problems.

I'm almost thinking of coming up with a statement: Bane's rule, you don't understand a distributed computing problem until you can get it to fit on a single machine first.

1 - http://www.freezepage.com/1420850340WGSMHXRBLE


I have intimate personal experience with the FCRA. Sadly I don't have an hour to talk about it at the moment, but ping me any time. Short version: it's one of the most absurdly customer-friendly pieces of legislation in the US, assuming you know how to work it. There exist Internet communities where they basically do nothing but assist each other with using the FCRA to get legitimate debts removed from their credit report, which, when combined with the Fair Debt Collection Practices Act, means you can essentially unilaterally absolve yourself of many debts if the party currently owning it is not on the ball for compliance.

The brief version, with the exact search queries you'll want bracketed: you send a [debt validation letter] under the FCRA to the CRAs. This starts a 30 day clock, during which time they have to get to the reporter and receive evidence from the reporter that you actually own the debt. If that clock expires, the CRAs must remove that tradeline from your report and never reinstate it. Roughly simultaneously with that letter, you send the collection agency a [FDCPA dispute letter], and allege specifically that you have "No recollection of the particulars of the debt" (this stops short of saying "It isn't mine"), request documentation of it, and -- this is the magic part -- remind them that the FDCPA means they have to stop collection activities until they've produced docs for you. Collection activities include responding to inquiries from the CRAs. If the CRA comes back to you with a "We validated the debt with the reporter." prior to you hearing from the reporter directly, you've got documentary evidence of a per-se violation of the FDCPA, which you can use to get the debt discharged and statutory damages (if you sue) or just threaten to do that in return for the reporter agreeing to tell the CRA to delete the tradeline.

No response from the CRA? You watch your mail box like a hawk for the next 30 days. Odds are, you'll get nothing back from the reporter in that timeframe, because most debt collection agencies are poorly organized and can't find the original documentation for the debt in their files quickly enough. Many simply won't have original documentation -- they just have a CSV file from the original lender listing people and amounts.

If you get nothing back from the reporter in 30 days, game over, you win. The CRA is now legally required to delete the tradeline and never put it back. Sometimes you have to send a few pieces of mail to get this to stick. You will probably follow-up on this with a second letter to the reporter, asserting the FDCPA right to not receive any communication from them which is inconvenient, and you'll tell them that all communication is inconvenient. (This letter is sometimes referred to as a [FOAD letter], for eff-off-and-die.) The reporter's only possible choices at that point are to abandon collection attempts entirely or sue you. If they sue you prior to sending validation, that was a very bad move, because that is a per-se FDCPA violation and means your debt will be voided. (That assumes you owe it in the first place. Lots of the people doing these mechanics actually did owe the debt at one point, but are betting that it can't be conveniently demonstrated that they owe the debt.)

If the reporter sends a letter: "Uh, we have you in a CSV file." you wait patiently until day 31 then say "You've failed to produce documentary evidence of this debt under the FDCPA. Accordingly, you're barred from attempting to collect on it. If you dispute that this is how the FDCPA works, meet me in any court of competent jurisdiction because I have the certified mail return receipt from the letter I sent you and every judge in the United States can count to 30." and then you file that with the CRA alleging "This debt on my credit report is invalid." The CRA will get in touch with the debt collection company, have their attempt timeout, and nuke the trade line. You now still technically speaking owe money but you owe it to someone who can't collect on the debt, (licitly [+]) sell it, or report it against your credit.

I just outlined the semi-abusive use of those two laws, but the perfectly legitimate use (for resolving situations like mine, where my credit report was alleging that I owed $X00,000 in debts dating to before I was born) is structurally similar. My dropbox still has 30 PDFs for letters I sent to the 3 CRAs, several banks, and a few debt collection companies disputing the information on my report and taking polite professional notice that there was an easy way out of this predicament for them but that if they weren't willing to play ball on that I was well aware of the mechanics of the hard way.

[+] Owing more to disorganization and incompetence than malice, many debt collection companies will in fact sell debts which they're not longer legally entitled to. This happened to me twice. I sent out two "intent to sue" letters and they fixed the problem within a week.

[Edit: I last did this in 2006 and my recollection on some of the steps I took was faulty, so I've corrected them above and made it a little more flow-charty.]


Few things consistently blow my mind as insane graphics demos

https://www.shadertoy.com/view/4dfGzS (or basically anything on that site)

How is that 400 lines of code.

Or this one which even generates the sound on the GPU

https://www.shadertoy.com/view/4ts3z2

With the wide adoption of WebGL, it's a good time to get involved in graphics. Furthermore, GPUs are taking over esp. with the advent of machine learning (nvidia stock grew ~3x, amd ~5x last year). The stuff nvidia has been recently doing is kinda crazy. I wouldn't be surprised if in 15 years, instead of AWS, we are using geforce cloud or smth, just because nvidia will have an easier time building a cloud offering than amazon will have building a gpu.

These are some good resources to get started with graphics/games

# WebGL Programming Guide: Interactive 3D Graphics Programming with WebGL

https://www.amazon.com/WebGL-Programming-Guide-Interactive-G...

Historically, C++ has definitely been THE language for doing graphics but if you are starting these these, you would have to have really compelling reasons to start with C++ and not JavaScript and WebGL. And that's coming from someone who actually likes C++ and used to write it professionally.

# Book of Shaders

https://thebookofshaders.com/

# Game Programming Patterns

http://gameprogrammingpatterns.com/contents.html

https://www.amazon.com/Game-Programming-Patterns-Robert-Nyst...

HN's own @munificent wrote a book discussing the most important design patterns in game design. Good book applicable beyond games.

# Game engine architecture

https://www.amazon.com/Engine-Architecture-Second-Jason-Greg...

# Computer graphics: Principles and Practice

https://www.amazon.com/Computer-Graphics-Principles-Practice...

This is more of college textbook if you'd prefer that but the WebGL one is more accessible and less dry.

# Physically Based Rendering & Real-Time Rendering

These discuss some state of the art techniques in computer graphics. I'm not going to claim to have really read them but from what I've seen they are very solid.

https://www.amazon.com/Computer-Graphics-Principles-Practice...

https://www.amazon.com/Physically-Based-Rendering-Third-Impl...


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: