Hacker News new | past | comments | ask | show | jobs | submit | memset's favorites login

I've had the chance to hear Richard Hipp talk about SQLite yesterday! He mentioned that the LSM tree storage engine is available as an extension to sqlite3. More specifically, he mentioned that he didn't really get the performance improvements he had hoped for, for insertion-heavy use cases.

I think part of this is because of a fundamental limitation of sqlite that it's an embedded database that has to persist data on disk at all times: The design of LSM trees works well with databases with a resident in-memory component because it's an approximation of just dumping every new thing you see at the end of an unordered in-memory array. This is as opposed to a data structure like a b-tree where you have to /find/ exactly where to put the data first, and then put it there. This finding bit means you're doing a lot of random access in memory, which is thrashing all of your caches (CPU / disk etc). LSM trees avoid this thrashing by just dumping stuff at the end of an array. However this means you have to scan that array to do lookups (as opposed to something easier like binary search). Then as your array gets big, you merge and flush it down to a lower "layer" of the lsm tree which is slightly bigger and sorted. And when that one fills, you flush further. And these merge-flushes are nice big sequential writes so that's nice too.

Anyway, with SQLite, the highest layer of your LSM tree would probably (this is conjecture) have to be on disk because of the way that there is no server component, versus in an in-memory system it'd probably be in your L2/L3 cache or at least your main memory. So this could be one reason why that model didn't work out as well for them.


> Treat it as a naive but intelligent intern

That’s the problem: it’s a _terrible_ intern. A good intern will ask clarifying questions, tell me “I don’t know” or “I’m not sure I did it right”. LLMs do none of that, they will take whatever you ask and give a reasonable-sounding output that might be anything between brilliant and nonsense.

With an intern, I don’t need to measure how good my prompting is, we’ll usually interact to arrive to a common understanding. With a LLM, I need to put a huge amount of thought into the prompt and have no idea whether the LLM understood what I’m asking and if it’s able to do it.


Do you want to train models from scratch, or do you want to build cool things on top of AI models?

If the former, I suggest digging into things like the excellent Fast AI course: https://course.fast.ai/

If the latter, the (relatively new) keyword you are looking for is likely "AI Engineer" - https://www.latent.space/p/ai-engineer

There's an argument that deep knowledge of how to train models isn't actually that useful when working with generative AI (LLMs etc) - knowing how to train or fine-tune a new model is less useful that developing knowledge of the other weird things you have to figure out about prompting, evals and using these models to build production-quality apps.


It's not actually so esoteric. The two main knobs are

- max_concurrent_queries, since each query uses a certain amount of memory

- max_memory_usage, which is the max per-query memory usage

Here's my full config for running clickhouse on a 2GiB server without OOMs. Some stuff in here is likely irrelevant, but it's a starting point.

    diff --git a/clickhouse-config.xml b/clickhouse-config.xml
    index f8213b65..7d7459cb 100644
    --- a/clickhouse-config.xml
    +++ b/clickhouse-config.xml
    @@ -197,7 +197,7 @@
     
         <!-- <listen_backlog>4096</listen_backlog> -->
     
    -    <max_connections>4096</max_connections>
    +    <max_connections>2000</max_connections>
     
         <!-- For 'Connection: keep-alive' in HTTP 1.1 -->
         <keep_alive_timeout>3</keep_alive_timeout>
    @@ -270,7 +270,7 @@
         -->
     
         <!-- Maximum number of concurrent queries. -->
    -    <max_concurrent_queries>100</max_concurrent_queries>
    +    <max_concurrent_queries>4</max_concurrent_queries>
     
         <!-- Maximum memory usage (resident set size) for server process.
              Zero value or unset means default. Default is "max_server_memory_usage_to_ram_ratio" of available physical RAM.
    @@ -335,7 +335,7 @@
              In bytes. Cache is single for server. Memory is allocated only on demand.
              You should not lower this value.
           -->
    -    <mark_cache_size>5368709120</mark_cache_size>
    +    <mark_cache_size>805306368</mark_cache_size>
     
     
         <!-- If you enable the `min_bytes_to_use_mmap_io` setting,
    @@ -981,11 +980,11 @@
         </distributed_ddl>
     
         <!-- Settings to fine tune MergeTree tables. See documentation in source code, in MergeTreeSettings.h -->
    -    <!--
         <merge_tree>
    -        <max_suspicious_broken_parts>5</max_suspicious_broken_parts>
    +        <merge_max_block_size>2048</merge_max_block_size>
    +        <max_bytes_to_merge_at_max_space_in_pool>1073741824</max_bytes_to_merge_at_max_space_in_pool>
    +        <number_of_free_entries_in_pool_to_lower_max_size_of_merge>0</number_of_free_entries_in_pool_to_lower_max_size_of_merge>
         </merge_tree>
    -    -->
     
         <!-- Protection from accidental DROP.
              If size of a MergeTree table is greater than max_table_size_to_drop (in bytes) than table could not be dropped with any DROP query.
    diff --git a/clickhouse-users.xml b/clickhouse-users.xml
    index f1856207..bbd4ced6 100644
    --- a/clickhouse-users.xml
    +++ b/clickhouse-users.xml
    @@ -7,7 +7,12 @@
             <!-- Default settings. -->
             <default>
                 <!-- Maximum memory usage for processing single query, in bytes. -->
    -            <max_memory_usage>10000000000</max_memory_usage>
    +            <max_memory_usage>536870912</max_memory_usage>
    +
    +            <queue_max_wait_ms>1000</queue_max_wait_ms>
    +            <max_execution_time>30</max_execution_time>
    +            <background_pool_size>4</background_pool_size>
    +
     
                 <!-- How to choose between replicas during distributed query processing.
                      random - choose random replica from set of replicas with minimum number of errors

The complements are also really useful for subtracting numbers. Not only in the binary system, but also in the decimal system. They allow you to subtract by adding numbers.

Let's say you need to calculate

     8467
    -4583
The ten's complement of 4583 is 5417 (the complement is the remainder to the next power of ten, in that case to 10000). A cool property of the ten's complement in the decimal system is that the subtraction above can be rewritten as

      8467
     +5417
    -10000
Which results in

     13884
    -10000
    ------
      3884
This is much easier to calculate than the direct subtraction. It exploits the fact that calculating -x is equal to +(10000 - x) - 10000.

The same works for binary and the two's complement, where it is even easier, because calculating the two's complement is equal to inverting all digits and adding 1. It works for any number system by using the complement of the system's base.


Tip for anyone looking to spend seven-figure or more sums on one-time egress: Direct Connect egress is $0.02/GB. Rent a rack at a Direct Connect facility and get as many 10G fiber Direct Connects as you need, with corresponding flat rate 10G Internet ports with HE/Cogent/whatever transit provider. If you're going to be spending millions on egress, you could just hire someone to set this up for you. With that kind of spend you'd be crazy to pay the full $0.09/GB.

Edit: Note also that Snowball egress is $0.03/GB. Slightly higher egress, much lower setup cost. You'll have to do the math but they're both clearly attractive options vs. full price $0.09/GB egress.


Recipe is based on e.g. 5 eggs, but I want a smaller output so I want to use only 3 eggs[1]. I set the slide rule so that 3 is lined up under 5.

Then for any other ingredient in the recipe, say 7 tablespoons of milk, I look up 7 on the upper scale and read off 4.2 on the lower scale.

It doesn't matter which scale is upper or lower, and you don't have to keep track of whether you're multiplying or dividing, you just pick one to mean "recipe amount" and one to mean "actual amount". You anchor them against each other based on the critical ingredient, and then read off all other amounts. That's part of what makes it so easy to use even for children.

[1]: Or maybe I'm cooking to use up an ingredient about to expire, so I want to use all of my seven eggs, and so on.


In fact, companies that do what Equals does are successful already:

- NocoDB

- Baserow

- Rowy


FWIW, it is still possible to get TVs without this stuff, albeit at a premium. TVs are still made for business usage in areas like conference rooms, wall displays etc. They're often found under labeling like "commercial digital signage" or "business display" or the like, they seem to often try to avoid using "TV" (if being cynical maybe to make them harder for normal people to discover and confuse them if they do). But they're often nice panels aimed at serious running hours, without this sort of junk (which would give enterprise IT conniptions) and can have very useful feature support like 802.1x authentication which so many devices still lack. Players like NEC will even advertise their use of an RPi compute and wink at lack of spyware [0] for some of their products, but lots of major "smart TV" providers also have a commercial lineup.

I think they're well worth considering, particularly for the HN crowd, granted I suppose for people who truly want built-in netflix or the like without connecting something like a Roku or Apple TV maybe it's less optimal. But even they might change their tunes back to the concept of separate boxes and normal panels if they dislike all the ads and data tracking.

----

0: https://www.sharpnecdisplays.us/products/displays/me501


"Did you win the Putnam?"

Yes, I did.


Not having any sort of basic disk/storage solution has been painful for us. There's been a lot of situations where I would love to store a big blob of data on the disk while processing it but if that Heroku dyno reboots - poof, its gone.

His claims about HN are false. Short version: his post was 2+ hours later than Stripe's, not an hour earlier, and fell in rank because users flagged it, not because moderators did anything (which I suppose is what his mafia/mob language was intended to imply).

The longer version is going to involve a tedious barrage of links, but I want this to be something that people can check for themselves if they care to.

Quoting from https://twitter.com/theryanking/status/1485784877060349953 and https://twitter.com/theryanking/status/1485784882173255680:

> Both had organically made it up to #1 on Hacker News with 100s of upvotes

Their first post, https://news.ycombinator.com/item?id=16215092, made it to #2 (http://hnrankings.info/16215092/) and got 171 upvotes. The second, https://news.ycombinator.com/item?id=16870692, made it to #4 (http://hnrankings.info/16870692/) and got 70 upvotes.

> On April 18, 2018, we posted this: https://news.ycombinator.com/item?id=16870692 [...] An ~hour later, this showed up: https://news.ycombinator.com/item?id=16869290. Ours disappeared.

Stripe's post (https://news.ycombinator.com/item?id=16869290) was submitted at 17:41 UTC, and Bolt's post (https://news.ycombinator.com/item?id=16870692) was submitted over 2 hours later, at 20:08 UTC. If you want external evidence for that, look at the hnrankings.info pages for the two posts (Stripe: http://hnrankings.info/16869290/, Bolt: http://hnrankings.info/16870692/). You'll see that they first pick up Stripe's post on the front page at 17:45 and Bolt's at 20:30. You'll also see that Stripe's post made it to #2, not #1 as he claims.

Here's the HN front page at 19:09 that day: https://web.archive.org/web/20180418190904/https://news.ycom.... Stripe's post is already at #2. Bolt's hasn't been submitted yet. The first snapshot with Bolt on the front page is at 20:35: https://web.archive.org/web/20180418203508/https://news.ycom.... By 21:21, Bolt's briefly makes it higher than Stripe's: https://web.archive.org/web/20180418212112/https://news.ycom....

> Ours disappeared

Bolt's post fell in rank because it was flagged by users—that is the drop you can see in http://hnrankings.info/16870692/ and in this snapshot by 21:21: https://web.archive.org/web/20180418222105/https://news.ycom....

I don't know why users flagged it. (Edit: generally, though, if you want to figure this out, the best place to look is in the comments. The top comment in the Bolt thread is complaining about non-transparent, enterprise-style pricing - https://news.ycombinator.com/item?id=16871886, which is a classic HN complaint. The same complaints had appeared in their earlier thread, for example https://news.ycombinator.com/item?id=16215604 and https://news.ycombinator.com/item?id=16215578. We actually downweighted both of those complaints. That's standard HN moderation when indignant comments are stuck at the top of a thread.)

None of the flags came from YC founders or staff. (Edit: I looked into whether it might have been Stripe people doing the flagging here: https://news.ycombinator.com/item?id=30070445.) HN moderators did not make the post fall in rank, or moderate the post at all. I don't think I'd ever heard of Bolt before today.

The article no longer exists on bolt.com, but if you want to compare it to the stripe.com article, the articles are here:

https://web.archive.org/web/20180418201841/https://blog.bolt...

https://web.archive.org/web/20180418171628/https://stripe.co...

His insinuations about HN are false too. He's clearly insinuating that we use insider powers to favor stripe.com submissions with special treatment on HN. That's precisely what we don't do.

Stripe succeeded on HN because they were favored by the community for many years. They are one of a small number of startups who have reached what you could call 'community darling' status on HN. It's true that this is incredibly hard to do. Another startup that managed it, around the same time, was Cloudflare (not a YC startup). Startups achieve this by doing three things: producing products that the community considers good, producing articles that the community finds interesting, and mastering the art of interacting with the community. You need all three.

I wish more startups would achieve this, YC or not. Whenever I run across one that's trying to succeed on HN, I try to help them do so (YC or not)—why? because it makes HN better if the community finds things it loves here. Among the startups of today, I can think of only two offhand who are showing signs of maybe reaching darling status—fly.io (YC), and Tailscale (not YC).

All 4 of the startups I've mentioned have the advantage that they were|are targeting programmers, which gives them a fast track to rapport with this community—sort of a ladder in the snakes-and-ladders game. However, that's not a sufficient condition for getting a satisfying click with the community, and it isn't a necessary one either. The real problem is that so much of the content that startups produce to try to interest this community just isn't interesting enough (to this community), and often gives off inadvertent signals of being uninteresting—things like seeming too enterprisey, or too slick in the marketing department.

If the bolt.com people had asked us for help, I would have been just as happy to help them as anybody else. I would have told them that the opening of their article (https://web.archive.org/web/20180418201841/https://blog.bolt...) was too enterprisey to appeal to HN. The language is drawn from ecommerce retailing, which makes sense given the little bit we've all learned about Bolt today, but is the kind of thing that comes across as boring on HN. The references to Gartner and Experian feel like reading an enterprise whitepaper, which detracts from credibility with the HN audience.

This is the sort of thing I've told countless startups over the years. It's dismayingly difficult to express this information in a way that people can actually absorb, but I have a set of notes that I've sent to many startups in this position, which I plan to turn into an essay about how to write for HN. if anyone wants a copy, email hn@ycombinator.com and I'll be happy to send it to you.

I had a bunch of other things I wanted to say here, but mercifully, I've forgotten them.


Did you win the Putnam?

If not, please don't be "bolder" than this guy: http://en.wikipedia.org/wiki/Ravi_Vakil


For those unaware, there is an active fork called yt-dlp, that is (mostly) a drop-in replacement:

https://github.com/yt-dlp/yt-dlp


Sorry to hear it.

# Here are some of the things that I've done. Here's to hoping it's effective.

1) Everyone uses Bitwarden[0] to store their passwords. We have an Organisation account which makes sharing passwords easy. I check master passwords against HaveIBeenPwned, and ask they use the generated Bitwarden passwords for their accounts.

2) The least tech-saavy amongst my family either get Chromebooks (which I despise because Google), or they get a Windows machine that I lock down pretty hard [1]. The lock-down may look draconian to power users, but they've yet to mention they can't do something they want to.

3) Its listed in the link in (2), but I make sure everyone runs uBlock Origin. It's far more useful than an antivirus.

4) I have a few catch-all emails I encourage my family to use for subscriptions. When asked for an email, use [website name]@[family member code].[domain].[tld]. That way, unless spearfished, you're likely to know the true providence of an email.

5) We have a NAS that is 3-2-1 backed-up, and encourage everyone to keep sensitive information there. Hopefully this is enough to avoid cryptolockers extorting us.

# Things I want to do

5) It would be better if we used one of those self-hosted random email generators to prevent maliciously constructed email domains at our catch-all instilling false confidence.

6) I'd like to install PiHole [2].

7) I have a Twilio number that goes straight to voice mail and sends me the audio files and forwards SMS. I'd like to create these for my family (maybe using extension numbers?) so they can use them on forms.

[0] https://bitwarden.com/

[1] https://noteaureus.org/post/tutorials/sysadmin/win4unsavvy/

[2] https://pi-hole.net/


I would look at these topics and in this order:

1. Containerisation - can you build a hello world web app in any language then Dockerise it.

2. Now break it into two containers - one is the original hello world but now it calls an API on a 2nd container that responds with hello world in one of 10 different languages. Just hard code this the point is that it’s now 2 containers for your “app”.

3. Create a Docker repository in GCP’s Artefact Registry and upload your images. Now remove them from your local Docker registry and run them again but this time pulling them from your registry on GCP.

3. Use Cloudbuild to build them.

4. Spin up a local Kubernetes cluster such as Minikube.

5. Read docos about K8s deployments and service types. Special attention to ClusterService, NodePort, LoadBalancer.

6. Deploy the first version of your hello world. Maybe try to point your YAML to the local registry then the one on GCP you created.

7. Create a service to access your app.

8. Figure out how to turn in and access (via proxy) the kubernetes dashboard.

9. Now deploy the 2nd version of your app. Learn a bit more about K8s networking, pod horizontal scaling, pod resource claims, kill some pods, etc.

10. Learn Skaffold.

11. Create GKE cluster.

12. Deploy same app.

13. Learn about K8s Ingress.

14. Get familiar with GKEs console.

15. Use knowledge of Skaffold to understand and use the brand new Google Cloud Deploy.

Edit: autocorrect “fixed” several things



> A reasonable model for the “blowup factor” (actual time divided by estimated time) would be something like a log-normal distribution.

Interestingly, we did extensive time tracking on a multi-year in-house software project and collected data comparing the estimated completion time of tickets with their actual time.

The software department was under a lot of pressure to improve their forecasting, and were somewhat despairing that their estimates were off by a factor of about 1.6 on average, and sometimes a factor of 10 or more. This persisted in the face of all attempts to improve calibration. Managers were worrying that developers had no idea how long a task would take and estimation was futile.

When we plotted the data, in all cases, the actual time was very accurately fit by a lognormal whose scale parameter was precisely the predicted completion time. That is, whether the tickets were predicted to take 1 hour, 3 hours, 13 hours, or whatever, the histogram of their actual completion times followed the exact same shape but with a corresponding scale change on the x axis.

This told me that the developers actually have a really good understanding of the class of problem they're dealing with when they start a task. But sometimes tasks have multiplicative factors that make them take longer than you expect. Sometimes the bug turns out to be two bugs, and so on. Based on this analysis, I urged them not to consider it a prediction failure when a ticket takes 10 times longer than expected; that's just a property of the lognormal distribution, and that estimate likely did a good job of reflecting all available information at the time they made it.

Instead of changing the estimates, I suggested that we pick a safety factor for external facing commitments that reflects this distribution. Padding the estimate by 1.6 factor gives the mean, but if you want to make a commitment you can take to a customer, you can just extend the lognormal up to 95% confidence or 99% confidence or however trustworthy your promised commitment needs to be. Of course, a 99% confidence interval on a lognormal is a pretty big factor. But if that's better than running late, it is what it is.

Another interesting thing is that you'd expect, by the central limit theorem, that sufficiently large tasks would eventually become normally distributed rather than lognormal, because they're composed of a large number of subtasks. But it turns out that lognormals are a pretty pathological case; sums of n lognormals can continue to look nearly lognormal until n becomes really, really large.


The original New Yorker article says he was blacklisted after his 2008 felony conviction. I've been told that the CRM software used by university development offices is pretty phenomenal. It was hinted to me that it automatically annotates donor records with SEC stock transaction disclosures, and I wouldn't be surprised if it also annotates donor records with public criminal court proceedings.

As it is always the case, determining when something is a paradigm shift or something truly great cannot be told instantly. It takes time. As a jazzdrummer myself (professional training and self-taught), I like what bevinahally said: that Jazz is not Mozart and Beethoven, and it doesn't have to be reproduced perfectly. Jazz is alive and sparkling. Especially London is a growing scene.

In my view, this time will be viewed as a paradigm shift, because you can see a lot of young and really talented players mixing up what is known as jazz with influences from techno and hip-hop (huh, what a fun twist). Yes, if you are looking for jazz as you are taught in school (say Coltrane), that still exists, but it's to me boring. Has been done.

I want to give you some examples to listen to, and you will hear what I mean. The drums use ghost notes quite heavily, 7th chords are turned into 11th and 13th chords, there are less changes in tonality. Rhythm is more straight, tighter. It get's you going!

Yussef Kamaal - Calligraphy (https://www.youtube.com/watch?v=1g826StJhLk)

Yussef Dayes X Alfa Mist - Love Is The Message (https://www.youtube.com/watch?v=NwVtIPeYIeQ)

Nihilism Live @jazzrefreshed (https://www.youtube.com/watch?v=yTlZEv9V35o)

Ezra Collective - Enter The Jungle (https://www.youtube.com/watch?v=WGkZM6wjIwk)

Richard Spaven - The Self feat. Jordan Rakei (https://www.youtube.com/watch?v=YattHO96UzI)

Enjoy :) jazz is for everyone


This is a woefully inadequate article, and a woefully inadequate survey. So what if 'poor leadership' is the number one cause for burnout? That insight isn't useful. As an employee, I'd like to figure out how to avoid burnout. And as a employer, I'm not sure how to reduce burnout — get better leadership? How do I do that?

But it turns out there are good ideas we can try that may be found in the academic literature.

Context: I was writing up my process for preventing burnout recently (https://commoncog.com/blog/nuanced-take-on-preventing-burnou...), and I took some time to look into the academic literature for burnout, to see if it highlighted anything I'd missed.

I'll give a quick summary:

1. The standard test for burnout today is something called the Maslach Burnout Inventory https://en.wikipedia.org/wiki/Maslach_Burnout_Inventory, and it measures burnout along three metrics: exhaustion, inefficacy, and cynicism.

a) Exhaustion: described as wearing out, loss of energy, depletion, debilitation, and fatigue.

b) Inefficacy: described as reduced productivity or capability.

c) Cynicism: negative or inappropriate attitudes towards clients, irritability, loss of idealism, and withdrawal.

2. The MBI is a descriptive model, which helps you identify burnout, but we need a developmental model as well (e.g. what are the various stages of burnout?). The early models of burnout described the pathway as three stages: 1. job stressors (an imbalance between work demands and individual resources), then 2. individual strain (an emotional response of exhaustion and anxiety), and then 3). defensive coping (changes in attitudes and behavior, such as greater cynicism).

Or, to put this simply: first your job demands too much of you, then you feel anxious and emotionally exhausted, then you cope by becoming cynical about work, and then you quit.

3. Maslach found that development of cynicism is the biggest predictor of burnout-related turnover. If you're cynical about your job, you're pretty likely to think about quitting or to actually quit soon. (https://www.ncbi.nlm.nih.gov/pubmed/19426369)

4. If you look at the literature today, however, you'll find that most burnout research has converged on two development models: the Job Demands‐Resources (JD‐R) model (https://en.wikipedia.org/wiki/Job_demands-resources_model) and the Conservation of Resources (COR) model (https://en.wikipedia.org/wiki/Conservation_of_resources_theo...). I'll leave you to read the respective Wikipedia articles, but the takeaway from both of them is that burnout results from when the resources provided by the job are outstripped by the demands of the job.

As a CEO or manager, your best bet to reducing burnout is to increase the list of job resources described in JD-R, that is:

> physical, psychological, social, or organisational aspects of the job that are either or: functional in achieving work goals; reduce job demands and the associated physiological and psychological cost; stimulate personal growth, learning, and development. Examples are, career opportunities, supervisor coaching, role-clarity, and autonomy.

As an employee, your best bet to reducing burnout is to increase personal resources (which are different from job-provided resources in the JD-R model). But the problem here is that the research doesn't yet know if there are effective techniques in increasing personal resources. Why? Well ...

5. There are only two decades or so worth of research into burnout, and the research was primarily centered around the care-giving professions. Consequently, early models of burnout were thought to stem from social exhaustion (e.g. nurses and doctors dealing with death, or grief); in a 2016 retrospective review article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4911781/), Maslach herself write that this is an active field of research, and they don't yet know if you can train someone to be more resistant to burnout.

There are some interesting research directions, though. I found this paragraph in an undergrad paper from Australia (where apparently half of the nurses there leave the profession prematurely, mostly due to burnout):

> Rather, failure to recover consistently from such work stresses (in non-work time) is a crucial determinant of chronic (maladaptive) fatigue and burnout evolution (Winwood et al. 2007). When such recovery is effected consistently, physiological toughness (Dienstbier 1989,1991) and enhanced stress resistance is developed, with improved performance at work, better sleep and reduced maladaptive health outcomes (Dienstbier 1991).While some individuals may achieve this spontaneously, far more may benefit from specific training to do so effectively and consistently.

The Winwood paper may be found here: https://www.ncbi.nlm.nih.gov/pubmed/17693784 but it's unclear if they've found a set of techniques that work for most people in most professions. I'm still reading up.

PS: I have a technique that I use myself, but it's unclear that it would work for everyone. I wrote it up here: https://commoncog.com/blog/nuanced-take-on-preventing-burnou.... It's certainly protected me from burnout in startupland over the years. But, as I've mentioned, I'm interested in the research because the goal there is to find a set of general techniques that would work for most people. My technique has a sample size of one.


I would pay for this.

Yes I worked for the company doing the .coop registry and ICANN had very strict rules about code escrow we had to follow.

> or useable only in specific contexts or with specific words.

A good example of this is the Winograd Schema. You might think you can figure out a good algorithm for anaphoric resolution (i.e. If you see "Sally called and she said hello.", who is "she"?) that just relies on the structure of a sentence, without considering semantics.

But here's a counterexample:

"The city councilmen refused the demonstrators a permit because they feared violence."

Who are 'they'?

"The city councilmen refused the demonstrators a permit because they advocated violence."

Now who are 'they'?

If you're like most people, even though only the verb changed, the binding of 'they' based on the deeper semantic meaning also changed.

These sentences are called Winograd Schema[1], and there are plenty more like it.

[1] https://en.wikipedia.org/wiki/Winograd_Schema_Challenge


I am a very loyal mutt user. Coupled with mbsync (for offline imap) and notmuch (for indexing, threading and address completion) I think it's a great MUA.

I obsessively measure wakeups/s in powertop for every application I use, and the above setup is very good.

If you want to try other things you can look into the mutt-kz fork (which replaces internal search commands with notmuch).

There is also a notmuch curses client, but I find mutt+notmuch superior.

Mu is a notmuch alternative worth considering. Their emacs client is excellent.

Finally, also if you use emacs, gnus is perhaps the most powerful email client ever. Steep learning curve, though. It's a bit different to everything else, as gnus has been built as a news reader. It treats mail as news, but this has some great advantages I think, as long as you can adapt your mindset. Also Org integration.

Gnus is the only thing that might make me switch away from mutt.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: