More

zoren · on Dec 13, 2017

In The Black Swan, Nassim Nicholas Taleb argues that applying game theory to real life situations is one instance of the ludic fallacy. https://en.wikipedia.org/wiki/Ludic_fallacy

VHRanger · on Dec 13, 2017

Fair warning: N. Taleb is more or less an anti-intellectual. He hates on most formal models he didn't write himself and most of his books are musings on heuristics.

zoren · on Aug 26, 2017

Why would you hold 10,000 elements in a linked list?

fsloth · on Aug 26, 2017

That's just - for example - every point in 100×100 R^2 cartesian grid. 10k elements is not by any measure a 'large number of elements'.

Just pick your algorithmic use case for a good excuse to use list and not an array from here:

http://bigocheatsheet.com

eru · on Aug 26, 2017

When you first write design your program, you might indeed use a different data structure than a linked list when you anticipate 10k+ elements.

But often programs are used long after they are designed, and it is great if they can degrade gracefully as you move outside their original design parameters.

mcguire · on Aug 26, 2017

I couldn't hold them in my cupped palms.

zoren · on July 27, 2017

When will this crash come? And how do you know?

efficax · on July 27, 2017

Because markets move in cycles? We've had multiple crashes in the past 40 years, most recently in 2008. Markets always crash, it's what they do.

zoren · on July 27, 2017

Oh I understood you as if you where claiming we are currently in a tech bubble that will burst soon. That's why I asked when this would happen.

zoren · on June 28, 2017

I would claim ANTLR is such a tool.

zoren · on June 28, 2017

Are they? I thought parser combinators made parsing context sensitive grammars easier than when using parser generators.

zoren · on May 8, 2017

Do you have numbers to back up your claim?

snowpanda · on May 8, 2017

There was no general claim, he shared his first hand experience.

zoren · on May 8, 2017

True, irrational shared a first hand experience and then proceeded to speculate as if that experience was evidence that smoking is more prevalent in Canada than in the US.

zoren · on May 6, 2017

That is one weird array in Google's reply. Looks like it could have been an object instead, whereby JSON hijacking wouldn't be a problem.

zoren · on May 5, 2017

I don't think it was intended in the formal sense either. But "minimalistic" or "small" would probably be more suitable.

zoren · on March 19, 2017

Someone please remind me why the hash is not a type definition so the representation would only have to be changed in one place.

jffry · on March 19, 2017

If you have a repo with a lot of GPG signed commits, or you just don't want to change all your commit IDs (because you reference them in other places), then it'd be very valuable to be able to have a repo that's mixed old and new hashes.

Also your Git binary, if compiled with only the One True Hash™, wouldn't be able to work with older repos at all because the hashes it's calculating are now different.

(Edit: Another benefit of generalizing this is so that if/when, in the future, the new hash algorithm must be abandoned due to weaknesses, Git tooling will have been already introduced to the notion that hashes can be different and should hopefully be a less involved migration the next time around)

loeg · on March 19, 2017

The one typedef could have just been changed from char[20] to 'struct objectid' to support multiple hash types.

hn_throwaway_99 · on March 19, 2017

It was, see comment from bk2204 above:

> Yes, this is correct. The struct object_id changes don't actually change the hash. What they do, however, is allow us to remove a lot of the hard-coded instances of 20 and 40 (SHA-1 length in bytes and hex, respectively) in the codebase.

loeg · on March 19, 2017

No. We're talking about zoren's hypothetical case where git used a typedef from the beginning, instead of littering char[20]s all over the tree.

My prior comment was explaining why jffry's complaint is nonsensical (a typedef does not prevent moving from a single hash model to a multiple simultaneous hash model).

hn_throwaway_99 · on March 19, 2017

Hmm, not sure where there disagreement is, that's exactly what I'm saying. Obviously it wasn't done from the beginning, but the code is now changing from the char[20]s everywhere to the typedef precisely to be able support multiple hash functions.

dahart · on March 19, 2017

That's exactly what this change is. You mean why wasn't it that way before the change? Maybe because it wasn't ever needed before? Git's been good with only sha-1 for 12 years. Think about the flip side of your question... what were the alternatives 12 years ago, or 5 years ago? And why would someone write code for alternatives that aren't expected to be used and maybe don't exist?

In my experience, generalizing ahead of need more often than not causes problems, and I've watched over-engineering result in far more effort to fix when the need it was anticipating does arrive than just waiting until the need is there.

indolering · on March 19, 2017

> Think about the flip side of your question... what were the alternatives 12 years ago, or 5 years ago?

SHA-2 and RIPEMD.

> And why would someone write code for alternatives that aren't expected to be used and maybe don't exist?

That's the problem: the software industry is still suffering from MD5 getting cracked [0]! Cryptographic agility is a baseline requirement for security primitives.

> In my experience, generalizing ahead of need more often than not causes problems

I agree and Linus has valid complaints about security recommendations during the 25-year history of Linux: most of the security recommendations kill performance and are only partial fixes, so why bother?

But Linus is also engaging in premature optimization: computers are ~30 billion times faster than when he first starting programming Linux. Yes, SHA-2 is relatively slow, they could have at least not hardcoded SHA-1 into the codebase and protocol.

> I've watched over-engineering result in far more effort to fix when the need it was anticipating does arrive than just waiting until the need is there.

You clearly haven't done any safety related engineering. That's the thing about cryptography: millions of dollars and human lives are at stake. Despite the smartest people in the world working on these problems, cryptographic primitives/protocols are regularly broken. Due to Quantum computing, every common cryptographic primitive we use today will need to be replaced or upgraded at some point.

Thankfully, you don't need to worry about the engineering of a given cryptographic primitive as long as you can swap it out with a new one. But when you hardcode a specific hash function and length into your protocol/codebase you are now assuming the role of a cryptographer.

[0]: https://en.wikipedia.org/wiki/Flame_%28malware%29

tedunangst · on March 19, 2017

Even totally ignoring that SHA2 was a thing, anybody looking around would have noticed that MD4 was broken, MD5 was broken, and it would be unlikely that the hash of today would stand forever.

dahart · on March 19, 2017

Yes, true. Correct. That is still true, and applies to SHA-2 as well. And Linus was aware of exactly what you say, back in 2005.

My point was that the choice that was made was considered good enough for the purposes for which it was intended. In the context of the OP's comment, criticizing git for not making different code design choices doesn't mean that Linus was wrong, it may mean that the OP doesn't know and/or understand all the considerations Linus had. And Linus has said many times that the security of the hash is not the primary consideration in his design.

Git's choice of SHA-1 was not at the time predicated on having the single most unbreakable hash in existence, the hash's use is not for security purposes, and to talk with such incredulity about Linus' choice may be to misunderstand git's design requirements.

QSIITurbo · on March 19, 2017

Well, this whole mess proves that Linus was wrong. Typing "unsigned char [20]" everywhere is beyond amateurish to me in any case and raises a concern about the overall quality of code in git and linux kernel.

yuhong · on March 19, 2017

It shows that Linus is not a cryptographer, to be more precise. Though yes SHA-1 chosen prefix attacks are still very expensive at this point. I wonder how many non-cryptographers knew about SHA-2 back in 2003-2004.

QSIITurbo · on March 19, 2017

No it's more than that. "unsigned char[20]" already has at least three potential points of failure (and why isn't it uint8 anyways). Moreover, it'll be referenced as unsigned char*, which opens another can of worms. And oh, by the way, have fun searching all references to sha1 on your source code now that you weren't pro enough to create a type for your object ids / hashes.

I'm guessing it's part lack of skill in design, part bad software development tools (uEMACS and makefiles or something), and part just being against c++ et al.

indolering · on March 19, 2017

Linus regularly treats security as a second-class citizen and is famous for his outrageous harassment [0]:

> Of course, I'd also suggest that whoever was the genius who thought it was a good idea to read things ONE FCKING BYTE AT A TIME with system calls for each byte should be retroactively aborted. Who the fck does idiotic things like that? How did they noty die as babies, considering that they were likely too stupid to find a tit to suck on?

He deserves to eat this shit sandwich.

> I wonder how many non-cryptographers knew about SHA-2 back in 2003-2004.

Any systems engineer should have known about SHA-2. SHA-1 only provides 80-bits of security, so everyone else assumed that it would need to be replaced.

[0]: https://en.wikiquote.org/wiki/Linus_Torvalds

striking · on March 20, 2017

What does his "outrageous harassment" have to do with his ignorance towards security?

I agree that he should've used SHA-2 or better yet, have made the hash algorithm modular, but what does your quote add to the discussion?

indolering · on March 20, 2017

> but what does your quote add to the discussion?

Not much, thanks for the gentle reminder :)

glandium · on March 19, 2017

> MD4 was broken, MD5 was broken

There are no practical pre-image attacks for either of them yet. (2^102 for MD4, 2^123 for MD5)

loeg · on March 19, 2017

> what were the alternatives 12 years ago, or 5 years ago?

SHA-2.

> And why would someone write code for alternatives that aren't expected to be used and maybe don't exist?

Well, the real question is why someone picked SHA-1 over SHA-2 in 2005 when attacks that reduced its strength were already being demonstrated.

dahart · on March 19, 2017

Linus has explained why he picked SHA-1. I'm not Linus, and I'm not defending his choice, but he has said repeatedly that git's hash is primarily for indexing and error correction, and not primarily for security. Clearly he felt like SHA-1 was "good enough". And if you have something that's "good enough" there are reasons not to write code for alternatives you're not going to use.

snakeanus · on March 19, 2017

>but he has said repeatedly that git's hash is primarily for indexing and error correction, and not primarily for security

And he was wrong as openpgp signatures on commits and tags are a thing.

Not sure when that feature was introduced however, I doubt that it existed in the first version of git. That being said he should have changed the hash function the moment that feature was introduced.

CopperWing · on March 20, 2017

Signatures were introduced in git as part of the response to the kernel.org hack in 2011.

mikeash · on March 19, 2017

The first attack on SHA-1 was published in 2003. Git showed up in 2005. Not only should git have allowed for something else, it never should have used SHA-1 in the first place.

digi_owl · on March 19, 2017

Not really adding much, but damn it i feel old reading that.

I still recall freshly the hoopla over Bitkeeper licensing that lead to Torvalds creating Git.

lvh · on March 19, 2017

SHA2 predates git by about four years.

asveikau · on March 19, 2017

How to say this without being rude.. You didn't read the diff.

To derisively say "remind me why not X" at a diff that does X ... I am amused.

nuntius · on March 19, 2017

Backwards compat requires that both old and new hashes work at the same time. A simple typedef is unlikely to handle all the semantics and space needed for such a change...

It is often hard to generalize when N=1. Now that the N=1 use case is established and we are moving towards N=2, it is painfully obvious to all that a better abstraction is needed.

Typedef or no, we would still need a full audit of the code to find spots where people "inlined" the expansion.

IMO, Linus should have done better here -- no crypto hash lasts forever, but this code is far cleaner than useless layers of abstraction.

jlgaddis · on March 19, 2017

Perhaps you haven't read Linus' comments where he stated (more than a decade ago) that the usage of SHA1 here isn't for "security"?

(Hint: that's why GPG signing commits is an option.)

nuntius · on March 19, 2017

I read those comments more than a decade ago. They seemed weak but tolerable then. They seem broken now. Git is supposed to guarantee that the code I see is the code the author saw, in a distributed and decentralized environment. This is Git's entire reason for existing.

A secure design is essential for trusting this functionality. My trust in Git has always been tempered by the weakness of SHA1.

A GPG signature is no stronger than its object ref.

Have you seen how many frameworks believe "auto-pull and compile deps by hash from github" is reasonable? They are assuming this isn't a massive attack vector. They are trying to build on a core feature that Git claims to have.

Recent events moved this from probably foolish to provably so.

scrollaway · on March 19, 2017

When you GPG sign a commit, you just GPG sign its hash, you're not signing its diff alongside it.

cormacrelf · on March 19, 2017

That's what comes to mind every time someone brings up Linus' comments from way back when. If SHA-1 is insecure, then there is no way to have security. Forge an object, and GPG sign its commit, and you have broken the apparent security GPG signing was meant to bring. If SHA-1 was not meant for security, then security must have been a non-goal of Git.

The comments are brought up usually to explain why Linus didn't think much of it at the time, whereas they actually demonstrate the shift of thinking around what Git is meant to provide. Security is definitely a goal now, and the hash function is the critical piece of security infrastructure.

glandium · on March 19, 2017

GPG signatures actually sign the hash digest of the text they're given. Fun fact, which I think (hope) changed in recent versions of GPG: the hash, by default, is (was?) SHA-1.

One can check what is used with e.g.

  $ git cat-file -p $some_tag | gpg --list-packets | grep "digest algo"

The output is of the form

  digest algo n, begin of digest xx yy

Where n can be:

  1: MD5
  2: SHA1
  8: SHA256
  10: SHA512

(See RFC 4880, 9.4 for all values)

scrollaway · on March 20, 2017

Interesting, I didn't know! Although it makes a lot of sense now that you bring it up.

I don't think it changes anything though, because of git's integrity. Stop me if I'm getting this wrong but, if you wanted to attack a signed git commit through the gpg signature's hash, you would have to modify the commit object itself... which yields a different commit hash in order to be valid. You'd have to get absurdly lucky to have a signature collision that contains a (valid) commit hash collision.

glandium · on March 20, 2017

The text that GPG signs on a git tag is:

  object $sha1
  type commit
  tag $name
  tagger $user $timestamp $tz

  $text

If you wanted to attack a signed git commit through the gpg signature's hash, you would have to do a second preimage attack on that text with a different commit sha1.

OTOH, if you wanted to attack a signed git commit through the git commit sha1, you would have to do a second preimage attack on that commit text, which is of the form:

  commit $length\0
  tree $sha1
  parent $parent_sha1
  author $author $author_timestamp $author_tz
  committer $committer $committer_timestamp $committer_tz

  $text

See where I'm going? it's the same kind of attack.

Another way to attack it would be to do a second pre-image attack on the pointed tree, which is harder because there is not really free-form text available in a tree object.

Yet another way to attack it would be to do a second pre-image attack on one of the blobs pointed to by a tree, where the format is of the form:

  blob $length\0$content

I don't think this is significantly easier than any of the second pre-image attacks mentioned above.

So, in fact, in any case, to attack a gpg signed git tag, you need a second pre-image attack on the hash. If git uses something better than SHA-1, but GPG still uses SHA-1, the weakest link becomes, ironically, GPG.

That being said, second pre-image attacks are pretty much impractical for most hashes at the moment, even older ones that have been deemed broken for many years (like MD5 or even MD4 (TTBOMK)).

That is, even if git were using MD4, you couldn't replace an existing commit, tree or blob with something that has the same MD4.

Edit:

In fact, here's a challenge:

Let's assume that git can use any kind of hash instead of SHA1. Let's assume I have a repository with a single commit with a single tree that contains a single source file.

The source file is:

  $ cat hackme.c
  #include <stdio.h>

  int main() {
    printf("Hack me, world!\n");
    return 0;
  }

So that we all talk about the same thing, here is the raw sha1 for this source:

  $ sha1sum hackme.c
  cffc02c09faf2e9a83ecbb976e1304759868cf1c  hackme.c

And its git SHA1:

  $ git hash-object hackme.c
  36134c8c8e9fdf705441dcc1f71736064afc7c44

Here is how you can create this SHA1 without git:

  $ (echo -e -n blob $(stat -c %s hackme.c)\\x0; cat hackme.c) | sha1sum
  36134c8c8e9fdf705441dcc1f71736064afc7c44  -

or

  $ (echo -e -n blob $(stat -c %s hackme.c)\\x0; cat hackme.c) | openssl sha1
  (stdin)= 36134c8c8e9fdf705441dcc1f71736064afc7c44

And for git variants that would be using MD5:

  $ (echo -e -n blob $(stat -c %s hackme.c)\\x0; cat hackme.c) | openssl md5
  (stdin)= 1b56dbc6613ff340b324ca973aec67f9

Or MD4:

  $ (echo -e -n blob $(stat -c %s hackme.c)\\x0; cat hackme.c) | openssl md4
  (stdin)= 0eaabfc1a32629dce98c476f591c3f60

The challenge is this: attack the hypothetical repository using the hash of your choosing[1] ; replace that source with something that is valid C because people using the content of the repository will be compiling the source. Obviously, you'll need the hash to match for "blob $length\0$content" where $length is the length of $content, in bytes, and $content is your replacement C source code.

1. let's say, pick any from the list on http://valerieaurora.org/hash.html

I posit you'll spend a lot of time and resources (and money) on the problem, (exponentially more so than Google did with SHAttered) except for Snefru.

scrollaway · on March 20, 2017

I was only talking about git commits though. For tags we agree, as the tag is only a pointer (https://twitter.com/Adys/status/835595116110823425).

But for the commit it's different, because the $text in your example affects the hash of the commit itself. And my understanding is that if you sign the commit, you're signing both the contents and the hash of the content. Am I incorrect?

glandium · on March 20, 2017

If you sign the commit, you sign the exact text I quoted, where $text is what you pass to the `-m` argument to `git tag`.

IncRnd · on March 19, 2017

Yes, Linus wrote that SHA1 isn't here for security, but that was a glaring misunderstanding of security on his part. Integrity protection of source code is a security function.

palunon · on March 20, 2017

I think it's mainly due to a different threat model. Linus only pulls from his trusted lieutenants, who are unlikely to try to attack the source in that way (it's way easier to simply hide a bad commit in the lot, no need to fiddle with SHA1). They do the same.

The rest of the code is sent through mailing lists as patches, so the hash is irrelevant.

SHA1 here protects against "random" corruption (which is more than some VCS do), but not an attacker. At no point one is able to send trusted contributors bad commit objects.

Now, the use people have of git is very different from the kernel (or git) style, so their threat model is different, and SHA1 may become a security function.

IncRnd · on March 27, 2017

I understand your point. However, that doesn't take into account defense in depth which says that more than a single control should be in place.

tedunangst · on March 19, 2017

Well, if SHA1 isn't for security, there's no reason to switch away from it today.

csense · on March 19, 2017

They want the new version to be backward compatible with existing sha1 repos and remotes. Also, sha256 hashes are longer.

zoren · on Feb 19, 2017

I shall have to commit my crime in cloudy weather then.

mirekrusin · on Feb 19, 2017

Or under the bridge