One of the necessary conditions for a secure cipher is security under chosen plaintext attacks, as specified by the following game:
I give you two plaintexts of the same length of my choice, you select one of them and encrypt it with a key of your choice and give me the resulting ciphertext. If I can determine with probability p>50% which of the two plaintexts you have encrypted, the cipher is considered broken at level p.
I chose the following two plaintexts relative to the C source liked to elsewhere in this thread, with
#define DATA_SIZE (1024 * 1024 * 128)
Plaintext 1:
for(i = 0; i < DATA_SIZE; i++)
decrypted[i] = i;
Plaintext 2:
for(i = 0; i < DATA_SIZE; i++)
decrypted[i] = 0;
and claim that your algorithm is broken at a level of at least 95%. (I actually think it is 100% if I make the plaintext long enough, but I'm hedging my bets by conceeding 5%).
I don't need the full ciphertext, the output of the following code fragment is sufficient:
I have not considered this kind of game, and therefor not designed the cypher to withstand it. I need to run your code and do some tests to look in to it further.
BUT:
I currently think that the line:
key[pos_a] ^= key[pos_c] ^ i ^ output[i];
Is a problem as "output" can counteract "i". I would probably replace it with:
> I have not considered this kind of game, and therefor not designed the cypher to withstand it.
This seems to be the central summation of why most of us should not be rolling our own, and why peer review and cryptanalysis is so important to determine if something really is secure. :)
In fact I think everybody who's interested in the subject should try to roll their own crypto for the experience and fun in it. They shouldn't use it in the real world though.
Most stream ciphers just output a pseudorandom stream to be xored with the plaintext.
Treating it as a PRG simplifies the security analysis. Letting the plaintext affect the state is more likely to hurt than help if you assume that the attacker knows or can influence parts of the plaintext.
This to me is the challenge. If done the wrong way you would allow an attacker to influence the algorithm, done the right way it could remove any patterns created by the key. It seams all attacks proposed in this thread try to attack this problem, while the rest seams harder to attack.
During the encryption phase the argument by Strilanc doesn't hold (as in that case decrypted[i] is given as the input and does not contain any key admixture), and it is exactly the case where pos_c equals pos_a that is problematic, as in that case the statement
key[pos_a] ^= key[pos_c] ^ i ^ decrypted[i];
overwites key[pos_a] with the value
i ^ plaintext[i]
that doesn't depend on any part of the key at all.
You're right, if pos_c = pos_a, the rotation does prevent the cancellation.
But throwing away a word of entropy most of the time is still very serious, and ending up with a different cancellation in the remaining cases is not really promising.
As always, it's terrible easy to create an encryption algorithm that's complicated enough that you yourself can't break it. That doesn't make it safe. (It might still be a fun exercise for you.)
Have you tried breaking any ciphers, yet? Even obviously weak ones require some thought to break.
Personally I think you would be more likely to get a better response:
* If you delivered a complete implementation in a popular language like C, Python, or Ruby, rather than publishing pseudocode that may or may not be interpreted correctly by error-prone humans
* If there was a clear goal. "Find a weakness" is vague; "Recover the missing plaintext from this 1MB message" is specific.
I think that the author is more interested in the design of the encryption algorithm rather than the specifics of the reference implementation provided. So yeah a dodgy prng was used but is there anything wrong otherwise - i.e. something inside the algorithm itself that reduces its efficacy and makes it vulnerable.
Well, if they would provide an actual implementation rather than a reference implementation, then I would break it. Breaking cryptography often depends on breaking implementations, not breaking the underlying theory. That's why AES-CTR is theoretically secure but so deviously hard to actually implement.
I note in the comments that the random number generator is not secure in any way. Its just a handy thing for testing, not something you would use in real life.
I like the idea of mutating both the output and the key. You are using it for a single pass over the length of the data. You could use the final key as input to another round of crypto. Repeating it for a number of rounds would keep it deterministic, but increase the computational load of an attacker and create an even distribution of input to output. Ah, problem could be that decryption would need the final key, not the one you started with, if so it would be good for a 1-time hash. So the key idea for the crypto is the nuking of the part of the key that was just used for the xor. Thing that concerns me is if the attacker has access to your crypto in binary form, he could run it again and again on different inputs (data,key) to infer its structure - you'd want to slow him down.
And there are obvious bugs in that code, for example:
pos_a = key[pos_b] % key_size;
probably should have been:
pos_a = key[pos_b % key_size] % key_size;
Exactly. Unless I misunderstood the code, If the key size is small (compared to the plain text size) and we know the key length, frequency analysis will work just fine.
How would you go about this? Obviously the key is smaller then the message (otherwise we would use one time pad and be done with it) The key needs to be large enough that A, it cant be guessed, and B so that the branches of possible positions in it quickly becomes unmanageable.
Well your key doesn't have much entropy to begin with....
On a more serious note, the code you written uses a 8bit key. you need to change the line:
(key[pos_c] << 31) | (key[pos_c] >> 1);
to:
(key[pos_c] << 7) | (key[pos_c] >> 1);
The algorithm can obviously also be modified to use 64bit chunks that could be faster on 64 bit hardware.
This is a classic stream cipher. A stream of bits is generated and xored with the plaintext. In code size RC4 is of very similar size, but work on bytes rather than 32 bit words. RC4 has been used in a wide number of areas, like SSL/TLS. It is now known to have problems, but it was belived to be secure for many years, but now some quite bad attacks are known (and much harder to break than this algorithm)
I see one very big problem with the algorithm. If I know the first bytes encrypted (a known plaintext attack), I will recover the xor of two parts of the raw key. On the next word encrypted, it will be a shifted version of the raw key material. This will eventually mix, but the mixing is slow enough that the key can be recovered. Even ciphertext only attacks will be possible if you know or guess underlying statistical properties of the plaintext, like that this is english text, HTTP headers or whatever. In general, too much of the key material leak into the keystream, making the cipher breakable. In practice, the length of an HTTP header will likely leak enough of material to break it.
A better streamcipher would have a key schedule before the encryption, making a table of bits that are mangled based on the key bits so that there is less correlation between the keystream and the key instead of using the key directly. Any such correlation will make the cipher breakable.
Furthermore, there must be more mixing of bits than the given algorithm has. RC4 appear to have no more mixing, but that uses a much bigger table of 256 bytes, and not only the size of the key like this one, that makes it very likely that a byte has been mixed quite a lot the next time it is used.
"I will recover the xor of two parts of the raw key."
Yes, but how is this useful since the relationship between these two are no longer valid before the next operation? the shift is slow, but the XOR destroying one of the two components that have a known relationship is instant.
They are extremely important because you can use it to construct an attack that let you narrow down the number of guesses you have to do to recover the original key, so that it becomes feasible to crack the code. In general, leaking information about the key like this is almost a cardinal sin in cipher design, as it will give an opportunity for an attack. Modern ciphers will leak practically no information about the key.
True, if you want to brute force the algorithem you loose some bits by knowing that somewhere in the key there are 2 values that together become an xor. The question for me is if this information compounds over time. My thought was that it wouldnt since the key is modified.
With that said, I'm not convinced that your pseudocode actually works. Specifically, the following doesn't make sense to me:
pos_a = key[0];
pos_b = key[1];
pos_c = key[2];
...
pos_a = key[pos_b] % key_size;
It would be really cool if you posted functioning encrypt() and decrypt() methods to github. Something that people can actually compile/run/analyze. In fact, if you do that get in touch with me and I'll try to crack it.
Thanks! Its been driving me crazy to try to crack it. It cant be this easy right.....?
I thought a lot about the possible starting point for the algorithm, but i found having it just be "unknown" like this was all that was needed. A test to make sure, pos_a, pos_b, pos_c are never the same might be useful. The code obviously needs to be fixed to modulo pos_X never to be larger then the key size.
As eru said elsewhere, you've created a stream cipher. To be secure, the
generated keystream (the thing you xor against the plaintext) must not have any
biases: the attacker shouldn't be able to guess the contents of any keystream
byte (with P > 1/2^8). Otherwise, an attacker can probabilistically recover
parts of the plaintext.
I played around with your C implementation a little[1]. I put it into 8-bit
mode, and generated 16 random keys and output their keystreams[2]. Here's the
first 32 bytes of those keystreams, in hex:
As you can see, there are some significant biases here. For example, as an
attacker, once I have a ciphertext, I can guess that bytes 18 and 19 of the
keystream were (hex) "1112", and have a very good chance of being right.
While the 32-bit version isn't as bad, I think there's still significant biases.
I generated 24495 keys. Here's the distribution for keystream byte 1 (script at
[3]):
I should really get to sleep so I didn't do the statistics on this, but I'm
fairly sure that's not a uniform distrubtion (i.e., bytes aren't being drawn
uniformly from [0:255]). Also, keystream byte 0 only ever has 128 values,
instead of 255.
I did not look at 64-bit.
All in all, I wouldn't use your cryptosystem :) but there's no shame in that!
Crypto is hard! and youre only going to get better.
I'm not really sure how you can get similar results with completely different keys. I'm looking in to this. I tried generating a meg encrypted zeroes in 8bit mode and the most used 8 bit value was only 1% more common then the least used value.
This looks fishy to me. Follow my thought: If you are looking at the very first value coming out of the stream, that value should have zero bias from the algorithm. Why? because it is an xor of two different values in the key. If both values are properly random then so should their XOR.
Note that at this point the algorithm has not yet started modifying itself, so the original key is still intact. The self modifying code may still be broken, but i think we can be fairly sure the first value has the exact same bias as the random number generator.
it's not standard with symmetric encryption in general; it's a "known plaintext attack". http://en.wikipedia.org/wiki/Known-plaintext_attack - with aes, for example, even if you know the plaintext there is no known way to obtain the key (faster than brute force search).
As everyone said, you have created a stream cipher. But I feel that I have to add that your idea (OTP with small key) is usually implemented this way: the small key is a counter used an input to some PRNG specifically designed for this (you should try AES in CTR mode), which you increment every time.
This produces a sequence which is always the same for each session, but because the keyspace is so large (2^128 with AES-128), this is acceptable because these PRNGs are actually cryptographically secure hash functions, which makes them very hard to reverse. Therefore, the only efficient way to crack your session is to either catch the key (your counter) or to brute force it by using every value of the counter and test whether this leads to a successful decryption. Not only are there 2^n possibilities, but every possibility is extremely expensive to test and then you have to accurately check whether the decrypted values are coherent (this is difficult with packets with a fixed header and nearly impossible with raw binary data).
The big disadvantage is having to store your small key until decryption.
If a cryptographic algorithm is fundamentally secure, a small (128 - 256 byte) key will be all it needs to be secure. If an algorithm can't be secure with a key that size, giving it a bigger key is likely to only make it marginally better.
> anyone [...] can see an obvious weakness here: the procedural algorithm will produce a pattern that can be found and used to break the key. [...] But what if the algorithm isn’t specific? What if the key describes the algorithm? If the key is data that can be used as an algorithm to produce data, we can create a cycle where the algorithm is self modifying and therefor wont create a pattern.
This doesn't seem right. If "the key describes the algorithm" you need another algorithm to generate the algorithm from the key. This other algorithm is unique and will produce patterns like any other deterministic algorithm. Patterns in the generated algorithms will eventually be reflected in the data.
Creating new algorithms based on the key doesn't change the fact that the input for the encryption function is the key (and in this case apparently the plaintext).
You can reduce all this algorithm generation back into essentially one algorithm anyway: generating "changes to the algorithm" merely obfuscates the presentation of the underlying single algorithm. In the end, there's still a fixed set of input information that goes into an encryption function and no matter how you try to obfuscate it Shannon won't budge.
No algorithm can ever be harder to crack then trying all permutations of the key: in other words brute force. The purpose of any encryption algorithm is to make it as close to impossible to crack it in any less time consuming way. One could argue that my algorithm is better then that because the plain text salts the algorithm, but that doesn't account for any potential guess work, or even an attacker trying to get a user to send a specific stream that would help crack the code.
I don't have time to analyze it carefully, but you have created a stream cipher. It is actually quite fun to implement ciphers and learn about all of the various attacks. If you are to be serious about this you should really not start from a pure product of your mind, but learn about the latest and greatest techniques. The academic field of cryptography doesn't move /that/ quickly, so it is quite reasonable to keep up with it if you are passionate about it. You can do much better with just a little knowledge about basic attacks. You can basically keep designing algorithms that fewer and fewer people can break until you reach the point where no one can break it with known, modern, academic cryptanalytic techniques. And when you get there... you might be a Cryptographer :)
Essentially, you need to take a key, and an IV, and feed it into your magic box (algorithm) and all that should come out is an incredibly long random string that does not repeat for a very, very long time. The shorter the period the worse off you are. I can't be sure, but I think you have a predictable, short period.
Even more interesting, read up on RC4: http://en.wikipedia.org/wiki/RC4 -- there were small but detectable biases in the initial part of the key stream, which allowed an attacker to break the algorithm. The solution was to advance the key stream a bit to discard the initial, biased part.
And thus we get to the hardest part of cryptographic algorithm design and why these exercises are so hard. It is very easy for someone to build an algorithm they can't break themselves. Obviously, you did everything you knew how to do to prevent it from being broken. It might even take an experienced cryptographer a while to break it (the key stream bias in RC4 was not found for a long time). Check out some of the WEP attacks.
Another thought for you: http://www.rickwash.org/papers/stream.pdf -- just how easy it is to be wrong. Page 8: If the swap operation of RC4 is omitted, the keystream becomes cyclic with a period of 2
n+1 -- very small changes can introduce non-obvious periods, etc.
Final thought: check out how WEP got broken. Stream ciphers are hard. This is why it is often very good as an educational exercise to write your own ciphers (I have done it), but you sort of do it once and either decide to get good at it or realize it takes a LOT of knowledge and it is better to just know the attacks and not invest a lot of time in it.
Anyhow, hope you had fun with this. Read up on the various ways stream cipher design can be broken, and check out how AES can be used as a stream cipher. Not sure I had a point, precisely, just wanted to share some thoughts having seen a dozen or more of these crazy algorithms in use in live code meant to protect truly sensitive data. Some programmer always thinks he is clever and ends up creating some XOR cipher (I know that is not you as you have the right goals, in my mind, for doing this).
Fewer and fewer people. The art world is similar. Artists should be aware what other artists have done, to not do something unoriginal unintentionally. But that's hard work because it requires so much time/energy to investigate. So the hard work of art criticism falls on the non-artist who can dedicate him/herself to the task full-time. My analogy is the critic is the codebreaker and the artist is the code-maker. Artists make great critics because they have inside knowledge of how the moves (decisions) are made. But artists often lack the historical perspective, awareness of emerging work and (while I can't read anyone's mind) they typically lack the vocabulary to discuss their decisions analytically. But splitting up the work allows for more specialization. They're both specialists with tons of overlap but they have different goals. So maybe the cryptography world will make the same split, if it hasn't already? Crypto-artists dedicated to creating complexity and then historian-critic-codebreakers to determine the value (strength?) of what was created. They need each other. One creates, the determines the value of it.
Anyway, as in art, "naiveté" (no formal education) can sometimes lead to a surprising breakthrough. While it's rare for an uneducated artist to do something interesting, I believe education hinders creativity in some instances. Though most unschooled artists, most of the time, tend to produce the same cliché results as predecessors, they do have the advantage of always thinking outside the box because the box is invisible to them. So outsiders can get breathtakingly weird results sometimes and part of that weirdness is due to their lack of education. It seems to me cryptography is like art in that originality is a goal. In non-objective art especially, you don't want anything recognizable. Like cryptography, non-objective art conceals the true meaning behind the intent. Sometimes cryptography and art even conceal the intent.
So creativity allows for infinite approaches and therefore infinite ways to conceal information. Anyway, I don't plan to become a cryptographer but I think it's a cool analogy (to compare to art) and would be interested in learning more about the "big picture" of cryptography if there are any good books on the subject, even if it's fiction.
This is a good point. The interesting there here is the raw level of information you need to have to even hope to do something new. From a period, say WWII up to when Shannon was just doing his seminal work, you could think maybe a little on a fairly broad area, and maybe come up with something new. Now, as with many areas of academic thinking, you have to learn so much just to get to a point where you can advance a really small piece of state of the art just a little bit.
I think a lot of real math and academic work follows this pattern, and it is the reason one has a right to be skeptical, or even dismissive, of someone doing something like this if they were to present it as a serious work. "Here, try to break this". It is great for someone to learn the true scope of their ignorance, but at the same time, it is also so unlikely you have done even the tiniest novel thing that it is almost always OK to dismiss it out of hand.
Cryptography is really just an applied branch of math/information theory. Things that work for advancing the state of art for math are generally the same tools that work for advancing the state of art for cryptography. In the raw, pure and abstract sense. In the practical world, where you have to have real bits on real physical things, the attacks are myriad and the innovators in that space often need minimal training in cryptography (relative to someone doing the theoretical work).
I did, but I couldn't find anything. its very evenly distributed even if the plain text just zeroes. Again, I'm not sure some one else cant find any bias just because I couldn't.
Maybe, if no one can find a glaring problem with it. I think using in in a game would be a good place to test it since the world doesn't fall over if it is broken.
Why bother? Just use a standard encryption. (Unless you feature your algorithm as a puzzle to be broken. Would make for an interesting spy game. But one that's probably more involved and `work' than most people want from their games.)
I give you two plaintexts of the same length of my choice, you select one of them and encrypt it with a key of your choice and give me the resulting ciphertext. If I can determine with probability p>50% which of the two plaintexts you have encrypted, the cipher is considered broken at level p.
I chose the following two plaintexts relative to the C source liked to elsewhere in this thread, with
Plaintext 1: Plaintext 2: and claim that your algorithm is broken at a level of at least 95%. (I actually think it is 100% if I make the plaintext long enough, but I'm hedging my bets by conceeding 5%).I don't need the full ciphertext, the output of the following code fragment is sufficient: