Storing binary data in playing cards. Page 5

Using playing cards to store hidden data:

The implied card method for
encoding data into playing cards

Using the whole pack for text:

In the previous text examples, we were encoding a short message into the string, and with each improvement, I showed how many unused cards we ended up with afterwards. Technically, only the used part of the pack would need to be kept in order to store the message. To show best how the Implied Encoding Method works, we will now see how much of a longer piece of text we can encode into one pack of cards. This means that we will be using the whole of the pack, and that we will need to be much more careful about using strings of consecutive zeroes near to the end. If we do not pay attention to this, the last few characters might end up decoding as nonsense. For this reason, it pays to keep an eye on how many remaining cards there are at all times. In many cases, we might have to leave a large number of unused cards at the end of the pack.

Our message will be: "Hello I am trying to encode as many characters of text as possible into a single pack of fifty two playing cards using a method that maps the cards on to carefully chosen binary numbers." As before, we will ignore the spaces in the message. Our message is made up of 151 characters (excluding the spaces), which is more than we will ever hope to fit in.

When using Mapping Table 1 (5-bit binary numbers in order of magnitude, mapped on to the letters of the alphabet sorted by frequency), we would manage to encode 29 characters of our message into the pack. We would fit in the letters: "Hello I am trying to encode as many c". There would be 4 cards left over: the 2, 4 and 7 of diamonds, and the 4 of spades.

When using Mapping Table 2 (5-bit binary numbers sorted by the number of bits in them (low to high), mapped on to the letters of the alphabet sorted by frequency), we would manage to encode 33 characters of our message into the pack. We would fit in the letters: "Hello I am trying to encode as many chara". There are 4 cards left over: the 6 of clubs, the 7 and queen of hearts, and the 10 of diamonds.

When using Mapping Table 3 (6-bit binary numbers sorted by the number of bits in them, mapped on to the letters of the alphabet sorted by frequency), we would manage to encode 34 characters of our message into the pack. We would fit in the letters: "Hello I am trying to encode as many charac". There are 6 cards left over: the 8 of diamonds, the 8 of spades, the king of spades, the 5 of clubs, the king of hearts and the 3 of diamonds. We cannot encode into the last 6 cards because there are too many consecutive zeroes at that point in our message. At this level of bit length, we are starting to see diminishing returns, as the quantity of consecutive zeroes prevents us from using the last few cards.

When using Mapping Table 4 (7-bit binary numbers sorted by the number of bits in them, mapped on to the letters of the alphabet sorted by frequency), we would manage to encode only 31 characters ("Hello I am trying to encode as many cha"). The reason for this is that near the end of the encoding, we would have 12 cards left, but would need to encode the sequence "r", "a" which are mapped on to 1000000 and 0000010. After the first 1, that gives us eleven consecutive zeroes to encode in just eleven cards. When those eleven cards are passed to the back, they end up in exactly the same position as before, so won't show up in the decoding. The following characters would then be decoded as nonsense. Supposing we had chosen a different message to encode, we might not come up with this problem at this point. Whether it happens or not is down to luck. It is a situation where it would be helpful to use a computer to choose the wording of the message in a way that maximises the number of characters we can fit in.

The capacity for encoding text in this way:

There are two limits to the number of letters we can encode:

1: There is a fixed number of possible permutations for a pack of cards - there are only so many different ways in which the cards can be ordered. These permutations allow differing capacities for storing bits (from 52 to 1,378) but the higher capacities require unlikely data (e.g. 50 zeroes in a row), which is not going to happen when encoding text. I have not done much research into it, but I think the average maximum for real-world text seems to be around 35 characters.

2: We can only encode fifty-two 1s in a pack (when using the Basic Implied Encoding Method).

Using the Basic Method, the most 1s a pack can contain is 52. If we did not encode the letter "e" with a zero, the highest card to letter ratio that would ever be possible is 1 card per letter, and the most letters we could encode would never be more than 52. However, as long as our message only used letters that used one 1 in their binary number and used at least one "e" then we could theoretically beat 52. This would be quite difficult for numbers with few bits in them. For example, with 7-bit numbers we could only use the letters, "etaoinsr".

If we tried with 10-bit numbers, we would have more letters to use in our message ("etaoinsrhld") and this gives us a better chance of encoding a message of more than 52 letters. The mapping for 10-bit numbers containing one or fewer 1s would be like this:

Mapping Table 5:

Containing no 1s:
00 0000 0000 (000 in decimal): e

Containing one 1:
00 0000 0001 (001): t
00 0000 0010 (002): a
00 0000 0100 (004): o
00 0000 1000 (008): i
00 0001 0000 (016): n
00 0010 0000 (032): s
00 0100 0000 (064): r
00 1000 0000 (128): h
01 0000 0000 (256): l
10 0000 0000 (512): d

There are quite a few words that we can make from these 11 letters, however because we are using 10-bit numbers, we risk running out of cards when we have used up just 42 cards. We could not have an "e" (00 0000 0000) as the last letter if we only had 10 cards left, or else it would not exist when we came to decode it. Similarly, we could not have a "d" (10 0000 0000) as the last letter if we only had 9 cards left.

Another major problem occurs when we have many consecutive zeroes from neighbouring letters. The letters "d", "e", "t" ("10 0000 0000" "00 0000 0000" "00 0000 0001"), in that order, result in 28 consecutive zeroes. This means that there would need to be at least 29 cards left in the pack at the point we start encoding the first zero, otherwise it would be decoded as nonsense. Other bad card combinations are not as severe, but will still disrupt the encoding long before the end. Any messages over 52 letters would need to be very carefully thought out, and it is extremely unlikely that any useful message of this length could ever be encoded.

It is, however, possible to use 10-bit numbers to encode messages of more than 52 letters without resorting to lengthy strings of consecutive "e"s. Without putting too much effort into it, I have reached 55 letters with an extremely contrived message: "Three released deer seen in little heeled shoes and old oil stains". This works out as 1.06 letters per card. After encoding, there were 11 unused cards left, so technically, the whole message is held within just 41 cards, which works out at 1.34 letters per card. However, all 52 cards are needed to encode the message. The data required 20 Decoding Rows to decode. [For future reference, there were only 6 cards that could have been used for the Recursive Method (explained later), so that would not have helped in this situation.]

Other Optimisation ideas:

With bit lengths that leave plenty of unassigned numbers, we can further develop the system of mapping other things on to the numbers with few 1s in them. We could map popular words for one thing. However, we could also map numerals or even the most common pairs of letters (digraphs) such as: th, he, in, er, an, re, nd, at, on, nt, ha, es, st, en, ed, to, it, ou, ea, hi, is, or, ti, as, te, et; or the most common triple letter combinations (trigraphs) such as: and, for, the, tha, ent, ion, tio, nde, has, nce and so on. As long as the words, digraphs and trigraphs use fewer 1s than the individual letters within them, it would be worthwhile. Which digraphs, trigraphs, words and numerals to use would require some research. There is also the problem that the mapping table would stop being intuitive and calculable, and so would need to be learned or kept by the decoder.

Previous page Next page

my email address is the word website followed by the digits 2024, then at timwarriner.com

my email address is the word website followed by the digits 2024, then at timwarriner.com