I have recently been thinking about Steganography again and various carriers as well as applications. For those of you that don't know what Steganography is, it simply means 'hidden writing' from the Greek. Some examples of steganography are: tatooing the scalps of messengers and then waiting for their hair to grow back; writing a message on the wood of a wax tablet before pouring the wax in; 'invisible inks'; pin pricks above characters in a cover letter; etc. Basically, we have a 'cover', which could be an image, passage of text, etc., that we are happy for anyone to see and a message that we want to hide within it so that it is undetectable. It turns out that this last part is quite hard.
Anyway, I thought I'd look at techniques to embed data within Twitter as it is popular now and people are starting to monitor it. Hiding within a crowd, however, is a good technique as it takes quite a lot of resources to monitor all activity on a service like Twitter. The techniques described here would work equally well on other social networks, such as LinkedIn, Facebook, etc. How do we embed data within a medium that allows only 140 plaintext characters though? Well, there are several methods, a few of which I'll talk about here. I'm only going to discuss methods that would be quite simple to detect if you knew what you were looking at, but that will go undetected by the majority of people.
The first method is to use a special grammar within your Tweet. If the person you are communicating with knows the grammar then you can alter a message to pass data back and forth. A simple example of this technique would be to choose 2, 4 or 8 words that mean the same thing, but each one represents a value. For example, you could use fast, speedy, quick and rapid to represent 0, 1, 2 and 3 respectively, effictively giving you 2-bits of embedded data. If we had 8 words then we would have 3-bits and so on. This can be extended to word order in the sentence and even the number of words per sentence. However, messages can be difficult to construct in such a way as to be readable and this is not a high data rate. We could probably get only one or two bytes worth of data in an update message.
Another method is suggested by Adrian Crenshaw. He used unicode characters, giving access to two versions of the charcterset. So the lower range represented 0's and the upper range of characters represented 1's. This is a good scheme, as you then transfer as many bits as there are characters in your message. This gives a maximum of 140 bits. The issue with his scheme is that on some devices and Twitter clients the two character sets look quite different and it is definitely detectable. However, a good idea nonetheless.
Following on from this, we can encode bits within the message, so that they aren't seen by the user, by appending whitespace to the end of the message. Whitespaces are things like a space or a tab, i.e. a place where a letter isn't. A simple method to embed your data is to represent a 0 by a space and a 1 by a tab. The good thing is that web browsers will display multiple whitespaces as only a single space, so this will be invisible within a browser. Other clients will print them out, but there's nothing to see. Now, Twitter, and most social media clients, will strip whitespace from the end of your message as they assume that you added them by accident. This will destroy your data. However, if you add the HTML code to the end of your message then it will keep all the whitespace (indeed, you could put any character at the end, but you may see multiple spaces in some clients). The advantage of using the is that it is a whitespace character and won't be displayed in your message. Now, you will need to write a short message and add the non-breaking space at the end, so you won't have that much space, but you should be able to get up to nearly 16 ASCII characters in this way, but certainly over 100 bits if you keep your message short.
We can also be quite blatant with our data. We can rely on the fact that people won't know we're transferring data and won't look very hard. A simple URL shortening service can be exploited in two ways to embed data. The simplest method is to make up a URL. Twitter users rely on http://bit.ly and http://twitpic.com extensively. If we base-64 encode our text or data, then we can add 6 bytes (or characters) to a URL. For example, I could tweet: "Just read this http://bit.ly/UkxSIFVL and saw the photo http://twitpic.com/IEx0ZC4=". Now, these URLs are fake and don't lead anywhere. However, the base-64 encoded text of the two URLs decodes to "RLR UK Ltd." and how many people will follow your link anyway. Even if they do, the two sites here will just put up a helpful message that there was an error with the URL. You can now appologise and provide two real URLs. Meanwhile the message has got across. Obviously more URLs mean more data - up to 36 bytes if you just send 6 URLs.
The second method of using a URL shortening service is to write your own. Now you can provide real URLs but flag particular IP addresses or require the addition of an extra parameter to the URL to make it show a different page to the person you are trying to communicate with, e.g. a password. This isn't really Steganography as such, but could be used to transfer URLs that can be checked by someone else and don't reveal the true target.
The final method I'm going to discuss here is the use of a Stego Profile Image. All social media networks allow you to upload and display a small image on your page. Why not use traditional Steganographic techniques to embed data within this image. If you change your image regularly then it won't look suspicious when you change it to transfer data to someone. There are tools on the Internet to do this for you by replacing the Least Significant Bit (LSB) of every pixel with one bit of your data. This is a simple scheme and easy to detect. There are other much better schemes that are not only harder to detect, but that will give you more 'space' within the image to store your data. To give you some idea, a 4-colour, 73x73 pixel GIF like Twitter's default images can store nearly 4KB of data with no visual impact. However, that's for another blog post...
Anyway, I thought I'd look at techniques to embed data within Twitter as it is popular now and people are starting to monitor it. Hiding within a crowd, however, is a good technique as it takes quite a lot of resources to monitor all activity on a service like Twitter. The techniques described here would work equally well on other social networks, such as LinkedIn, Facebook, etc. How do we embed data within a medium that allows only 140 plaintext characters though? Well, there are several methods, a few of which I'll talk about here. I'm only going to discuss methods that would be quite simple to detect if you knew what you were looking at, but that will go undetected by the majority of people.
The first method is to use a special grammar within your Tweet. If the person you are communicating with knows the grammar then you can alter a message to pass data back and forth. A simple example of this technique would be to choose 2, 4 or 8 words that mean the same thing, but each one represents a value. For example, you could use fast, speedy, quick and rapid to represent 0, 1, 2 and 3 respectively, effictively giving you 2-bits of embedded data. If we had 8 words then we would have 3-bits and so on. This can be extended to word order in the sentence and even the number of words per sentence. However, messages can be difficult to construct in such a way as to be readable and this is not a high data rate. We could probably get only one or two bytes worth of data in an update message.
Another method is suggested by Adrian Crenshaw. He used unicode characters, giving access to two versions of the charcterset. So the lower range represented 0's and the upper range of characters represented 1's. This is a good scheme, as you then transfer as many bits as there are characters in your message. This gives a maximum of 140 bits. The issue with his scheme is that on some devices and Twitter clients the two character sets look quite different and it is definitely detectable. However, a good idea nonetheless.
Following on from this, we can encode bits within the message, so that they aren't seen by the user, by appending whitespace to the end of the message. Whitespaces are things like a space or a tab, i.e. a place where a letter isn't. A simple method to embed your data is to represent a 0 by a space and a 1 by a tab. The good thing is that web browsers will display multiple whitespaces as only a single space, so this will be invisible within a browser. Other clients will print them out, but there's nothing to see. Now, Twitter, and most social media clients, will strip whitespace from the end of your message as they assume that you added them by accident. This will destroy your data. However, if you add the HTML code to the end of your message then it will keep all the whitespace (indeed, you could put any character at the end, but you may see multiple spaces in some clients). The advantage of using the is that it is a whitespace character and won't be displayed in your message. Now, you will need to write a short message and add the non-breaking space at the end, so you won't have that much space, but you should be able to get up to nearly 16 ASCII characters in this way, but certainly over 100 bits if you keep your message short.
We can also be quite blatant with our data. We can rely on the fact that people won't know we're transferring data and won't look very hard. A simple URL shortening service can be exploited in two ways to embed data. The simplest method is to make up a URL. Twitter users rely on http://bit.ly and http://twitpic.com extensively. If we base-64 encode our text or data, then we can add 6 bytes (or characters) to a URL. For example, I could tweet: "Just read this http://bit.ly/UkxSIFVL and saw the photo http://twitpic.com/IEx0ZC4=". Now, these URLs are fake and don't lead anywhere. However, the base-64 encoded text of the two URLs decodes to "RLR UK Ltd." and how many people will follow your link anyway. Even if they do, the two sites here will just put up a helpful message that there was an error with the URL. You can now appologise and provide two real URLs. Meanwhile the message has got across. Obviously more URLs mean more data - up to 36 bytes if you just send 6 URLs.
The second method of using a URL shortening service is to write your own. Now you can provide real URLs but flag particular IP addresses or require the addition of an extra parameter to the URL to make it show a different page to the person you are trying to communicate with, e.g. a password. This isn't really Steganography as such, but could be used to transfer URLs that can be checked by someone else and don't reveal the true target.
The final method I'm going to discuss here is the use of a Stego Profile Image. All social media networks allow you to upload and display a small image on your page. Why not use traditional Steganographic techniques to embed data within this image. If you change your image regularly then it won't look suspicious when you change it to transfer data to someone. There are tools on the Internet to do this for you by replacing the Least Significant Bit (LSB) of every pixel with one bit of your data. This is a simple scheme and easy to detect. There are other much better schemes that are not only harder to detect, but that will give you more 'space' within the image to store your data. To give you some idea, a 4-colour, 73x73 pixel GIF like Twitter's default images can store nearly 4KB of data with no visual impact. However, that's for another blog post...
If you are interested in a "very large" data-hiding capacity steganography, why not visit the following Web site.
ReplyDeletehttp://datahide.org/BPCSe/
From its link you can download a fantastic steganography program for Windows without any charge.