Skip to main content

Twitter Steganography

I have recently been thinking about Steganography again and various carriers as well as applications. For those of you that don't know what Steganography is, it simply means 'hidden writing' from the Greek. Some examples of steganography are: tatooing the scalps of messengers and then waiting for their hair to grow back; writing a message on the wood of a wax tablet before pouring the wax in; 'invisible inks'; pin pricks above characters in a cover letter; etc. Basically, we have a 'cover', which could be an image, passage of text, etc., that we are happy for anyone to see and a message that we want to hide within it so that it is undetectable. It turns out that this last part is quite hard.

Anyway, I thought I'd look at techniques to embed data within Twitter as it is popular now and people are starting to monitor it. Hiding within a crowd, however, is a good technique as it takes quite a lot of resources to monitor all activity on a service like Twitter. The techniques described here would work equally well on other social networks, such as LinkedIn, Facebook, etc. How do we embed data within a medium that allows only 140 plaintext characters though? Well, there are several methods, a few of which I'll talk about here. I'm only going to discuss methods that would be quite simple to detect if you knew what you were looking at, but that will go undetected by the majority of people.

The first method is to use a special grammar within your Tweet. If the person you are communicating with knows the grammar then you can alter a message to pass data back and forth. A simple example of this technique would be to choose 2, 4 or 8 words that mean the same thing, but each one represents a value. For example, you could use fast, speedy, quick and rapid to represent 0, 1, 2 and 3 respectively, effictively giving you 2-bits of embedded data. If we had 8 words then we would have 3-bits and so on. This can be extended to word order in the sentence and even the number of words per sentence. However, messages can be difficult to construct in such a way as to be readable and this is not a high data rate. We could probably get only one or two bytes worth of data in an update message.

Another method is suggested by Adrian Crenshaw. He used unicode characters, giving access to two versions of the charcterset. So the lower range represented 0's and the upper range of characters represented 1's. This is a good scheme, as you then transfer as many bits as there are characters in your message. This gives a maximum of 140 bits. The issue with his scheme is that on some devices and Twitter clients the two character sets look quite different and it is definitely detectable. However, a good idea nonetheless.

Following on from this, we can encode bits within the message, so that they aren't seen by the user, by appending whitespace to the end of the message. Whitespaces are things like a space or a tab, i.e. a place where a letter isn't. A simple method to embed your data is to represent a 0 by a space and a 1 by a tab. The good thing is that web browsers will display multiple whitespaces as only a single space, so this will be invisible within a browser. Other clients will print them out, but there's nothing to see. Now, Twitter, and most social media clients, will strip whitespace from the end of your message as they assume that you added them by accident. This will destroy your data. However, if you add the   HTML code to the end of your message then it will keep all the whitespace (indeed, you could put any character at the end, but you may see multiple spaces in some clients). The advantage of using the   is that it is a whitespace character and won't be displayed in your message. Now, you will need to write a short message and add the non-breaking space at the end, so you won't have that much space, but you should be able to get up to nearly 16 ASCII characters in this way, but certainly over 100 bits if you keep your message short.

We can also be quite blatant with our data. We can rely on the fact that people won't know we're transferring data and won't look very hard. A simple URL shortening service can be exploited in two ways to embed data. The simplest method is to make up a URL. Twitter users rely on and extensively. If we base-64 encode our text or data, then we can add 6 bytes (or characters) to a URL. For example, I could tweet: "Just read this and saw the photo". Now, these URLs are fake and don't lead anywhere. However, the base-64 encoded text of the two URLs decodes to "RLR UK Ltd." and how many people will follow your link anyway. Even if they do, the two sites here will just put up a helpful message that there was an error with the URL. You can now appologise and provide two real URLs. Meanwhile the message has got across. Obviously more URLs mean more data - up to 36 bytes if you just send 6 URLs.

The second method of using a URL shortening service is to write your own. Now you can provide real URLs but flag particular IP addresses or require the addition of an extra parameter to the URL to make it show a different page to the person you are trying to communicate with, e.g. a password. This isn't really Steganography as such, but could be used to transfer URLs that can be checked by someone else and don't reveal the true target.

The final method I'm going to discuss here is the use of a Stego Profile Image. All social media networks allow you to upload and display a small image on your page. Why not use traditional Steganographic techniques to embed data within this image. If you change your image regularly then it won't look suspicious when you change it to transfer data to someone. There are tools on the Internet to do this for you by replacing the Least Significant Bit (LSB) of every pixel with one bit of your data. This is a simple scheme and easy to detect. There are other much better schemes that are not only harder to detect, but that will give you more 'space' within the image to store your data. To give you some idea, a 4-colour, 73x73 pixel GIF like Twitter's default images can store nearly 4KB of data with no visual impact. However, that's for another blog post...


  1. If you are interested in a "very large" data-hiding capacity steganography, why not visit the following Web site.
    From its link you can download a fantastic steganography program for Windows without any charge.


Post a Comment

Popular Posts

Coventry Building Society Grid Card

Coventry Building Society have recently introduced the Grid Card as a simple form of 2-factor authentication. It replaces memorable words in the login process. Now the idea is that you require something you know (i.e. your password) and something you have (i.e. the Grid Card) to log in - 2 things = 2 factors. For more about authentication see this post . How does it work? Very simply is the answer. During the log in process, you will be asked to enter the digits at 3 co-ordinates. For example: c3, d2 and j5 would mean that you enter 5, 6 and 3 (this is the example Coventry give). Is this better than a secret word? Yes, is the short answer. How many people will choose a memorable word that someone close to them could guess? Remember, that this isn't a password as such, it is expected to be a word and a word that means something to the user. The problem is that users cannot remember lots of passwords, so remembering two would be difficult. Also, having two passwords isn't real

How Reliable is RAID?

We all know that when we want a highly available and reliable server we install a RAID solution, but how reliable actually is that? Well, obviously, you can work it out quite simply as we will see below, but before you do, you have to know what sort of RAID are you talking about, as some can be less reliable than a single disk. The most common types are RAID 0, 1 and 5. We will look at the reliability of each using real disks for the calculations, but before we do, let's recap on what the most common RAID types are. Common Types of RAID RAID 0 is the Stripe set, which consists of 2 or more disks with data written in equal sized blocks to each of the disks. This is a fast way of reading and writing data to disk, but it gives you no redundancy at all. In fact, RAID 0 is actually less reliable than a single disk, as all the disks are in series from a reliability point of view. If you lose one disk in the array, you've lost the whole thing. RAID 0 is used purely to speed up dis

Trusteer or no trust 'ere...

...that is the question. Well, I've had more of a look into Trusteer's Rapport, and it seems that my fears were justified. There are many security professionals out there who are claiming that this is 'snake oil' - marketing hype for something that isn't possible. Trusteer's Rapport gives security 'guaranteed' even if your machine is infected with malware according to their marketing department. Now any security professional worth his salt will tell you that this is rubbish and you should run a mile from claims like this. Anyway, I will try to address a few questions I raised in my last post about this. Firstly, I was correct in my assumption that Rapport requires a list of the servers that you wish to communicate with; it contacts a secure DNS server, which has a list already in it. This is how it switches from a phishing site to the legitimate site silently in the background. I have yet to fully investigate the security of this DNS, however, as most