Stop spam, read books

| | Comments (0) | TrackBacks (0)
Like everyone else, I get a lot of spam. Google mail generally does a good job of filtering it out, but even so 2 or 3 items of spam get into my inbox each day, and on bad days I'll find 200 emails sat waiting for me.   I also keep getting grief about the amount of spam generated from websites I help manage, so when I heard about reCAPTCHA -  a system that's designed to reduce website spam and help digitise books at the same time - I was interested.

Since most spam is automated - spammers send millions of emails at the same time - a good strategy to avoid spam is to try to prove that the person sending it is not a computer.  CAPTCHAs (Completely Automated Public Turing Tests) are designed to do exactly this. You'll have seen CAPTCHAs all over the place - they're the warped, sometimes colourful text at the bottom of the page which you need to identify before you can sign up to the latest and greatest website or post comments on your favourite blog.  The idea is that the website offers you a word which is designed to be hard for a computer to read.  If you can see the word and type its letters into the box accurately, you're more likely to be a human than a computer.  Although spammers can occasionally beat CAPTCHAs (e.g. if the word is not warped or disguised in some way, a computer can use Optical Character Recognition to decode it), they're generally pretty effective at stopping spam.
reCAPTCHA is a CAPTCHA system designed by the clever people are Carnegie Mellon University.  Their idea is to use CAPTCHAs not only to stop spam, but also to do some good for the world:

About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books.
There are lots of projects (e.g. Google Print and Internet Archive) that are attempting to digitize out-of-copyright  books to protect them and make them more widely available.  Since computers can't read pages flawlessly without making mistakes (which is why CAPTCHAs work), the university boffins hit upon the idea of using humans to help the computers.   Words that the computer cannot recognise on its own are piped into the reCAPTCHA system and humans are asked to help identify them.  By combining the 'unknown' word with a known test word, reCAPTCHA can serve two purposes at once: stopping spam and 'reading' books.  Because some of the words are not identifiable due to poor quality printing, several humans are asked to identify the same 'unknown' words to make sure the word is identified correctly.

reCAPTCHA is designed to be easy to implement on your own website or in your own application. It's available to developers in many languages (Perl, PHP, Ruby, Java, etc) and also has plugins for some popular blogging tools.  It certainly is easy to use.  I used the Perl Captcha::reCAPTCHA today to implement a captcha on our website's contact form. It took about 20 minutes including testing, and it works really well.

MovableType 4.0 has reCAPTCHA  built in, but just you try getting it to work without Josh Carter's excellent guide.  Make sure you edit the plugin template as described by Josh. If you can't see your saved public and private reCAPTCHA keys, it probably won't work.  Josh has also written a reCAPTCHA plugin for earlier versions of MovableType.   

Why not read a few more books right now, by commenting below?




Categories

,

0 TrackBacks

Listed below are links to blogs that reference this entry: Stop spam, read books.

TrackBack URL for this entry: http://www.tritastic.com/cgi-bin/mt-tb.cgi/33

Leave a comment

Remember personal info?
Comments (You may use HTML tags for style)
Verification (reduces spam)

About this Entry

This page contains a single entry by Nik published on November 11, 2007 10:51 AM.

Eynsham to Paris was the previous entry in this blog.

The longest road (the A40 to Eynsham) is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

wiggle

Other links