8.10.05
Success with Spammy Check
After a few days of really testing my new referer spam fighting tool, I can say it works flawlessly. Everything being spammed in my Shameless Plug page is getting rejected. Also, legitimate plugs are clearing through. What's more, "like" spam plugs are aiding to reject each other. So, to offer others who may be looking to try something similiar, here's how it works.
The key is how spammers use referer. Typical referer spam will load around 30 or more links that all relate to one specific link. For instance <a href=http://"www.stupid-spammer.com">www.stupid-spammer.com</a> will be added to a "URL" input on a comment section, but the "description" section will have additional links to www.stuid-spammer.com/cool.html, www.stuid-spammer.com/checkout.html, www.stuid-spammer.com/online.html, ect.
What Spammy Check does is compare the actual URL to what's in the description after it's added to a "spam" table. All plug submissions are treated as "guilty until proven innocent" so every plug goes through this filter. To have as clean of a comparison as possible, using a combination of regular expressions and substr(), I strip all links (in from the URL and description) as bare as possible. <a href=http://"www.stupid-spammer.com">www.stupid-spammer.com</a> will become a stupid spammer.
Next, I do a Boolean search of the "spam" table for URLs and descriptions to see if any words of the URL match anything in the table. Since MySQL will block noice words from being searched, this query will only be against "stupid" and "spammers". Those results are set as a score. If that score exceeds a predetermined threshold, the plug/comment is blocked. If a plug passes the test, it is deleted from the "spam" table and added to the "plug" table.
What's more, the "spam" table has learned to look for "stupid", "spammer" and what ever other words were added to the description in relation to new URLs being posted. That means that "www.hotel-spammer.com" and "www.stupid-blockers.com" will be caught as well. Since both of those would likely be spammers as well, they will have a huge list of related sites which will further block "hotel" and "blockers".
Legitimate traffic will typically pass through because spammers use certain words that identify them as such. Words that normal conversation wouldn't include. Hopefully this brief example will help others to block idiot referer plugs.
0 ramblings so far

9:42 am in