One of the kind of troublesome things on the web is that there is a fair amount of poorly behaved bots out there. We've been "not really" enough to raise the attention of a fair amount of spambots over the years, who seek to try and publish writeups or just put junk on their homenode attempting to sell you all on herbal supplements or whatever it is they are peddling. This has been a nuisance problems for a few years, and we have a unique challenge to fight spam in that we are an open source application. This means that if a spammer is dedicated enough, they're just going to figure out the source code and understand our form structure.
The previous attempts at stopping this kind of stuff at Sign Up were form obfuscation techniques and having a checkbox that forced someone to read it and say that they weren't an "Evil Robot Spammer". These efforts weren't effective, and we saw a lot of account creation from certain domains, with very well-defined patterns. To that end, I've implemented the follow-ing anti-spam measures into the system, and we're already seeing them bear fruit:
- reCAPTCHA: This is a Google-provided service which looks at various user behavior patterns and determines whether you are a bot. We are recording the outcome of the spam checks so we can figure out if the thresholds are wrong. This has actually been silently gathering metrics for about three weeks and appears to work without issue. The difference between v2 and v3 of the product is that V3 is a lot quieter. You are surely aware of having to pick schoolbusses out of photo grids or stop lights or something to fuel the rise of Skynet's driving ability; this is the quieter version that looks at a number of secret sauce factors.
- Eliminated the "Evil Robot Spammer" checkbox: In the numbers I have been gathering, this has actually been frustrating legitimate user signups so now that we don't have to implement low-rent turing tests to frustrate the worst of bots, then this will be good.
- Spam domain blacklists: Also in the toolkit is ability for our administrators to totally ban certain email domains from signing up. While we want to be careful with this, there's a safetynet feature which prevents certain good email domains from inadvertently getting added should we automate it in the future. This has been effective in knocking these folks out pretty hard, and forcing a new domain signup. All spamming and scamming is an economic equation, so if we make it too expensive to try and do illegitimate business here, these bots will go elsewhere
Finally, there will eventually be a spam account scrubbing feature, but that is going to require me to go column by column through the data schema and clean a lot of that out so we can safely delete. There's a couple more features that are ahead of the line in that, but having a "bad data" removal tool that runs in an automated fashion is going to be key to keeping this place tidy and not breaking. I have another update in the next week which is going to lay out the roadmap, and that will show more about what is next there
Thank you to everyone in the community for helping to point out and fight spam by hand over the years; I am optimistic this automated solution is going to eliminate or greatly reduce a lot of the nonsense going forward.
--Jay