Click on the title for all pages that deal with WikiSpam (cf. WhatIsSpam).

WikiSpam is a wikiwide problem and won't be solved but wikiwide.

Wikis are characterized by their UniversalAccess and UniversalEditing. Google measures its PageRank based on links from one site to another, plus the PageRank of the site linking to the other. Wikis are PageRank machines, being both massively linked and with hundreds or thousands of pages. These two factors - openness and PageRank - make wikis the ideal target for spam attacks.

CategoryWikiConventions CategoryWikiTechnology CategoryDifficultPerson CategoryCategory

-----

<toc>

-----

== # Introduction ==

Spammers are looking for higher Google results, not for people to follow their links. The most useful wikis to spam are not even the most popular wikis, but rather the abandoned wikis that aren't being watched. While early spammers hit sites with incredibly high PageRank, it's possible now to use Google (ironically) to find, and a robot (automated script) to write links on, millions of smaller single-user wikis whose owners will neither notice nor revert the vandalism. Millions of links from smaller sites is better than a few links from larger sites.

WikiSpam often appears as explicit links (e.g. http://example.com); as [bracketed] links, which sometimes are difficult to detect if a spammer replaces the URL in one [bracketed] link with the spam link; and as hard to see periods.

== # Possible Solutions ==

Don't panic.

Remember that TheCollective ultimately controls everything, right down to access, on an OnlineCommunity. The only problems are the negotiations. We could FishBowl the community or we can leave it wide open, or we can do something in between, but there is some solution to the problem.

(See MotivationEnergyAndCommunity for more about the classification scheme used below.)

=== # CommunitySolution ===

Revert spammed pages by hand is often the simplest and most effective solution. It's not necessarily a bad thing to let your users continue to revert spam as it gives them some sense of collective defense of the OnlineCommunity, which will increase their sense of responsibility and attachment. Reverting spam is something everyone can do, even the most squeamish of editors.

This fails when the energy of the community is matched or surpassed by the energy of the spammers. In the face of 'bots, and with the realization that your site may later become a GhostTown, other approaches are needed.

----

== # Background (theory & practice) ==

'''Basic definition.''' When we speak of spam, we usually refer to one of two types: SemanticSpam that encourages us to buy something; and LinkSpam that takes advantage of Google's PageRank algorithm. As we primarily fight LinkSpam on web-based SocialSoftware like wikis, we will primarily talk about LinkSpam here, although some techniques will apply against SemanticSpam as well. LinkSpam is the most common type of spam since it is more profitable to rise higher in the Google rankings and get thousands of potential readers than it is to get the dozens of readers on a wiki. 

'''Methods.''' For the most part, LinkSpam is done ''manually'', as labour is cheap in the spammier parts of the world. Some of the most sophisticated of spammers use robots, often custom tailored to their target, but these people are few since the cost/benefit ratio is high. Most spammers will use an OpenProxy or an AnonymousProxy to avoid HardBan""s of their IP addresses. Some have been known to use ZombieMachine""s, exploiting a security flaw in Windows Remote Desktop. 

'''Dimensions of analysis.''' Spam and our responses must be analyzed in terms of MotivationEnergyAndCommunity. Spam is primarily motivated by economic factors, whereas community is primarily motivated by less tangible, soft, emotional factors. Solutions pit the energy spammers are willing to expend against the energy produced by community goodwill. Mitigating factors boil down often to TechnologySolution""s. In the HardSecurity manner, we can build better shields, better weapons; or in the SoftSecurity manner, we develop better abilities to dodge, absorb, and deflect spam.

'''Communication.''' Because spam is not an attempt at communication, attempting to communicate with a spammer using words or ideas will fail. Spammers are merely interested in the ''act'' of posting links. Consequently, the only way information will transfer from us to a spammer is through ''actions''. Think of it this way: at the point where a conflict has degraded to a fist fight, words are often useless. You must first create physical distance. The same goes for spammers, except they are not in a temporary foul mood, but in ''business'', which means they will not go away unless the cost to them greatly increases or the benefit disappears.

'''Essential problem.''' More traditional methods of increasing their costs, like jail, are impractical short of banning most of the world from using the Internet (which ''is'' already happening). This mostly increases costs on their neighbours, who will hopefully take local action to control the problem. However, since China has a Great Firewall strategy, this is doubtful. Internet-centric ways include: downgrading their rankings in the SearchEngine""s; increasing their labour cost to the point they find cheaper ways of exploiting PageRank; and developing more efficient SearchEngineOptimization methods that do not depend on harassing others on the Internet. (Search relevance is really Google's problem.)

----

== # Solutions ==

=== # Content filtering ===

Because you control the underlying WikiEngine, you can control what content is posted. A theoretically ideal ContentFilter blocks all 'bad' content whilst leaving 'good' content unfettered, but this is impossible as the range of possible content (good or bad) is both infinite and undecidable--you need people to make decisions. Therefore, content filtering becomes a game to identify new 'bad' content as quickly as possible, as well as finding simple patterns that can be scalably exploited to block a wide range of content. In terms of the energy arms race, this is one-for-one in effort with the attacker.

* '''Scale through numbers.''' The decentralized PeerToPeerBanList (e.g. the recent SharedAntiSpam initiative) and the centralized-but-receptive http://www.chongqed.org RegexFilter list pit the mostly good against the few bad apples. An AntiSpamBot (e.g. ThoughtStorms:WikiMinion) can apply some mechanical muscle to small communities or GhostTown""s.

* '''Algorithmic.''' Blocking with a LanguageFilter (e.g. everything Chinese) or a trained BayesianFilter (unproven) can greatly increase your defensive power. False positives are a major problem (e.g. everyone Chinese).

=== # Network control ===

While with cities, you defend borders against neighbours, on the Internet, you defend ports against IP addresses. Defending your server against network attacks seems to be par for the course on the Internet. In terms of the energy arms race, aside from manually blocking IP addresses, this is an advantage in effort over the attacker.

* '''HardBan.''' The traditional approach is to (manually) maintain a BanList of offending IPs. This usually quickly devolves into a RegionalBan against China, which eventually fails due to OpenProxy, AnonymousProxy, and ZombieMachine attacks. False positives are expected as entire countries are casually banned.

* '''Automated defense.''' PortScan""ning editing hosts to see if they are an OpenProxy, and blocking each AnonymousProxy is a good strategy in the same vein as UseRealNames. Some proprietors with ethical reasons to support total anonymity will not like this.

* '''Spider defense.''' SpiderTrap robots before they annihilate you. More extremely, put up a SearchEngineCloak.

* '''SurgeProtector.''' You can directly control the amount of energy any part of the network can inflict on your site. EditThrottling, ViewThrottling, or directly LinkThrottling ShotgunSpam are all good options. Care must be taken not to jointly throttle spam reversion, lest one provide an easy way for spammers to defeat the community.

=== # Anti-energy ===

While a spammer manually editing pages from a browser is hard to detect, many spammers use automated scripts targeting thousands or millions of websites, hiding behind a RotatingProxy to avoid a simple IP ban. This kind of ''energy weapon'', massively increasing the amount of energy the spammer has, is devastating to a CommunitySolution, quickly overwhelming the energy in the community. An ''anti-energy weapon'' prevents such a tactic by ''any'' user, thus remaining notionally in the realm of SoftSecurity. Three techniques have been found extremely efficacious on MeatballWiki:

* '''EditHash.''' Ensure a POST cannot succeed unless it comes from the same IP as a matching GET, preventing the use of a standard RotatingProxy. Other techniques are available for detecting OpenProxies, but this is both effective and cheap. All modern wiki engines should strongly consider adding such a system.

* '''HumanVerification.''' Require that any edit adding URLs to a page pass a simple HumanVerification test. This can be coupled with EditHash if a wiki is commonly used by legitimate RotatingProxy-using members.

* '''Non-triviality.''' Require that any edit passes some measure of non-triviality. For instance, one can insist (a) that the edit summary be non-empty, and (b) not match the page text. This has the beneficial side-effect of forcing users to summarise edits. This catches a class of automated scripts which erroneously POST updates that do not contain URLs (and hence bypass HumanVerification).

=== # Demotivation ===

The most SoftSecurity approach is to eliminate any intrinsic interest the spammer has in attacking. The best way is to stop SearchEngine""s finding or valuing their links. 

* '''NotIndexed.''' You can and should hide your VersionHistory from SearchEngine""s. You can also use an ExternalRedirect or flag all outbound links with NoFollow (and destroy the Web while you're at it). 

* '''DelayedIndex.''' You can delay the time it takes for a link to appear to SearchEngine""s, say by making them NoFollow until a LinkVeto period has passed, or only presenting a StableCopy to a SearchEngine.

* '''HiddenCommunity.''' You can try putting up an EditMask or even more extremely SearchEngineCloak, or use the RobotsExclusionStandard to hide yourself from SearchEngine""s, and thus make yourself an unfindable and unvaluable target.

* '''Cost of participation.''' You can increase the cost of posting by introducing a PricklyHedge, like HumanVerification (e.g. a CaptchaTest). The OpenProxy PortScan also increases the cost (~15 seconds per new host). A good method anywhere but the Internet is to use an AccessFee.

=== # Better peer review ===

Often the best solution is to empower the good guys. A strong CommunitySolution is more resilient and adaptive and fair than any algorithm.

* '''PreemptiveModeration.''' The traditional approach: a volunteer army to vet content.

* '''RecentLinks.''' Provide a specialized RecentChanges for just external links. You can further specialize this by listing only new domains. Necessary with a LinkVeto.

* '''GlobalRevert.''' Decrease the cost of reverting a spam attack to tip the balance back into the good guy's hands. As this is akin to putting guns in everyone's hands, you can do it with more consequences via CitizenArrest.

=== # Offensive action ===

Some people, particularly the fine folks at http://www.chongqed.org, would like to take a more proactive stance towards spam. While this strategy may be mildly worrisome for those who remember how spammers stalked, harassed, and threatened the maintainers of the email RealTimeBlackholeList""s in the 1990s, there are things that we can do that do not require putting our necks on the line.

* '''GoogleBomb.''' http://www.chongqed.org has a strategy of GoogleBomb""ing spammer keywords to point to http://www.chongqed.org.

* '''AntiSpamBot.''' Use SearchEngine""s to find wikis the same way spammers do. Send a bot to automatically revert their spam. Like a virus scanner for the whole Internet. False positives are problematic.

* '''GhostTown list.''' http://www.chongqed.org maintain a list of private GhostTown""s that end up being WildHoneyPot""s. This list could eventually be used by SearchEngine""s to eliminate a large number of spammers from their listings, although this is very dangerous and potentially litigious.

* '''RealtimeBlackholeList.''' http://www.chongqed.org maintains a centralized RegexFilter of spammers, as submitted by volunteers.

* '''PeerToPeerBanList.''' A very decentralized and difficult-to-attack network of RegexFilter lists. Benefits from the same easy scalability as the WebLog community without someone to harassingly phone at 3am. See SharedAntiSpam.

* '''Fight fire with fire.''' We can create websites that SearchEngine""s down-rate, like link farms, and put the spam links up on those sites to trigger any automatic spam detectors. 

* '''Report spam.''' Just report spam links directly to SearchEngine""s.

=== # Erect barriers ===

You can also give up the basic wiki principle of open editing by all and concede that some jerks will spoil the fun for everybody. Strategies to adapt exist on a gradient, fortunately, so you can strike a happy medium.

* '''ShieldsUp.''' During a spam attack, close the site to everyone but trusted editors. (a close relative of FishBowl)

* '''Logins.''' The traditional approach is only as strong as the rate spammers can create new fake email addresses. Creating new logins does offer a SpeedBump, however.
** '''Staged login.''' Everyone can edit, but only CommunityMember""s can post external links. Definition of a CommunityMember may be as simple as those with UserName""s with a CategoryHomePage. (e.g. a FunctionalAccessTrustMetric) This risks creating a culture of screening new members to determine which ones are spammers, a PricklyHedge to membership that may be highly detrimental to the growth of community.

* '''InviteOnly.''' Many strategies revolve around you reaching out to others (cf. UsAndThem).
** '''PrivateCommunity.''' Like many places on the Internet, your wiki is only readable and writable to those invited.
** '''FishBowl.''' Read-only for the public; writable only to those invited.
** '''InvitationClique.''' Only those already invited can invite more people. You can control the rate of growth through economic factors, like invitation tokens. 
** '''Petition.''' Have contributors prove they belong to the social group. This works best in professional associations, like academic communities. You just prove you've written a paper in the field.

* '''Economic.''' Charge a nominal AccessFee for participation. This defeats the spammer at the heart of their motivation, whilst giving you a solid identity (their credit card) to counter-attack. Caveat: payment often obliges you to a contract; caveat: you'll discourage good guys too who often have an even lower economic incentive than spammers for posting. Many people will rightly refuse to give out credit card details on the internet. You also make your server a tempting target for crackers eager to steal money, and risk subsequent litigation costs.

=== # Anti-community weapon ===

An anti-community weapon attempts to cleave a community in two. A Chongqed:TarPit attempts to neatly cleave Them from Us, without telling Them we did so. The two problems here are (a) identifying Them not Us, and (b) not letting on to Them that they've been dropped in a pit. Anti-community weapons can of course be used for other ends: ContentFilter""s are almost always abused to censor political enemies. How to support an AuditTrail without defeating (b) is an open question.

----

== # Wider issues ==

* '''SpammerStalk""ing.''' During the email spam wars, the maintainers of the RealtimeBlackholeList""s were ultimately harassed, threatened, stalked, hacked, and essentially attacked until they backed down. Spam is an economic crime without meaningful consequences. Their victims, on the other hand, have plenty of exposed liabilities. The only secure response to spam is a widely distributed, decentralized one, such as the PeerToPeerBanList.

* '''BanChina.''' We are slowly creating a inverted GreatFirewallOfChina by banning all the normal Chinese as casualties in our war against spammers.

----

== # WikiEngine security standard ==

You can measure the spam-resistance of a wiki by seeing if it meets this minimal standard:

* Basic ReversibleChange and an AuditTrail. e.g. VersionHistory and RecentChanges.

* Non-content pages should be NotIndexed, ''especially'' history pages.
* If you HardBan, OpenProxy and AnonymousProxy defenses. 

CategoryWikiStandard

----

== # See also ==

* WikiPedia:Link_spam
* EmacsWiki:BannedContent
* PhpWiki:WikiSpam
* [http://openwiki.com/ow.asp?WikiSpam OpenWiki:WikiSpam]
* CommunityWiki:WikiSpam
* [http://wiki.s23.org/wiki.pl?WikiSpam s23:WikiSpam]
* http://chongqed.org/ and http://wiki.chongqed.org/
* [http://twiki.org/cgi-bin/view/Codev/WikiSpam TWiki:Codev/WikiSpam]
* Wiki:WikiSpam
* [http://cafoscari.wiki.taoriver.net/moin.cgi/WikiSpam CafoScari:WikiSpam]
* [http://www.istori.com/cgi-bin/wiki?ChangeLog KaminskiWiki:ChangeLog]
* MoinMoin MoinMoin:AntiSpamGlobalSolution MoinMoin:AntiSpamFeatures
* ThoughtStorms:WikiImmuneSystem
* [strategy at rubygarden]
* http://www.jpaulmorrison.com/cgi-bin/wiki.pl?ExperienceWithCaptcha

The above text is PrimarilyPublicDomain. Alternate version that appeared at WikiSym 2005 is available on WikiSpamWorkshop.

----

== # Discussion ==

Aye, edit masking is something strongly on the table. I liken it to how a virus (e.g. the SARS corona virus) changes its protein coat to prevent detection by the immune system.

I'm considering how this will impact legitimate bots, but I think we can have a white list of bots that are allowed to hit a clean API. (again, akin to the immune system). -- SunirShah

: The only problem would be for people who have their browsers set up to report when a page changes. -- ChrisPurcell

They aren't important. It's better to have good RSS feeds than use HTML changes, since the latter is kind of bogus for dynamic community sites. Think about sites with MOTD, fortune cookies, 'who is online' lists, RSS aggregation, etc. -- SunirShah

----

Could the number of newly created pages added to the surge protecting? -- MarkusLude

: If we do that, we'll simply drive the spammer to target existing pages, making reversion and hiding the edits harder. -- ChrisPurcell


----

Anyone tried EugeneEricKim's new Eaton script? http://www.eekim.com/software/eaton/eaton.pl 

-- PhilJones

----

== Living with the Enemy? ==

Instead of fighting spammers by removing spam, forcing an arms race between our attempts to detect spam and their attempts to add it, we could use spam detection simply to ensure existing content remains unaffected. This could be considered more inline with the SoftSecurity approach to handling attacks. It is hopefully unlikely that blackhat "SEO" spammers will deliberately set out to destroy existing content.

'''PageRank''': While living with spam will inherently decrease your rank (since outgoing links devalue internal ones), the hit will probably not be too severe. On the other hand, Google may notice the spam links and blacklist the wiki, in which case the spammers will be wasting their time (but you will disappear from search engines).

'''Readers''': By keeping spam at the bottom of a page, readers will be mostly affected by the increased download times, not by missing content, as currently happens when a page is spammed. Hopefully, this size increase will stabilise as spammers start to overwrite old spam. However, the esteem a wiki is held in, by its readership and by potential new members, may drop significantly if it starts hosting spam.

'''Host''': Becoming a haven for spam may increase storage and bandwidth costs significantly, especially if spammers take advantage of the new relaxed attitude and increase their rate of spam. Many spammers find pages to attack by searching spammy keywords or even their competition's URLs; spam will thus draw spammers like blood draws sharks, further exacerbating this cost. The host may also be legally responsible for the products they are inadvertently promoting.

'''Google''': As the number of such sites grows, there is a reason to really change the page ranking rules. In other words, the problem is shifted to the search engines/spammers. As long as we fight for clean links, the makers of indexing engines have no reason to change the rules. However, given the number of non-wiki spam sites, it seems less likely that a few spammed wikis will fundamentally change the way major search engines work. The existing profusion of spam-filled GhostTown""s backs this up.

Perhaps the most promising application for this approach is PersonalWiki""s, which do not worry about PageRank, cannot afford to spend much time removing spam, and do not attach much importance to the high esteem of potential contributors. However, unless the separation of spam from ham is perfect, real contributions will be lost on RecentChanges, and you might as well just lock the site.

''Authors: RadomirDopieralski, ChrisPurcell, JoeChongq''

=== Discussion ===

To assess the value of the strategy versus taking your wiki out of the search engines altogether or even making it private, it seems necessary to understand the objective of the wiki. Why does it need to make itself available to spammers? And if so, why does it need to be in the search engines? -- SunirShah

: Can we estimate the costs and compare them to the costs of fighting the spam? I think it would be interesting. I didn't see any suggestion for a similar strategy here, so I assumed it wasn't discussed before. I don't know how to estimate the costs, so... -- RadomirDopieralski

I've rewritten the idea above, moving the objections inline. Hopefully I've represented both pros and cons equally. This should make it easier to see the relative gains and costs in each aspect. -- ChrisPurcell

: Now, after a few discussions about the idea, and once the emotions settled down a little, I think you even made it look more promising than it really is. But it's well written, thank you. -- RadomirDopieralski

Apologies if I appeared hot. I really wasn't, merely concise. I get complaints about that sometimes. -- ChrisPurcell

''Your site becomes blacklisted and all spam on it does actually harm to spammers.'' Being blacklisted just takes your PageRank down to zero, as I understand it: thus, spammers are only affected in that they've wasted some time. They don't get harmed. On the other hand, accepting spam ''will'' harm your PageRank, simply by the mechanics of the sytem, as I understand it.

''I believe the "SEO" spammers do not want to destroy the wiki.'' Yes, they do. Otherwise they would add their spam to the bottom of our pages, not overwrite them. You have to take into account that those doing the spamming are often twenty-year-old geeks with nothing better to do with their time, and they take satisfaction in destroying people's work whilest getting paid. -- ChrisPurcell

: I am not so sure many of them truely want to destroy other people's content, they just don't care about anything except money.  Many refuse to see what they are doing as spamming.  And many young teens in third world countries probably are doing it to [http://money.cnn.com/magazines/fortune/fortune_archive/2006/05/29/8378124/ support their entire family] (as they suggest in some of their spams).  From the email spam world, Ryan Pitylak, a "reformed" spammer says he just thought of it "as just a game of cat and mouse with corporate email administrators."  I suspect that carries over to many of the web spammers, it is just a money making game to them.  See the Contact From Spammers section on our [http://wiki.chongqed.org//DiscussSpammers DiscussSpammers] for more insite into their warped minds. -- JoeChongq

: It's much easier to only sent POSTs with your text, instead of GETting the original text, appending your spam to it and POSTing it all. -- RadomirDopieralski

Nice theory, but unfortunately false: we require a unique revision ID to accompany all POSTs to existing pages, to prevent EditConflict""s. It cannot be guessed at, being essentially random. Empirical evidence strongly supports the theory that they are simply doing a GET, editing it by hand, then POSTing it, all via a web browser. (Your theory is still a very nice one. It explains why certain spammers were so big on creating new pages a little while ago: no revision ID needed. We could usefully close that gap.) -- ChrisPurcell

: I think Radomir was thinking in general wiki terms.  I certainly agree that is likely what many are doing on wikis without the kind of protection you have here. -- JoeChongq

Certainly, but that doesn't contradict ''my'' point, which is that spammers ''do'' intentionally delete existing text. -- ChrisPurcell

: Purposly deleting content doesn't make sense from a active wiki stand point, they are deleting legitimate content and angering users.  But for [GhostTown]s, it is necessary since they are competing with other spammers.  Wiping out other spammer links helps them by reducing the links of the competition plus Google punishes sites with too many links to bad neighborhoods so if you are the only spammer on the page it would be better.  They are doing it on purpose, but I don't think they are doing it purposely to maliciously destroy the wikis.  Leaving legitimate content on the wikis would be better for them since Google would less likely identify the site as a GhostTown/bad site, but they can't take that chance since the content may be their competition. -- JoeChongq

Again, my point was simply that spammers ''do'' intentionally delete existing text, whether or not it angers users. -- ChrisPurcell

----

I have added my thoughts above.  My other thoughts are on chongqed's [http://wiki.chongqed.org//WikiForum WikiForum].  A bit of summary from there, spam attracts spam and anything that gets soft on spam is in effect promoting spamming. Webpage owners don't put up wikis, blogs, and guestbooks for the purpose of giving spammers a place to put links.

'''Readers''': Spammers will rarely replace only each other's spam.  If they replace anything it is going to be the entire page.  Normally it is not recognizable from the legitimate article text.  I have seen some using the divs of [http://wiki.chongqed.org//CSSHiddenSpam CSSHiddenSpam] to replace each other's or their own earlier spam (which makes absolutely no sense), but it is not common.

'''Google''': [GhostTown]s already prove this PageRank point.  Because they are heavily spammed, Google does not rank them very well anymore.  Most are found burried deep in search results.  Google is not going to drastically change the way they rank pages.  Overall, it is not a bad system.  The problem is it is vulnerable to abuse, but any ranking based on any popularity measure is open for abuse.  Ranking sites by some measure of popularity is important to help users find what they need.  They can't just throw it out.  And anyway, we had plenty of guestbook spam before PageRank, wikis, and blogs were invented. -- JoeChongq

As mentioned on our WikiForum, this Tolerate Spammers idea was suggested by Mattis.  Here is [http://wiki.chongqed.org//TolerateOrFightSpammers some discussion] in the same area with him from over a year ago. -- JoeChongq

: There was only one point I really wanted to come back on. You said "If they replace anything it is going to be the entire page." That's true. The point of the proposed system, though, is to detect such edits, and preserve the useful content of the page across them. I wasn't sure if you'd got that aspect of the proposal, and if not, how we could best change the text to put it across. -- ChrisPurcell

I had missed that suggestion, but I don't really see how it would work.  Would you be (as this ignore spammers discussion suggests) leaving the spam intact, as well as resurecting the useful content.  If you can identify these page replacement spams, keeping the spam on the page makes no sense.  Whether possible or not, it is just doesn't make sense.  Spam attracts spam so if you leave the spam there even at the bottom of the page out of the way, you are just inviting other spammers to find that page and spam it further. -- JoeChongq

: The point is to avoid an arms race. Spammers will not try to trick your spam-detection algorithm if their spam gets through anyway. This is all stated in the first sentence of the text above. -- ChrisPurcell

My point is that living with spam is stupid.  If you can identify spam enough to segregate it, you can remove it.  Few wikis will go with this "support the spammers so they don't bother us" idea.  Unless a large portion of wikis started using this, why would spammers even notice that you don't remove their edits.  This will only benifit them by making it easier to spam your pages again.  Many spammers find pages to attack by searching spammy keywords or even their competition's URLs.  Leaving their spam on your pages just attracts more spam.  Assuming this was implemented widely, why woudn't spammers just adapt to make sure their edits are not segregated?  Their links would be in the main body while their competition gets stuck at the bottom.  With email, would you perfer having a Spam folder full of 1000 spams or 0?  Going with the live with it theory, either way they don't end up in your inbox so it is the same.

'''Size''': Spam inflated pages may cause problems editing due to browser technological limits as well as being extremely slow for dialup users.  MediaWiki warns on large pages: "some browsers may have problems editing pages approaching or longer than 32kb." -- JoeChongq

: I personally think it's an awful idea. I'm just trying to ensure any arguments against it are ''actually'' against it, not a strawman. For instance, if the page can tell what's spam and what's not, why would it bother putting the spam on the edit form?

:: Alone, size is not much of an arguement, but it is a fact and depending on the implementation may or may not be a strawman (not including the spam on the edit form severely hurts this point).  Even if the edit form does not carry the weight of the extra text, the size of the normal view of the page will be increased.  Many users around the world are still on dialup, have slow/unreliable connections, or pay for download bandwidth.  But now I see you already have a bit of that part in the Readers section.

: "If you can identify spam enough to segregate it, you can remove it." Once again, I say: arms race. Maybe you could check out MotivationEnergyAndCommunity for the longer explanation here. Why would spammers go to the effort of breaking around the anti-spam system when their spam is getting through? There's no incentive; indeed, there's a disincentive, as spamming the main page and leaving their competitions' spam at the bottom decreases the value of their spam, as you've said.

: "Living with spam is stupid." The novelty of the idea is to live with spam. You can't knock it down merely by calling it stupid. If people don't like the premise, they won't use it. ''I'' won't use it. The point of this discussion is to see whether it's viable in the first place.

:: It is not viable, that is why I am attempting to find any way possible to shoot down the idea including pointing out that it is stupid.

: "Why would spammers even notice that you don't remove their edits?" They wouldn't. Is this not clear yet? The point is not to go to the effort of removing spam: just leave it on the page. The other way to avoid removing spam is not to let it on in the first place, but spammers ''do'' notice that, and it causes an arms race, leaving you right back where you started.

:: On an individual wiki basis, few spammers notice anything unless you are actively trying to annoy them (like chongqed is).  They don't know if you remove their spam or if you leave their spam.  It doesn't matter to them, they just keep spamming.  You say "The point is not to go to the effort of removing spam."  With enough protection (which sadly few wikis have by default), there is normally not a lot of spam that gets through.  Any method of segregating spam from real content would have to be based on existing spam prevention methods.  That leaves the only effort saving advantage on reverting spam edits.  Most wikis currently suck at that even with admin priveledges, but rather than implementing the system necessary to give in to spammers, why not improve rollback/revert systems?  As for the arms race, if wiki developers took antispam measures more seriously (built in rather than relying on third party plugins) the race could be ignored by the wiki users and admins.  Regular updates (which even lazy admins should be installing for security purposes) would provide improved spam blocking as well as important security fixes.

: I've copied a significant point of yours to the main text, by the way. Don't think I'm not appreciating your arguments. Merely hoping to help refine them. -- ChrisPurcell

:: I understand you want to discuss this, but as someone who fights spam so intensely, the whole idea is just insane.  It is giving up and it has consiquences beyond the individual wiki.  By not cleaning spam from your wiki, you promote the practice of spamming wikis.  And all the irrelivant spam links on your site to shady businesses damages the effectiveness of search engine ranking systems which is exactly what spammers want.  The pollution of the internet is already horrible with splogs, GhostTowns, etc., why add active wikis to the problem?  Remember the quote at the top of this page "WikiSpam is a wikiwide problem and won't be solved but wikiwide."

:: As for avoiding the arms race, the race is going to go on whether you participate or not.  If you aren't in it, you are going to end up as a collateral damage.  To many spammers this is a game.  Spammers may be slime, but for the few actual spammers that write their own software they are still hackers.  Solving interesting puzzles such as breaking spam protection is not done only to make money, it is a challenge.  There are plenty of targets for spammers on the net that are not protected.  Why then are they attempting to break CAPTCHAs and disguise their edits on well protected sites they should be able to presume are going to quickly remove the spam?  Because they can.  That is the same reason they will attempt to get around this proposed spam segregation.

:: The strongest argument against this is the fact that ''spam attracts spam''.  If you want a constant stream of spammers hitting your page then go ahead.  Some of those spams are going to get posted to the main page whether the spammer is trying to out do your segregation rules or not (no spam identification method is perfect).  And because you are not fighting anymore, users will be less likely to notice and revert spam (or move it to the segregation area) when it does make it through. -- JoeChongq

: Those are good arguments, much more what I was hoping for. I don't agree that spam that slips the filter will be ignored simply because of the filter — after all, RecentChanges will reflect only changes that are considered non-spam. They'll be ignored because the reputation of the wiki will be non-existant, and there'll be no editors. The rest of your points, I agree with. -- ChrisPurcell

:: For the same reason you say the reputation will be non-existant, I say people won't revert spam as carefully (if there are any people).  Assuming there are editors who still care to be involved with the wiki (which I agree is unlikely), their motivation to clean spam won't be as great because normally the wiki is full of spam (even though it is segregated).  If the spam doesn't disrupt the page (which it shouldn't because any spam that sneaks through must be minor or it would have been segregated) it isn't worth the bother checking on each change and possibly cleaning it up.  Not all spam is clearly identifiable as spam. Link substitution, topical keyword linking, or stolen text insertion would all be hard to detect automatically or manually as spam.

:: We have seen a spammer recycling existing text found elsewhere on our WikiForum.  Likely this case was done manually, but it could be automated.  By choosing older text and removing the signature of the original author, the new post looked relatively on topic and well written.  The only reason it was discovered as spam (assuming the URL was not clearly spammy, I don't remember for sure) was that I realized the text seemed familiar.  Manni had written it originally weeks before.

:: By not ruining the page (any non-segregated spam would not), the spam is less visible if it is not noticed right away in Recent Changes.  Users will be less vigillant because it is not destructive. It also means Google continues to find legitimate content, and so the page rank of the victim wiki will not be damaged (which helps the spammer). -- JoeChongq