John Sample

Bits and Bytes
posts - 103, comments - 354, trackbacks - 16

Reducing .TEXT Blog Spam (Updated)

I had a run in with a blog spammer recently (see here), but there wasn't anyway in .TEXT to block comments by their content.
I didn't want to touch the web code, so I went ahead and hacked something together in the database.

Basically its a table of phrases that will be checked before an entry is made. If there are any hits, the comment won't be posted.
I haven't dug much into the .TEXT internals, so I don't know if this is 100% acccurate, but I tried to make it so it only affects public comments. The blog owner can still use all the blocked words they want.

If you want to use it, you'll need the table in this SQL script:

if not exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[FILTER_WORD]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
 BEGIN
CREATE TABLE [FILTER_WORD] (
 [fw_ID] [int] IDENTITY (1, 1) NOT NULL ,
 [fw_WORD] [varchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
 CONSTRAINT [PK_FILTER_WORD] PRIMARY KEY  CLUSTERED 
 (
  [fw_ID]
 )  ON [PRIMARY] 
) ON [PRIMARY]
END
GO


The table's name is FILTER_WORD with 2 columns fw_ID (incrementing identity) and fw_WORD to hold the word/phrase/whatever you want to block. In my case these are domain names that always appear in the messages.

Now you need to modify one of the stored procedures: blog_InsertEntry
Here is the script in a text file, I don't want to paste the whole thing into this post: http://www.johnsample.com/misc/blog_InsertEntry.txt

Now just fill FILTER_WORD with phrases to block and you're set.

I hacked this thing together in a hurry, so I make no claims as to its suitability, or even that it may screw up and delete your entire DB. All I know is that it works for me, so be careful.

Issues to be aware of:
1. It is case sensitive
2. It will fail if the comment is more than 8000 (or maybe less) characters.
3. It doesn't tell the user why the comment was rejected, or even that the comment was rejected at all.

Now, if you want to test it, I put a test filter in my database: IAMATESTFILTER (Notice I can write it here, but you can't put it in the comments)
I hope this helps everyone!

UPDATE

I fixed the bug that was causing a delay in the posts and stopping email. Rerun the SQL file for the new version (just the stored procedure, no need to rebuild the table).
Filters are also case insensitive now.

UPDATE 2

The filters now check trackback entries.
IP, name, title, and author are also run against the filters.
To block IPs, just them into FILTER_WORD table like all the others.
I was wrong on where the trackbacks come in, this proc will need to be altered also:
http://www.johnsample.com/misc/blog_InsertPingTrackEntry.txt

UPDATE 3

Here's a beta version for Community Server that filters only anonymous posts. It hasn't been tested much (I don't use CS) but from what I can tell it works.
http://www.johnsample.com/misc/cs_weblog_Post_Create.txt
I've only tried it with v1.0.
The only thing that will be filtered are Anonymous posts, modify to your heart's content.
This is the only piece I added in, nothing else was touched:

----------------------------------------
---Filtering

If (@PostAuthor ='Anonymous')
	begin
		declare @temp_text varchar(8000)

		set @temp_text = UPPER(CAST(@Body as varchar(7500))) + UPPER(isnull(@UserHostAddress,'')) + Upper(isnull(@TitleUrl,'')) + Upper(isnull(@PostAuthor,'')) + Upper(isnull(@Subject,'')) + UPPER(CAST(@PropertyValues as varchar(50)))
		
		declare @word_buffer varchar(100)
		declare filterCursor CURSOR LOCAL FAST_FORWARD for 
			select UPPER(fw_WORD) from FILTER_WORD
		OPEN filterCursor
		FETCH NEXT FROM filterCursor INTO @word_buffer
		WHILE @@FETCH_STATUS = 0
			BEGIN
			
				IF (CHARINDEX(@word_buffer,@temp_text) > 0)
					begin
						CLOSE filterCursor
						DEALLOCATE filterCursor
						RAISERROR('Disallowed words or phrases detected. I you feel this message is in error, please contact blog owner.',11,1) 
						RETURN 1
                                                                                                                              
					end
			 FETCH NEXT FROM filterCursor INTO @word_buffer 		
			END
		CLOSE filterCursor
		DEALLOCATE filterCursor
	end
----------------------------------------



posted on Thursday, January 06, 2005 4:39 PM

Feedback

# Other .Text solutions

Here are some more .Text spam blockers:

http://netnerds.net/articles/285.aspx (Trigger)

http://blogs.clearscreen.com/migs/archive/2004/11/02/531.aspx (Captcha)
1/7/2005 8:24 AM | John Sample

# Preventing Blog Spam In .Text

1/10/2005 3:53 PM | A Byte Of Life

# re: Reducing .TEXT Blog Spam

If you can compile the source, another solution is to modify btnSubmit_Click() in the Dottext.Web.UI.Controls.PostComment class to detect the spam and not post it. All I do at the moment is count the number of links in the comment and drop it if it exceeds 10.
1/10/2005 6:56 PM | Ian

# Reducing .Text Blog Spam via SQL

1/11/2005 7:16 AM | STEFANO DEMILIANI WeBlog

# Blog Feedback: Part 2

1/12/2005 1:53 AM | Welcome to Flaphead.com @ Home

# re: Reducing .TEXT Blog Spam

I received an email saying the code above causes a delay in the posts and stops email notifications.
I'm tracking down the bug and will post a fix when I have it.
1/12/2005 7:27 AM | John Sample

# re: Reducing .TEXT Blog Spam (Updated)

The problems have been fixed. Get the new SQL from the text file.
1/12/2005 9:28 AM | John Sample

# Blog Attack in INDC Blog!!!!!!!!!!

1/17/2005 8:10 PM | Agus Kurniawan

# re: I may have to move my blog...

1/19/2005 9:50 AM | Pretty stupid for a smart guy...

# re: Reducing .TEXT Blog Spam (Updated)

This is sooo good. can you update it for Community server?
5/31/2005 4:05 AM | flaphead

# re: Reducing .TEXT Blog Spam (Updated)

If I ever get around to upgrading to community server I'll modify it.
I haven't looked at it much, but I'm surprised there's nothing like this already built in.
6/9/2005 11:00 AM | John Sample

# re: Reducing .TEXT Blog Spam (Updated)

There you go. Feel free to try it out!
6/9/2005 5:36 PM | John Sample

# re: Reducing .TEXT Blog Spam (Updated)

Done the update and testing it now ... fingers crossed
6/10/2005 9:55 AM | flaphead

# re: Reducing .TEXT Blog Spam (Updated)

Not soo keen on the error screen so I commented it out and now it loops.

Would be nice to just do a silent exit do they are none the wiser, so exit with a nice error, like when you post a comment and it says "your comment or rating has been received". you could create a new one to say your comment has been rejected?

dunno how you can do that
6/10/2005 10:02 AM | flaphead

# re: Reducing .TEXT Blog Spam (Updated)

I agree with you on the error message. I just put that in there so I could be sure when a filter hits during testing. Just make sure you don't comment out the "return 1".
When you say "loops", so you mean it gets stuck in some kind of loop?
6/10/2005 10:47 AM | John Sample

# Trackback Spam

6/17/2005 10:41 PM | The Nets Edge

# Blog Spammer SUCK ! - Or, Blocking Spam in .TEXT

2/19/2006 3:05 PM | Joe On .NET

# Blog Spammers SUCK ! - Or, Blocking Spam in .TEXT

2/19/2006 3:07 PM | Joe On .NET

# Blog Spammers SUCK ! - Or, Blocking Spam in .TEXT

Recently I’ve been getting hammered with blog spam on www.JoeOn.net. My blog mirror blogs.msdn.com/joestagner...

# re: Reducing .TEXT Blog Spam (Updated)

You may be having concurency issues on a blog with lots of add comments going on. If you are, change this line:
select UPPER(fw_WORD) from FILTER_WORD

To this line:
select UPPER(fw_WORD) from FILTER_WORD (WITH NO_LOCK)

that way, no locks will be held on the FILTER_WORD table. FAST_FORWARD is supposed to do this, but I have run into issues where it doesn't.
5/17/2006 10:07 PM | Evadman

# re: Reducing .TEXT Blog Spam (Updated)

thankss
9/5/2008 9:02 AM | oyunlar

# re: Reducing .TEXT Blog Spam (Updated)

thanks!!!!!
2/26/2009 7:32 AM | lisa

# re: Reducing .TEXT Blog Spam (Updated)

Great job !!!!
You rules !!!
3/28/2009 6:21 AM | 909

Post Comment

Title  
Name  
Url
Enter the code you see:
Comment