Stop Microsoft
Operating Systems => Linux and UNIX => Topic started by: Stryker on 28 November 2002, 00:35
-
Is it possible to make a script to use wget to download a bunch of pages, and then filter out any swear words or anything, and then save them as their original file names?
Our school has a filter and I want to be able to use this site, it's blocked. So i figured i could use my server to redirect requests to this site. I have one right now kind of. It will use wget -c "$query_string" to get a site. but how would I go about filtering words out of it?
-
Actually this would be *very* easy using Squid and a perl script filter. Look at the Ad Zapper script for an example:
http://adzapper.sourceforge.net/ (http://adzapper.sourceforge.net/)
Now if someone were to post porn images this wouldn't stop it, uless you also put in image blocks to places other than known good image sites. This could also be done in this type of content filter/rewriter.
-
i thought there was a way to use gawk to filter out the words.
-
"sed" (stream editor) is the way to go. That's why I suggested Ad Zapper. The code is already there (Perl has sed-like stream filtering built in). The nice thing about using Squid to do it is it all happens on the fly and it's fast.