Author Topic: need help with scripting  (Read 493 times)

Stryker

  • VIP
  • Member
  • ***
  • Posts: 1,258
  • Kudos: 41
need help with scripting
« on: 28 November 2002, 00:35 »
Is it possible to make a script to use wget to download a bunch of pages, and then filter out any swear words or anything, and then save them as their original file names?

Our school has a filter and I want to be able to use this site, it's blocked. So i figured i could use my server to redirect requests to this site. I have one right now kind of. It will use wget -c "$query_string" to get a site. but how would I go about filtering words out of it?

voidmain

  • VIP
  • Member
  • ***
  • Posts: 5,605
  • Kudos: 184
    • http://voidmain.is-a-geek.net/
need help with scripting
« Reply #1 on: 28 November 2002, 02:03 »
Actually this would be *very* easy using Squid and a perl script filter. Look at the Ad Zapper script for an example:

http://adzapper.sourceforge.net/

Now if someone were to post porn images this wouldn't stop it, uless you also put in image blocks to places other than known good image sites. This could also be done in this type of content filter/rewriter.
Someone please remove this account. Thanks...

Master of Reality

  • VIP
  • Member
  • ***
  • Posts: 4,249
  • Kudos: 177
    • http://www.bobhub.tk
need help with scripting
« Reply #2 on: 28 November 2002, 02:17 »
i thought there was a way to use gawk to filter out the words.
Disorder | Rating
Paranoid: Moderate
Schizoid: Moderate
Linux User #283518
'It takes more than a self-inflicted gunshot wound to the head to stop Bob'

voidmain

  • VIP
  • Member
  • ***
  • Posts: 5,605
  • Kudos: 184
    • http://voidmain.is-a-geek.net/
need help with scripting
« Reply #3 on: 28 November 2002, 02:21 »
"sed" (stream editor) is the way to go. That's why I suggested Ad Zapper. The code is already there (Perl has sed-like stream filtering built in). The nice thing about using Squid to do it is it all happens on the fly and it's fast.
Someone please remove this account. Thanks...