Tuesday, November 02, 2004

Stopwords

Added a stopword removal feature to the Indexer. It was quick and I didn't get to comment the new code. One thing about the Indexer code base is that it is well commented. Need to come up with a good stopword list. Need to add stemming, namely Potor stemming, soon as well.

I indexed about 167 news articles the webcrawler had gathered from a few news sources. Surprisingly it ran pretty past. Need to still test it with a much larger input file. The speed also depends on the size of each news article as well.

0 Comments:

Post a Comment

<< Home