Stopwords
Added a stopword removal feature to the Indexer. It was quick and I didn't get to comment the new code. One thing about the Indexer code base is that it is well commented. Need to come up with a good stopword list. Need to add stemming, namely Potor stemming, soon as well.
I indexed about 167 news articles the webcrawler had gathered from a few news sources. Surprisingly it ran pretty past. Need to still test it with a much larger input file. The speed also depends on the size of each news article as well.
I indexed about 167 news articles the webcrawler had gathered from a few news sources. Surprisingly it ran pretty past. Need to still test it with a much larger input file. The speed also depends on the size of each news article as well.
0 Comments:
Post a Comment
<< Home