Monday, November 29, 2004

Fixed titled gathering

Fixed the title gathering mechanism. Previous version used the TITLE tag in the HTML file. However, many news sources were not putting the title of the news article in the TITLE tag. Hence, developed a new mechanism which retrieves the news article title from the HTML file of the news article. This is done by looking for specific places where the news article title is embedded.

The main benefit this provies is that it prevent my need to visit each news article induvidually when determining the quality of a particular clustering of news articles. Now, I can just inspect the title of a particular news article and have a good idea on what the article is about. This prevent the need for me to visit each news article individually, which is cumbersome and time consuming.

0 Comments:

Post a Comment

<< Home