[HTVa11] Everything you would like to know about RSS feeds and you are afraid to ask

Conférence Nationale avec comité de lecture : BDA'11, Base de Données Avancées, October 2011, pp.1--20, Rabat, Maroc,

Mots clés: RSS, RSS Statistics, Publication activity, Items structure & length, textual vocabulary composition & evolution

Résumé: We are witnessing a widespread of web syndication technolo- gies such as RSS or Atom for a timely delivery of frequently updated Web content. Almost every personal weblog, news portal, or discussion forum employs nowadays RSS/Atom feeds for enhancing traditional pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Social media applications such as Twitter or Facebook also employ RSS for notifying users about the newly available posts of their preferred friends (or followees). Unfortunately, previous works on RSS/Atom statistical characteristics do not provide a precise and up- dated characterization of feeds’ behavior and content, characterization which can be used to successfully benchmark effectiveness and efficiency of various RSS/Atom processing/analysis techniques. In this paper, we present the first thorough analysis of three complementary features of real-scale RSS/Atom feeds, namely, publication activity, items structure and length, as well as, vocabulary of the textual content which we believe are crucial for Web 2.0 applications.

Equipe: isid , vertigo
Collaboration: FORTH-ICS


