Sunday, July 27, 2014

2013 Journal Metrics data computed from Elsevier's Scopus data


Eugene Garfield first published the idea of analyzing citation patterns in scientific publications in his 1955 Science paper, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas. He subsequently popularized the impact factor metric for journalsand many other bibliographic concepts and founded the Institute for Scientific Information to provide products and services around them.

In the last decade, digital libraries, online publishing, text mining and big data analytics have combined to produce new bibliometric datasets and metrics. Google's Scholar Metrics, for example, uses measures derived from the popular  h-index concept. Microsoft's Academic Search uses a PageRank like algorithm to weigh citations based on the metric for their source.  Thompson Reuters, which acquired Garfield's ISI in 1992, still relies largely on the traditional impact factor in its Citation Index. These new datasets and metrics have also stimulated a lively debate on the value of such analysis and the dangers of putting too much reliance on them.

Elsevier's Journal Metrics site publishes journal citation metrics computed with data from their Scopus bibliographic database, which covers nearly 21,000 titles from over 5,000 publishers in the scientific, technical, medical, and social science fields. Last week the site added data from 2013, using  three measures of a journal's impact based on an analysis of its paper's citations.
  • Source Normalized Impact per Paper (SNIP), a measure of contextual citation impact that weights citations based on the total number of citations in a subject field.
  • Impact Per Publication (IPP), an estimate of the average number of citations a paper will receive in tree years.
  • SCImago Journal Rank (SJR), a PageRank-like measure that takes into account the "prestige" of the citing sources.
We were happy to see that the metrics for the Journal of Web Semantics remain strong, with 2013 values for SNIP, IPP and SJR of 4.51, 3.14 and 2.13, respectively.  Our analysis, described below, shows that these metrics put the journal in the top 5-10% of a set of 130 journals in our "space".

To put these in context, we wanted to compare these to other journals that regularly publish similar papers. The Journal Metrics site has a very limited search function, but you can download all of the data as a CSV file. We downloaded the data, used grep to select out just the journals in the Computer Science category and whose names contained any of the strings web, semantic, knowledge, data, intellig, agent or ontolo. The data for the resulting 130 journals for the last three years is available as a Google spreadsheet.

All of these metrics have shortcomings and should be taken with a grain of salt.  Some, like Elsevier's, are based on data from a curated set of publications with several years (e.g., three or even five) years of data available, so new journals are not included. Others, like Google's basic citation counts, weigh a citation from a paper in Science the same as one from an undergraduate research paper found on the Web.  Journals that publish a handful of very high quality papers each year fare better on some measures but are dominated by publications that publish a large number of articles, from top quality to mediocre, on others.  Nonetheless, taken together, the different metrics offer insights into the significance and utility of a journal's published articles based on citations from the research community.