Monday, November 24, 2014

Preprint: Linked Hypernyms: Enriching Dbpedia with Targeted Hypernym Discovery



Tomáš Kliegr, Linked Hypernyms: Enriching Dbpedia with Targeted Hypernym Discovery, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Abstract: The Linked Hypernyms Dataset (LHD) provides entities described by Dutch, English and German Wikipedia articles with types in the DBpedia namespace. The types are extracted from the first sentences of Wikipedia articles using Hearst pattern matching over part-of-speech annotated text and disambiguated to DBpedia concepts. The dataset covers 1.3 million RDF type triples from English Wikipedia, out of which 1 million RDF type triples were found not to overlap with DBpedia, and 0.4 million with YAGO2s. There are about 770 thousand German and 650 thousand Dutch Wikipedia entities assigned a novel type, which exceeds the number of entities in the localized DBpedia for the respective language. RDF type triples from the German dataset have been incorporated to the German DBpedia. Quality assessment was performed altogether based on 16.500 human ratings and annotations. For the English dataset, the average accuracy is 0.86, for German 0.77 and for Dutch 0.88. The accuracy of raw plain text hypernyms exceeds 0.90 for all languages. The LHD release described and evaluated in this article targets DBpedia 3.8, LHD version for the DBpedia 3.9 containing approximately 4.5 million RDF type triples is also available.

Saturday, November 22, 2014

Preprint: Mining Various Semantic Relationships from Unstructured User-Generated Web Data



Pei-Ling Hsu, Hsiao-Shan Hsieh, Jheng-He Liang and Yi-Shin Chen, Mining Various Semantic Relationships from Unstructured User-Generated Web Data, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Absract: With the emergence of Web 2.0, the amount of user-generated web data has sharply increased. Thus, many studies have proposed techniques to extract wisdom from these user-generated datasets. Some of these works have focused on extracting semantic relationships through the use of search logs or social annotations, but only hierarchical relationships have been considered. The goal of this paper is to detect various semantic relationships (hierarchical and non-hierarchical) between concepts using search logs and social annotations. The experimental results demonstrate that our proposed approach constructs adequate relationships.

Friday, November 21, 2014

Preprint: On the Formulation of Performant SPARQL Queries



Antonis Loizou, Renzo Angles and Paul Groth, On the Formulation of Performant SPARQL Queries, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Abstract: The combination of the flexibility of RDF and the expressiveness of SPARQL provides a powerful mechanism to model, integrate and query data. However, these properties also mean that it is nontrivial to write performant SPARQL queries. Indeed, it is quite easy to create queries that tax even the most optimised triple stores. Currently, application developers have little concrete guidance on how to write "good" queries. The goal of this paper is to begin to bridge this gap. It describes five heuristics that can be applied to create optimised queries. The heuristics are informed by formal results in the literature on the semantics and complexity of evaluating SPARQL queries, which ensures that queries following these rules can be optimised effectively by an underlying RDF store. Moreover, we empirically verify the ecacy of the heuristics using a set of openly available datasets and corresponding SPARQL queries developed by a large pharmacology data integration project. The experimental results show improvements in performance across six state-of-the-art RDF stores.

Wednesday, September 24, 2014

Preprint: Lessons Learnt from the Deployment of a Semantic Virtual Research Environment



Peter Edwards, Edoardo Pignotti, Chris Mellish, Alan Eckhardt, Kapila Ponnamperuma, Thomas Bouttaz, Lorna Philip, Kate Pangbourne, Gary Polhill and Nick Gotts, Lessons Learnt from the Deployment of a Semantic Virtual Research Environment, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.


The ourSpaces Virtual Research Environment makes use of Semantic Web technologies to create a platform to support multi-disciplinary research groups. This paper introduces the main semantic components of the system: a framework to capture the provenance of the research process, a collection of services to create and visualise metadata and a policy reasoning service. We also describe different approaches to authoring and accessing metadata within the VRE. Using evidence gathered from data provided by the users of the system we discuss the lessons learnt from deployment with three case study groups.

Tuesday, September 23, 2014

CFP: Special Issue on Knowledge Graphs


JWS Special Issue on Knowledge Graphs


The Journal of Web Semantics invites submissions to a special issue on Knowledge Graphs to be edited by Markus Kroetzsch and Gerhard Weikum. Submissions are due by 28 February March 2015.

Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities. They have become a powerful asset for search, analytics, recommendations, and data integration. Rooted in academic research and community projects such as DBpedia, Freebase, Yago, BabelNet, ConceptNet, Nell, Wikidata, WikiTaxonomy, and others, knowledge graphs are now intensively used at big industrial stakeholders. Examples are the Google Knowledge Graph, Facebook's Graph Search, Microsoft Satori, Yahoo Knowledge, as well as thematically specialized knowledge bases in business, finance, life sciences and more. Many of these knowledge sources are available as Linked Open Data or RDF exports.

The goal of this special issue is to provide a stage for research on recent advances in knowledge graphs and their underlying semantic technologies. Traditional challenges of scalability, information quality, and data integration are of interest, but also specific projects that publish, study, or use knowledge graphs in innovative ways. More specifically, we expect submissions on (but not restricted to) the following topics.
  • Creation and curation of knowledge graphs
    • Automatic and semi-automatic creation of knowledge graphs
    • Data integration, disambiguation, schema alignment
    • Collaborative management of knowledge graphs
    • Quality control: noisy data, uncertainty, incomplete information
    • New kinds of knowledge graphs: common-sense, visual knowledge, etc.
  • Management and querying of knowledge graphs
    • Architectures for managing big graphs
    • Expressive query answering
    • Reasoning with large-scale, dynamic data
    • Data dynamics, update, and synchronization
    • Synthetic graphs and graph benchmarks
  • Applications of knowledge graphs
    • Innovative uses of knowledge graphs
    • Understanding and analyzing knowledge graphs
    • Semantic search
    • Question answering
    • Combining knowledge graphs with other information resources

Guest Editors

  • Markus Kroetzsch (primary contact), TU Dresden, markus.kroetzsch@tu-dresden.de
  • Gerhard Weikum, Max Planck Institute for Informatics, weikum@mpi-inf.mpg.de

Program Committee

Important Dates

We will aim at an efficient publication cycle in order to guarantee prompt availability of the published results. We will review papers on a rolling basis as they are submitted and explicitly encourage submissions well before the submission deadline. Submit papers online at the journal's Elsevier Web site.
  • Submission deadline: 28 February March 2015
  • Author notification: 30 June 2015
  • Final version: 31 August 2015
  • Final notification: 31 October 2015
  • Publication: late 2015/early 2016

Submission guidelines

The Journal of Web Semantics solicits original scientific contributions of high quality. Following the overall mission of the journal, we emphasize the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services.

Submission of your manuscript is welcome provided that it, or any translation of it, has not been copyrighted or published and is not being submitted for publication elsewhere. Manuscripts should be prepared for publication in accordance with instructions given in the JWS guide for authors. The submission and review process will be carried out using Elsevier's Web-based EES system. To ensure that all manuscripts are correctly identified for inclusion into the special issue, it is important that authors select "S.I.: Knowledge Graphs" at the "Article Type" step in the submission process.

Upon acceptance of an article, the author(s) will be asked to transfer copyright of the article to the publisher. This transfer will ensure the widest possible dissemination of information. Elsevier's liberal preprint policy permits authors and their institutions to host preprints on their web sites. Preprints of the articles will be made freely accessible on the JWS preprint server. Final copies of accepted publications will appear in print and at Elsevier's archival online server.

Friday, September 19, 2014

CFP: Special Issue on Geospatial Semantics


Special Issue of the JWS on Geospatial Semantics


The Journal of Web Semantics seeks submissions for a special issue on geospatial semantics to be edited by Yolanda Gil and Raphaël Troncy. Submissions are due by January 31 February 16, 2015.

Geospatial reasoning has an increasingly larger scope in the semantic web. More and more information is geolocated, more mobile devices produce geocoded records, and more web mashups are created to convey geospatial information. Semantics can enable automated integration of geospatial information, and track the provenance of the data shown to an end user. Semantics can also improve visualizations and querying of geospatial data. Semantics can also support crowdsourcing of geospatial data, particularly to track identity through name and property changes over time. Several recent workshops on geospatial semantics have emphasized the interest in the community on these topics. Of note are workshops organized by the World Wide Web Consortium (W3C) and the Open Geospatial Consortium (OGC) indicating a strong interest in standardization efforts in geospatial semantics. This special issue aims to synthesize the recent trends in research and practice in the area of geospatial semantics.

Topics of interest include but are not limited to:
  • Combining semantic information with more traditional representations and standards for geospatial data
  • Exploiting semantics to enhance visualizations of geospatial information
  • Use of semantics to support geospatial data integration and conflation
  • Semantic mashups of geospatial data
  • Semantic provenance of geospatial data (e.g., PROV)
  • Semantics for mobile geospatial applications
  • Geospatial linked open data
  • Managing privacy of personal geospatial data and whereabouts through semantics
  • Combining semantic web standards (W3C) with geospatial (OGC) standards (e.g., GML)
  • Format for representing geographical data (e.g., GeoJSON)
  • Semantics for crowdsourcing geospatial information
  • Semantics for exploiting geospatial information in social network platforms
  • Scalable reasoning with semantic geospatial data
  • Real world applications of semantic geospatial frameworks

Guest Editors

  • Yolanda Gil, Information Sciences Institute, University of Southern California
  • Raphaël Troncy, Multimedia Communications Department, EURECOM

Important Dates

  • Call for papers: September 20, 2014
  • Submission deadline: January 31 February 16, 2015
  • Author notification: mid-April 2015
  • Publication: third quarter of 2015

Submission guidelines

The Journal of Web Semantics solicits original scientific contributions of high quality. Following the overall mission of the journal, we emphasize the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services.

Submission of your manuscript is welcome provided that it, or any translation of it, has not been copyrighted or published and is not being submitted for publication elsewhere. Manuscripts should be prepared for publication in accordance with instructions given in the JWS guide for authors. The submission and review process will be carried out using Elsevier's Web-based EES system. Upon acceptance of an article, the author(s) will be asked to transfer copyright of the article to the publisher. This transfer will ensure the widest possible dissemination of information. Elsevier's liberal preprint policy permits authors and their institutions to host preprints on their web sites. Preprints of the articles will be made freely accessible on the JWS preprint server. Final copies of accepted publications will appear in print and at Elsevier's archival online server.

Wednesday, September 3, 2014

Preprint: SINA: Semantic Interpretation of User Queries for Question Answering on Interlinked Data



Saeedeh Shekarpour, Edgard Marx, Axel-Cyrille Ngonga Ngomo and Sören Auer, SINA: Semantic Interpretation of User Queries for Question Answering on Interlinked Data, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Abstract: The architectural choices underlying Linked Data have led to a compendium of data sources which contain both duplicated and fragmented information on a large number of domains. One way to enable non-experts users to access this data compendium is to provide keyword search frameworks that can capitalize on the inherent characteristics of Linked Data. Developing such systems is challenging for three main reasons. First, resources across different datasets or even within the same dataset can be homonyms. Second, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain user query. Finally, constructing a federated formal query from keywords across different datasets requires exploiting links between the different datasets on both the schema and instance levels. We present Sina, a scalable keyword search system that can answer user queries by transforming user-supplied keywords or natural-languages queries into conjunctive SPARQL queries over a set of interlinked data sources. Sina uses a hidden Markov model to determine the most suitable resources for a user-supplied query from different datasets. Moreover, our framework is able to construct federated queries by using the disambiguated resources and leveraging the link structure underlying the datasets to query. We evaluate Sina over three different datasets. We can answer 25 queries from the QALD-1 correctly. Moreover, we perform as well as the best question answering system from the QALD-3 competition by answering 32 questions correctly while also being able to answer queries on distributed sources. We study the runtime of SINA in its mono-core and parallel implementations and draw preliminary conclusions on the scalability of keyword search on Linked Data.

Monday, September 1, 2014

Preprint: Global Machine Learning for Spatial Ontology Population, Kordjamshidi and Moens


Parisa Kordjamshidi and Marie-Francine Moens, Global Machine Learning for Spatial Ontology Population, Web Semantics: Science, Services and Agents on the World Wide Web, to appear. 

Abstract: Understanding spatial language is important in many applications such as geographical information systems, human computer interaction or text-to-scene conversion. Due to the challenges of designing spatial ontologies, the extraction of spatial information from natural language still has to be placed in a well-defined framework. In this work, we propose an ontology which bridges between cognitive-linguistic spatial concepts in natural language and multiple qualitative spatial representation and reasoning models. To make a mapping between natural language and the spatial ontology, we propose a novel global machine learning framework for ontology population. In this framework we consider relational features and background knowledge which originates from both ontological relationships between the concepts and the structure of the spatial language. The advantage of the proposed global learning model is the scalability of the inference, and the flexibility for automatically describing text with arbitrary semantic labels that form a structured ontological representation of its content. The machine learning framework is evaluated with SemEval-2012 and SemEval-2013 data from the spatial role labeling task.

Thursday, August 28, 2014

Preprint: Theophrastus: On Demand and Real-Time Automatic Annotation and Exploration of (Web) Documents using Open Linked Data


Pavlos Fafalios and Panagiotis Papadakos, Theophrastus: On Demand and Real-Time Automatic Annotation and Exploration of (Web) Documents using Open Linked Data, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Abstract: Theophrastus is a system that supports the automatic annotation of (Web) documents through entity mining and provides exploration services by exploiting Linked Open Data (LOD), in real-time and only when needed. The system aims at assisting biologists in their research on species and biodiversity. It was based on requirements coming from the biodiversity domain and was awarded the first prize in the Blue Hackathon 2013. Theophrastus has been designed to be highly configurable regarding a number of different aspects like entities of interest, information cards and external search systems. As a result it can be exploited in different contexts and other areas of interest. The provided experimental results show that the proposed approach is efficient and can be applied in real-time.

 

Friday, August 15, 2014

New LOD cloud draft includes 558 semantic web datasets



Chris Bizer announced a new draft version of the LOD cloud with 558 linked datasets connected by 2883 linking sets. Last call for new datasets (submit at DataHub) for this version is 2014-08-20.

Sunday, July 27, 2014

2013 Journal Metrics data computed from Elsevier's Scopus data


Eugene Garfield first published the idea of analyzing citation patterns in scientific publications in his 1955 Science paper, Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas. He subsequently popularized the impact factor metric for journalsand many other bibliographic concepts and founded the Institute for Scientific Information to provide products and services around them.

In the last decade, digital libraries, online publishing, text mining and big data analytics have combined to produce new bibliometric datasets and metrics. Google's Scholar Metrics, for example, uses measures derived from the popular  h-index concept. Microsoft's Academic Search uses a PageRank like algorithm to weigh citations based on the metric for their source.  Thompson Reuters, which acquired Garfield's ISI in 1992, still relies largely on the traditional impact factor in its Citation Index. These new datasets and metrics have also stimulated a lively debate on the value of such analysis and the dangers of putting too much reliance on them.

Elsevier's Journal Metrics site publishes journal citation metrics computed with data from their Scopus bibliographic database, which covers nearly 21,000 titles from over 5,000 publishers in the scientific, technical, medical, and social science fields. Last week the site added data from 2013, using  three measures of a journal's impact based on an analysis of its paper's citations.
  • Source Normalized Impact per Paper (SNIP), a measure of contextual citation impact that weights citations based on the total number of citations in a subject field.
  • Impact Per Publication (IPP), an estimate of the average number of citations a paper will receive in tree years.
  • SCImago Journal Rank (SJR), a PageRank-like measure that takes into account the "prestige" of the citing sources.
We were happy to see that the metrics for the Journal of Web Semantics remain strong, with 2013 values for SNIP, IPP and SJR of 4.51, 3.14 and 2.13, respectively.  Our analysis, described below, shows that these metrics put the journal in the top 5-10% of a set of 130 journals in our "space".

To put these in context, we wanted to compare these to other journals that regularly publish similar papers. The Journal Metrics site has a very limited search function, but you can download all of the data as a CSV file. We downloaded the data, used grep to select out just the journals in the Computer Science category and whose names contained any of the strings web, semantic, knowledge, data, intellig, agent or ontolo. The data for the resulting 130 journals for the last three years is available as a Google spreadsheet.

All of these metrics have shortcomings and should be taken with a grain of salt.  Some, like Elsevier's, are based on data from a curated set of publications with several years (e.g., three or even five) years of data available, so new journals are not included. Others, like Google's basic citation counts, weigh a citation from a paper in Science the same as one from an undergraduate research paper found on the Web.  Journals that publish a handful of very high quality papers each year fare better on some measures but are dominated by publications that publish a large number of articles, from top quality to mediocre, on others.  Nonetheless, taken together, the different metrics offer insights into the significance and utility of a journal's published articles based on citations from the research community.

Sunday, July 13, 2014

Preprint: Tailored Semantic Annotation for Semantic Search


Rafael Berlanga, Victoria Nebot and Maria Pérez, Tailored Semantic Annotation for Semantic Search, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

Abstract: This paper presents a novel method for semantic annotation and search of a target corpus using several knowledge resources (KRs). This method relies on a formal statistical framework in which KR concepts and corpus documents are homogeneously represented using statistical language models. Under this framework, we can perform all the necessary operations for an efficient and effective semantic annotation of the corpus.  Firstly, we propose a coarse tailoring of the KRs w.r.t the target corpus with the main goal of reducing the ambiguity of the annotations and their computational overhead. Then, we propose the generation of concept profiles, which allow measuring the semantic overlap of the KRs as well as performing a finer tailoring of them. Finally, we propose how to semantically represent documents and queries in terms of the KRs concepts and the statistical framework to perform semantic search. Experiments have been carried out with a corpus about web resources which includes several Life Sciences catalogues and Wikipedia pages related to web resources in general (e.g., databases, tools, services, etc). Results demonstrate that the proposed method is more effective and efficient than state-of-the-art methods relying on either context-free annotation or keyword-based search.


Wednesday, July 2, 2014

Preprint: Konclude: System Description


Preprint: Andreas Steigmiller, Thorsten Liebig, Birte Glimm, Konclude: System Description, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

This paper introduces Konclude, a high-performance reasoner for the Description Logic SROIQV. The supported ontology language is a superset of the logic underlying OWL 2 extended by nominal schemas, which allows for expressing arbitrary DL-safe rules. Konclude's reasoning core is primarily based on the well-known tableau calculus for expressive Description Logics. In addition, Konclude also incorporates adaptations of more specialised procedures, such as consequence-based reasoning, in order to support the tableau algorithm. Konclude is designed for performance and uses well-known optimisations such as absorption or caching, but also implements several new optimisation techniques. The system can furthermore take advantage of multiple CPU's at several levels of its processing architecture. This paper describes Konclude's interface options, reasoner architecture, processing workflow, and key optimisations. Furthermore, we provide results of a comparison with other widely used OWL 2 reasoning systems, which show that Konclude performs eminently well on ontologies from any language fragment of OWL 2.

Tuesday, July 1, 2014

Preprint: Everything you always wanted to know about blank nodes (but were afraid to ask)


Aidan Hogan, Marcelo Arenas, Alejandro Mallea and Axel Polleres, Everything You Always Wanted to Know About Blank Nodes, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

In this paper we thoroughly cover the issue of blank nodes, which have been defined in RDF as 'existential variables'. We first introduce the theoretical precedent for existential blank nodes from first order logic and incomplete information in database theory. We then cover the different (and sometimes incompatible) treatment of blank nodes across the W3C stack of RDF-related standards. We present an empirical survey of the blank nodes present in a large sample of RDF data published on the Web (the BTC–2012 dataset), where we find that 25.7% of unique RDF terms are blank nodes, that 44.9% of documents and 66.2% of domains featured use of at least one blank node, and that aside from one Linked Data domain whose RDF data contains many "blank node cycles", the vast majority of blank nodes form tree structures that are efficient to compute simple entailment over. With respect to the RDF-merge of the full data, we show that 6.1% of blank-nodes are redundant under simple entailment. The vast majority of non-lean cases are isomorphisms resulting from multiple blank nodes with no discriminating information being given within an RDF document or documents being duplicated in multiple Web locations. Although simple entailment is NP-complete and leanness-checking is coNP-complete, in computing this latter result, we demonstrate that in practice, real-world RDF graphs are sufficiently "rich" in ground information for problematic cases to be avoided by non-naive algorithms.

Sunday, June 29, 2014

JWS ranked highly in the 2014 Google Scholar Metrics data



Google has released its 2014 Google Scholar Metrics data, which estimated the visibility and influence of journals and selected conferences based on citations to articles published in 2009-2013 and indexed in Google Scholar as of mid-June 2013. The primary measure is a publication venue's h5-index, a variation on the popular h-index. Google defines a venue's h5-index as the largest number h such that h articles published in the last five years have at least h citations each. A related measure, h5-median is also computed for a venue as the median number of citations for the articles that make up its h5-index.

Journal of Web Semantics 2014 h5-index was 36 and its h5-median was 62. This puts the JWS among the top venues for the Google-defined category Databases and Information Systems as well as among the top venues whose names contain one of the words web, semantics, knowledge, intelligence or intelligent.

Here are the 36 articles that make up the JWS's h5-index for 2014.

Friday, June 20, 2014

JWS preprint: Identifying Relevant Concept Attributes to Support Mapping Maintenance under Ontology Evolution




Duy Dinh, Julio Cesar Dos Reis, Cédric Pruskia, Marcos Da Silveiraa and Chantal Reynaud-Delaître, Identifying Relevant Concept Attributes to Support Mapping Maintenance under Ontology Evolution, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

Abstract: The success of distributed and semantic-enabled systems relies on the use of up-to-date ontologies and mappings between them. However, the size, quantity and dynamics of existing ontologies demand a huge maintenance effort pushing towards the development of automatic tools supporting this laborious task. This article proposes a novel method, investigating different types of similarity measures, to identify concepts' attributes that served to define existing mappings. The obtained experimental results reveal that our proposed method allows one to identify the relevant attributes for supporting mapping maintenance, since we found correlations between ontology changes affecting the identified attributes and mapping changes.

Friday, June 6, 2014

CFP: Special issue on machine learning and data mining for the Semantic Web



The Journal of Web Semantics seeks submissions of original research papers for a special issue on machine learning and data mining for the Semantic Web dealing with analytical, theoretical, empirical, and practical aspects of machine learning and data mining for all areas of the Semantic Web. Submissions are due by February 15, 2015December 15, 2014.

In the last years, machine learning, as well as data mining approaches have become the main focus of many research works and initiatives related to the Semantic Web and the Web of Data. Challenges imposed by the large scale of Web Data, the uncertainty related to contradictory and incomplete information, and also, by properties and characteristics of Linked Data represent an interesting domain for emerging machine learning and data mining approaches.

For this special issue, we invite high quality contributions from all areas of research that address any aspects of the aforementioned challenges. Topics of interest include but are not limited to the following.
  • Ontology-based data mining
  • Automatic (and semi-automatic) ontology learning and population
  • Distant-supervision (or weak-supervision) methods based on ontologies and knowledge bases
  • Web mining using semantic information
  • Meta-learning for the Semantic Web
  • Cognitive-inspired approaches and exploratory search in the Semantic Web
  • Discovery science involving linked data and ontologies
  • Data mining and machine learning applied to information extraction in the semantic web
  • Big Data analytics involving linked data
  • Inductive reasoning on uncertain knowledge for the Semantic Web
  • Ontology matching and instance matching using machine learning and data mining
  • Data mining and knowledge discovery in the Web of data
  • Knowledge base maintenance using Machine Learning and Data Mining
  • Crowdsourcing and the Semantic Web
  • Mining the social Semantic Web
We solicit contributions that address these challenges, as well as reports on novel applications with the potential to push Semantic Web and machine learning/data mining cooperation forward.

Submission guidelines

The Journal of Web Semantics solicits original scientific contributions of high quality. Following the overall mission of the journal, we emphasize the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services.

Submission of your manuscript is welcome provided that it, or any translation of it, has not been copyrighted or published and is not being submitted for publication elsewhere. Upon acceptance of an article, the author(s) will be asked to transfer copyright of the article to the publisher. This transfer will ensure the widest possible dissemination of information. Manuscripts should be prepared for publication in accordance with instructions given in the JWS guide for authors. The submission and review process will be carried out using Elsevier's Web-based EES system. Final decisions of accepted papers will be approved by an editor in chief.

Final review copies of accepted publications will appear in print and at the archival online server. Author preprints of the articles will be made freely accessible on the JWS preprint server.

Important Dates

  • Call for papers: June 2014
  • Submission deadline: 15 February 2015 15 December 2014
  • Author notification: 30 April 2015
  • Submission deadline for revisions: 15 June 2015
  • Author notification: 1 August 2015

Special issue editors

Wednesday, May 28, 2014

Maria-Esther Vidal and Elena Simperl join JWS editorial board

The Journal of Web Semantics welcomes Drs. Maria-Esther Vidal and Elena Simperl as new members of its editorial board.

Dr. Maria-Esther Vidal is a full-professor in the Computer Science Department of the Universidad Simón Bolívar (Caracas, VE) where also serves as the Assistant Dean for Research and Development in Applied Science and Engineering. Dr. Vidal leads USB Semantic Web Group and has research interests that include publishing and consuming (linked) open data, query rewriting, optimization and execution, ranking, link prediction, and benchmarking and evaluation. She received a Ph.D. in Computer Science from USB in 2000.


Dr. Elena Simperl is a senior lecturer in the Web and Internet Science Research Group at the University of Southampton (Southampton, UK). Her research is in the intersection of semantic technologies and social computing and includes interests in the socially and economically-motivated aspects of creating and using semantically-enabled Web content and how it can be used to foster collaboration and participation. She received a Ph.D. in Computer Science from the Freie Universität Berlin in 2006.

Monday, May 19, 2014

Preprints of papers from the 2012 Semantic Web Challenge


Preprints from the JWS special issue on the 2012 Semantic Web Challenge are available on the preprint server.

Thursday, May 15, 2014

JWS preprint: Querying NeXtProt Nanopublications and their Value for Insights on Sequence Variants and Tissue Expression

New paper on the Journal of Web Semantics preprint server:

Christine Chichester, Pascale Gaudet, Oliver Karch, Paul Groth, Lydie Lane, Amos Bairoch, Barend Mons, Antonis Loizou, Querying NeXtProt Nanopublications and their Value for Insights on Sequence Variants and Tissue Expression, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

Abstract: Understanding how genetic differences between individuals impact the regulation, expression, and ultimately function of proteins is an important step toward realizing the promise of personal medicine. There are several technical barriers hindering the transition of biological knowledge into the applications relevant to precision medicine. One important challenge for data integration is that new biological sequences (proteins, DNA) have multiple issues related to interoperability potentially creating a quagmire in the published data, especially when different data sources do not appear to be in agreement. Thus, there is an urgent need for systems and methodologies to facilitate the integration of information in a uniform manner to allow seamless querying of multiple data types which can illuminate, for example, the relationships between protein modifications and causative genomic variants. Our work demonstrates for the first time how semantic technologies can be used to address these challenges using the nanopublication model applied to the neXtProt data set, a curated knowledgebase of information about human proteins. We have applied the nanopublication model to demonstrate querying over several named graphs, including the provenance information associated with the curated scientific assertions from neXtProt. We show by the way of use cases using sequence variations, post-translational modifications (PTMs) and tissue expression, that querying the neXtProt nanopublication implementation is a credible approach for expanding biological insight.

JWS preprint: Querying a Messy Web of Data with Avalanche, Basca and Bernstein


New paper on the Journal of Web Semantics preprint server:

Cosmin Basca and Abraham Bernstein, Querying a Messy Web of Data with Avalanche, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.


Abstract: Recent efforts have enabled applications to query the entire Semantic Web. Such approaches are either based on a centralised store or link traversal and URI dereferencing as often used in the case of Linked Open Data. These approaches make additional assumptions about the structure and/or location of data on the Web and are likely to limit the diversity of resulting usages.

In this article we propose a technique called Avalanche, designed for querying the SemanticWeb without making any prior assumptions about the data location or distribution, schema-alignment, pertinent statistics, data evolution, and accessibility of servers. Specifically, Avalanche finds up-to-date answers to queries over SPARQL endpoints. It first gets on-line statistical information about potential data sources and their data distribution. Then, it plans and executes the query in a concurrent and distributed manner trying to quickly provide first answers.

We empirically evaluate Avalanche using the realistic FedBench data-set over 26 servers and investigate its behaviour for varying degrees of instance-level distribution "messiness" using the LUBM synthetic dataset spread over 100 servers. Results show that Avalanche is robust and stable in spite of varying network latency finding first results for 80% of the queries in under one second. It also exhibits stability for some classes of queries when instance-level distribution messiness increases. We also illustrate, how Avalanche addresses the other sources of messiness (pertinent data statistics, data evolution and data presence) by design and show its robustness by removing endpoints during query execution.

Thursday, April 10, 2014

JWS preprint: Linked knowledge sources for topic classification of microposts: a semantic graph-based approach


A new preprint is available on the JWS preprint server.

Andrea Varga, Amparo E. Cano, Matthew Rowe, Fabio Ciravegna and Yulan He, Linked knowledge sources for topic classification of microposts: a semantic graph-based approach, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.


Short text messages, a.k.a microposts (e.g., tweets), have proven to be an effective channel for revealing information about trends and events, ranging from those related to disaster (e.g., Hurricane Sandy) to those related to violence (e.g., Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond.

In this work we study the problem of topic classification (TC) of microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of microposts with features extracted only from the microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of microposts.

Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and microposts at a conceptual level, considering the enriched representation of these documents.

Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures.

Saturday, March 29, 2014

W3 CSV on the Web Working Group publishes working drafts



The W3c Data Activity's CSV on the Web Working Group published two first public working drafts. One provides a basic data model for tabular data and metadata and the other describes use cases and requirements derived from them.

Model for Tabular Data and Metadata on the Web

Tabular data is routinely transferred on the web as "CSV", but the definition of "CSV" in practice is very loose. This document outlines a basic data model or infoset for tabular data and metadata about that tabular data. It also contains some non-normative information about a best practice syntax for tabular data, for mapping into that data model, to contribute to the standardisation of CSV syntax by IETF. Various methods of locating metadata are also provided.

CSV on the Web: Use Cases and Requirements

A large percentage of the data published on the Web is tabular data, commonly published as comma separated values (CSV) files. The CSV on the Web Working Group aim to specify technologies that provide greater interoperability for data dependent applications on the Web when working with tabular datasets comprising single or multiple files using CSV, or similar, format. This document lists the use cases compiled by the Working Group that are considered representative of how tabular data is commonly used within data dependent applications. The use cases observe existing common practice undertaken when working with tabular data, often illustrating shortcomings or limitations of existing formats or technologies. This document also provides a set of requirements derived from these use cases that have been used to guide the specification design.

Friday, March 28, 2014

JWS preprint: API-Centric Linked Data Integration: the Open PHACTS Discovery Platform Case Study


Paul Thomas Groth, Antonis Loizou, Alasdair J. G. Gray, Carole Goble, Lee Harland and Steve Pettifer, API-Centric Linked Data Integration: the Open PHACTS Discovery Platform Case Study, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

Data integration is a key challenge faced in pharmacology where there are numerous heterogenous databases spanning multiple domains (e.g., chemistry and biology). To address this challenge, the Open PHACTS consortium has developed the Open PHACTS Discovery Platform that leverages Linked Data to provide integrated access to pharmacology databases. Between its launch in April 2013 and March 2014, the platform has been accessed over 13.5 million times and has multiple applications that integrate with it. In this work, we discuss how Application Programming Interfaces can extend the classical Linked Data Application Architecture to facilitate data integration. Additionally, we show how the Open PHACTS Discovery Platform implements this extended architecture.

Thursday, March 13, 2014

The Web's 25th anniversary


Greeting from Web inventor Tim Berners-Lee on the Web's 25th anniversary

Tim Berners-Lee invites everyone to celebrate the 25th anniversary of the Web and to join the activities organized by the World Wide Web Consortium and World Wide Web Foundation in 2014 and beyond to address some of the threats to the future of the Web. As Berners-Lee says, "Together we have built an amazing Web. But we still have a lot to do so that the Web remains truly for everyone." For more information about Web25 activities, visit webat25.org.

Saturday, February 22, 2014

JWS preprint: ArguBlogging: an Application for the Argument Web


New preprint on the JWS preprint server:

Floris Bex, Mark Snaith, John Lawrence and Chris Reed, ArguBlogging: an Application for the Argument Web, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

In this paper, we present a software tool for 'ArguBlogging', which allows users to construct debate and discussions across blogs, linking existing and new online resources to form distributed, structured conversations. Arguments and counterarguments can be posed by giving opinions on one's own blog and replying to other bloggers' posts. The resulting argument structure is connected to the Argument Web, in which argumentative structures are made semantically explicit and machine-processable. We discuss the ArguBlogging tool and the underlying infrastructure and ontology of the Argument Web.

Thursday, February 20, 2014

JWS Preprint: Streaming the Web: Reasoning over Dynamic Data


A new preprint is available on the JWS preprint server:

Alessandro Margara, Jacopo Urbani, Frank van Harmelen and Henri Bal, Streaming the Web: Reasoning over Dynamic Data, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

In the last few years a new research area, called stream reasoning, emerged to bridge the gap between reasoning and stream processing. While current reasoning approaches are designed to work on mainly static data, theWeb is, on the other hand, extremely dynamic: information is frequently changed and updated, and new data is continuously generated from a huge number of sources, often at high rate. In other words, fresh information is constantly made available in the form of streams of new data and updates. Despite some promising investigations in the area, stream reasoning is still in its infancy, both from the perspective of models and theories development, and from the perspective of systems and tools design and implementation. The aim of this paper is threefold: (i) we identify the requirements coming from dierent application scenarios, and we isolate the problems they pose; (ii) we survey existing approaches and proposals in the area of stream reasoning, highlighting their strengths and limitations; (iii) we draw a research agenda to guide the future research and development of stream reasoning. In doing so, we also analyze related research fields to extract algorithms, models, techniques, and solutions that could be useful in the area of stream reasoning.

Wednesday, January 1, 2014

Data Activity ⊃ Semantic Web ∪ eGovernment



In December the W3C created the Data Activity as the new home for the Semantic Web and eGovernment activities with Phil Archer as its lead. The Data Activity has eight ongoing groups, which include two new ones: CSV on the Web and Data on the Web Best Practices.

Much has changed in the past 15 years when the term Semantic Web was introduced and researchers and developers have  produced a large and mature collection of concepts, languages and technologies. The new, wider focus on exploiting this in support of open data and services is welcome. You can keep track of what the W3C Data Activity is doing on the new Data Activity Blog.