Monday, November 24, 2014

Preprint: Linked Hypernyms: Enriching Dbpedia with Targeted Hypernym Discovery



Tomáš Kliegr, Linked Hypernyms: Enriching Dbpedia with Targeted Hypernym Discovery, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Abstract: The Linked Hypernyms Dataset (LHD) provides entities described by Dutch, English and German Wikipedia articles with types in the DBpedia namespace. The types are extracted from the first sentences of Wikipedia articles using Hearst pattern matching over part-of-speech annotated text and disambiguated to DBpedia concepts. The dataset covers 1.3 million RDF type triples from English Wikipedia, out of which 1 million RDF type triples were found not to overlap with DBpedia, and 0.4 million with YAGO2s. There are about 770 thousand German and 650 thousand Dutch Wikipedia entities assigned a novel type, which exceeds the number of entities in the localized DBpedia for the respective language. RDF type triples from the German dataset have been incorporated to the German DBpedia. Quality assessment was performed altogether based on 16.500 human ratings and annotations. For the English dataset, the average accuracy is 0.86, for German 0.77 and for Dutch 0.88. The accuracy of raw plain text hypernyms exceeds 0.90 for all languages. The LHD release described and evaluated in this article targets DBpedia 3.8, LHD version for the DBpedia 3.9 containing approximately 4.5 million RDF type triples is also available.

Saturday, November 22, 2014

Preprint: Mining Various Semantic Relationships from Unstructured User-Generated Web Data



Pei-Ling Hsu, Hsiao-Shan Hsieh, Jheng-He Liang and Yi-Shin Chen, Mining Various Semantic Relationships from Unstructured User-Generated Web Data, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Absract: With the emergence of Web 2.0, the amount of user-generated web data has sharply increased. Thus, many studies have proposed techniques to extract wisdom from these user-generated datasets. Some of these works have focused on extracting semantic relationships through the use of search logs or social annotations, but only hierarchical relationships have been considered. The goal of this paper is to detect various semantic relationships (hierarchical and non-hierarchical) between concepts using search logs and social annotations. The experimental results demonstrate that our proposed approach constructs adequate relationships.

Friday, November 21, 2014

Preprint: On the Formulation of Performant SPARQL Queries



Antonis Loizou, Renzo Angles and Paul Groth, On the Formulation of Performant SPARQL Queries, Web Semantics: Science, Services and Agents on the World Wide Web, to appear.

Abstract: The combination of the flexibility of RDF and the expressiveness of SPARQL provides a powerful mechanism to model, integrate and query data. However, these properties also mean that it is nontrivial to write performant SPARQL queries. Indeed, it is quite easy to create queries that tax even the most optimised triple stores. Currently, application developers have little concrete guidance on how to write "good" queries. The goal of this paper is to begin to bridge this gap. It describes five heuristics that can be applied to create optimised queries. The heuristics are informed by formal results in the literature on the semantics and complexity of evaluating SPARQL queries, which ensures that queries following these rules can be optimised effectively by an underlying RDF store. Moreover, we empirically verify the ecacy of the heuristics using a set of openly available datasets and corresponding SPARQL queries developed by a large pharmacology data integration project. The experimental results show improvements in performance across six state-of-the-art RDF stores.