A.T. Arampatzis, T. Tsoris, C.H.A. Koster, and Th.P. van der Weide. Phrase-based Information Retrieval. In: Information Processing & Management, Nr: 6, Vol: 34, Pages: 693-707, December, 1998.
In this article we describe a retrieval schema which goes beyond the classical information retrieval keyword hypothesis and takes into account also linguistic variation. Guided by the failures and successes of other state-of-the-art approaches, as well as our own experience with the Irena system, our approach is based on phrases and incorporates linguistic resources and processors. In this respect, we introduce the Phrase Retrieval Hypothesis to replace the Keyword Retrieval Hypothesis. We suggest a representation of phrases suitable for indexing, and an architecture for such a retrieval system. Syntactical normalization is introduced to improve retrieval effectiveness. Morphological and lexico≠semantical normalizations are adjusted to fit in this model.
A.H.M. ter Hofstede, and H.A. (Erik) Proper. How to Formalize It? Formalization Principles for Information Systems Development Methods. In: Information and Software Technology, Nr: 10, Vol: 40, Pages: 519-540, October, 1998.
Although the need for formalisation of modelling techniques is generally recognised, not much literature is devoted to the actual process involved. This is comparable to the situation in mathematics where focus is on proofs but not on the process of proving. This paper tries to accomodate for this lacuna and provides essential principles for the process of formalisation in the context of modelling techniques as well as a number of small but realistic formalisation case studies.
Th.P. van der Weide, T.W.C. Huibers, and P. van Bommel. The Incremental Searcher Satisfaction Model for Information Retrieval. In: The Computer Journal, Nr: 5, Vol: 41, Pages: 311-318, 1998.
In this article, the incremental searcher satisfaction model for Information Retrieval is introduced. In this new model, documents are not presented according to decreasing relevancy only, but also on the level of novelty in the context of the documents previously presented. Documents which are judged to be insu∆ciently surprising (according to a searcher determined threshold) are not presented to the searcher. This is especially useful for Information Retrieval in certain contexts (e.g. Internet applications such as search engines), when a searcher does not want all relevant documents to be shown, but only have a global idea of the variety of what the corpus may contain on a topic. Important properties of this model are discussed, such as the relation between the reductional effect and the order of presentation.
B.C.M. Wondergem, P. van Bommel, Th.P. van der Weide, and T.W.C. Huibers. De Elektronische Informatiemakelaar. In: Informatie Professional, Nr: 11, Vol: 2, 1998.
Vernieuwd! Verbeterd! Nu met extra witbeschermer, of zelfs gebaseerd op liposomen. Kreten over wasmiddelen en schoonheids≠cremepjes die aangeven dat het produkt ingrijpend verbeterd is. Of toch slechts simpelweg reclamepraat? Iets gelijks lijkt er aan de hand in de wereld van zoeksystemen en Information Retrieval. Agents! Proaktief! Intelligent! Zijn het loze kreten die alleen goed klinken en goed verkopen? Trouwens, wat zijn agenten eigenlijk? Is er ueberhaupt een werkbare definitie van het begrip agent te geven? Hoe ziet de opbouw van een zoeksysteem gebaseerd op agenten er uit? En wat is nu precies de rol die agenten kunnen spelen in een zoeksysteem? En, leidt het gebruik van agenten echt tot verbetering van zoeksystemen? Een checklist van eigenschappen voor het begrip 'agent'.
B.C.M. Wondergem, P. van Bommel, and Th.P. van der Weide. Nesting and Defoliation of Index Expressions for Information Retrieval. In: Knowledge and Information Systems, Vol: 2, Pages: 33-52, 1998.
In this article, a formalisation of index expressions is presented. Index expressions are more expressive than keywords while maintaining a comprehensible complexity. Index expressions are well-known in Information Retrieval (IR), where they are used for characterising document contents, formulation of user interests, and matching mechanisms. In addition, index expressions have found both practical and theoretical applicability in 2-level hypermedia systems for IR. In these applications, properties of (the structure of) index expressions are heavily relied upon. However, the presupposed mathematical formalisation of index expressions and their properties still lacks. Our formalism is based on the structural notation of index expressions. It is complete in the sense that several notions of subexpressions and defoliation of index expressions are also formalised. Defoliation, which plays an important role in defining properties of index expressions, is provided as a recursively defined operator. Finally, two other representational formalisms for index expressions are compared to ours.
F.C. Berger. Navigational Query Construction in a Hypertext Environment. University of Nijmegen, 1998.
[ Missing PDF ] [ Bibtex ]
J.W.G.M. Hubbers. Object-Oriented Analysis for Multi-Faceted Applications with Distributed Control and Localized Data. University of Nijmegen, 1998, ISBN 9090121242.
[ Missing PDF ] [ Bibtex ]
A Framework of Information Systems Concepts. Edited by: E.D. Falkenberg, A.A. Verrijn-Stuart, K. Voss, W. Hesse, P. Lindgreen, B.E. Nilsson, J.L.H. Oei, C. Rolland, and R.K. and Stamper. IFIP WG 8.1 Task Group FRISCO, IFIP, Laxenburg, Austria, EU, 1998, ISBN 3901882014.
P. van Bommel, and Th.P. van der Weide. Multi Media Information Filtering on the WWW. In: Proceedings of the World Automation Congress, May, TSI Press, Anchorage, Alaska, USA, 1998.
The focus of this paper is information filtering on the World Wide Web. A conceptual model for characterizing information objects is discussed. The information to be filtered may be of various kinds, including images and sound. The central idea behind the framework is that Information Modelling techniques (e.g. NIAM, ER, OO) should be used for the characterization of information objects to be retrieved. This brings together two worlds which were separated for a long time: Information Modelling (IM) and Information Retrieval (IR, or: document retrieval). Although IM is in most cases used for traditional (non≠document) databases such as relational databases (e.g. SQL), our Information Filtering project PROFILE applies these techniques in order to obtain different characterization levels for information objects.
F.J.M. Bosman, P.D. Bruza, Th.P. van der Weide, and L.V.M. Weusten. Documentation, Cataloging and Query by Navigation: A Practical and Sound Approach. In: Research and Advanced Technology for Digital Libraries, 2nd European Conference on Digital Libraries `98, ECDL `98, Heraklion, Crete, Greece, EU, Edited by: C. Nikolaou, and C. Stephanidis. Lecture Notes in Computer Science, Vol: 1513, Pages: 459-478, September, Springer, 1998.
In this paper we discuss the construction of an automated information system for a medium-sized collection of visual reproductions of art objects. Special attention is payed to the economical aspects of such a system, which appears to be mainly a problem of data entry. An approach is discussed to make this feasible, which also strongly provokes consistency between descriptions.
Another main target of such a system is the capability for effective disclosure. This requires a disclosure mechanism on descriptions which is easy to handle by non technical users. We show the usefulness of query by navigation for this purpose. It allows the searcher to stepwise build a query in terms of (semi-)natural language. At each step, the searcher is presented with context sensitive information.
The resulting system is described and we discuss an experiment of its use.
[ Missing PDF ] [ Bibtex ]
J.W.G.M. Hubbers, and A.H.M. ter Hofstede. Exploring the Jungle of Object-Oriented Conceptual Data Modeling. In: Proceedings of the 9th Australasian Database Concefernce, ADC`98, Perth, Western Australia, Australia, Edited by: Chris McDonald. Australian Computer Science Communications, Vol: 20(2), Pages: 65-76, February, Springer, 1998.
[ Missing PDF ] [ Bibtex ]
T.W.C. Huibers, and B.C.M. Wondergem. Towards an Axiomatic Aboutness Theory for Information Retrieval. In: Information Retrieval, Uncertainty and Logics - Advanced Models for the Representation and Retrieval of Information, Edited by: F. Crestani, M. Lalmas, and C.J. van Rijsbergen. Kluwer, Deventer, The Netherlands, EU, 1998.
[ Missing PDF ] [ Bibtex ]
V. Kamphuis, and J.J. Sarbo. NaturalLanguage Concept Analysis. In: Proceedings of NeMLaP3/CoNLL98: International Conference on New Methods in Language Processing and Computational Natural Language Learning, ACL, Edited by: D.M.W. Powers. Pages: 205-214, 1998.
[ Missing PDF ] [ Bibtex ]
B.C.M. Wondergem, P. van Bommel, and Th.P. van der Weide. Construction and Applications of the Association Index Architecture. In: Proceedings of the Conferentie Informatiewetenschap (CIW`98), December, 1998.
Information Discovery (ID) is the synthesis of Information Retrieval (IR) and Information Filtering (IF). In ID, information brokers act as intermediaries between users and sources. Information about user interests and document content can be modeled by 2-level hypermedia representations. These representations allow navigational mechanisms which have proven their effectiveness in IR applications.
Information brokers should thus combine two 2-level hypermedia representations to obtain an overall information structure necessary for the synthesis of IR and IF. For this, we propose the so called Association Index Architecture (AIA) which consists of two 2-level hypermedia representations which are connected thorugh a third level which is coined the association index. The AIA this forms a 3-level hypermedia representation. Information brokers can perform their actions in the AIA to implement their IR anf IF related tasks. The AIA is shown to be a general symbolic architecture for combining knowledge by illustrating how a number of ID applications can be performed in it.
B.C.M. Wondergem, P. van Bommel, T.W.C. Huibers, and Th.P. van der Weide. Agents in Cyberspace - Towards a Framework for Multi-Agent Systems in Information Discovery. In: Proceedings of the 20th BCS-IRSG Colloquium on IR Research, CLIPS-IMAGGrenoble, France, EU, 1998.
This article proposes a formal framework for Multi-Agent Systems in the context of Information Discovery. Information Discovery is a synthesis of Information Retrieval and Information Filtering. The Information Discovery Paradigm is given. In addition, the different types of agents needed in Information Discovery applications are described in terms of the operations they support and the knowledge and the information they use. A correct filtering topology, consisting of sound filter paths, is identified. Three fields are identified in which Information Retrieval and Information Filtering benefit from their synthesis: query expansion, query generation or autonomous IR, and profile adaption.
B.C.M. Wondergem, P. van Bommel, T.W.C. Huibers, and Th.P. van der Weide. Domain Knowledge in Preferential Models. In: Proceedings of the third Baltic Workshop DB&IS98, Edited by: J. Barzdins. Vol: 1, Pages: 126 - 138, April, 1998.
In Information Retrieval, user preferences and domain knowledge play an important role. This article shows how to incorporate domain knowledge in a logical framework and provides a mechanism to exploit user preferences for personalizing domain knowledge, based on the inferences made in the matching functions. The matching functions are essentially symbolic logical inferences. The logic used in this article is that of Preferential Models, which is augmented with domain knowledge by providing an enriched aboutness relation. However, the techniques described in this article are applicable to other logics as well. A way to personalize the domain knowledge is given, which also gives the user insight into the workings of the matching functions. In addition, sound inference rules, which are tailor-made for the domain knowledge, are provided.
B.C.M. Wondergem, P. van Bommel, and Th.P. van der Weide. Cumulative Duality in Designing Information Brokers. In: Proceedings of the 9th International Conference on Database and Expert Systems Applications (DEXA), August, 1998.
The focus of this paper is information brokers within Information Discovery (ID). We describe Cumulative Duality matrices, an instrument to deal with design criteria for such information brokers. ID is the synthesis of Information Retrieval and Information Filtering, where information brokers act as middle≠agents. There are numerous design criteria for information brokers. Since these stem from ID, they exhibit a dual nature. The duality of the criteria is shown to be cumulative. In the form of a matrix, cumulative duality can be used as a design instrument for information brokers.
B.C.M. Wondergem, P. van Bommel, T.W.C. Huibers, and Th.P. van der Weide. Opportunities for Electronic Commerce in Information Discovery. In: Proceedings of the International IFIP/GI Working Conference on Trends in Distributed Systems for Electronic Commerce, TrEC 98, Edited by: F. Griffel, T. Tu, and W. Lamersdorf. Pages: 126-136, June, 1998.
This article investigates the connection between Electronic Commerce (EC) and Information Discovery (ID). ID is the synthesis of distributed Information Retrieval and Information Filtering, filled in with intelligent agents and information brokers. Currently, no link exists between EC and ID. We argue that this link consists of a cost model for ID. We therefore propose several (types of) cost models, which enable application of EC to the whole of ID. This is illustrated with examples.
P.A. Jones, C.H.A. Koster, P. van Bommel, and Th.P. van der Weide. Critical Reference Counting. Technical report: CSI-R9825, Information Systems Group, Computing Science Institute, University of Nijmegen, The Netherlands, 1998.
[ Missing PDF ] [ Bibtex ]
T.A. Barrett, and H.A. (Erik) Proper. Component Based Solutions Under Architecture. Technical report, Spring, Origin, Utrecht, The Netherlands, EU, 1998.
Many of today's applications have an, almost tangible, monolithic nature. They are built as 'islands', purporting to be self contained, offering little or nothing in the way of integration with other applications. In the past, being large and self-contained may have eliminated the need to interact with other solutions to some extent. However, in the business environments of today the interaction with other applications becomes paramount. As a result of this, many ad-hoc point-to-point integration solutions have been built between different applications. This has already led to an 'application spaghetti' at many of our customer sites. Many of today's applications are poorly structured, which makes their responsiveness to business change sluggish. The application spaghetti with its plethora of point-to-point interfaces further inhibits the responsiveness to change.
In this paper we present a two pronged approach to tackle these issues. Firstly, we outline an architectural approach to the development of component-based business solutions. Secondly, we propose a reference architecture to help in actually realising such solutions. The architectural approach to system development aims to provide a way to provide better control of a components environment.
F.A. Grootjen, V. Kamphuis, and J.J. Sarbo. A Genealogy of Phrase Structure. Technical report: CSI-R9814, University of Nijmegen, 1998.
In standard approaches to NLP phrase structure is usually specified by a grammar. The coverage of such a grammar, especially in the case of performance data is often insufficient; the grammar is subject to frequent modifications and this can bring about maintainabilitu problems. In this paper we describe a releational basis underlying phrase structure. Based on that, a model is defined that yields hierarchical structure as the result of more abstract principles related to the combinatorial properties of linguistic units. The model is simple in use and easy to maintain, and provides an important key to the description of non-phrase structure configurations that require greater flexibility.
[ Missing PDF ] [ Bibtex ]
F.A. Grootjen. NLCA: Towards an algorithmic implementation. Technical report: CSI-R9811, University of Nijmegen, 1998.
In most mainstream approaches to natural language modelling, some form of hierarchical structure (e.g. phrase structure) plays a central role. However, practical application of phrase structure-based parsers in natural language processing has enjoyed only limited success. One reason for this lies in the rigidity of hierarchical structure on the one hand, as opposed to the high flexibility of language use on the other. The relative lack of success of rule-based parsers has inspired a search for alternative methods, such as statistically based or lexicon-driven parsing.
In search for a solution the NLCA project took one step back, and examined the nature of hierarchical structure in general, and phrase structure in particular. It looked for ways to derive hierarchical structure from input, and to incorporate it in a mathematically well-founded theory of knowledge representation. The result is an approach in which hierarchical structure is found as the yield of the interaction between different, inherent combinatorial properties of linguistic units. The model identifies three different basic relations that underlie these combinatorial properties, at a level of abstraction that, in principle, allows language-independent modelling and analysis. The structural analysis of input is mapped onto formal concepts in the sense of lattice theory and in this way creates a suitable environment for information retrieval.
This paper summarises the basic ideas of NCLA and presents a sketch of an algorithm that implements NLCA for the English language.
[ Missing PDF ] [ Bibtex ]
H.A. (Erik) Proper. Da Vinci - Architecture-Driven Business Solutions. Technical report, Summer, Origin, Utrecht, The Netherlands, EU, 1998.
This document has emerged out of Origin's past experiences with architecture-driven application development (AD2), and the need to further formalise and consolidate these experiences.
The AD2 related developments range in scope from the actual design and implementation of applications, to the development of a long-term vision of an organisation's business activities and IT support required. The main concern of AD2 is the development of applications to support an organisation's business activities, by considering the entire context of the applications.
Th.P. van der Weide, and P. van Bommel. Individual and collective approaches for searcher satisfaction in IR. Technical report: CSI-R9819, Information Systems Group, Computing Science Institute, University of Nijmegen, The Netherlands, 1998.
The incremental searcher satisfaction model for Information Retrieval has been introduced to capture the relevancy of documents under consideration of documents previously presented. In this paper, different approaches for the construction of increment functions are identified, such as the individual and the collective approach. The requirements posed by these approaches are examined and evaluated with respect to well-known similarity measures used in IR, such as Inclusion, Jaccard's, Dice's, and Cosine coefficient.
B.C.M. Wondergem, P. van Bommel, and Th.P. van der Weide. Association Index Architecture for Information Brokers. Technical report: CSI-R9820, July, University of Nijmegen, 1998.
Information Discovery (ID) is the synthesis of Information Retrieval (IR) and Information Filtering (IF). In ID, broker agents act as intermediaries between user agents and source agents. Information about user interests and documents in sources can be modeled by 2-level hypermedia representations. These representations allow navigational mechanisms which have proven their effectiveness in IR applications. Broker agents should thus combine two 2-level hypermedia representations to obtain an overall information structure necessary for the synthesis of IR and IF. For this, we propose the so called Association Index Architecture (AIA) which consists of two 2-level hypermedia representations which are connected through a third level which is coined the association index. The AIA thus forms a 3-level hypermedia representation. Broker agents can perform actions in the AIA to implement their IR and IF related tasks. The AIA is shown to be a general symbolic architecture for combining knowledge by illustrating how a number of ID applications can be performed in it.
B.C.M. Wondergem, P. van Bommel, and Th.P. van der Weide. Boolean Index Expressions for Information Retrieval. Technical report: CSI-R9827, December, University of Nijmegen, 1998.
Keywords still seem to form the basis for document content and query representation. Approaches to use more advanced linguistic structures, such as noun phrases, still are in an experimental phase. In addition, Boolean descriptor languages have often been applied for Information Retrieval. However, the synthesis of logic and linguistics in one descriptor language still is an open issue. In this paper, Boolean index expressions, combining Boolean logic and linguistic structure, are proposed as a good balance between expresiveness and practical issues. Boolean index expressions are obtained by augmenting regular index expressions with logical operators for disjunction, conjunction, and negation. Boolean index expressions are more expressive than both index expressions and the Boolean query language based on keywords. They allow a compact representation of logical combinations of index expressions. In addition, Boolean index expressions are still efficiently parsible and their meaning can be deter- mined through their structure. It is shown how Boolean index expressions can be brought into normal form, allowing fast numerical matching. Matching strategies for Boolean index expressions are obtained by adapting matching strategies for index expressions by providing a case for negations. Our implementation of Boolean index expressions illustrates mentioned issues.
H.A. (Erik) Proper. Flexibiliteit van informatiemodellen. In: Informatie, Nr: 4, Vol: 40, Pages: 28-33, 1998, In Dutch.
Heden ten dage bevinden de meeste organisaties zich in een dynamische omgeving. Deze dynamiek dwingt een hoge mate van flexibiliteit van de organisaties af: 'evolve or die'. Het wegvallen van protectionisme, de deregulering van het internationale handelsverkeer, de invoering van nieuwe technologie, de privatisering van staatsbedrijven en de invoering van de Euro zijn allemaal voorbeelden van aspecten die deze dynamiek teweeg brengen.
In de context van informatiemodelleren is het daarom relevant eens stil te staan bij de flexibiliteit van informatiemodellen. Immers, informatiemodellen worden gebruikt om de structuur van informatiesystemen vast te leggen. Wanneer we te maken hebben met een hoge mate van dynamiek, dan zal dit zo zijn effecten hebben op deze onderliggende informatiemodellen. Ontwerpers van informatiemodellen ontwerpen voor een deel hun eigen onderhoudslast, getuige de vele conversie-, jaar 2000- en Euro-problemen. Een informatiemodel dat vandaag van toepassing is, kan morgen alweer verouderd zijn. Het is als schieten op een bewegend doelwit. In dit artikel gaan we kort in op deze problematiek. Wat zijn de te verwachten knelpunten? Hoe moeten we daar in de praktijk nu mee omgaan? Dit artikel verkent een aantal van deze knelpunten en mogelijke oplossingsrichtingen.
H.A. (Erik) Proper. Kennismanagement onder architectuur. In: ID Nieuws, Vol: 2, Pages: 5-7, November, 1998, In Dutch.
Kennismanagement is het jongste programma van ID Research. In dit artikel wordt u een nadere kennismaking met de visie van ID Research op kennismanagement geboden. Een belangrijk kenmerk van deze visie is dat informatie- en communicatietechnologie (ICT) 'slechts' als een enabler van kennismanagement wordt beschouwd, en dat kennismanagement duidelijk meer behelst dan een Intranet- of een groupware-applicatie.