| Concept-based and context-sensitive searching. | January 1998 |
Motivation *
Definition of Structure *
Protocol Requirements *
Protocol Examples *
User Interface Suggestions
I. Motivation
Complex concepts are intrinsic to the way people think. A concept may be represented to a person by a single word, but in fact contains assumptions about the building blocks that concept is made of and the relationship of that concept to a framework of knowledge.
An effective search engine should allow users to represent a specific concept by a single word, to build up more complex concepts from simpler ones, and to assign a concept to a position in an existing framework.
A good search scheme should provide an extensible set of frameworks with mechanisms to assign information-containing sites to positions in the frameworks. Users should be able to choose an appropriate framework and search in broad or narrow context areas.
Our search scheme should provide a mechanism to specify input and output formats, and for specific sites have their inputs & outputs mapped to equivalent standard formats if applicable.
Our search scheme should provide identifiable levels of reliability checking, so that the user can choose what tradeoff to make.
II. Structural Goals & Examples.
An example of a concept might be A = "The state of Arizona" or B = "Trends of employee wages in the 1990's" This allows us to define A + B = "Trends of employee wages in the 1990's in the state of Arizona" Note, concept B itself may have been built from smaller concepts, "Wages" + "Economic Trends" + "1990's"ii. A Framework is an organized graph of concepts. It may be hierarchical, such as the Geography framework: World breaks down into countries, each of which breaks down into states, each of which breaks down into regions.
It may also have other internal structure. World may break down into both countries and river valleys, and the river valleys each map to one or more countries.
iii. Frameworks and concepts within them can be assigned addresses. Concepts can be mapped to points in various frameworks.
i. Methods consist of lists of field names and types for input and output
ii. Standard methods can be defined and assigned addresses.
iii. Specific servers or users may map a custom method onto a standard one.
For example, the General Product method might contain inputs and outputs of Title, Description, Availability, Location and Price.
The 'Clothing Product' method might contain additional inputs and outputs of Clothing_Type, Color, Size, and Male_Female
i. A particular 'context search' may specify a framework, a concept within that framework, and a method for input and for output.
In our search protocol, the search message starts with
-------- --------------- -------- 8-bytes 16 bytes 8 bytes framework concept method address address address
only the framework addresses would need to be agreed upon globally. The concept and method addresses would be set by the server or servers supporting each framework.
Some methods may also be assigned globally, but this would be only a small subset of the possible methods addressable.
i. Some frameworks should be general in scope, well-maintained, with reviewed additions.
ii. Some frameworks should be specific in scope, well-maintained, with reviewed additions. Specific frameworks may map to points within general frameworks.
iii. Some frameworks may be open to all additions
iv. Some frameworks may have 'trusted members' who may freely add at X level. Hierarchical frameworks particularly may define 'levels of trust' needed to add to various levels of the hierarchy.
There may exist a 'ring' of mutually trusted members who jointly maintain a framework.
v. There may exist mappings between frameworks.
i. There should exist 'master servers' that know the existence of frameworks, concepts, and the locations of many pieces of information
ii. There should exist 'site servers' that recognize certain concepts and can accept queries and return information in specific formats.
iii. There will exist 'dumb servers' that do not recognize any concepts or frameworks as defined here, but are capable of performing a search. 'mapping servers' may act as site servers on behalf of these 'dumb servers'.
III. Communication/Protocol Requirements
The general search message always starts with the 32-byte string 8-bytes 16-bytes 4-byte 4-byte depends on input method -------- --------------- ---- ---- -------------------------- fwork id concept id input output rest of input method methodA few input and output methods will be 'reserved words' to specify types of queries, such as those for a 'master server' listed above. For example,
ffffffff 0000000000000000 0001 0001 xxxxxxxmay specify a string match query, output to be a concept list, for a concept in framework ffffffff matching string xxxxxxx
ffffffff cccccccccccccccc 0002 0002may specify input by concept id, output to be 'yes' or 'no' depending on whether this master server has information on concept ccccccccc.... in framework ffffffff
ffffffff cccccccccccccccc 0002 0003may specify input by concept id, output to be a list of 'site servers'
ffffffff cccccccccccccccc 0003 0004 xxxxxxxxxxxxxxxxxmay specify input by concept id and site server address, output a list of input parameters accepted by that site server. ccccccccccccccc = concept id, xxxxxxxxxxxxxxxxxx = site server address.
ffffffff cccccccccccccccc 0003 0005 xxxxxxxxxxxxxxxxxxmay specify input by concept id and site server address, output a list of output parameters offered by that site server.
A physical server may maintain/mirror multiple frameworks.
V. User Interface Suggestions
A. A client may download an entire hierarchy or framework for fast
browsing.
B. A client may map user-defined concepts to specific points in a framework, and to specific input/output formats.
C. A search may contain specific concepts and strings for full-text comparison.
D. A client may make parallel queries of multiple 'site servers' simultaneously.
E. Users may pick favorite frameworks and master servers to start queries at.
Example: An indoor gardening hobbyist searches for Quality: High Context: plant Search string: winter carewhere "plant", for this user, defines the specific context of 'indoor potted plants' and "winter care" is a string for full-text comparison. The result will be a list of known highly-maintained sites about 'indoor potted plants' containing the string "winter care". If no results are returned, the user can lower the required quality or change the search string.
It is assumed in this search that the word "plant" has already been defined to mean a particular concept in a particular framework downloaded from an existing 'master server'.
A novice user who needs to do a search for the first time, must first locate a master server that contains information about the concept he is interested in. This search would take longer, but once the master server is found, future searches in this general topic area will be efficient.
The novice user's search might go like this:
String: 'plant'
Results:
User chooses "Hobbies/Gardening/Plants" and requests more specific categories.
Results:
User chooses "Indoor Plants" and clicks a button to link the word "plant" to the concept "Hobbies/Gardening/Plants/IndoorPlants" from this framework.
Now he can do his search...
This scheme supposes at least a few good master servers are available, before general searching will be efficient.