Outline: In Search of a good Search Protocol

Concept-based and context-sensitive searching. January 1998

Motivation * Definition of Structure * Protocol Requirements * Protocol Examples * User Interface Suggestions

I. Motivation

A. Need for user-defined concepts

Complex concepts are intrinsic to the way people think. A concept may be represented to a person by a single word, but in fact contains assumptions about the building blocks that concept is made of and the relationship of that concept to a framework of knowledge.

An effective search engine should allow users to represent a specific concept by a single word, to build up more complex concepts from simpler ones, and to assign a concept to a position in an existing framework.

B. Need for search contexts

In many cases, searching within a particular informational context is more important than searching a particular company's or individual's information. However, there are many possible frameworks that can provide context.

A good search scheme should provide an extensible set of frameworks with mechanisms to assign information-containing sites to positions in the frameworks. Users should be able to choose an appropriate framework and search in broad or narrow context areas.

C. Need for specifying input/output formats

In many cases, field-based searches and retrievals are more powerful than full-text. Results from multiple sources with the same set of standard fields can be compared or pre-processed before being presented to the user. Searching multiple sources by the same criteria is likely to provide more complete results more efficiently than searching a many sources with different criteria.

Our search scheme should provide a mechanism to specify input and output formats, and for specific sites have their inputs & outputs mapped to equivalent standard formats if applicable.

D. Need for validity and reliability

Quality of information retrieved is generally very important to the user. Checking quality of information requires time and effort. In many cases there will be a trade off between quantity and quality of information.

Our search scheme should provide identifiable levels of reliability checking, so that the user can choose what tradeoff to make.

E. Need for open standards

Whatever mechanisms are chosen for framework and concept building, validity checking, and input/output format registration, there need to exist public and open standards. Specific portions of the search universe may be highly controlled and private, but the mechanisms by which new pieces are put into place should be public.

II. Structural Goals & Examples.

A. Definition of Concepts and Frameworks

i. Users and/or servers can build up a concept, which can then be addresed and used as a building block for other concepts.
An example of a concept might be

	A = "The state of Arizona"

or	B = "Trends of employee wages in the 1990's"

This allows us to define

	A + B = "Trends of employee wages in the 1990's in the state of Arizona"

Note, concept B itself may have been built from smaller concepts,

	 "Wages" + "Economic Trends" + "1990's"
ii. A Framework is an organized graph of concepts. It may be hierarchical, such as the Geography framework: World breaks down into countries, each of which breaks down into states, each of which breaks down into regions.

It may also have other internal structure. World may break down into both countries and river valleys, and the river valleys each map to one or more countries.

iii. Frameworks and concepts within them can be assigned addresses. Concepts can be mapped to points in various frameworks.

B. Definition of Methods

i. Methods consist of lists of field names and types for input and output

ii. Standard methods can be defined and assigned addresses.

iii. Specific servers or users may map a custom method onto a standard one.

For example, the General Product method might contain inputs and outputs of Title, Description, Availability, Location and Price.

The 'Clothing Product' method might contain additional inputs and outputs of Clothing_Type, Color, Size, and Male_Female

C. Addressing

i. A particular 'context search' may specify a framework, a concept within that framework, and a method for input and for output.

In our search protocol, the search message starts with

	--------   ---------------   --------

	8-bytes	   16 bytes          8 bytes

	framework  concept	     method	
	address    address	     address

only the framework addresses would need to be agreed upon globally. The concept and method addresses would be set by the server or servers supporting each framework.

Some methods may also be assigned globally, but this would be only a small subset of the possible methods addressable.

D. Structure-building Process

i. Some frameworks should be general in scope, well-maintained, with reviewed additions.

ii. Some frameworks should be specific in scope, well-maintained, with reviewed additions. Specific frameworks may map to points within general frameworks.

iii. Some frameworks may be open to all additions

iv. Some frameworks may have 'trusted members' who may freely add at X level. Hierarchical frameworks particularly may define 'levels of trust' needed to add to various levels of the hierarchy.

There may exist a 'ring' of mutually trusted members who jointly maintain a framework.

v. There may exist mappings between frameworks.

E. Discovery of Information

i. There should exist 'master servers' that know the existence of frameworks, concepts, and the locations of many pieces of information

ii. There should exist 'site servers' that recognize certain concepts and can accept queries and return information in specific formats.

iii. There will exist 'dumb servers' that do not recognize any concepts or frameworks as defined here, but are capable of performing a search. 'mapping servers' may act as site servers on behalf of these 'dumb servers'.

III. Communication/Protocol Requirements

A. Structural Queries and Responses

A framework server must recognize

not necessarily in a globally standard format.

B. A 'master server' must recognize, for at least one framework:

C. A 'site server' or 'mapping server' must recognize, for at least 1 framework:

IV Protocol examples

The general search message always starts with the 32-byte string

	8-bytes	  16-bytes         4-byte 4-byte   depends on input method
	--------  ---------------  ----   ----     --------------------------
	fwork id  concept id       input  output   rest of input
				   method method
A few input and output methods will be 'reserved words' to specify types of queries, such as those for a 'master server' listed above. For example,
	ffffffff 0000000000000000 0001 0001  xxxxxxx
may specify a string match query, output to be a concept list, for a concept in framework ffffffff matching string xxxxxxx
	ffffffff cccccccccccccccc 0002 0002 
may specify input by concept id, output to be 'yes' or 'no' depending on whether this master server has information on concept ccccccccc.... in framework ffffffff
	ffffffff cccccccccccccccc 0002 0003
may specify input by concept id, output to be a list of 'site servers'
	ffffffff cccccccccccccccc 0003 0004 xxxxxxxxxxxxxxxxx
may specify input by concept id and site server address, output a list of input parameters accepted by that site server. ccccccccccccccc = concept id, xxxxxxxxxxxxxxxxxx = site server address.
	ffffffff cccccccccccccccc 0003 0005 xxxxxxxxxxxxxxxxxx
may specify input by concept id and site server address, output a list of output parameters offered by that site server.

A physical server may maintain/mirror multiple frameworks.

V. User Interface Suggestions

A. A client may download an entire hierarchy or framework for fast browsing.

B. A client may map user-defined concepts to specific points in a framework, and to specific input/output formats.

C. A search may contain specific concepts and strings for full-text comparison.

D. A client may make parallel queries of multiple 'site servers' simultaneously.

E. Users may pick favorite frameworks and master servers to start queries at.

Example:  An indoor gardening hobbyist searches for

Quality: High     Context: plant    Search string: winter care
where "plant", for this user, defines the specific context of 'indoor potted plants' and "winter care" is a string for full-text comparison. The result will be a list of known highly-maintained sites about 'indoor potted plants' containing the string "winter care". If no results are returned, the user can lower the required quality or change the search string.

It is assumed in this search that the word "plant" has already been defined to mean a particular concept in a particular framework downloaded from an existing 'master server'.

A novice user who needs to do a search for the first time, must first locate a master server that contains information about the concept he is interested in. This search would take longer, but once the master server is found, future searches in this general topic area will be efficient.

The novice user's search might go like this:

String: 'plant'


User chooses "Hobbies/Gardening/Plants" and requests more specific categories.


User chooses "Indoor Plants" and clicks a button to link the word "plant" to the concept "Hobbies/Gardening/Plants/IndoorPlants" from this framework.

Now he can do his search...

This scheme supposes at least a few good master servers are available, before general searching will be efficient.

Motivation * Definition of Structure * Protocol Requirements * Protocol Examples * User Interface Suggestions