mardi, juillet 24, 2007

Atlas Internet Search infrastructure

This is a brief overview of a large vision: enabling search to become
a part of the Internet's infrastructure. Building on Atlas as an
open protocol, search can become a fully distributed and
interoperable world-wide community. All of the participants can
interact openly and in any role where they believe they can add value
to the network.

A search engine can be constructed from many independent entities
serving different roles instead of one monolithic system. These
entities are exchanging aggregate information, or knowledge, and can
decide with whom they want to work with. To design this working
economy based on knowledge, there must be balance between these
various entities. Each actor must have incentive to act both for
their own benefit and for the benefit of the whole, and enough
information to make and validate those decisions. Reputations and
relationships are the essential fabric of Atlas, just as they are in
a real-world free market.

There are three primary roles within Atlas:

Factory - Responsible to the content.
Collector - Responsible to the keyword.
Broker - Responsible to the searcher.

Each of these actors must interact with the others to complete any
search request. Any two roles could be performed by a single entity
(whereas if all three are performed by one entity, the result would
be a traditional, monolithic search engine).

A Factory is akin to a crawler in today's search engines. An Atlas
Factory must fetch and process the content as intelligently as
possible, performing analysis (such as Natural Language Processing)
and normalizing it into distinct units. A Factory shares its highly
refined and processed output with one or more Collectors based on who
they believe is best utilizing it.

A Collector absorbs and indexes output from one or more Factories,
with one primary goal: ranking. An Atlas Collector must provide the
most intelligent ranking and relationship analysis possible. A
Collector has to compete for the output of a Factory, as well as
compete to provide the best ranking quality for Brokers.

A Broker must provide a searcher with the best possible results. It
does so by combining diverse ranking results from Collectors and also
by retrieving content from the original Factories. This last step, a
Broker interacting with a Factory, is critical to maintaining a
balanced ecosystem. All Factories must be aware of and approve how
their results are being used and by whom.

Reputation and reward is bi-directional between all parties (Factory-
Collector, Collector-Broker, and Broker-Factory). Each entity may
choose to interact on principle (free, Commons), attribution (results
provided by), or commercially (as a paid service), the Atlas protocol
is purely a facilitator and does not restrict how the relationships
between any entities are formed. In considering these motives for
the various entities, it's likely that the free-based networks will
tend to become more specialized, commercial ones will compete on
quality, and attribution based networks will mature in both directions.

This simple yet powerful division of roles, responsibilities, and
relationships will result in a distributed economic foundation for an
Internet Search Infrastructure. The wire protocol and further
definition of the interactions between these entities is openly
evolving, anyone interested is welcomed to join the discussions and
see the initial proposals at
atlas-l over the coming weeks.

Thanks, looking forward to a radically different search ecosystem in
the coming years :)


