Globally Applicable

Advanced Spiders

 

Veljko Milutinovic

Department of Computer Engineering
School of Electrical Engineering
University of Belgrade

vm@etf.rs
http://galeb.etf.rs/~vm

S

 

Software Agents

 

 

*junkware

Active Document Architecture (ADA)

 

 

 

 

CORBA - Common Object Request Broker Architecture:

ADA in ACTION

Rules of interaction in four domains:
User interface
Information management
System management
Task management

(d) Application objects
Components of specific end-user applications

 

Genetic Algorithms and Simulated Annealing
for Internet Search

 

Veljko Milutinovic

 

Department of Electrical Engineering
School of Electrical Engineering
University of Belgrade

POB 35-54, 11120 Belgrade, Serbia, Yugoslavia

 

vm@etf.rs
http://galeb.etf.rs/~vm

S

GAAS

Support for links-based search agents (Spiders),
as an alternative to index-based search (Altavista)

Genetic algorithms invented and developed in AI;
may be efficient if properly applied to Internet search

Simulated annealing introduced in mathematics;
applicable to Internet search, but reported as not too efficient

Industry leader in packages for EBI-related strategic planning:
Comshare, Inc.

Genetic Algorithm for Internet Search

    Select the initial WWW presentation or a set thereof
    Extract all URLs and fetch the corresponding WWW presentations
    Measure the fitness value for each newly fetched WWW presentation
    Continue with a subset of the most promising WWW presentations,
    while occasionally mutating the extracted URLs
    [Chen+Chung+Ramsey+Yang+Ma+Yen97].

    Issues of importance

    Representation of genes (URL is a numerically-encoded string)

    Crossover (one parent: WWW page; the other: selection function)

    Fitness function (typically, Jaccard's score)

    Number of offsprings (limited to a subset of the "best")

    Mutation type (typically, DB-based)
    [Mirkovic+Kraus+Milutinovic97].

    Possible Solutions

    Possible representation approaches:
    1. String
    2. Array of strings, etc…

    Possible crossover approaches:
    1. Link crossover (one explicit parent and one implicit parent)
    2. Classical (two explicit parents), etc…

    Possible fitness functions:
    1. Jaccard's function
    2. Evaluation function, etc…

    Possible number of offsprings:
    1. Limited
    2. Unlimited, etc…

    Possible mutation types:
    1. DB-based
    2. Semantics-based, etc…

     

    Representation of Genomes

     

    Crossover Operator

     

    Fitness Function

     

    Degree of the Crossover

     

    Mutation Operator

     

      

    Generation of the Output Set

     

    Selected Papers About Intelligent Agents on Internet

     

    Chen, H., Chung, Y.-M., Ramsey, M., Yank C., Ma, P.-C., Yen, J.,
    "Intelligent Spider for Internet Searching,"
    Proceedings of the HICSS-97, Maui, Hawai'i, USA, pp. 178-188
    .

     

  1. This paper introduces a new interactive genetic search algorithm,
    which is better than traditional genetic search without on-line adjustments
    (the worst case of which is the best-search algorithm)
  2.  

     

    Relevance of a link is evaluated using the anchor test
    Anchor test measures similarity between anchor text and user query
    Anchor text are the words describing the link to another document
    Anchor text is a small subset of the document
    Search speed versus search quality

    Essence: If weak links avoided - more strong links in unit of time!
    (c) Used by America Online since January 1995


    1. Starting URL(s)
    2. Keywords
    3. Number of URLs expected to return
    4. Category of the searching space


    (b) When a query is submitted
    the appropriate searching space is invoked
    in the available database

     

    Score From Links:

    Homepages are x and y
    Their links are X={x1, x2,...} and Y={y1, y2,...}

    Jaccard's score between x and y is equal to:

    If X=Y then J=1

    If X<>Y then J=0

     

    Reference:

    Goldberg, D.E.,
    "Genetic Algorithms in Search, Optimization, and Machine Learning,"
    Addison-Wesley, Reading, Massachusetts, 1989.

     

    Score From Indexing:

    #1. Total number of homepages is counted - N

    #2. Terms of a homepage are identified - set t

    #3. Total number of terms is counted - L

    #4. The number of words in term j is calculated - wj

    #5. Term frequency number of occurences of term j in homepage x - tfxj

    #6. Homepage frequency number of homepages in set N where term j occurs - dfj

    #7. Combined weight of term j in homepage x - dxj

     

    Details of the Best First Search Implementation:

     

     

    (b) Determining the best homepage in H:

    #1. Determine the Jaccard's score for all elements of set H
    #2. Score is computed as:


    (c) Fetch the homepage from H with the highest JS

    #1. Save it as OUTPUT(k)
    #2. Increase k by 1

    (d) Repeat until all output homepages obtained

     

    Simulated Annealing for Internet Search

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Explanation: Symbolic Representation of Simulated Annealing
    SlC - Slow Cooling
    SuH - Sudden Heating
    StC - Stochastic Crossover

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Explanation: Simulated Annealing on a MIMD Machine
    The speed-up is considerably slower than linear [Green90],
    due to quasi-minima created by partitioning and interprocess communications.

     

    Important Difference Between SA and GA

    • SA - Single solution being modified over time (inherently serial)
    • GA - A population of candidate solutions maintained (inherently parallel)
    • Single search (typical of SA) is inherently serial
      and therefore difficult to get parallelized.
    • However, it potentially offers a better performance, since no edge effects and interprocess communications are present [Chen98].
    • Consequently, a hybrid approach may be a solution!

     

    Essence of the Hybrid Approaches

    • One solution maintained per processing element (PE).
    • Each PE accepts a solution from other PEs
      for crossover and mutation.
    • If the best solution from the neighborhood is selected, convergence is potentially faster,
      but serialization gets limited.
    • If all PEs receive the visiting solution
      from the same direction, convergence gets slower,
      but parallelization gets easier,
      and enables an overcompensation.

     

    GSA Algorithm Running on Each PE

    of a MasPar Machine

    1 begin

    2    temperature:=_initial_temperature();

    3     r:=_random_solution();

    4     for i:=1 to max_iteration do

    5     begin

    6         direction:=random(0, 7, random_seed);

    7         distance:=random(1, max_distance, random_seed);

    8         v:=XNet_direction(distance).r;

    9         [n0, n1]:=crossover_mutation(r, v);

    10         r:=select(r, n0, n1, temperature);

    11         temperature:= temperature * a ;

    12     end

    13     return r

    14 end

     

     

    References

    [Chen98] Chen, H., Flann, N., Watson, D.,
    Parallel Genetic Simulated Annealing:
    A Massively Parallel SIMD Algorithm,
    IEEE Transactions on Parallel and Distributed Systems,
    Vol. 9, No. 2, February 1998.

    [Green90] Green, D.,
    Parallel Simulated Annealing Techniques,
    Physica,
    Vol. 42, pp. 293-306, 1990.

    [Goffe94] Goffe, W., Ferrier, X., Rogers, X,
    Global Optimization of Statistical Functions
    with Simulated Annealing,
    Journal of Econometrics,
    Vol. 60, No. 1/2, January/February 1994, pp. 65-100.

     

    An Intelligent Spider
    Based on Hybrid Simulated Annealing

     

    The Moore's law applies to Internet, too:

    "Amount of information stored on Internet
    doubles every 18 months."

     

    Reference

     

    Yang, C., Yen, J., Chen, H.,
    "Intelligent Search Engine Based on Hybrid Simulated Annealing,"
    Proceeding of the HICSS-98,
    Kona, Hawaii, USA, January 1998, pp. 415-422.

     

    Initialization

     

    Generation of New Configuration

    For each document x in A or B

    If

    save the document in CurentConfiguration

    Else if

    where rand is a random value between 0 and 1,

    save the document in CurentConfiguration

     

    Major Industrial Research in the USA

    • OpenGroup: MOA,
    • IBM: Aglet,
    • Crystaliz: MuBot,
    • General Magic: Telescript + Odyssey.
    • For more information: AltaVista or EBI

     

    Research at University of Belgrade

     

    Specification of the KQML
    Agent-Communication Language
    (Stanford University)

    • Message-passing is glue that builds large software systems
      out of smaller software systems.
    • Many new standards and toolkits supporting the transport of messages among programs
      (e.g., OMG CORBA, OSF DCE, ISIS, BSD Sockets, even ServiceMail),
      but once connected, what should programs say to one another?
    • Sophisticated programs, especially knowledge-based "agents,"
      interact in may ways beyond the simple query-response paradigm of standards like SQL, leading to a proliferation of incompatible agent communication languages.
    • Knowledge Query and Manipulation Language (KQML).
      KQML is a language that programs can use to describe a variety of different attitudes about information including queries, assertions, action requests, information subscriptions, and processing capabilities.
    • Furthermore, KQML is an enabler of information-flow architectures,
      though forwarding, broadcasting, and brokering messages.

    http://www.sce.carleton.ca/faculty/bieszczad/courses/94587/AgentBasedEngineering.html

     

     

    Agent-mediated Electronic Commerce: A Survey

    (MIT)

    • Software agents are programs to which one can delegate (aspects of) a task.
    • They differ from “traditional” software in that they are personalized, continuously running and semi-autonomous.
    • Consumer Buying Behavior Model:

    Need Identification
    This stage characterizes the consumer becoming aware of some unmet need.
    Within this stage, the consumer can be stimulated through product information.

    Product Brokering
    This stage comprises the retrieval of information to help determine what to buy.
    Encompasses the evaluation of product alternatives based on consumer-provided criteria.
    The result of this stage is called the “consideration set” of products.

    Merchant Brokering
    This stage combines the “consideration set” from the previous stage
    with merchant-specific information to help determine who to buy from.
    This includes the evaluation of merchant alternatives based on consumer-selected criteria
    (e.g., price, warranty, availability, delivery time, reputation, etc.).

    Negotiation - this stage is about how to determine the terms of the transaction.

    Purchase and Delivery - self-explanatory

    Product Service and Evaluation - this post-purchase stage involves product service, customer service,
    and an evaluation of the satisfaction of the overall buying experience.

    http://www.iiia.csic.es/amet98/abstract5.html

     

     

    The Oz Project

    Carnegie Mellon University

    • Oz - a computer system for creation and presenting higly interactive dramas (Bates 92).
    • Highly interactive - the interactor is choosing what to do, say, and think at all times.
    • Drama - even though the interactor is choosing what to do, say, and think, there is a destiny,
      created by the author of the interactive drama.
    • Two different presentation models: textual and animated
    • The textual system uses text as the input and output medium. The world and characters are described through text,
      and the interactor's actions are entered to the computer through text.
    • In the animated system, the world and characters are presented graphically.
      Humans interact with the system physically, through sonar sensors and a mouse.
    • In the future of both systems, people may interact through sounds and speech

    • The architecture includes a simulated physical world, several characters, an interactor, a theory of presentation,
      and a drama manager. A model of each character's body and of the interactor's body are in the physical world.
      Outside the physical world, a model of mind controls each character's actions. The interactor's actions are controlled by the interactor. Sensory information is passed from the physical world to the interactor through an interface controlled by a theory o f presentation. The drama manager influences the characters' minds, the physical world, and the presentation theory.

    http://128.2.242.152/afs/cs.cmu.edu/project/oz/web/oz.html

     

     

    The Edgar Agent
    (University of Kansas and Rutgers University)

    • Example of an intelligent agent that gathers financial information.
    • Used mostly in auditing - value added service of the auditor.
    • Organizations need to know (in real time) how they perform!
    • Simple internals - written in Perl. (UNIX scripting language)
      Uses the Berkeley Sockets Lib and the Steven Brenner CGI Lib.
    • Learning while searching!

     

    Nelson, K., et al,
    Virtual Auditing Agents: The Edgar Agent,
    Minitrack on Enabling Technologies,
    The Internet and the Digital Economy Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998.

     

    The IIR Agent
    (National Taiwan University)

    • A solution for the information overload problem (category~directory).
    • An agent community is proposed (to suggest interesting
      URLs to visit).
    • Agents and subagents based on knowledge categories (community architecture).

     

    Explanation: The architecture of agent community

    Tu, H.-C., et al,
    An Architecture and Category Knowledge
    for Intelligent Information Retieval Agents,
    Minitrack on Enabling Technologies,
    The Internet and the Digital Economy Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 405-414.

     

    Browsing Agents for Improved Commerce (National Sun Yat-Sen University in Taiwan)

    • Five agent types:
      recommendation, new-content, search, customizing, status
    • Essential elements: Control engine + knowledge base
    • Prototype: Book commerce

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Lai, H., et al,
    A System Architecture of Intelligent-Guided Browsing on the Web,
    Minitrack on Enabling Technologies,
    The Internet and the Digital Economy Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 423-432.

     

    A Decentralized Coalition Multi-Agent (University of Hong Kong)

    • Deregulation and restructuring in power industry
      for better efficiency!
    • Agents assisting owners
      (of power generation stations + transmission lines)
      and customer groups
      (in negotiations about contracts and coalitions).

     

    Yen, J., et al,
    Multi-Agent Coalition Formation in Power Transmission Planning,
    Minitrack on Enabling Technologies,
    The Internet and the Digital Economy Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 433-443.

     

    Bilateral Negotiation:

     

    for each agent:

    1. Initially, set i = 1.
    2. Sends an offer to the ith agent
      in the agent’s preference list, i.e. L(i).
    3. Waits for replies and offers from other agents.
      If an offer from the agent L(j), j
      £ i is received, i = j.
      If an offer from the agent L(j), j > i or from an agent
      outside the preference list L has been received, replies a dissent message to that agent. If no more offers from other agents have been received, replies a consent message
      to agent L(j) and informs coordinator about the candidate coalition with agent L(j).
    4. If a consent message from agent L(i) has been received, informs coordinator about
      the candidate coalition with agent L(i). If a dissent message from agent L(i)
      has been received and L(i) is not the last agent in the preference list,
      i = i+ 1
      and go to step 2.

     

    for coordinator:

    When coordinator receives messages from both founders of a candidate coalition,
    it informs every agent to stop negotiation and removes from its own preference list
    the agents within the candidate coalition, and then goes to step 2.

    When every agent reaches the end of the list L and no coalition is possible, the process terminates.

     

     

    An Agent for Resolving
    the Multiagent Collaboration Dilemmas
    (NTU, Taiwan)

     

    • Proposing a set of general quantitative criteria (agent Ana meets agent Bob)
      for detecting collaboration cases (open motives and motives behind),
      and for picking strategies to maximize effectiveness (FT in achieving goals).
    • Definition (Collaboration payoff):

    Given R1, ... , Rn over a scalar field F (e.g. real numbers) as resources invested in a collaboration by an agents a1, ... , an respectively, the payoffs to agent ai, i=1, ... , n,
    of the collaboration is a function pi(R1, ... Rn)
    Î F.

    • Definition (Agent collaboration problem):
      Given a set of agents about to participate in a (possibly infinite)
      sequence of collaborations, a solution to the agent collaboration problem (ACP)
      for each paticipating agent is an algorithm that decides
      a sequence of resource contribution decisions to maximize

      the agent’s accumulated profit during the collaborations.

     

    Wang, J. C.-C.,
    A Framework for Resolving the Multiagent Collaboration Dilemmas,
    Minitrack on Enabling Technologies,
    The Internet and the Digital Economy Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 444-451.

     

    An Agents for a Kiosk
    (University of Hong Kong)

     

    • Architecture of an agent-based society:
      Communications procedures and control policies.
    • Case study: Tourism kiosk for Hong Kong.

     

    Yeung, C.,
    A Multi-Agent Based Tourism Kiosk on Internet,
    Minitrack on Enabling Technologies,
    The Internet and the Digital Economy Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 452-461.

     

    An Agent System for Information Infrastructure Management (GMD)

    • Use of Java-based agents
      in network and service management.
    • Essential issue:
      Platform supporting agent mobility, to compare with
      OpenGroup MOA, IBM Aglet, Crystaliz MuBot, General Magic Telescript + Odyssey.

     

    Covaci, S.,
    Mobile Intelligent Agents
    for the Management of the Information Infrastructure,
    Minitrack on Agent Mobility and Communication,
    The Software Technology Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 24-33.

     

    An Agent System for Security Issues
    (Darmstadt University of Technology)

    • Safety versus security!
    • Issues of interest:
      A-to-H, A-to-A, H-to-H, and H-to-A security

     

    Fuenfrocken, S.,
    Integrating Java-Based Mobile Agents into Web Servers
    Under Security Concerns,
    Minitrack on Agent Mobility and Communication,
    The Software Technology Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 34-43.

     

    The Agent for Tracking Agents

    • Agent identification, creation, cloning, and deletion
    • Message handling
    • Algorithms for agent migration and tracking
    • The migration for tracking algorithm:
      1. send a message Update Entry to the local commu-nication agent
        with the target address;
      2. send a message AgentIsMigrating to the target communication agent
        informing it to buffer all incom-ing messages forwarded to the migrating agent;
      3. in the local database, mark the agent’s address with the tag Migrated;
      4. wait for all active threads to terminate;
      5. remotely clone the agent on the target node passing as arguments
        to the creator agent the agents’s id and the variable NumberOfHops
        (an integer giving the number of migrations made by the agent
        up to the current node; its value defaults to 0 and is incremented
        each time a migration occurs);
      6. forward all messages left in the mailbox to the newaddress and flush the mailbox.

     

    Desbiens, J., et al,
    Communication and Tracking Infrastructure of a Mobile Agent System,
    Minitrack on Agent Mobility and Communication,
    The Software Technology Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 54-63.

     

     

    Agents for Improving Interagent Communications
    (University of Karlsruhe)

    • Providing an infrastructure for remote interagent communication
      and for locating agents during their remote execution.
    • Basic messaging/directory mechanisms do not scale up (so: multicast IP layer)

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Hartroth, J., et al,
    Using IP Multicast to Improve Communication
    in Large Scale Mobile Agent Systems,
    Minitrack on Agent Mobility and Communication,
    The Software Technology Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 64-73.

     

    An Agent for Adaptive Routing
    (Universite' Libre de Bruxelles)

     

    • Introducing AntNet - A distributed mobile-agents-based algorithm
      inspired by the ant colony metaphor to achieve efficiency
      (ant agents sit at nodes and make local research when asked to do so)

     

     

     

    Di Caro, G., et al,
    Mobile Agents for Adaptive Routing,
    Minitrack on Agent Mobility and Communication,
    The Software Technology Track,
    Proceedings of the HICSS-98, Kona, Hawaii, USA, January 1998, pp. 74-83.

     

     

    Research at UB/IFACT

    Two major research domains:
      Algorithmic domain:
      mutation based on spatial and temporal locality
      [Milutinovic+Kraus97]
      Open problems:
      spatial node switch function and temporal page revisit period Tools domain:
      software packages supporting the Lego approach
      to Internet experimenting
      Open problems:
      interface issues and resource absorption Acknowledgments: Laslo Kraus, Jelena Mirkovic, Sasa Slijepcevic, Nela Tomca, etc…

       

      The XML (Extensible Markup Language) Standard

       

      Aprroved by W3C (WWW Consortium)
      Based on SGML (Standard Generalized Markup Language)

       

      HTML determines only the way in which browser displays the document
      XML also defines contents (using tags), for more efficient search

       

      HTML has a fixed set of tags
      XML is a metalanguage to desing one's own ML and special tags

       

      HTML handles complex DTDs only via CGI scripts and Java applets
      XML handles complex document type definitions via language constructs

       

      Clark, D., W3C Approval Gives XML a Push,
      IEEE Computer, April 1998, pp. 18-19.