Recent AI implementation thoughts

This month, AI, and in particular semantics of data, have been in the spotlight. At the beginning of the month, Stephen Wolfram (of “A New Kind Of Science” fame) released his knowledge search engine, Wolfram Alpha.  This has generated a lot of press interest, with some journalists going so far as asking if it is a “Google killer”. Experts, of course, pooh poohed the idea, but there was a considerable amount of noise about it in the technology news sector

Once Wolfram Alpha actually became available, the reviews were mixed. The search engine clearly has quite a bit of information semantically tagged and a decent enough NLP that it can understand a lot of mathematical queries. However, there is often a feeling that much of the NLP is “hard-coded” to understand certain orderings of phrases, particularly those unrelated to mathematics, rather than actually trying to understand their meaning. That’s not to say you can’t get useful information out of Alpha – but the majority of that information is direct from Wolfram’s other brain child, Mathematica, and when it isn’t, things get a lot woollier.

When I’ve tried Alpha, I’ve been impressed at the speed of the results. Considering so much was made of it, Alpha seemed to stand up well enough in the first few days, in which it must have been deluged with visitors. However, the dreaded message of “Wolfram Alpha doesn’t know what to do with your input” will almost certainly continue for a while now, until they improve both the parsing of natural language and increase the amount of semantic data they store.

My main concern about Alpha is that Wolfram, being Wolfram, insists on keeping the information and the semantic tagging on their own servers and not making either their sources of data or the ontology available to be used by others. Interoperability is clearly not something Wolfram is looking for, meaning that all semantic tagging will continue to be done by his company, rather than any form of distributed effort, hindering any real progress.

More recently, Microsoft has announced Bing, a “decision engine” as MS like to refer to it. Bing is surprisingly similar to Alpha in many ways, mainly differing in their intended audience. Whereas Alpha is clearly targetted at mathematicians and scientists, Bing is clearly more interested in shoppers. From early demo videos, it seems Microsoft have developed a search engine that is somewhere between Google Shopping and every price comparison website in existence. They seem to want to provide powerful price comparisons for every sector they can possibly find and join them up to make what could be a very useful system.

Clearly, as we are in the days before the semantic web might take off, I presume there is going to be a large amount of hard-coding for individual websites. However, if they manage to do what they claim it will, read and understand the content of websites and bring related sources together in one searchable website, it is still closer to the original semantic web vision of Tim Berners-Lee than Alpha or, honestly, anything that’s come before. Whether Microsoft will manage what they claim they can do is a different matter. But regardless, it seems to have been a good month for anyone interested in knowledge semantics!

Addendum: Bing just came up today! Haven’t had a chance to look at it properly, but have heard positive things so far, will be an interesting one to keep an eye on.

Computability Theory

My final day of module-specific revision and I was looking at Computability Theory today. This subject analyses the limits of computers, such as what they can and cannot solve. There are several important classes of problems: primitive recursive (also called primitive computable), partially computable and computable (or recursive).

The most basic class, primitive recursive, is made up of three initial functions (successor, nil and projection) and recursion, bounded minimisation and composition. If a program will only use a finite application of these, it will be primitive recursive.

If a program can require an infinite number of applications in certain situations (i.e. will never return a result in some cases) it is known as partially computable. The class of computable functions, on the other hand, is total, meaning it will always return a result. Whilst the class of primitive recursive functions are total, every total function is not necessarily primitive recursive.

There were several important discoveries looked at in the course, including the Halting problem. This states that it is not possible to detect whether a Turing machine will halt with another Turing machine. This was proved using a method known as diagonalisation, where a set of instances is enumerated, however a new member of the set is generated, but will never be enumerated, thus proving the set cannot be enumerated.

Other discoveries including the s-m-n theorem, which states that any program can integrate some of the variables into the algorithm, and two different types of reducibility – Turing reducibility, where A reduces to B if it can be solved with an oracle to B and Many-to-One reducibility, where X reduces to Y if there is a function f that will transform every instance of X into Y.

Finally, how an algorithm can be measured was looked at, in particular Blum’s axioms for a valid complexity measure. These state that a complexity measure should only be able to measure the complexity of an algorithm if the algorithm terminates and that it is recursive (total).

Advanced Computational Complexity

After a short break in which I looked generally at Artificial Intelligence revision topics, today’s subject was Advanced Computational Complexity. Most of this module was dedicated to looking at various complexity classes for problems, such as P (those solvable in polynomial time by a deterministic machine), NP (as before, but by a non-deterministic machine), L (those solvable in logarithmic time by a deterministic machine) and NL (again, as before, but by a non-deterministic machine).

Most of the problems that could come up during the exam will be about determining whether a problem is in a complexity class or is complete for a class. A X-complete problem is one such that it is shown to be in X and every problem in X can be reduced to it given a suitable limit on the running time or space requirements of the reduction. An X-hard problem is one such that every problem in X can be reduced to it, but cannot be shown to be in X.

Two important theorems that we examined were the space hierarchy and time hierarchy theorems. These provide the main bounds upon the hierarchy of the complexities, such as that NL does not equal PSPACE and that P does not equal EXP. A collarary for the time hierarchy theorem that gives a more accessible definition is as follows; for any real numbers 1 \leq a < b, TIME[n^a] \subsetneq TIME[n^b].

Finally, approximations of NP problems were looked at. Two methods were demonstrated, those where a result will always be within a constant factor of the optimal solution and a method known as gap[a, b]-approximation for decision problems. The latter will return YES if there is a solution > a or NO if the best solution will be \leq b. This leaves the eponymous gap, in which we don’t care what the algorithm will return.

Multi-Agent Systems

Today was Multi-Agent Systems. An agent is an application that demonstrates three attributes: being reactive, proactive and social. These mean that an agent should react to changes in the environment it feels are pertinent, should make an active effort to reach its goals and will interact with other agents.

We were taught agents with an emphasis on the BDI infrastructure. In this model, an agent will have some beliefs about the environment, will desire to affect some changes to the environment and will have some intentions about actions they wish to carry out that they believe will cause their desires to be met. This model has been implemented in several forms, one being the Jason architecture.

The environment an agent is placed in is particularly important. They after described using the following types:

  • Accessible or inaccessible (whether the entire environment can be perceived)
  • Deterministic or non-deterministic (whether an action will always have the same results given the same environments)
  • Episodic or non-episodic (whether events can happen at any time)
  • Static or dynamic (whether the environment can change)
  • Discrete or continuous (whether time happens in discrete parts or not)

Some of the methods of agent decision making were covered, in particular the Contract Net Protocol, in which agents bid to fulfil a task. Bidding can often take place as a Monotonic Concession Protocol, for which the optimal strategy is known as “Zeuthen”. Agent argumentation, where agents either attack other agents arguments or their basis is another method of deciding various problems.

Finally, we looked at the three main classes of agents: Cognitive, Reactive and Hybrid. Cognitive agents were popular at the beginning of AI research and mainly used the methods discussed in Automated Reasoning to make decisions. However, these agents were slow to respond, and because of this, Reactive agents became more popular. These agents only responded to changes in the environment and did not try to reason about events. However, these agents quickly became limited and were replaced by Hybrid agents, much like the agents written using the BDI infrastructure.

Advanced Information Systems and Automated Reasoning

Yesterday I was revising Advanced Information Systems. This topic bares a lot of similarities to many of the topics touched on by ATAI, particularly in knowledge representation and the use of ontologies, although it looks at it from a more practical view. Most of the revision was about semi-structured data, which is effectively a theoretical version of JSON, although could be implemented quite easily. The most interesting bit about SSD was the idea of typing the data. To do this, you provide a set of classes you wish the data to conform to and then, using one of several different methods, you attempt to coerce the data to fit in the classes. The different methods of doing this that we covered were:

  • First order logic
  • Datalog
  • Simulation

The first order logic method was the least flexible of the three and also required the most work. Datalog and Simultation work in similar ways to each other; both repeatedly attempt to find which class an element belongs to by testing it’s relationships to the other elements. The main difference between the two is that Datalog requires that the relations be there, whilst Simulation merely allows them to be there, which can make Datalog more powerful when it comes to typing.

Today I was working on Automated Reasoning. This subject is primarily about logic and satisfying clause sets, which are vitally important for applications such as Prolog, HOL Light and other reasoning tools. The course focused on enhancements to satisfiability algorithms and special cases where satisfiability is not necessarily exponential.

Two early algorithms for speeding up satisfiability were Unit Propagation and Pure Literals. Unit Propagation looks for any clauses composed of only a single literal. Since, to satisfy this clause, the literal must be given the correct assignment, it therefore can be done first. Pure Literals are similar, if a variable only occurs in the same literals, it can therefore be given a satisfying assignment.

Next we covered resolution and linear resolution, the later of which is not necessarily complete, unless the clause set is Horn clause set. We also showed that UP can show whether a Horn clause set is unsatisfiable.Horn clause sets are those where every clause can contain at most one positive literal. A clause that contains a single positive literal is known as a definite Horn clause.

Solving satisfiability for 2-SAT clause sets was also covered. It was shown to be solvable in polynomial time using the following algorithm: a graph is created for the clause set, F, where variable in F is two vertices and labelled with the positive and negative literals. Then, for each clause {l, m}, two edges are added, l -> m’ and l’ -> m. The clause set is unsatisfiable if there exists a variable with a directed path from x -> x’ and back again.

Finally, Herbrand’s theorem and the unification algorithm were looked at. The implications of Herbrand’s theorem were looked at, in particular the creation of a semantic tree and deciding whether it is closed (and therefore the clause set is unsatisfiable) or not. The unification algorithm was important as it provides an algorithm for quickly producing the minimum general unifier for a set of clauses, if one exists.

Name change and tagging files

As well as revising for my exams, I’ve been finding more things to procrastinate about. One of the first things I did was to change the name of this blog – I felt it didn’t quite fit (plus my attempt as humour was pretty poor, frankly). The name Open Source Mind might seem an odd one, but I feel it’s not a bad attempt at summing up how I think I think. Since I came across the open source movement sometime in my late high school years, I’ve been fascinated by it and I’ve been interested in participating in it for a long time. So in that sense, the name is accurate. But I also feel that this blog is an out-pouring of my thoughts and that if I ever get any feedback other than spam-bots, I might enhance my understanding or maybe be caused to reconsider an idea – just like the open source world can be.

As an attempt to procrastinate in a tangental way to my work today, I’ve been considering methods that Ubuntu/Gnome could utilise semantic information better. Right now, my main topic of interest is tagging files (mainly as I’ve found an itch that I really want to scratch – a lot of my lecture slides cover similar topics, and being able to cross-reference them easily would be brilliant). I’ve found some discussion about this, which I’ll attempt to summarise.

One attempt has been started to create a “semantic filesystem” known as tagsistant. This was interesting in many ways – it uses the concept of tags to act as folders, with the addition of some simple set theory (AND and OR). My problem with this is:

  1. I like the way my files are arranged right now – I really don’t want to rearrange them
  2. I really don’t want to reformat my hard disk (again)
  3. that I’d much prefer to use a well tested filesystem such as ext3/4

Based on these problems, I decided to look towards the various Linux/Gnome mailing lists to see if there was any way to create this system on top of what’s already available. Fortunately, there is some hope – ext2 and up have featured a technology known as “xattrs” – extensible attributes. These can be added to any file and should (in theory) be associated with those files from then on. This would match one of my requirements, that of arbitary numbers of tags being added to existing files on my filesystem, organised however I like.

However, this creates another problem – how does one index the tags that have been attached to the files? An application such as tracker would appear to be ideal, though obviously some integration would have to be done and standards devised before any real progress was made. This would obviously have to involve all of the interested parties (Nautilus would presumably be heavily involved, as I would imagine them to be the main “adder” of the tags through their filesystem interface).

Interestingly, there is already some (well, a little) discussion of these ideas on the Ubuntu brainstorming site. Reading posts on the mailing lists however indicates that this is still currently a long way from becoming real. But that hasn’t stopped me from getting interested in making it happen. Whilst this concept is possibly not a “strong” semantic topic, tagging of files could provide an easy, powerful new paradigm for file management, whilst not diverging too far from the existing paradigmns that have served us so well.

Advanced Topics of AI

This topic primarily covers various methods of representing or handling knowledge and looks at five different methods:

  • Databases and simple set theory
  • Description logic (DL)
  • Knowledge Grid (KG)
  • Semantic Link Network (SLN)
  • Category theory

This module has a lot in common with the Advanced Information Systems module I will be revising tomorrow, so I chose to skip some of the content of this module (mainly about ontologies such as RDFS and OWL, which I will examine tomorrow). One part that didn’t come up in AIS however is the idea of a closed/open world assumption. These describe the default assumption a system will make in the case that it requires knowledge it does not possess. In a closed world assumption, the system assumes it knows everything there is to know about the domain, thus anything that is not explicitly stated is false. The open world assumption is less defined – each system will choose how it should react to requests for unknown information.

Description logic is a form of logic that is broken into two disinict sections – the ABox and TBox. The TBox contains the rules for the system – describing what properties an instance must fulfil in order to be considered a member of the concepts, whereas the ABox contains the instances and the actual properties held by them. DL is mainly interesting due to it’s subsumption algorithm, which can decide whether or not a class conmpletely contains another in polynomial time.

A Semantic Link Network is yet another method of storing knowledge. In this model, there are three defined sections – types, links and rules. The rules act similarly to the TBox of DL, however, it is more defined in comparison to DL, with the types and properties of the network being explicitly defined in the corresponding sections. A SLN is usually represented as a graph, with the types acting as vertices and the links acting as edges. They also have a normalised form, where the network is broken into closed component (a closed network being one that one contains directly related types).

Finally, we looked at some (simplified) category theory. This mainly looked at how data could be represented using the concepts of categories, particularly using morphisms to represent relations between objects.

Advanced Algorithms

Today’s revision has been on Advanced Algorithms. This module was split into two parts, each taught by different lecturers. In the first term, we started by looking at some sorting algorithms, in particular quicksort. Whilst quicksort is optimally a O(n \log n) algorithm, we noticed that in the worst case it could be O(n^2). To avoid this, we looked at a randomised version of quicksort that randomly selected the element to partition about, which gave it an expected running time of O(n \log n). Then we looked at why, at best, a comparison sort will be \Omega(n \log n) (due to decision trees).

Once that was done, we moved on to Dynamic Programming and Network Flow. Important highlights included understanding how dynamic programming could reduce the running time of recursive problems and the Min Cut/Max Flow theorem, which states that the capacity of the minimum cut for a graph, which disconnects s and t, will be equal to the maximum flow in the graph. The Ford-Fulkerson method also made an appearance for increasing the flow in a graph.

In the second term, the sub-module concentrated on the kernelization of problems. Expressed simply, this is the idea that difficult optimisation problems might have sub-problems within that can be solved quickly, leaving a smaller instance to be solved traditionally. The primary algorithms we looked at were Crown kernelization and  NT-kernelization, both of which are used for finding kernels in instances of vertex cover.

For Crown kernelization, you start with a maximal matching, M_0. Using this matching, I_0 = M_0-exposed vertices and H_0 = N(I_0) \setminus I_0. A maximal matching between I_0 and H_0 is found, M_1 and if \vert M_1 \vert = \vert H_0\vert, a crown has been found. Otherwise, X is created from the vertices accessible from the M_1-exposed vertices by M_1-alternating paths. I = X \cap I_0 and H = X \cap H_0, which is the crown.

NT-kernelization is different. Given a graph, G = (V, E), you create a bipartite graph, B, by copying V' = V and for each edge in E adding two edges, xy' and x'y. Let C_B be a vertex cover on B, C_0 = \lbrace x : x \in C_B \wedge x' \in C_B \rbrace and V_0 = \lbrace x : x \in C_B \vee x' \in C_B \rbrace. Let D be the vertex cover of G[V_0], then a vertex cover of G is D \cup C_0.

Hardware thoughts

Outside of revision, I’ve been thinking quite a bit about hardware. I’ve managed to break my laptop’s power charger (one too many bends in the cable) and I’m trying to avoid spending any unnecessary money this term, so I’ve resurrected my old PC and stuck the latest version of Ubuntu on it (9.04). I’ve been pleasantly surprised – it’s the first time I’ve used Linux on a device with a “proper” graphics card (the venerable Geforce 7600GS, if I remember correctly) and Compiz is so much better with it. I also installed the latest Xubuntu on my Eee PC (the 901) and managed to make it access my WPA encrypted wireless network within half an hour of booting! But at the end of the day, I’m going to be looking to build a new PC, mainly to be a lot more efficient (this one has a P4 and cooks itself at 50C on idle – definitely not efficient!). So I have half an eye on the various gadget sites for advice on what would meet my requirements.

As well as these, I’ve been looking at some mobile devices. I own a Nokia E71, which is a joy to use and provides so much functionality, I can’t see myself getting bored with it for quite a while. However, it’s one problem is it’s lack of a nice GUI. Recent news about the Nokia’s supporting the Qt toolkit has kicked me into action though and it looks pretty nice, though navigation is an issue without a touchscreen. However, installing it is a pain in the arse on Symbian 3.1 edition phones, particularly as you have to download a small installer that hides in a 100Mb zip file. To get around this, I’ve extracted and uploaded the required PIP/Open C installer for anyone to download.

But on top of all this standard hardware, I’ve been considering a project that recently came to mind. I’m not going to divulge much at this stage – it’s very much as the earliest stages of planning, but I have been looking quite intently at the latest Beagle Board as one of the components. Seems very impressive – and there’s been a project to make the Android operating system work on it, which is very exciting and could definitely make my life a lot easier. I’ve worked with the Android operating system in a professional capacity and my initial thoughts were very negative (particularly about the documentation at the time). However, after a couple of weeks using it and learning how to write programs for it from an alternative source, I got myself up to speed and found myself enjoying it. So Android is a definite contender for this little project of mine.

Theory and Practice

As my final Computer Science exams are rapidly approaching, I’ve been busy revising for them. I’ve always been terrible at revision, mainly because I didn’t see much need to do it during my early school days. However, with a little help from my friends, I’ve been making some progress this year and I already feel more comfortable about the upcoming exams. I’ve decided to write down some of my notes as part of my blog, mainly for me to look back at just before the actual exams.

Today I’ve been revising a sub-module known as “Theory & Practice”. It might as well be called “Graph Theory”, frankly, as the entire sub-module revolves around it. Fortunately, my project this year was pretty heavy of the graph theory, so I’m not overly concerned about it. It also helps that I studied some of the subject in my first year, thanks to choosing an optional module in discrete mathematics. The module is primarily about graph connectivity and graph colouring, which are, respectively, measurements of how well connected vertices are and how many colours are needed if you want to assign unique colours to connected vertices.

In graph connectivity, the most important things to know are (what we refer to as) Whitney’s theorems and the handshaking lemma. Whitney’s theorems (\kappa \leq \lambda \leq \delta) describe the relations between the different types of connectivity (vertex connectivity and edge connectivity) and are quite intuitative, once you understand them fully. For graph colouring, there are a couple of algorithms one needs to know in order to demonstrate how they work and what their results will be if applied to various graphs. Whilst not trivial, these aren’t particularly hard to remember, although applying them isn’t always so easy.