Xholon Wikipedia tutorial1

Ken Webb 2011-11-19T14:29:10Z

This page is part of a series that explores the many ways in which Xholon and Wikipedia can interact with each other. The tutorial develops a simple model of the Solar System, that can be reused in other Xholon applications, such as models of climate change on the Earth. The Earth is one of the objects in the Solar System, and is critically dependent on energy from the Sun which is at the center of the Solar System.

Identify potential domain classes

A typical first step in developing a Xholon application, is to identify potential domain classes, and organize them into an inheritance hierarchy. In this case the domain includes the Solar System, the astronomical objects that exist within the Solar System, the properties of those objects, and the relationships between them.

The Solar System page in the English Wikipedia provides the names of the most important domain objects and types of objects. Like many people, I have a basic knowledge of astronomy and could write down much of the basic information without refering to an external source. As a child I learned that planets (Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune; plus now-downgraded Pluto) orbit the Sun, and it's a cinch that the Wikipedia article will include these exact words, plus things like 'star'. But I'm not sure what the generally recognized terms are for 'moon', 'space', 'small planet-like object', etc. I could come up with my own terms for things, but it's better and easier if I can simply look them up. The vocabulary used in Wikipedia represents a concensus of opinion that develops out of the interactions between numerous editors. And indeed, the words used in the Wikipedia article (2011) agree closely with the contents of my aging (1978) astronomy textbook.

The Wikipedia article contains many more terms than I want to use in my simple model. To help me whittle this down, I'd like to have a list of all the other Wikipedia pages that are referenced by the Solar System article. There are many ways to derive such a list.

Yahoo! Query Language

The Yahoo! Query Language (YQL) is "an expressive SQL-like language that lets you query, filter, and join data across Web services." It's an online service that can query HTML pages on the web. If I use the following URL in a web browser, I will get a list of 1000+ Wikipedia pages that somehow relate to the Solar System. Some of these, especially those at the beginning of the list, are the terms that I want to use in my application.

http://query.yahooapis.com/v1/public/yql?q=SELECT title FROM html WHERE url="http://en.wikipedia.org/wiki/Solar_System" AND xpath="//a[@title]"

Partial results:

<a title="Planetary system"/>
<a title="Star system"/>
<a title="List of gravitationally rounded objects of the Solar System"/>
<a title="This article is semi-protected."/>
<a title="Enlarge"/>
<a title="Planet"/>
<a title="Dwarf planet"/>
<a title="Enlarge"/>
<a title="Enlarge"/>
<a title="List of Solar System objects"/>
<a title="List of Solar System objects by size"/>
<a title="Timeline of discovery of Solar System planets and their moons"/>
<a title="List of gravitationally rounded objects of the Solar System"/>
<a title="List of natural satellites"/>
<a title="List of minor planets"/>
<a title="Template:Lists of Solar System objects"/>
<a title="Template talk:Lists of Solar System objects"/>
<a title="Sun"/>
<a title="Astronomical objects"/>
<a title="Gravity"/>
<a title="Orbit"/>
<a title="Formation and evolution of the solar system"/>
<a title="Molecular cloud"/>
<a title="Orbit"/>
<a title="Mass"/>
<a title="Planets"/>
<a title="Plane of the ecliptic"/>
<a title="Mercury (planet)"/>
<a title="Venus"/>
<a title="Earth"/>
<a title="Mars"/>
<a title="Terrestrial planets"/>
<a title="Gas giant"/>
<a title="Jupiter"/>
<a title="Saturn"/>
<a title="Uranus"/>
<a title="Neptune"/>
<a title="Volatiles"/>
<a title="Asteroid belt"/>
<a title="Kuiper belt"/>
<a title="Scattered disc"/>
<a title="Trans-Neptunian object"/>
<a title="Volatiles"/>
<a title="Ceres (dwarf planet)"/>
<a title="Pluto"/>
<a title="Haumea (dwarf planet)"/>
<a title="Makemake (dwarf planet)"/>
<a title="Eris (dwarf planet)"/>
<a title="Dwarf planets"/>
<a title="Small Solar System body"/>
<a title="Comet"/>
<a title="Centaur (minor planet)"/>
<a title="Interplanetary dust"/>
<a title="Natural satellite"/>
<a title="Moon"/>

Alternatively, I can enter one of the following queries in the YQL console. Sometimes with the YQL console, Wikipedia returns a "Redirected to a robots.txt restricted URL" error.

Each query gets the content of all anchor (a) nodes that have a title attribute:

SELECT title FROM html
WHERE url="http://en.wikipedia.org/wiki/Solar_System"
AND xpath="//a[@title]"
SELECT content FROM html
WHERE url="http://en.wikipedia.org/wiki/Solar_System"
AND xpath="//a[@title]"
SELECT title, content FROM html
WHERE url="http://en.wikipedia.org/wiki/Solar_System"
AND xpath="//a[@title]"


Every Wikipedia page includes the popular jQuery JavaScript library. In some browser configurations it's possible to use the following jQuery script. I've tested it with Firefox 3.6, the Firebug 1.7.3 console, and jQuery 1.5 . It will provide a list of the titles of the first 450 anchor (a) nodes that have a title. It uses jQuery search syntax rather than the XPath syntax used by YQL.

$('div.mw-content-ltr a[title]').each( function(index) {
  if (index < 450) {

Derive an inheritance hierarchy

The final Xholon inheritance hierarchy that I've ended up with (for now) is:

<?xml version="1.0" encoding="UTF-8"?>
This inheritance hierarchy is based on wikipedia content.
 * http://en.wikipedia.org/wiki/Solar_System
 * YQL queries of the wikipedia page
 * wikipedia has "Moons_of_Jupiter" etc.
 * wikipedia has "Rings_of_Jupiter" etc.
<_-.solarsystem> <!-- this is a forest -->

    <StarSystem> <!-- = Star(s) + PlanetarySystem -->
    <PlanetarySystem/> <!-- contains non-stellar objects -->
  <!-- Collections of astronomical objects. In wikipedia "Stars", "Planets" redirect to "Star", "Planet". -->
  <!-- Space separates astronomical objects from each other, and connects them together. -->
  <!-- properties -->

Planet is a type of (subclass of) AstronomicalObject. SolarSystem is a type of StarSystem, which in turn is a type of AstronomicalObject. This is consistent with the Wikipedia description of AstronomicalObject.

It uses the terminology found on the Wikipedia page and conveniently listed in the YQL query results. Terms with spaces and underscores have been systematically converted into camel case (ex: Star_system becomes StarSystem). In this simple model, individual planets, stars, dwarf planets, moons, and asteroids, will be identified using Xholon role names rather than classes (ex: &lt;Planet roleName="Earth"/>). Thus, Earth will be an instance of Planet which is a type of AstronomicalObject. The result is a file ready for use in a Xholon application, that builds on the domain knowledge and consensus of the large Wikipedia community.

Test the inheritance hierarchy

You can test this inheritance hierarchy using the online Chameleon Xholon app. It requires Java. Chameleon is a Xholon app that does nothing (yet).

Test steps:

  1. When Chameleon has loaded, select File > Open from the menu.
  2. In the tree structure, navigate to Model > InheritanceHierarchy > XholonClass .
  3. Select the above XML text (starting and ending with, and including, <_-.solarsystem> ).
  4. Drag the text to the XholonClass node in the Xholon tree. Or copy and paste the text. To paste something in Xholon, right-click on the node and select Edit > Paste Last Child .
  5. Press the Refresh button.
  6. Navigate again to Model > InheritanceHierarchy > XholonClass . You should see the classes of the Solar System inheritance hierarchy as children of XholonClass.

Right-click on any of these new nodes, and select Search Engine > Search . The corresponding Wikipedia page will load in your browser.

This tutorial started with using Wikipedia to gather information. We then created a Xholon inheritance hierarchy, which we tested by loading it into a running Xholon app. We came full circle, back to Wikipedia, by using the new nodes to query additional Wikipedia pages.

The next tutorial walks through creating the composite structure for a Xholon app based on the classes in the inheritance hierarchy.

return to main page