PHP: XML data extraction using regular expressions — page two

This is page two of an article about extracting XML data using regular expressions. Page one was about extracting the data from a single element. Page three is about extracting element attributes. This page is about extracting multiple elements with the same name into an array.

The source code for the functions described in this article can be downloaded on page one.

Matching multiple XML elements

Here is the code for a function I've written which builds an array of all elements in an XML file that have a specified name:

function element_set($element_name, $xml, $content_only = false) {
    if ($xml == false) {
        return false;
    }
    $found = preg_match_all('#<'.$element_name.'(?:\s+[^>]+)?>' .
            '(.*?)</'.$element_name.'>#s',
            $xml, $matches, PREG_PATTERN_ORDER);
    if ($found != false) {
        if ($content_only) {
            return $matches[1];  //ignore the enlosing tags
        } else {
            return $matches[0];  //return the full pattern match
        }
    }
    // No match found: return false.
    return false;

This function works very similarly to value_in, but this time an array of results is returned instead of a single string. You provide the name of the element you want to search for, an XML sample to search through, and tell the function whether to preserve or discard the initial enclosing tags for each result returned.

Example of use

This time consider a different XML sample:

<catalogue version="2.0" gen="TatBase2000">
    <item>
        <name>Steel nails, 10-pack, rusty</name>
        <price>€3,99</price>
        <dimensions>
            <depth>70mm</depth>
            <width>6mm</width>
        </dimensions>
    </item>
    <item>
        <name>Box of vinyl LPs, water damaged</name>
        <price>¥356</price>
        <dimensions>
            <width>40cm</width>
            <height>40cm</height>
            <depth>20cm</depth>
        </dimensions></item>
    <item>
        <name>Crocodile Dundee, Betamax tape</name>
        <price>$1.00</price>
        <dimensions>
            <width>9.5cm</width>
            <height>15.5cm</height>
            <depth>2.5cm</depth>
        </dimensions>
    </item>
    <item>
        <name>Bangers &amp; Mash, cold</name>
        <price>73p</price>
    </item>
</catalogue>

The above XML has more than one item element, so the value_in function is of no use. The elment_set function, however, can produce an array that contains each item element in the XML sample. Call element_set like this:

$item_set = element_set('item', $xml);

Now $item_set should be an array which contains four strings, each one containing the content of an item element from the XML sample. PHP's print_r function can confirm the content of the $item_set array like this:

print_r($item_set);
Array
(
    [0] => <item>
        <name>Steel nails, 10-pack, rusty</name>
        <price>€3,99</price>
        <dimensions>
            <depth>70mm</depth>
            <width>6mm</width>
        </dimensions>
    </item>

    [1] => <item>
        <name>Box of vinyl LPs, water damaged</name>
        <price>¥356</price>
        <dimensions>
            <width>40cm</width>
            <height>40cm</height>
            <depth>20cm</depth>
        </dimensions>
    </item>

    [2] => <item>
        <name>Crocodile Dundee, Betamax tape</name>
        <price>$1.00</price>
        <dimensions>
            <width>9.5cm</width>
            <height>15.5cm</height>
            <depth>2.5cm</depth>
        </dimensions>
    </item>

    [3] => <item>
        <name>Bangers &amp; Mash, cold</name>
        <price>73p</price>
    </item>
)

(I've added newlines and tabs to make the above output more readable.)

By default, element_set preserves the enclosing tags of the named element (which is the opposite behaviour to the value_in function). Pass a value of true as the third parameter to element_set if you want to discard them instead.

Now that we have an array that contains the content of each item element, we can use value_in to pick out the sub-elements we're interested in. If you wanted to display all of the prices one after the other, you could loop through the array using foreach like this:

foreach ($item_set as $item) {
    $name = value_in('name', $item);
    $price = value_in('price', $item);
    echo '<p>Price of '.html_entity_decode($name).
            ' is '.html_entity_decode($price).'</p>';
}

Note that we need to use PHP's html_entity_decode function (available after PHP 4.3) to decode any entities in the XML, such as the ampersand entity in "Bangers &amp; Mash". The resulting output looks like this:

<p>Price of Steel nails, 10-pack, rusty is €3,99</p>
<p>Price of Box of vinyl LPs, water damaged is ¥356</p>
<p>Price of Crocodile Dundee, Betamax tape is $1.00</p>
<p>Price of Bangers & Mash, cold is 73p</p>

Using element_set to build an array of elements, and then looping through the array and using value_in to further extract content if necessary, you can very quickly produce HTML that contains the data you want from an XML file.