Colon In XML/RSS Messing Up PHP’s SimpleXML

Recently used PHP’s simpleXML to parse through a blogs RSS feed. The parsing worked great and it was simple and clean. Only problem was that the XML nodes that contained colons in the name were being discarded by simpleXML. Example:

<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>

Not sure if this is a bug, error, or what but it was causing me grief. I haven’t been able to find others on the web complaining of this but my own solution was to remove the colons and traverse accordingly. I wrote a function that takes the feed as a string and returns the string with all colons inside tag names removed.

function removeColonsFromRSS($feed) {
    // pull out colons from start tags
    // (<\w+):(\w+>)
    $pattern = '/(<\w+):(\w+>)/i';
    $replacement = '$1$2';
    $feed = preg_replace($pattern, $replacement, $feed);
    // pull out colons from end tags
    // (<\/\w+):(\w+>)
    $pattern = '/(<\/\w+):(\w+>)/i';
    $replacement = '$1$2';
    $feed = preg_replace($pattern, $replacement, $feed);
    return $feed;
}