Macnyt RSS proxy
From WiKim
Contents |
Introduction
Is it possible to create a RSS proxy for the forum on http://www.macnyt.dk?
Macnyt doesn't offer a RSS feed for it's forum, only news. It would be nice to have an RSS feed for the new submissions to the forum on this URL: http://www.macnyt.dk/forum/?page=new_submissions
Status
Beta released http://www.kika.dk/macnyt/macnytforum.php. --Kim Bach 07:56, 11 June 2006 (CEST)
Well the sysop at macnyt.dk didn't really like what I did - and you can't really blame him.
If you inspect the code you can see that it shouldn't put very heavy strain on the server, since it simply does a HTTP GET of the HTML, the rest of the processing is done in PHP, and yes I know, that that could be implemented more efficiently, but heck it's my first PHP project created from scratch. The sysop also claims that scraping is illegal, well I don't think so, it's actually some kind of deep linking, and there is precedence for the legality of that.
My code will remain up (for now), but I urge that you host it locally, and that you use it sparingly. I most admit that I like it myself, it has incresed the usability of macnyt.dk quite a bit for me.
I hope for an official forum RSS feed, but I doubt that it will follow the standards, the current news feed doesn't work with Firefox. --Kim Bach 22:55, 11 June 2006 (CEST)
The current news feed has been fixed, but still no official Forum RSS. I did fix a bug, after a reinstall of the Macnyt server, the code broke, due to an extra w in the URLs, this worked before the update, but I guess that the * alias has been removed. --Kim Bach 11:12, 4 September 2006 (CEST)
Changelog
- Version 0.1.1 No new features, just some clean-up. --Kim Bach 05:49, 18 June 2006 (CEST)
Analysis
Reverse engineering of the Macnyt New Submissions page shows that is is quite simple to create a scraper and to implement it in PHP. Basically looking for these strings:
<td class=\"forum_text\">
<td class=\"forum_headline\"><B><a href=\""
Usage
The script is hosted at http://www.kika.dk/macnyt/macnytforum.php, add it manually to your feeds. Please use it sparingly, so that we don't anger the sysop.
Code
Below is the code (macnytforum.php):
<?php
# Macnyt Danmark Forum RSS feed converter.
#
# Last update:
# 2006-09-04 KB Version 0.1.2 fixed bug in URL, had 4 w's instead of 3! And this broke
# when a new server went online
#
# Copyright Kim Bach, kim(dot)bach(at)gmail.com
#
# Project homepage: http://www.kimbach.org/wiki/index.php/Macnyt RSS proxy
# Version 0.1.2, 04 September 2006
#
# Revision history:
# Date Init Descritpion
# 2006-06-09 KB Created
# 2006-06-11 KB Version 0.1.0 first beta
# 2006-06-18 KB Version 0.1.1 clean up
# 2006-09-04 KB Version 0.1.2 fixed bug in URL, had 4 w's instead of 3! And this broke
# when a new server went online
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
# http://www.gnu.org/copyleft/gpl.html
$page_title = "macnyt";
$meta_descr = "n/a";
$meta_keywd = "n/a";
$macnyt_forum_url = "http://www.macnyt.dk/forum/?page=new_submissions";
$post_link_prefix = "<td class=\"forum_headline\"><B><a href=\"";
$forum_link_prefix = "http://www.macnyt.dk";
$post_description_prefix = "<td class=\"forum_text\">";
$page_title = "macnyt";
$meta_descr = "n/a";
$meta_keywd = "n/a";
if ($handle = @fopen($macnyt_forum_url, "r")) {
$content = "";
while (!feof($handle)) {
$part = fread($handle, 1024);
$content .= $part;
}
fclose($handle);
$lines = preg_split("/\r?\n|\r/", $content); // turn the content in rows
$is_title = false;
$is_descr = false;
$is_keywd = false;
$close_tag = ($xhtml) ? " />" : ">"; // new in ver. 1.01
header("Content-Type: text/xml");
echo("<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>");
echo("<!DOCTYPE rss [<!ENTITY % HTMLlat1 PUBLIC \"-//W3C//ENTITIES Latin 1 for XHTML//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent\">]>");
echo("<rss version=\"2.0\" xml:base=\"http://www.kika.dk/macnyt\">");
echo("<channel>");
$has_header = false;
foreach ($lines as $val) {
if (eregi("<title>(.*)</title>", $val, $title)) {
$page_title = $title[1];
$is_title = true;
echo($page_title);
}
if (eregi("<meta name=\"description\" content=\"(.*)\"([[:space:]]?/)?>", $val, $descr)) {
$meta_descr = $descr[1];
$is_descr = true;
echo($meta_descr);
}
if (eregi("<meta name=\"keywords\" content=\"(.*)\"([[:space:]]?/)?>", $val, $keywd)) {
$meta_keywd = $keywd[1];
$is_keywd = true;
}
if ($is_title && $is_descr && $is_keywd && !$has_header) {
echo("<title>" .$page_title. "</title>");
echo("<link>".$macnyt_forum_url."</link>");
echo("<description>" .$meta_keywd. "</description>");
echo("<language>da</language>");
$has_header = true;
}
if (!$is_headline && eregi($post_link_prefix, $val, $headline)) {
// extract link
// Skip to second instance of double ping
$pingcount = 0;
for ($i = 0; $i < strlen($val); $i++) {
if (substr($val, $i, 1) == '"') {
// double ping found, increase count
$pingcount++;
if ($pingcount == 3) {
$forum_link = substr($val, $i + 1);
$pingcount = 0;
// Find last ping
for ($j = 0; $j < strlen($forum_link); $j++) {
if (substr($forum_link, $j, 1) == '"') {
// double ping found
$forum_link=$forum_link_prefix.substr($val, $i + 1, $j);
break;
}
}
break;
}
}
}
// extract description
// Skip to thrid instance of gt
$gtcount = 0;
for ($i = 0; $i < strlen($val); $i++) {
if (substr($val, $i, 1) == '>') {
// gt found, increase count
$gtcount++;
if ($gtcount == 3) {
$forum_title = substr($val, $i + 1);
$gtcount = 0;
// Find last gt
for ($j = 0; $j < strlen($forum_title); $j++) {
if (substr($forum_title, $j, 1) == '<') {
// lt found
$forum_title = substr($val, $i + 1, $j);
break;
}
}
break;
}
}
}
$is_headline = true;
echo("<item>");
echo("<title>".$forum_title."</title>");
echo("<link><![CDATA[".$forum_link."]]></link>");
}
if ($is_headline && eregi($post_description_prefix, $val, $text)) {
echo("<description>"."<![CDATA[".$val."]]>"."</description>");
//echo("<category domain=\"http://macwiki.kimbach.org/portal/?q=taxonomy/term/5\">Samarbejdspartnere</category>");
//echo("<pubDate>Fri, 21 Apr 2006 03:10:09 +0200</pubDate>");
echo("</item>");
$is_headline = false;
}
}
echo("</channel>");
echo("</rss>");
}
?>

