<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: How to create a mashup and not die trying&#8230;</title>
	<atom:link href="http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/feed/" rel="self" type="application/rss+xml" />
	<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/</link>
	<description>Random musings about nothing in particular</description>
	<pubDate>Mon, 06 Oct 2008 21:18:52 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: Tim</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-1202</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Fri, 01 Feb 2008 01:12:39 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-1202</guid>
		<description>“What is the best way to get information from Internet automatically?”

I have had good success with the free iMacros for Firefox extension

https://addons.mozilla.org/en-US/firefox/addon/3863

http://wiki.imacros.net/Data_Extraction

Tim</description>
		<content:encoded><![CDATA[<p>“What is the best way to get information from Internet automatically?”</p>
<p>I have had good success with the free iMacros for Firefox extension</p>
<p><a href="https://addons.mozilla.org/en-US/firefox/addon/3863" rel="nofollow">https://addons.mozilla.org/en-US/firefox/addon/3863</a></p>
<p><a href="http://wiki.imacros.net/Data_Extraction" rel="nofollow">http://wiki.imacros.net/Data_Extraction</a></p>
<p>Tim</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris_C</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-505</link>
		<dc:creator>Chris_C</dc:creator>
		<pubDate>Tue, 11 Dec 2007 06:15:18 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-505</guid>
		<description>Nice work Franchu.  We are well aware of the Mac client issue and will be addressing it.  Timing is uncertain at this point.

Additionally, you can create a REST robot on openkapow that can return data to you in multiple formats without rewriting the robot.  Formats include:
- REST (XML)
- JSON
- CSV
- HTML
- XHTML</description>
		<content:encoded><![CDATA[<p>Nice work Franchu.  We are well aware of the Mac client issue and will be addressing it.  Timing is uncertain at this point.</p>
<p>Additionally, you can create a REST robot on openkapow that can return data to you in multiple formats without rewriting the robot.  Formats include:<br />
- REST (XML)<br />
- JSON<br />
- CSV<br />
- HTML<br />
- XHTML</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Franchu</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-140</link>
		<dc:creator>Franchu</dc:creator>
		<pubDate>Fri, 15 Jun 2007 10:23:00 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-140</guid>
		<description>Hi!

Well, I am afraid there is no easy answer to your question.

In an ideal world, the page would be written in XHTML or following some accessibility standards and it would be easy to extract semantic content from the page.

In the former case you would need an XML parser and using XPath you can extact the node of the document that contains the information you are interested in.

In the later, you would have to parse the string, looking for the delimiters of the content you want to extract.

The only thing I can do, is suggest you to search for more information on web scraping (that's the technical name of what you are trying to do). Keep in mind that the HttpClient the only thing that does is retrieve the web page and give it to you as a Stream or as a String. After this point, it is up to you to parse the content with web scrapping techniques.

Hope this reply was useful to put you on track towards the answer :)</description>
		<content:encoded><![CDATA[<p>Hi!</p>
<p>Well, I am afraid there is no easy answer to your question.</p>
<p>In an ideal world, the page would be written in XHTML or following some accessibility standards and it would be easy to extract semantic content from the page.</p>
<p>In the former case you would need an XML parser and using XPath you can extact the node of the document that contains the information you are interested in.</p>
<p>In the later, you would have to parse the string, looking for the delimiters of the content you want to extract.</p>
<p>The only thing I can do, is suggest you to search for more information on web scraping (that&#8217;s the technical name of what you are trying to do). Keep in mind that the HttpClient the only thing that does is retrieve the web page and give it to you as a Stream or as a String. After this point, it is up to you to parse the content with web scrapping techniques.</p>
<p>Hope this reply was useful to put you on track towards the answer <img src='http://franchu.net/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: discoverall</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-139</link>
		<dc:creator>discoverall</dc:creator>
		<pubDate>Fri, 15 Jun 2007 04:53:59 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-139</guid>
		<description>Hello Franchu ,
I asked the question "What is the best way to get information from Internet automatically?"   on J2EE free course...
Thank you for your witting.
Now I have once more question about this.
I have used  HttpClient from http://jakarta.apache.org/commons/httpclient/
and I can download webpage. But I would like to get some part in that page. 
Example: I have a link to a news from other web. But I just want to get only the news. not get the banner, advertise...
How can I do that.
Thank you very much.</description>
		<content:encoded><![CDATA[<p>Hello Franchu ,<br />
I asked the question &#8220;What is the best way to get information from Internet automatically?&#8221;   on J2EE free course&#8230;<br />
Thank you for your witting.<br />
Now I have once more question about this.<br />
I have used  HttpClient from <a href="http://jakarta.apache.org/commons/httpclient/" rel="nofollow">http://jakarta.apache.org/commons/httpclient/</a><br />
and I can download webpage. But I would like to get some part in that page.<br />
Example: I have a link to a news from other web. But I just want to get only the news. not get the banner, advertise&#8230;<br />
How can I do that.<br />
Thank you very much.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
