<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How to create a mashup and not die trying&#8230;</title>
	<atom:link href="http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/feed/" rel="self" type="application/rss+xml" />
	<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/</link>
	<description>Random musings about nothing</description>
	<lastBuildDate>Wed, 10 Mar 2010 11:40:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Rosario Lidke</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/comment-page-1/#comment-29472</link>
		<dc:creator>Rosario Lidke</dc:creator>
		<pubDate>Wed, 10 Mar 2010 11:40:08 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-29472</guid>
		<description>regards !! incredibly good post</description>
		<content:encoded><![CDATA[<p>regards !! incredibly good post</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Franchu</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/comment-page-1/#comment-26217</link>
		<dc:creator>Franchu</dc:creator>
		<pubDate>Mon, 14 Dec 2009 17:49:50 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-26217</guid>
		<description>I have just received an email from OpenKapow telling that they have discontinued their services. Much of what is described in this post is valid as a concept even if the tools themselves have changed.

Dear openkapow user,

The free openkapow servers has now been stopped and instead we have
launched the new openkapow discussion forum on openkapow.com.

At the same time we have lowered the cost for running robots on the
new pay-as-you-go StrikeIron service on
http://cts.vresp.com/c/?KapowTechnologies/d2e6342b3a/c5c156124f/b4af4c0fa2
and we hope these new prices will be attractive and cost-effective for
most of you. We have at the same time upgraded StrikeIron to run our
latest release 7.1 with improved usability and support for flash and
complex Web sites based on Google Web Toolkit and other AJAX toolkits.


We encourage you to sign up for a free trial and receive 1000 free
StrikeIron hits to start with. Sign up for your free trial now at
http://cts.vresp.com/c/?KapowTechnologies/d2e6342b3a/c5c156124f/25de14810c.


Best regards

Stefan Andreasen
CTO &amp; Founder
Kapow Technologies</description>
		<content:encoded><![CDATA[<p>I have just received an email from OpenKapow telling that they have discontinued their services. Much of what is described in this post is valid as a concept even if the tools themselves have changed.</p>
<p>Dear openkapow user,</p>
<p>The free openkapow servers has now been stopped and instead we have<br />
launched the new openkapow discussion forum on openkapow.com.</p>
<p>At the same time we have lowered the cost for running robots on the<br />
new pay-as-you-go StrikeIron service on<br />
<a href="http://cts.vresp.com/c/?KapowTechnologies/d2e6342b3a/c5c156124f/b4af4c0fa2" rel="nofollow">http://cts.vresp.com/c/?KapowTechnologies/d2e6342b3a/c5c156124f/b4af4c0fa2</a><br />
and we hope these new prices will be attractive and cost-effective for<br />
most of you. We have at the same time upgraded StrikeIron to run our<br />
latest release 7.1 with improved usability and support for flash and<br />
complex Web sites based on Google Web Toolkit and other AJAX toolkits.</p>
<p>We encourage you to sign up for a free trial and receive 1000 free<br />
StrikeIron hits to start with. Sign up for your free trial now at<br />
<a href="http://cts.vresp.com/c/?KapowTechnologies/d2e6342b3a/c5c156124f/25de14810c" rel="nofollow">http://cts.vresp.com/c/?KapowTechnologies/d2e6342b3a/c5c156124f/25de14810c</a>.</p>
<p>Best regards</p>
<p>Stefan Andreasen<br />
CTO &#038; Founder<br />
Kapow Technologies</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/comment-page-1/#comment-72</link>
		<dc:creator>Tim</dc:creator>
		<pubDate>Fri, 01 Feb 2008 01:12:39 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-72</guid>
		<description>“What is the best way to get information from Internet automatically?”

I have had good success with the free iMacros for Firefox extension

https://addons.mozilla.org/en-US/firefox/addon/3863

http://wiki.imacros.net/Data_Extraction

Tim</description>
		<content:encoded><![CDATA[<p>“What is the best way to get information from Internet automatically?”</p>
<p>I have had good success with the free iMacros for Firefox extension</p>
<p><a href="https://addons.mozilla.org/en-US/firefox/addon/3863" rel="nofollow">https://addons.mozilla.org/en-US/firefox/addon/3863</a></p>
<p><a href="http://wiki.imacros.net/Data_Extraction" rel="nofollow">http://wiki.imacros.net/Data_Extraction</a></p>
<p>Tim</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris_C</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/comment-page-1/#comment-73</link>
		<dc:creator>Chris_C</dc:creator>
		<pubDate>Tue, 11 Dec 2007 06:15:18 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-73</guid>
		<description>Nice work Franchu.  We are well aware of the Mac client issue and will be addressing it.  Timing is uncertain at this point.

Additionally, you can create a REST robot on openkapow that can return data to you in multiple formats without rewriting the robot.  Formats include:
- REST (XML)
- JSON
- CSV
- HTML
- XHTML</description>
		<content:encoded><![CDATA[<p>Nice work Franchu.  We are well aware of the Mac client issue and will be addressing it.  Timing is uncertain at this point.</p>
<p>Additionally, you can create a REST robot on openkapow that can return data to you in multiple formats without rewriting the robot.  Formats include:<br />
- REST (XML)<br />
- JSON<br />
- CSV<br />
- HTML<br />
- XHTML</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Franchu</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/comment-page-1/#comment-71</link>
		<dc:creator>Franchu</dc:creator>
		<pubDate>Fri, 15 Jun 2007 10:23:00 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-71</guid>
		<description>Hi!

Well, I am afraid there is no easy answer to your question.

In an ideal world, the page would be written in XHTML or following some accessibility standards and it would be easy to extract semantic content from the page.

In the former case you would need an XML parser and using XPath you can extact the node of the document that contains the information you are interested in.

In the later, you would have to parse the string, looking for the delimiters of the content you want to extract.

The only thing I can do, is suggest you to search for more information on web scraping (that&#039;s the technical name of what you are trying to do). Keep in mind that the HttpClient the only thing that does is retrieve the web page and give it to you as a Stream or as a String. After this point, it is up to you to parse the content with web scrapping techniques.

Hope this reply was useful to put you on track towards the answer :)</description>
		<content:encoded><![CDATA[<p>Hi!</p>
<p>Well, I am afraid there is no easy answer to your question.</p>
<p>In an ideal world, the page would be written in XHTML or following some accessibility standards and it would be easy to extract semantic content from the page.</p>
<p>In the former case you would need an XML parser and using XPath you can extact the node of the document that contains the information you are interested in.</p>
<p>In the later, you would have to parse the string, looking for the delimiters of the content you want to extract.</p>
<p>The only thing I can do, is suggest you to search for more information on web scraping (that&#8217;s the technical name of what you are trying to do). Keep in mind that the HttpClient the only thing that does is retrieve the web page and give it to you as a Stream or as a String. After this point, it is up to you to parse the content with web scrapping techniques.</p>
<p>Hope this reply was useful to put you on track towards the answer <img src='http://franchu.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: discoverall</title>
		<link>http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/comment-page-1/#comment-70</link>
		<dc:creator>discoverall</dc:creator>
		<pubDate>Fri, 15 Jun 2007 04:53:59 +0000</pubDate>
		<guid isPermaLink="false">http://franchu.net/2007/06/05/how-to-create-a-mashup-and-not-die-trying/#comment-70</guid>
		<description>Hello Franchu ,
I asked the question &quot;What is the best way to get information from Internet automatically?&quot;   on J2EE free course...
Thank you for your witting.
Now I have once more question about this.
I have used  HttpClient from http://jakarta.apache.org/commons/httpclient/
and I can download webpage. But I would like to get some part in that page.
Example: I have a link to a news from other web. But I just want to get only the news. not get the banner, advertise...
How can I do that.
Thank you very much.</description>
		<content:encoded><![CDATA[<p>Hello Franchu ,<br />
I asked the question &#8220;What is the best way to get information from Internet automatically?&#8221;   on J2EE free course&#8230;<br />
Thank you for your witting.<br />
Now I have once more question about this.<br />
I have used  HttpClient from <a href="http://jakarta.apache.org/commons/httpclient/" rel="nofollow">http://jakarta.apache.org/commons/httpclient/</a><br />
and I can download webpage. But I would like to get some part in that page.<br />
Example: I have a link to a news from other web. But I just want to get only the news. not get the banner, advertise&#8230;<br />
How can I do that.<br />
Thank you very much.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
