Scraping Html From Google Translate
Solution 1:
Basic example using HTML Agility Pack
using System;
using HtmlAgilityPack;
classTraslator
{
privatestring url;
private HtmlWeb web;
private HtmlDocument htmlDoc;
publicTranslator(string langPair) // LangPair = "SL|TL" ( Source Lang | Target Lang - Ex.: "en|pt"
{
this.url = "http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair=" + langPair;
this.web = new HtmlWeb();
this.htmlDoc = new HtmlDocument();
}
publicstringTranslate(string input)
{
this.htmlDoc = web.Load(String.Format(this.url, Uri.EscapeUriString(input)));
HtmlNode htmlNode = htmlDoc.DocumentNode.SelectSingleNode("//*[@id=\"result_box\"]");
return htmlNode.InnerText;
}
}
Whats wrong in your example: Just url used... try inspect the document.Text prop to get the html received from webGet... u will se that span.result_box will be empty.
Solution 2:
Rather than relying on screen-scraping, you should consider looking into using the API that google makes available for the translate service.
Some documentation can be found here
Update:
I belive your problems with screen-scraping approach may be that the translate application uses Ajax to call the server-side and retrieve the translation. The page you get when downloading using HtmlWeb
is merely the JS application, it doesn't actually contain the translation. That doesn't get filled in until after a call has been made from the page to the server.
Post a Comment for "Scraping Html From Google Translate"