Skip to content Skip to sidebar Skip to footer

Scraping Html From Google Translate

I want to translate a string using Google Translator. My sample string is 'this is my string'. I want to use HTML Agility Pack to parse HTML documents. I tried this: using HtmlAgil

Solution 1:

Basic example using HTML Agility Pack

using System;
using HtmlAgilityPack;    
classTraslator
    {
        privatestring url;
        private HtmlWeb web;
        private HtmlDocument htmlDoc;

        publicTranslator(string langPair) // LangPair = "SL|TL" ( Source Lang | Target Lang - Ex.: "en|pt"
        {
            this.url = "http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair=" + langPair;
            this.web = new HtmlWeb();
            this.htmlDoc = new HtmlDocument();
        }

        publicstringTranslate(string input)
        {
            this.htmlDoc = web.Load(String.Format(this.url, Uri.EscapeUriString(input)));
            HtmlNode htmlNode = htmlDoc.DocumentNode.SelectSingleNode("//*[@id=\"result_box\"]");
            return htmlNode.InnerText;
        }
    }

Whats wrong in your example: Just url used... try inspect the document.Text prop to get the html received from webGet... u will se that span.result_box will be empty.

Solution 2:

Rather than relying on screen-scraping, you should consider looking into using the API that google makes available for the translate service.

Some documentation can be found here

Update:

I belive your problems with screen-scraping approach may be that the translate application uses Ajax to call the server-side and retrieve the translation. The page you get when downloading using HtmlWeb is merely the JS application, it doesn't actually contain the translation. That doesn't get filled in until after a call has been made from the page to the server.

Post a Comment for "Scraping Html From Google Translate"