SEO for Angulajs ajax loaded contents

faceFore

How to detect Google & other search engine crawlers, bots and spiders

Tweet


Angularjs & modern Javascript frameworks are using AJAX calls to server, in order to fill their views/templates dynamically with JSON data returned from server.

When a browser receive content from web server, it execute all the scripts and render the web page. And it makes all the content visible to be seen by human. But web crawler can't execute all the scripts, so most of the content are not visible to a web crawler.
If your website contents not visible to web crawlers then your site not exists on the Internet. Because AJAX applications generate content dynamically, to make crawler to see your contents then you will have to create prerendered HTML snapshots and serve them to the crawler.

In order to make your AJAX application crawlable, your site has to follow the AJAX crawling scheme. And let the crawler know that your website is following the Ajax crawling scheme.
To achieve that you have to use hashbang (#!) in your Urls, the hash fragment of Url must start with exclamation mark (!). For example:


www.yoursite.com/index.php/#!/some-hash-fargment-part

If you are using Angularjs, you should use the following code in the App route configuration:

$locationProvider.hashPrefix('!');

If you are using Html5 mode, then you have to include the following meta tag in your website head section.

For Angularjs, you have to enable the HTML5 mode, the bellow line of code, in the App route configuration, will do that.

$locationProvider.html5Mode(true);

And in the Head section you will have to set the site base reference (normally site root), add the tag:

The above mentioned steps will alert the search engine crawler that the site is using Ajax crawling scheme.
When the crawler sees that the site is using Ajax contents, then it cancels the request and sends back a new request, with a special token string _escaped_fragment_ in the request URI.
If the site is using hashbang Url then the hashbang is replaced with _escaped_fragment_, for example, the above mentioned Url will become like this:

www.yoursite.com/index.php/?_escaped_fragment_=/some-hash-fargment-part

If the site is using Html5 mode then the sample Url will become like this:

www.yoursite.com/index.php/some-hash-fargment-part?_escaped_fragment_=

When the web server receives the request, then it has to return the content as HTML snapshot for the requested page.
Note: The ?_escaped_fragment_= is passed in query string by Google, Bing/Yahoo and twitter, I did not found yet, about other search engine who is using this method.

2. Now at your web server, you have to handle the request for the URL which has _escaped_fragment_ in their query string.

First you have to detect if the request is from a human or a crawler bot. If it is from a human then the server will return the normal content and web browser will render it. But if it is from a web crawler then the server has to return the content in form of prerendered HTML snapshot.

How to detect if the visitor is a crawler bot, not a human?
Here I am going to share with you the PHP script that will detect the crawler bot.
I use PHP as server side language, because in my view PHP is most mature, secure, time tested and proven with server market share of more than 80%. And JS is best for client side.

The script first check if the request query string contains _escaped_fragment_, if not then it will check for the user agent's ID string (mostly active bots that I found). You can add as much ID strings if you have found more, but better include those that are relevant.



$uri='';

if(isset($_GET['_escaped_fragment_'])){

$uri = $_GET['_escaped_fragment_']; // use this line if you are using hashbang

//$uri = $_SERVER['REQUEST_URI']; // for HTML5 mode URL

//extract only the route path string without the starting slash

$uri = substr($uri,1,strrpos($_SERVER['REQUEST_URI'],'?')-1);

// your process here

}

else if(crawlerBot()) { //check if other crawler bot which does not use _escaped_fragment_

$uri = $_SERVER['REQUEST_URI'];

//process here

}

function crawlerBot(){

//Check for Crawler's ID string in HTTP user agent
// Returns true if found
// You can add more ID strings whenever you find
// You can find about the crawler user agents by looking at your website log.

$bots = array(

'Googlebot','bingbot','Baiduspider','Twitterbot','Yahoo! Slurp','facebookexternalhit',
'msnbot','YandexBot','AhrefsBot','DuckDuckGo-Favicons-Bot','Sogou web
spider','Exabot','MJ12bot','proximic','TurnitinBot','uMBot','XoviBot'
);

foreach($bots as $bot){
if( stripos( $_SERVER['HTTP_USER_AGENT'], $bot ) !== false ) return true;
}

return false;
}



If you want your AJAX based website content appear in search engine results, then you have to create static HTML snapshots for all the dynamic pages of your site. The HTML snapshots get indexed by search engines and will appear in SERP.

You can get our script, upload it to your site root, it will create HTML snapshots on fly, will cache the snapshots, and will serve them to search engine crawler.


@facefore