Recently I wanted to create a page that would list the backlinks to certain sites that I had chosen. Yahoo always seems to have the most comprehensive list of backlinks and luckily they also make available a series of APIs that enables you to access their data. Using the Site Explorer Inbound Links API, and an example of a script towards the bottom of that page, I was able to put together a script that closely mirrors the sort of results you would get if you entered a site’s URL into Yahoo’s Site Explorer.
The only problem with these results is that you often end up with lots of links from the same domain if, for example, you have a link to your site in a forum signature, or the link appears in a sidebar of a blog with lots of pages. I wanted to manipulate the results sent back from Yahoo to only display the first one or two links from any one domain regardless of how many links there were.
I started first with the example from Yahoo used to extract the information (you’ll need to get an API key from Yahoo to get this to work on your own site):
$api_service_url = "http://search.yahooapis.com/SiteExplorerService/V1/inlinkData";
$apiid = "your_api_key_goes_here:_get_it_from_yahoo";
$query = $input_url; // can be hard-coded or receive a value from a function
$entire_site = ""; // "1" to provide results for the entire site
$omit_inlinks = "domain";
$linksperrequest = 100; // 100 is max value
$startposition = 1;
$request_url = $api_service_url."?appid=".$apiid."&query=".urlencode($query)."&entire_site=".$entire_site."&omit_inlinks=".$omit_inlinks."&output=php";
$currentpos = 0;
while ($currentpos++ >= 0) {
$requrl = sprintf("%s&start=%s&results=%s", $request_url, ($currentpos-1)*$linksperrequest+$startposition, $linksperrequest);
if (($content = file_get_contents($requrl)) === FALSE ) {
echo "HTTP error: $requrl";
exit;
} else {
$data = unserialize($content);
if (array_key_exists("ResultSet", $data)) {
for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
$url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL
$title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink
}
} else {
echo "Error: Bad response from server";
}
if (sizeof($data["ResultSet"]["Result"]) < $linksperrequest) break;
}
}
The API only lets you process 100 results per request, so the script is set up to cycle through 100 a results at a time until it reaches the end. As it does so, the URL and page title for the site linking back to you are captured with:
$url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL $title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink
Originally, I thought I might be able to use PHP’s parse_url function to just extract the domain portion of each link and then feed that into array_unique to remove all the duplicate occurrences of a domain, but that didn’t give me the ability to set a certain limit for how many links I wanted to permit from each domain and also completely removed the unique portion of the link (i.e., everything after the domain name).
So I modified the Yahoo script like so:
for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
$url = $data["ResultSet"]["Result"][$i]["Url"]; // backlink URL
$title = $data["ResultSet"]["Result"][$i]["Title"]; // page title for the backlink
$domain = 'http://'.parse_url($url, PHP_URL_HOST);
$backlinks[$domain][] = array($url, $title);
}
and then with a bit of help from Tony Aslett and Chris..S at CSS Creator and included:
define('BACKLINK_LIMIT',2);
define('BACKLINK_TRUNCATE',1);
define('BACKLINK_ALL',0);
foreach ($backlinks as $domain => $links) {
if (count($links) > BACKLINK_LIMIT) $backlinks[$domain] = array_slice($links, 0, BACKLINK_TRUNCATE);
}
sort($backlinks[$domain]);
function print_backlinks($domain, $links, $num) {
$limit = $num ? min($num,count($links)) : count($links);
for ($i=0; $i < $limit; $i++) {
list($url,$title) = $links[$i];
echo '<li><a href="'.$url.'">'.$title.'</a></li>';
}
}
echo '<ul>';
foreach ($backlinks as $domain => $links) {
if (count($links) > BACKLINK_LIMIT) {
print_backlinks($domain, $links, BACKLINK_TRUNCATE);
} else {
print_backlinks($domain, $links, BACKLINK_ALL);
}
}echo '</ul>';
This defines some constants: BACKLINK_LIMIT – the number past which I want to manipulate entries; BACKLINK_TRUNCATE – the number of links to display from the domains that have more than the number of links specified in BACKLINK_LIMIT; BACKLINK_ALL – self-explanatory.
Then a foreach loop filters all the link results from the Yahoo API, and if there are more links for each domain than the limit specified, array_slice removes all those links after the point set with BACKLINK_TRUNCATE.
After sorting the now filtered array, it is run through another foreach loop to print the results to the screen using a function called print_backlinks except this time if there are more links than the chosen limit, BACKLINK_TRUNCATE is passed to the function which then imposes that value as the upper limit of passes for the loop (rather than looping through all the results).
The end result should now look like:
function backLink($input_url) {
$api_service_url = "http://search.yahooapis.com/SiteExplorerService/V1/inlinkData";
$apiid = "your_api_key_goes_here";
$query = $input_url;
$entire_site = "";
$omit_inlinks = "domain";
$linksperrequest = 100;
$startposition = 1;
$request_url = $api_service_url."?appid=".$apiid."&query=".urlencode($query)."&entire_site=".$entire_site."&omit_inlinks=".$omit_inlinks."&output=php";
$currentpos = 0;
while ($currentpos++ >= 0) {
$requrl = sprintf("%s&start=%s&results=%s", $request_url, ($currentpos-1)*$linksperrequest+$startposition, $linksperrequest);
if (($content = file_get_contents($requrl)) === FALSE ) {
echo "HTTP error: $requrl";
exit;
} else {
$data = unserialize($content);
if (array_key_exists("ResultSet", $data)) {
for ($i=0; $i<sizeof($data["ResultSet"]["Result"]); $i++) {
$url = $data["ResultSet"]["Result"][$i]["Url"];
$title = $data["ResultSet"]["Result"][$i]["Title"];
$domain = 'http://'.parse_url($url, PHP_URL_HOST);
$backlinks[$domain][] = array($url, $title);
}
} else {
echo "Error: Bad response from server";
}
if (sizeof($data["ResultSet"]["Result"]) < $linksperrequest) break;
}
}
define('BACKLINK_LIMIT',2);
define('BACKLINK_TRUNCATE',1);
define('BACKLINK_ALL',0);
foreach ($backlinks as $domain => $links) {
if (count($links) > BACKLINK_LIMIT) $backlinks[$domain] = array_slice($links, 0, BACKLINK_TRUNCATE);
}
sort($backlinks[$domain]);
function print_backlinks($domain, $links, $num) {
$limit = $num ? min($num,count($links)) : count($links);
for ($i=0; $i < $limit; $i++) {
list($url,$title) = $links[$i];
echo '<li><a href="'.$url.'">'.$title.'</a></li>';
}
}
echo '<ul>';
foreach ($backlinks as $domain => $links) {
if (count($links) > BACKLINK_LIMIT) {
print_backlinks($domain, $links, BACKLINK_TRUNCATE);
} else {
print_backlinks($domain, $links, BACKLINK_ALL);
}
}
}
So although this script closely resembles the sort of results you would get from Yahoo’s Site Explorer, for me it has three advantages:
- Using an array or by linking it to a database select query, you can display the backlink results for any number of sites, not just one, in the same location, or even on the same page.
- It enables me to not only customise the output of the results but I can also display it in a template of my own choosing which means it can be integrated into client-only sections of websites.
- It enables me to filter out lots of repetitive backlinks to make the overall display more readable, and therefore more usable.














I’m not a coder so bear with me but I stumbled on this script when searching for a way to use the yahoo api to download more than 1000 links. It looks like it will work, I’m just having a tough time understanding where I put in my site name that I’d like to query. I see the yahoo api key portion but, again, not being able to php my way out of a wet paper bag, I’m stumped as to where to insert my site name. Thanks for the help!
Hi Randy, the function requires a URL to be input –
function backLink($input_url)– which you can call from anywhere in your page withbacklink('http://www.example.com')or in my case I used a value from a database query as it looped through the records.Sweet
Very cool man, I just stumbled on this, and it makes me want to jump back into PHP again…
Yahoo exp can get annyoying how it always lists its own domain…
It seems that an error happens for sites with more than 1,000 back links. Every time I try with such a site I get this error message:
failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
And it always happens when start=1001. For sites with fewer links, it always works though. Am I doing something wrong, or is there a problem handling more than 1,000 back links?
@Oscar, the limit for links per request is 100 (set by Yahoo) so you can’t make $linksperrequest more than that. But the script is set up so that it will keep being submitted until no more links are found and the results are appended to each other.
But Yahoo’s official API documentation says that:
“The starting result position to return (1-based). The finishing position (start + results - 1) cannot exceed 1000.”
The script increments the start value in a loop, and every time it reaches at total of over 1,000 (let’s say start=1001&results=100) it crashes for me, because Yahoo is returning a 400 Bad Request error.
I appreciate your help, but I just can’t see how the script would circumvent Yahoo’s 1,000 limit. Have you tested it with a site that has more than 1,000 links?
@Oscar: you’re quite right – I haven’t tested it on a site with more than 1000 links and wasn’t aware of that part of the API docs. Sorry for the confusion.
very usefull article. it is more easy to get back links from Yahoo than Google.
How do sites like linkdiagnosis.com return thousands of backlink results? They say the Yahoo API is used, but I’m kinda stumped.
You’d have to ask them. Looks like an interesting tool though.
Why my domain not yet pagerank?
Enny, this article’s about Yahoo’s Site Explorer API; pagerank is a Google feature - you’d have to ask them.
Hi,
I’ve come across this tool today… Really nice one!
However when I downloaded it and installed at the server (with proper AppId), I keep getting error saying: “failed to open stream: HTTP request failed! HTTP/1.1 999 Rate Limit Exceeded in /inlinks/backlink.php on line 15”
I know this is connected with query limit but even if I change IP (I have dynamic IP connection) nothing happens… Tried this several times and doesn’t work :/ Any ideas what’s wrong?
Well, I guess I’ll parse the XML data using SimpleXML Class on PHP 5, it’s simple and easier. Anybody knows the similar API for Google instead?
Many thanks.
This is a great code that i’m looking for and thanks yahoo for the API.
Try BackLinkStat.com - Get detailed backlinks report of your site for FREE!
@oscar
try the code to stop crashes:
while ( ($currentpos++ >= 0) && ( (($currentpos-1)*$linksperrequest+$startposition) < 999) ) {
works fine for me.