onsdag den 3. september 2014

A small processor to make sure page URL's are SEO friendly

If you have been in the business of creating web solutions for a while, you have most likely been asked about what to do to improve the SEO-friendliness of the current project you are working on.

Since SEO is not always clearly defined, I'll not go into details on everything that can be done, but instead pinpoint two things.

These two URL's are not the same, when being indexed by a search engine:

http://www.test.org/Some/Page
http://www.test.org/Some/Page/

Also, there two links are not the same:

http://www.test.org/Some/Page
http://www.test.org/Some/PaGe

Since users are able to type URL's, we cannot ensure that they always type them the way we want them to - but we can force the browser (and search engines) to act like we want.

One way (there are most likely several other ways) to do this, is to create a small processor, that looks at the incoming URL, and changes it a bit if needed.

So here we go - create a new class in Visual Studio, and enter the following code:

public class SeoUrlProcessor : HttpRequestProcessor
{
    public override void Process(HttpRequestArgs args)
    {
        if (args == null || args.Context == null)
        {
            return;
        }

        if (Sitecore.Context.Item == null)
        {
            return;
        }

        if (!Sitecore.Context.PageMode.IsNormal)
        {
            return;
        }

        // Extend this list with other sites you find will be broken by this.
        if (Sitecore.Context.Site.Name == "shell" || Sitecore.Context.Site.Name == "publishing")
        {
            return;
        }

        bool urlHasBeenChanged = false;
        string incomingUrl = args.Context.Request.RawUrl;

        // In case language embedding is used, this is needed to make sure incomingUrl contains the language.
        // Otherwise, the comparing of the URL's will not match up.
        if (!string.IsNullOrEmpty(args.Context.Request.Url.Query))
        {
            incomingUrl = incomingUrl.Replace(args.Context.Request.Url.Query, string.Empty);
        }

        UriBuilder builder = new UriBuilder(args.Context.Request.Url)
        {
            Path = incomingUrl
        };

        // This works like a one-time flip-bit, once it has been set to true, it stays as true.
        urlHasBeenChanged |= HandleTrailingSlash(builder);
        urlHasBeenChanged |= HandleCasing(builder);

        if (urlHasBeenChanged)
        {
            RedirectToUrl(args, builder.ToString());
        }
    }

    private static bool HandleTrailingSlash(UriBuilder builder)
    {
        if (!builder.Path.EndsWith("/"))
        {
            return false;
        }

        builder.Path = builder.Path.TrimEnd('/');

        return true;
    }

    private static bool HandleCasing(UriBuilder builder)
    {
        string destinationUrl = LinkManager.GetItemUrl(Sitecore.Context.Item);

        if (!builder.Path.Equals(destinationUrl, StringComparison.InvariantCultureIgnoreCase))
        {
            return false;
        }

        if (builder.Path == destinationUrl)
        {
            return false;
        }

        builder.Path = destinationUrl;

        return true;
    }

    private static void RedirectToUrl(HttpRequestArgs args, string url)
    {
        args.Context.Response.Clear();
        args.Context.Response.Headers.Add("Location", url);
        args.Context.Response.Status = "301 - Moved Permanently";
        args.Context.Response.StatusCode = 301;
        args.Context.Response.End();

        args.AbortPipeline();
    }
}

Save the file.

Let me explain the HandleCasing function, since that can be a bit confusing.

First we compare the URL's without looking at casing at all - if they aren't identical, we can be pretty sure the user has hit some kind of aliasing, since they are standing on this item, but the incoming URL is totally different - in that case, we don't wanna redirect the user anywhere.

Then, if they are identical, we compare them again - this time look at if the are completly identical - and if the aren't, that means that the casing is wrong somewhere (like the user typing "products" instead of "Products").

In that case, we replace the Path part of the URL with the one Sitecore's link manager generated, which ensures that every view of the page has the URL typed as it is inside Sitecore.

Finally, you just need to create an include file, for Sitecore to run this.
So create an XML file, and paste the following test into it, replacing Namespace, Classname and Assembly with your own values:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:x="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <pipelines>
      <httpRequestBegin>
        <processor type="Namespace.Classname, Assembly" patch:after="processor[@type='Sitecore.Pipelines.HttpRequest.ItemResolver, Sitecore.Kernel']"/>
      </httpRequestBegin>
    </pipelines>
  </sitecore>
</configuration>

Now you are done. :)

Try accessing a few pages, where you try adding a tailing slash to the URL, and where you try mixing the casing differently than inside Sitecore.

Ingen kommentarer:

Send en kommentar