How to give your Jekyll Site Structured Data for Search with JSON-LD

April 27 Comments Off on How to give your Jekyll Site Structured Data for Search with JSON-LD Category: Feed

You’ve got Jekyll, now you need to get all your structured data ready for Google, archival systems and more with Schema.org markup. Here’s how you can deploy a structured data in a JSON-LD format to your Jekyll site.

A word of warning: this is a living document. Schema.org guidelines aren’t set in stone and even if they were they suffer from bad documentation and frequently lack examples. If you’re interested in some of the challenges I faced, I’ll go into detail at the end of the article. If you want to know why this is worth the work even with these challenges, that is at the bottom as well.

All the suggested code here validates in Google’s structured data checking tool. All work here is, to my best knowledge, correct and I welcome any pull requests to fix samples where I may have made a mistake.

Ok let’s get to the code!

The first thing we need to do is build support for managing JSON-LD in our templates. There’s structured markup that can be done inline, using itemprop, itemscope, and itemtype, but we won’t dive into those right now. We’re just looking at the JSON-LD objects we can embed at page level that will tell any machine trying to understand our blog or blog posts important information about both the site and the post itself.

This lives in the HEAD, and we’ll need different treatments for pages and posts then the home page. In _includes/head.html you’ll need to check if you are in a page. I do this by checking page.title. I also wanted the capability to override any individual page’s automatic JSON-LD with a manual setup. Here’s how:

    <!-- Check to see if we are in a page. -->
    {% if page.title %}
        {% if page.jsonld %}
            {% include {{page.jsonld}}.html %}
        {% else %}
            {% include postJSONLD.html %}
        {% endif %}
    {% else %}
        {% include homeJSONLD.html %}
    {% endif %}  
</head>

JSON-LD objects can become sizable to manage, and a bit of a pain to deal with when inside larger files. Here I make it simple by pulling the script for building JSON-LD out of the head.html file. I placed this code block at the end of the HEAD block, as illustrated by </head> at the bottom. Now one of three things will happen:

  1. If the page specifies a jsonld value, Jekyll will check to find that value as an HTML file in your _includes directory on page build.
  2. If the page does not specify a jsonld value, it will use the default file at postJSONLD.html.
  3. If there is no title, we assume it’s a site home page and use the homeJSONLD.html file.

All files will exist inside _includes and will be valid HTML. Note: you must create these files (and any custom files you specify) before you build your site or Jekyll will crash. If you (like I) are maintaining this on GitHub, that means before you git push to your repository.

First you’ll want to set up your per-post JSON-LD file. We’ll break down what’s happening inside next.

<script type="application/ld+json">
    {
        "@context": "http://schema.org",
        "@type": "BlogPosting",
        "headline": "{{ page.title }}",
        "description": "{{ page.excerpt }}",
        "image": [
            {% if page.ogimageoff %}

            {% elsif page.ogimage %}
                "{{page.ogimage}}"
            {% elsif page.image %}
                "{{page.image}}"
            {% else %}
                "https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg"
            {% endif %}
        ],
        "mainEntityOfPage": {
            "@type": "WebPage",
            "@id": "{{ page.url | replace:'index.html','' | prepend: site.baseurl | prepend: site.url }}"
        },
        "datePublished": "{{page.date}}",
        "dateModified": "{% if page.modified %}{{page.modified}}{% else %}{{page.date}}{% endif %}",
        "isAccessibleForFree": "True",
        "isPartOf": {
            "@type": ["CreativeWork", "Product", "Blog"],
            "name": "Fight With Tools",
            "productID": "aramzs.github.io"
        },
        "license": "http://creativecommons.org/licenses/by-sa/4.0/",
        "author": {
            "@type": "Person",
            "name": "Aram Zucker-Scharff",
            "description": "Aram Zucker-Scharff is Director for Ad Engineering at Washington Post, lead dev for PressForward and a consultant. Tech solutions for journo problems.",
            "sameAs": "http://aramzs.github.io/aramzs/",
            "image": {
                "@type": "ImageObject",
                "url": "https://pbs.twimg.com/profile_images/539484037765533698/7l6-pKY-_400x400.jpeg"
            },
            "givenName": "Aram",
            "familyName": "Zucker-Scharff",
            "alternateName": "AramZS",
            "publishingPrinciples": "http://aramzs.github.io/about/"
        },
        "publisher": {
            "@type": "Organization",
            "name": "Fight With Tools",
            "description": "A site discussing how to imagine, build, analyze and use cool code and web tools. Better websites, better stories, better developers. Technology won't save the world, but you can.",
            "sameAs": "http://aramzs.github.io",
            "logo": {
                "@type": "ImageObject",
                "url": "https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg"
            },
            "publishingPrinciples": "http://aramzs.github.io/about/"
        }
    }
</script>

To give some extra context, it’s useful to see what the object for this post looks like.

---
layout: post
title:  "How to give your Jekyll Site Structured Data for Search with JSON-LD"
date:   2018-04-21 10:19:51 +0100
categories:  jekyll schema-dot-org
image: https://github.com/AramZS/aramzs.github.io/blob/master/_includes/beamdown.gif?raw=true
vertical: Code
excerpt: "Let's make your Jekyll site work with Schema.org structured data and JSON-LD."
overlay: blue
---

Let’s look at the top level object properties:

        "@context": "http://schema.org",
        "@type": "BlogPosting",
        "headline": "{{ page.title }}",
        "description": "{{ page.excerpt }}",

The context property says we’re using the Schema.org standard. The type describes what this page is – a blog posting. headline and description both take metadata about the post that exists in our markdown and places it properly in the schema format.

        "image": [
            {% if page.ogimageoff %}

            {% elsif page.ogimage %}
                "{{page.ogimage}}"
            {% elsif page.image %}
                "{{page.image}}"
            {% else %}
                "https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg"
            {% endif %}
        ],

Here I’ve added a number of options to allow us to control the image in the JSON-LD statement, based on my social media metadata setup. I’m checking here for ogimageoff: indicating no image, ogimage: an image intended specifically for social media and image: the preview image at the top of this page. If I’ve set nothing, I default to a standard image across my site.

        "mainEntityOfPage": {
            "@type": "WebPage",
            "@id": "{{ page.url | replace:'index.html','' | prepend: site.baseurl | prepend: site.url }}"
        },

This part is a bit unclear. Presumably we’re already describing the mainEntityOfPage and this would not be required, but Google’s tool requires it and Schema.org’s demo, along with some news organizations, handles this property as type WebPage and gives it its own URL as an ID.

        "datePublished": "{{page.date}}",
        "dateModified": "{% if page.modified %}{{page.modified}}{% else %}{{page.date}}{% endif %}",
        "isAccessibleForFree": "True",
        "isPartOf": {
            "@type": ["CreativeWork", "Product", "Blog"],
            "name": "Fight With Tools",
            "productID": "aramzs.github.io"
        },
        "license": "http://creativecommons.org/licenses/by-sa/4.0/",

This block describes the date the content published, date it was modified and if it is behind a paywall (it is not). Because Google suggests dateModified no matter what the state, we check for a modified date and if none is set, we’ll use the published date.

isPartOf describes this blog post as part of a larger object, the blog itself.

license provides a link to the Creative Commons license I have applied to this blog.

        "author": {
            "@type": "Person",
            "name": "Aram Zucker-Scharff",
            "description": "Aram Zucker-Scharff is Director for Ad Engineering at Washington Post, lead dev for PressForward and a consultant. Tech solutions for journo problems.",
            "sameAs": "http://aramzs.github.io/aramzs/",
            "image": {
                "@type": "ImageObject",
                "url": "https://pbs.twimg.com/profile_images/539484037765533698/7l6-pKY-_400x400.jpeg"
            },
            "givenName": "Aram",
            "familyName": "Zucker-Scharff",
            "alternateName": "AramZS",
            "publishingPrinciples": "http://aramzs.github.io/about/"
        },

The above is the author block, we establish a Schema.org standard Person object and provide data about it. In this case, the data is about me (since I’m the author). Most of the properties are self evident, but there are two important properties to examine. publishingPrinciples should link to a page that describes why and how you publish the way you do. image attaches an ImageObject to the Person object that is an image of me. sameAs links to a page that describes the same person object. It is on this site and we’ll get into how that gets build later.

        "publisher": {
            "@type": "Organization",
            "name": "Fight With Tools",
            "description": "A site discussing how to imagine, build, analyze and use cool code and web tools. Better websites, better stories, better developers. Technology won't save the world, but you can.",
            "sameAs": "http://aramzs.github.io",
            "logo": {
                "@type": "ImageObject",
                "url": "https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg"
            },
            "publishingPrinciples": "http://aramzs.github.io/about/"
        }

The final property describes the publisher object. In this case we’re establishing it as this site, so the publishing organization for this post is this site. The properties within the object describes the site.

That’s the object!

Are you a person? Tell me how.

Now let’s explore the process of establishing a Person object under a stand-alone URL and using the custom jsonld property of a page.

At the base of the folder for this site, I’ve built a whoami.md file. This will be the new home of my Schema.org identity. It is topped with page data as follows:

---
layout: page
title: Who is Aram Zucker-Scharff?
navtitle: Who is Aram?
permalink: /aramzs/
excerpt: Aram Zucker-Scharff is Director for Ad Engineering at Washington Post, lead dev for PressForward and a consultant. Tech solutions for journo problems.
jsonld: jsonld-id
date: 2018-04-21 13:00:00 +0100
---

As you can see, I’m using the jsonld property to specify a stand-alone one-off .html file to use for the JSON-LD statement. Now I can create _includes/jsonld-id.html. Here’s what that file looks like:

<script type="application/ld+json">
    {
        "@context": "http://schema.org",
        "@type": "Person",
        "name": "Aram Zucker-Scharff",
        "description": "{{ page.excerpt }}",
        "disambiguatingDescription": "A media-focused developer and strategist.",
        "image": [
            {% if page.ogimagenull %}

            {% elsif page.ogimage %}
                "{{page.ogimage}}"
            {% elsif page.image %}
                "{{page.image}}"
            {% else %}
                "https://pbs.twimg.com/profile_images/539484037765533698/7l6-pKY-_400x400.jpeg"
            {% endif %}
        ],
        "givenName": "Aram",
        "familyName": "Zucker-Scharff",
        "alternateName": "AramZS",
        "publishingPrinciples": "http://aramzs.github.io/about/"
    }
</script>

This object now lives at the top of http://aramzs.github.io/aramzs/. It means that page can now represent my identity as a Schema.org object and it can be referred to by other Schema.org objects as a Person.

This is pretty cool because it now means I can refer to this on my site and on any site as a source of truth for my identity.

Hello Blog

The final step at getting the Structure Markup for this blog as valid as possible is to describe the blog itself as an object. Most of that object is the same, the exception being my home page is a Blog object as opposed to a BlogPosting. Here’s what it looks like:

<script type="application/ld+json">
    {
        "@context": "http://schema.org",
        "@type": "Blog",
        "url": "{{ prepend: site.baseurl | prepend: site.url }}",
        "headline": "{{ site.title }}",
        "about": "{{ site.description }}",
        "image": [
            "https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg"
        ],
        "isAccessibleForFree": "True",
        "isPartOf": {
            "@type": ["CreativeWork", "Product"],
            "name": "Fight With Tools",
            "productID": "aramzs.github.io"
        },
        "discussionUrl": "https://twitter.com/search?f=tweets&vertical=default&q=to%3AChronotope&l=en&src=typd",
        "author": {
            "@type": "Person",
            "name": "Aram Zucker-Scharff",
            "description": "Aram Zucker-Scharff is Director for Ad Engineering at Washington Post, lead dev for PressForward and a consultant. Tech solutions for journo problems.",
            "sameAs": "http://aramzs.github.io/aramzs/",
            "image": {
                "@type": "ImageObject",
                "url": "https://pbs.twimg.com/profile_images/539484037765533698/7l6-pKY-_400x400.jpeg"
            },
            "givenName": "Aram",
            "familyName": "Zucker-Scharff",
            "alternateName": "AramZS",
            "publishingPrinciples": "http://aramzs.github.io/about/"
        },
        "editor": {
            "@type": "Person",
            "name": "Aram Zucker-Scharff",
            "description": "Aram Zucker-Scharff is Director for Ad Engineering at Washington Post, lead dev for PressForward and a consultant. Tech solutions for journo problems.",
            "sameAs": "http://aramzs.github.io/aramzs/",
            "image": {
                "@type": "ImageObject",
                "url": "https://pbs.twimg.com/profile_images/539484037765533698/7l6-pKY-_400x400.jpeg"
            },
            "givenName": "Aram",
            "familyName": "Zucker-Scharff",
            "alternateName": "AramZS",
            "publishingPrinciples": "http://aramzs.github.io/about/"
        },
        "inLanguage": "en-US",
        "license": "http://creativecommons.org/licenses/by-sa/4.0/",
        "additionalType": "CreativeWork",
        "alternateName": "Fight With Tools",
        "publisher": {
            "@type": "Organization",
            "name": "Fight With Tools",
            "description": "A site discussing how to imagine, build, analyze and use cool code and web tools. Better websites, better stories, better developers. Technology won't save the world, but you can.",
            "sameAs": "http://aramzs.github.io",
            "logo": {
                "@type": "ImageObject",
                "url": "https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg"
            },
            "publishingPrinciples": "http://aramzs.github.io/about/"
        }
    }
</script>

I added a property to describe an editor. It’s just me here, but if someone else edits your work regularly, you could include there here. You could even do so on a per-post level. I also linked to a search on Twitter for people directing tweets to me as the discussionUrl for this blog.

You can check out the code I walked through here in the repository for this blog.

Challenges

Schema.org still has not been implemented across the web and so we frequently lack examples in the wild, and where they are live they can be inconsistent or conflict with other sites’ interpretations of the rules. The examples on the Schema.org site do not cover every case and even where they do, they don’t always make sense.

Because of this issue, there isn’t a clear answer to the question of how these JSON-LD objects should be formed, especially when dealing with specific situation or pages-as-objects (which is the intent of the JSON-LD head-level data). A good example of this is mainEntityOfPage. Theoretically you use this to designate the primary ‘object’ of the page, but isn’t that exactly the intent of the entire JSON-LD object? I’m not sure.

The other challenge is that you can get your structured data object correct but apply it to the wrong page, misrepresenting the object. There’s no real way to check this and extensive examples are hard to find.

Why bother?

With the complexity and lack of clarity involved in JSON-LD, why should you add it to your Jekyll site?

A bunch of reasons! Structured data is important for getting the best chance at topping a Google Search page. That might be reason enough for you. That isn’t the only reason:

  • It’s important for systems that attempt to understand and archive the web.
  • These data structures are useful for voice-controlled systems that might want to use data from your site.
  • Like OpenGraph’s less complex data, JSON-LD data is vital for the Semantic Web and allows computers and humans to better work together.

Thanks for making it to the bottom! JSON-LD and Structured Data is important to the future of the web and I hope this article makes implementing it on your Jekyll site easier!.

*Developer Testing Note!

Frustrated by trying to set up your JSON-LD locally and being unable to test it? I used bundle exec jekyll serve --incremental to set up my Jekyll site as a local server on port 4000 and then used a CLI application called Ngrok ngrok http 4000 to make my local server visible publicly so I could check it with the Google Structured Data Testing tool.

How to make your Jekyll site show up on social

November 12 Comments Off on How to make your Jekyll site show up on social Category: Code, Feed, Tumblr

How to make your Jekyll site show up on social: I wrote a thing! And made a new blog!

How to make your Jekyll site show up on social

November 11 Comments Off on How to make your Jekyll site show up on social Category: Code, Feed, Fight With Tools

Congratulations, you’ve set up a Jekyll site. You may even be, like me, taking advantage of the free hosting provided by GitHub. You’ve written your first post, you’ve set up all the options. But when you go to share it on Facebook, Tumblr, LinkedIn or Twitter, that share may not look so pretty.

Here’s how to make Jekyll posts easier for others to see and share on social networks.

To fix ugly shares and be the envy of all your GitHub followers you’ll have to add some metadata to the HTML HEAD tag. Following is a walk-through of what tags and Liquid code is needed to generate those tags. Unless otherwise indicated, this markup goes in the head.html file in your _includes folder. If you’re not already familiar with social and open graph tags, this post should be a useful illustration of how they work.

First, there are standard tags that should be applied on every page.

<!-- The Author meta propagates the byline in a number of social networks -->
<meta name="author" content="Aram Zucker-Scharff" />

The og:title tag sets the title for sharing. I’ve duplicated the logic of the title tag to show either the site title or the post title based on what location the user has loaded. You could set a post-level variable for custom title as well or change the number of allowed characters.

We’ll do the same with duplicating the logic of the description to og:description and canonical to og:url tags.

I’ve made the below Liquid statements multi-line for easier reading, but I wouldn’t recommend that in production.

<meta property="og:title"
    content="{% if page.title %}
      {{ page.title | strip_html | strip_newlines | truncate: 160 }}
    {% else %}
      {{ site.title }}
    {% endif %}">

<meta property="og:description"
    content="{% if page.excerpt %}
        {{ page.excerpt | strip_html | strip_newlines | truncate: 160 }}
      {% else %}
        {{ site.description }}
      {% endif %}">


<meta property="og:url"
    content="{{ page.url | replace:'index.html','' | prepend: site.baseurl | prepend: site.url }}" />

Populating the Open Graph site name and locale tags is fairly straightforward.

<meta property="og:site_name" content="{{ site.title }}" />

<meta property="og:locale" content="en_US" />

These are the site-wide Twitter tags. My twitter:site property is set to my personal name, but you might want to set it to your site’s account, if you have one. Description is set to the same data as the other description tags.

<meta name="twitter:site" content="@chronotope" />
<meta name="twitter:description" content="{% if page.excerpt %}{{ page.excerpt | strip_html | strip_newlines | truncate: 160 }}{% else %}{{ site.description }}{% endif %}" />

To populate all the fields social networks expect, you’ll need some extra properties on your posts. Here’s what the head of this post’s markdown file looks like.

---
layout: post
title:  "How to make your Jekyll site more shareable"
date:   2015-10-29 01:34:51 -0400
categories: jekyll social-media
image: http://41.media.tumblr.com/173cb5c51a1c308ab022a786f69353f3/tumblr_nwncf1T2ht1rl195mo1_1280.jpg
vertical: Code
excerpt: "Jekyll is pretty cool, here's how to make writing with it easier for others to share on social networks."
---

There are a number of meta tags that are either site or article only. In order to figure out if we’re on an article or not Liquid can switch in an if/else statement on the page.title.

{% if page.title %}
  <!-- Article specific OG data -->
  <!-- The OG:Type dictates a number of other tags on posts. -->
  <meta property="og:type" content="article" />
  <meta property="article:published_time" content="{{page.date}}" />

  <!-- page.modified isn't a natural Jekyll property, but it can be added. -->
  {% if page.modified %}
    <meta property="article:modified_time" content="{{page.modified}}" />
  {% endif %}

  <!-- Here my author and publisher tags are the same (yay self-publishing) -->
  <meta property="article:author" content="http://facebook.com/aramzs" />
  <!-- But if your site has its own page, this is where to put it. -->
  <meta property="article:publisher" content="https://www.facebook.com/aramzs" />

  <!-- Article section isn't a required property, but it can be good to have -->
  <meta property="article:section" content="{{page.vertical}}" />

  <!-- I use the page.categories property for OG tags. -->
  {% for tag in page.categories %}
    <meta property="article:tag" content="{{tag}}" />
  {% endfor %}

  <!-- I prefer the summary_large_image Twitter card for posts. -->
  <meta name="twitter:card" content="summary_large_image" />
  <!-- You, you're the creator. -->
  <meta name="twitter:creator" content="@chronotope" />
  <!-- This property is for the article title, not site title. -->
  <meta name="twitter:title" content="{{page.title}}" />

Sharing works better with pictures. You can upload them to your repository, or link them from other locations. Not every page may have an image, so I’ve built a check to assure that an image has been supplied. If one hasn’t, it returns to the default image I have for the whole site.

This takes care of both the Open Graph and Twitter Image tags. With more page properties you could have custom images for each if you wanted.

{% if page.image %}
    <meta property="og:image" content="{{page.image}}" />
    <meta name="twitter:image" content="{{page.image}}" />
  {% else %}
    <meta property="og:image" content="https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg" />
    <meta name="twitter:image" content="https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg" />
  {% endif %}

What if you’re not on a post page? There are some default values we can fill in to indicate that we’re on the basic website.

{% else %}
  <!-- OG data for homepage -->
  <meta property="og:image" content="https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg" />
  <meta property="og:type" content="website" />
  <meta name="twitter:card" content="summary" />
  <meta name="twitter:title" content="{{site.title}}" />
  <meta name="twitter:image" content="https://41.media.tumblr.com/709bb3c371b9924add351bfe3386e946/tumblr_nxdq8uFdx81qzocgko1_1280.jpg" />

{% endif %}

That’s all of them! If you’re interested, you can see the whole set of tags, the Liquid script, and the rest of head.html that I use for this very site by checking the repo.

Hello World!

October 29 Comments Off on Hello World! Category: Feed

Hey, check it out. First post! Trying this thing out.