Rich Link Previews For Thoughts_

August 23, 2021 @22:45

If you follow my microblog that I named Thoughts, you may have noticed that I added rich link previews. I found myself taking screenshots of links that I'd post and that is just a silly duplication of work which means it's time to write some software.

URL Metadata Standards?

The first task was deciding how to look up the metadata for a URL, there are several pseudo-standards currently competing for relevance in this space. The W3C has a recommendation called JSON-LD which seems to have been designed by a committee, so it has several moving parts and Facebook created a specification called the Open Graph protocol. I looked at the links that I posted to Thoughts and by far the most frequently available metadata is in the form of Open Graph tags. A keen reader may also note that [I went through some hoops to generate Open Graph images](/blog/open-graph-images-for-thoughts-with-pyppeteer-in-azure-functions.html} for Thoughts, and the rest of the site generator uses Open Graph tags and not JSON-LD.

With the metadata source in mind, the workflow had to be designed.

Posting Changes

=======[ Posting App ]=======    ====[ Azure Function ]====

        +---------------+
+------+    | Extract first |    +---------------------+
| Post |--->| `A' from text |--->| Resolve OG Metadata |
+------+    | and validate. |    +---------------------+
        +---------------+          |
                            |
        +----------------+       /
        | Store Thought  |      /
        | in Azure Table |<----------
        +----------------+

The posting interface changes were pretty simple, and the Azure Function is pretty trivial. First we need to extract and resolve the link from the text entered by the user.

async function findLink(text) {
    // Do not resolve these domains into link attachments.
    const disallowedDomains = [
        'www.going-flying.com',
        'ssl.ub3rgeek.net'
    ];

    try {
        const dom = (new DOMParser())
            .parseFromString(text, 'text/html');
        const firstA = dom.querySelector('a');

        if (! firstA || ! firstA.href) { return false; }

        const hostname = (new URL(firstA.href)).host;
        if (disallowedDomains.includes(hostname)) { return false; }

        const href = encodeURIComponent(firstA.href);
        const resp = await fetch(
            'https://vociferate.azurewebsites.net/api/resolver?url=' + href
        );

        if (! resp.ok) { return false; }
        console.log('Resolved metadata for ' + firstA.href);

        const meta = await resp.json();
        meta['url'] = firstA.href;
        return meta;

    } catch (e) {
        console.error('findLink failed ' + e.message);
        return false;
    }
}

This gets called from the post() function, we then save a link attribute in the Table, just like we would an attachment.

const row = {
    'PartitionKey': 'thought',
    'RowKey': id.toString(),
    'id@odata.type': 'Edm.Int64',
    'id': id.toString(),
    'message': nl2br(postText.value)
};

if (attachments) {
    row['attachment'] = JSON.stringify(attachments);
}

if (link) {
    row['link'] = JSON.stringify(link);
}

await connections.table.insert(row);

The Azure Function that gets called just simply fetches the page using the venerable Requests library and parsed with the ubiquitous BeautifulSoup HTML parsing library. The Open Graph tags are simple META tags in the page so they are very straightforward to get.

'''resolver/__init__.py (c) 2021 Matthew J Ernisse <matt@going-flying.com>
All Rights Reserved.

Redistribution and use in source and binary forms,
with or without modification, are permitted provided
that the following conditions are met:

    * Redistributions of source code must retain the
      above copyright notice, this list of conditions
      and the following disclaimer.
    * Redistributions in binary form must reproduce
      the above copyright notice, this list of conditions
      and the following disclaimer in the documentation
      and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
'''
import json
import logging
import os
import requests
import azure.functions as func

from bs4 import BeautifulSoup
from urllib.parse import unquote as unquote

from ..lib import CacheHeaders


class OpenGraphPage(object):
    ''' Fetch a URL and create a dict-like mapping of the OpenGraph
    properties in the page.
    '''
    _ua = 'Mozilla/5.0 (compatible; ThoughtsBot/1.0; +matt@going-flying.com'

    def __init__(self, url):
        resp = requests.get(url, headers={'User-Agent': self._ua})
        resp.raise_for_status()

        soup = BeautifulSoup(resp.content, features='lxml')
        if not soup:
            raise Exception(f'Failed to parse content of {url}')

        self.soup = soup

    def __contains__(self, attr):
        try:
            if self[attr]:
                return True

            return False

        except KeyError:
            return False

    def __getitem__(self, attr):
        tag = self.soup.find('meta', property=f'og:{attr}')
        if not tag:
            raise KeyError(attr)

        return tag['content']


def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
    if 'url' not in req.params.keys():
        return func.HttpResponse(
            'url is a required query parameter',
            status_code=400
        )

    url = req.params.get('url')
    url = unquote(url)

    try:
        og = OpenGraphPage(url)
    except Exception as e:
        logging.error(f'resolver() url {url} threw {e!s}')
        logging.exception(e)
        return func.HttpResponse(status_code=500)

    obj = {
        'description': '',
        'image': '',
        'title': '',
        'site_name': '',
    }

    for tag in obj.keys():
        if tag in og:
            obj[tag] = og[tag]

    try:
        logging.info(f'resolver() resolved {url}')
        return func.HttpResponse(
            json.dumps(obj),
            charset='utf-8',
            headers=CacheHeaders.dynamic,
            mimetype='application/json; charset=utf-8'
        )

    except Exception as e:
        logging.error(f'resolver() threw {e!s} encoding {obj}')
        logging.exception(e)
        return func.HttpResponse(status_code=500)

Viewing Changes

So we now have a link attribute stored for new Thoughts, but to make that data useful we need to display it to the user. I created a new custom Element, much like I did for the video thumbnails to contain and style the link metadata. The big difference between this and the custom elements I used elsewhere is that this one is slotted. This is largely because it is much simpler and doesn't need to encapsulate as much custom behavior. In fact the JavaScript is basically the bare minimum boilerplate required to instantiate a custom element.

class GoingflyingLinkPreview extends HTMLElement {
    constructor() {
        super();
        const template = document.getElementById(
            'goingflying-link-preview'
        ).content;

        const shadowRoot = this.attachShadow({mode: 'open'});
        shadowRoot.appendChild(template.cloneNode(true));
    }
}

The work is all done by the containing element (either goingflying-thought-large or goingflying-thought-small).

if (link && link.title && link.image) {
    const gutter = this.shadowRoot
        .getElementById('gutter') || contEl;
    const linkEl = document.createElement(
        'goingflying-link-preview'
    );
    linkEl.title = link.url;

    linkEl.addEventListener('click', () => {
        window.location = link.url;
    });

    if (link.image) {
        const thumb = document.createElement('img');
        thumb.slot = 'thumbnail';
        thumb.src = link.image;
        linkEl.appendChild(thumb);
    }

    const hero = document.createElement('span');
    hero.slot = 'hero';
    hero.innerHTML = link.title;
    linkEl.appendChild(hero);

    const preview = document.createElement('span');
    preview.slot = 'preview';
    preview.innerHTML = link.description;
    linkEl.appendChild(preview);
    gutter.appendChild(linkEl);
}

After making this work I refactored how the embed code was created because previously it was copying the DOM tree inside the Shadow DOM of the goingflying-thought-* element and that would include my new goingflying-link-preview but wouldn't copy the template element that styled it or the JavaScript that instantiated it. As you may see I chose to omit it from the embed code.

Conclusion

This was a bit more involved than adding videos since I had not planned for it but I think it makes the thoughts a bit more interesting and obviates the need for me to post a screenshot of the link target so the reader would know what I was talking about.

Subscribe via RSS. Send me a comment.