Rich Link Previews For Thoughts_

Support the Entertainment Community Fund.
🇺🇦 Resources to help support the people of Ukraine. 🇺🇦
August 23, 2021 @22:45

If you follow my microblog that I named Thoughts, you may have noticed that I added rich link previews. I found myself taking screenshots of links that I'd post and that is just a silly duplication of work which means it's time to write some software.

URL Metadata Standards?

The first task was deciding how to look up the metadata for a URL, there are several pseudo-standards currently competing for relevance in this space. The W3C has a recommendation called JSON-LD which seems to have been designed by a committee so it has several moving parts, and specifiation created by Facebook called the Open Graph protocol. I looked at the links that I posted to Thoughts and by far the most frequently available metadata is in the form of Open Graph tags. A keen reader may also note that I went through some hoops to generate Open Graph images for Thoughts, and the rest of the site generator uses Open Graph tags and not JSON-LD.

With the metadata source in mind, the workflow had to be designed.

Posting Changes

=======[ Posting App ]=======    ====[ Azure Function ]====

             +---------------+
 +------+    | Extract first |    +---------------------+
 | Post |--->| `A' from text |--->| Resolve OG Metadata |
 +------+    | and validate. |    +---------------------+
             +---------------+              |
                                            |
            +----------------+             /
            | Store Thought  |            /
            | in Azure Table |<----------
            +----------------+

The posting interface changes were pretty simple, and the Azure Function is pretty trivial. First we need to extract and resolve the link from the text entered by the user.

async function findLink(text) {
    // Do not resolve these domains into link attachments.
    const disallowedDomains = [
        'www.going-flying.com',
        'ssl.ub3rgeek.net'
    ];

    try {
        const dom = (new DOMParser())
            .parseFromString(text, 'text/html');
        const firstA = dom.querySelector('a');

        if (! firstA || ! firstA.href) { return false; }

        const hostname = (new URL(firstA.href)).host;
        if (disallowedDomains.includes(hostname)) { return false; }

        const href = encodeURIComponent(firstA.href);
        const resp = await fetch(
            'https://vociferate.azurewebsites.net/api/resolver?url=' + href
        );

        if (! resp.ok) { return false; }
        console.log('Resolved metadata for ' + firstA.href);

        const meta = await resp.json();
        meta['url'] = firstA.href;
        return meta;

    } catch (e) {
        console.error('findLink failed ' + e.message);
        return false;
    }
}

After we resolve the link the post() function saves the results as a new link attribute in the Table, just like we would an attachment.

const row = {
    'PartitionKey': 'thought',
    'RowKey': id.toString(),
    'id@odata.type': 'Edm.Int64',
    'id': id.toString(),
    'message': nl2br(postText.value)
};

if (attachments) {
    row['attachment'] = JSON.stringify(attachments);
}

if (link) {
    row['link'] = JSON.stringify(link);
}

await connections.table.insert(row);

The link resolution Azure Function simply fetches the page using the venerable Requests library and the resulting HTML is parsed with the ubiquitous BeautifulSoup library. Open Graph tags are simple META tags in the HEAD of the page so they are very straightforward to get.

'''resolver/__init__.py (c) 2021 Matthew J Ernisse <matt@going-flying.com>
All Rights Reserved.

Redistribution and use in source and binary forms,
with or without modification, are permitted provided
that the following conditions are met:

    * Redistributions of source code must retain the
      above copyright notice, this list of conditions
      and the following disclaimer.
    * Redistributions in binary form must reproduce
      the above copyright notice, this list of conditions
      and the following disclaimer in the documentation
      and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
'''
import json
import logging
import os
import requests
import azure.functions as func

from bs4 import BeautifulSoup
from urllib.parse import unquote as unquote

from ..lib import CacheHeaders


class OpenGraphPage(object):
    ''' Fetch a URL and create a dict-like mapping of the OpenGraph
    properties in the page.
    '''
    _ua = 'Mozilla/5.0 (compatible; ThoughtsBot/1.0; +matt@going-flying.com'

    def __init__(self, url):
        resp = requests.get(url, headers={'User-Agent': self._ua})
        resp.raise_for_status()

        soup = BeautifulSoup(resp.content, features='lxml')
        if not soup:
            raise Exception(f'Failed to parse content of {url}')

        self.soup = soup

    def __contains__(self, attr):
        try:
            if self[attr]:
                return True

            return False

        except KeyError:
            return False

    def __getitem__(self, attr):
        tag = self.soup.find('meta', property=f'og:{attr}')
        if not tag:
            raise KeyError(attr)

        return tag['content']


def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
    if 'url' not in req.params.keys():
        return func.HttpResponse(
            'url is a required query parameter',
            status_code=400
        )

    url = req.params.get('url')
    url = unquote(url)

    try:
        og = OpenGraphPage(url)
    except Exception as e:
        logging.error(f'resolver() url {url} threw {e!s}')
        logging.exception(e)
        return func.HttpResponse(status_code=500)

    obj = {
        'description': '',
        'image': '',
        'title': '',
        'site_name': '',
    }

    for tag in obj.keys():
        if tag in og:
            obj[tag] = og[tag]

    try:
        logging.info(f'resolver() resolved {url}')
        return func.HttpResponse(
            json.dumps(obj),
            charset='utf-8',
            headers=CacheHeaders.dynamic,
            mimetype='application/json; charset=utf-8'
        )

    except Exception as e:
        logging.error(f'resolver() threw {e!s} encoding {obj}')
        logging.exception(e)
        return func.HttpResponse(status_code=500)

Viewing Changes

Now that we have a link attribute stored for new Thoughts, we need to make that data useful and display it to the user. I created a new custom Element, much like I did for the video thumbnails to contain and style the link metadata. The big difference between this and the custom elements I used elsewhere is that this one is slotted. This is largely because it is much simpler and doesn't need to encapsulate as much custom behavior. In fact the JavaScript is basically the bare minimum boilerplate required to instantiate a custom element.

class GoingflyingLinkPreview extends HTMLElement {
    constructor() {
        super();
        const template = document.getElementById(
            'goingflying-link-preview'
        ).content;

        const shadowRoot = this.attachShadow({mode: 'open'});
        shadowRoot.appendChild(template.cloneNode(true));
    }
}

The work is all done by the containing element (either goingflying-thought-large or goingflying-thought-small).

if (link && link.title && link.image) {
    const gutter = this.shadowRoot
        .getElementById('gutter') || contEl;
    const linkEl = document.createElement(
        'goingflying-link-preview'
    );
    linkEl.title = link.url;

    linkEl.addEventListener('click', () => {
        window.location = link.url;
    });

    if (link.image) {
        const thumb = document.createElement('img');
        thumb.slot = 'thumbnail';
        thumb.src = link.image;
        linkEl.appendChild(thumb);
    }

    const hero = document.createElement('span');
    hero.slot = 'hero';
    hero.innerHTML = link.title;
    linkEl.appendChild(hero);

    const preview = document.createElement('span');
    preview.slot = 'preview';
    preview.innerHTML = link.description;
    linkEl.appendChild(preview);
    gutter.appendChild(linkEl);
}

After making this work I refactored how the embed code was created to not include the goingflying-link-preview element as previously it just was copying the DOM from inside the Shadow DOM of the goingflying-thought-*.

Conclusion

This was a bit more involved than adding videos since I had not planned for it but I think it makes the thoughts a bit more interesting and obviates the need for me to post a screenshot of the link target so the reader knows what I am referring to. It also goes to show how flexible the system is, being made up of small pieces acting together instead of a single massive whole.

Comment via e-mail. Subscribe via RSS.