If you follow my microblog that I named Thoughts, you may have noticed that I added rich link previews. I found myself taking screenshots of links that I'd post and that is just a silly duplication of work which means it's time to write some software.
URL Metadata Standards?
The first task was deciding how to look up the metadata for a URL, there are several pseudo-standards currently competing for relevance in this space. The W3C has a recommendation called JSON-LD which seems to have been designed by a committee so it has several moving parts, and specifiation created by Facebook called the Open Graph protocol. I looked at the links that I posted to Thoughts and by far the most frequently available metadata is in the form of Open Graph tags. A keen reader may also note that I went through some hoops to generate Open Graph images for Thoughts, and the rest of the site generator uses Open Graph tags and not JSON-LD.
With the metadata source in mind, the workflow had to be designed.
Posting Changes
=======[ Posting App ]======= ====[ Azure Function ]====
+---------------+
+------+ | Extract first | +---------------------+
| Post |--->| `A' from text |--->| Resolve OG Metadata |
+------+ | and validate. | +---------------------+
+---------------+ |
|
+----------------+ /
| Store Thought | /
| in Azure Table |<----------
+----------------+
The posting interface changes were pretty simple, and the Azure Function is pretty trivial. First we need to extract and resolve the link from the text entered by the user.
async function findLink(text) {
// Do not resolve these domains into link attachments.
const disallowedDomains = [
'www.going-flying.com',
'ssl.ub3rgeek.net'
];
try {
const dom = (new DOMParser())
.parseFromString(text, 'text/html');
const firstA = dom.querySelector('a');
if (! firstA || ! firstA.href) { return false; }
const hostname = (new URL(firstA.href)).host;
if (disallowedDomains.includes(hostname)) { return false; }
const href = encodeURIComponent(firstA.href);
const resp = await fetch(
'https://vociferate.azurewebsites.net/api/resolver?url=' + href
);
if (! resp.ok) { return false; }
console.log('Resolved metadata for ' + firstA.href);
const meta = await resp.json();
meta['url'] = firstA.href;
return meta;
} catch (e) {
console.error('findLink failed ' + e.message);
return false;
}
}
After we resolve the link the post()
function saves the results
as a new link
attribute in the Table, just like we would an attachment
.
const row = {
'PartitionKey': 'thought',
'RowKey': id.toString(),
'id@odata.type': 'Edm.Int64',
'id': id.toString(),
'message': nl2br(postText.value)
};
if (attachments) {
row['attachment'] = JSON.stringify(attachments);
}
if (link) {
row['link'] = JSON.stringify(link);
}
await connections.table.insert(row);
The link resolution Azure Function simply fetches the page using
the venerable
Requests
library and the resulting HTML is parsed with the ubiquitous
BeautifulSoup
library. Open Graph tags are simple META
tags in the HEAD
of
the page so they are very straightforward to get.
'''resolver/__init__.py (c) 2021 Matthew J Ernisse <matt@going-flying.com>
All Rights Reserved.
Redistribution and use in source and binary forms,
with or without modification, are permitted provided
that the following conditions are met:
* Redistributions of source code must retain the
above copyright notice, this list of conditions
and the following disclaimer.
* Redistributions in binary form must reproduce
the above copyright notice, this list of conditions
and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE
USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
'''
import json
import logging
import os
import requests
import azure.functions as func
from bs4 import BeautifulSoup
from urllib.parse import unquote as unquote
from ..lib import CacheHeaders
class OpenGraphPage(object):
''' Fetch a URL and create a dict-like mapping of the OpenGraph
properties in the page.
'''
_ua = 'Mozilla/5.0 (compatible; ThoughtsBot/1.0; +matt@going-flying.com'
def __init__(self, url):
resp = requests.get(url, headers={'User-Agent': self._ua})
resp.raise_for_status()
soup = BeautifulSoup(resp.content, features='lxml')
if not soup:
raise Exception(f'Failed to parse content of {url}')
self.soup = soup
def __contains__(self, attr):
try:
if self[attr]:
return True
return False
except KeyError:
return False
def __getitem__(self, attr):
tag = self.soup.find('meta', property=f'og:{attr}')
if not tag:
raise KeyError(attr)
return tag['content']
def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
if 'url' not in req.params.keys():
return func.HttpResponse(
'url is a required query parameter',
status_code=400
)
url = req.params.get('url')
url = unquote(url)
try:
og = OpenGraphPage(url)
except Exception as e:
logging.error(f'resolver() url {url} threw {e!s}')
logging.exception(e)
return func.HttpResponse(status_code=500)
obj = {
'description': '',
'image': '',
'title': '',
'site_name': '',
}
for tag in obj.keys():
if tag in og:
obj[tag] = og[tag]
try:
logging.info(f'resolver() resolved {url}')
return func.HttpResponse(
json.dumps(obj),
charset='utf-8',
headers=CacheHeaders.dynamic,
mimetype='application/json; charset=utf-8'
)
except Exception as e:
logging.error(f'resolver() threw {e!s} encoding {obj}')
logging.exception(e)
return func.HttpResponse(status_code=500)
Viewing Changes
Now that we have a link
attribute stored for new Thoughts, we need to make
that data useful and display it to the user. I created a new custom
Element, much like I did for the
video thumbnails to contain
and style the link metadata. The big difference between this and the
custom elements I used elsewhere is that this one is slotted. This is
largely because it is much simpler and doesn't need to encapsulate as much
custom behavior. In fact the JavaScript is basically the bare minimum
boilerplate required to instantiate a custom element.
class GoingflyingLinkPreview extends HTMLElement {
constructor() {
super();
const template = document.getElementById(
'goingflying-link-preview'
).content;
const shadowRoot = this.attachShadow({mode: 'open'});
shadowRoot.appendChild(template.cloneNode(true));
}
}
The work is all done by the containing element (either
goingflying-thought-large
or goingflying-thought-small
).
if (link && link.title && link.image) {
const gutter = this.shadowRoot
.getElementById('gutter') || contEl;
const linkEl = document.createElement(
'goingflying-link-preview'
);
linkEl.title = link.url;
linkEl.addEventListener('click', () => {
window.location = link.url;
});
if (link.image) {
const thumb = document.createElement('img');
thumb.slot = 'thumbnail';
thumb.src = link.image;
linkEl.appendChild(thumb);
}
const hero = document.createElement('span');
hero.slot = 'hero';
hero.innerHTML = link.title;
linkEl.appendChild(hero);
const preview = document.createElement('span');
preview.slot = 'preview';
preview.innerHTML = link.description;
linkEl.appendChild(preview);
gutter.appendChild(linkEl);
}
After making this work I refactored how the embed code was created
to not include the goingflying-link-preview
element as previously
it just was copying the DOM from inside the Shadow DOM of the
goingflying-thought-*
.
Conclusion
This was a bit more involved than adding videos since I had not planned for it but I think it makes the thoughts a bit more interesting and obviates the need for me to post a screenshot of the link target so the reader knows what I am referring to. It also goes to show how flexible the system is, being made up of small pieces acting together instead of a single massive whole.