PIG: RSS All The Things

akkartik · April 4, 2024, 6:04am

I’m less certain than you that RSS should do all the things. (And I say this as someone who was once bullish enough to try to get an RSS project/startup off the ground way back in 2009.)

For example, I have yet to meet a feedreader that doesn’t periodically spam me with already-read posts from some feed or other. It happens for a variety of reasons, and it’s a hard problem. But it suggests that <atom:updated> is very far from a complete solution for sprouts. For example, not all updates are the same, and I might want different policies for different feeds as a reader, independent of what the feed author recommends. And that requires whole new categories of clients, that create vocabularies I don’t possess yet to even debate potential solutions with.

In the fullness of time we may get to all these use cases. But in any short to medium term, depending on RSS for audience segmentation seems unrealistic. It’s rare enough that people subscribe to the one feed on a site without introducing paradox-of-choice burdens. Until then, I’d focus for RSS on use cases it does well.

I think all this is a circuitous way to say I prefer the direction you’re taking with RSS, even as that other post is tantalizing and thought-provoking.

swaldman · April 4, 2024, 6:16am

Weird recurrences of posts in feeds and readers is a really ubiquitous RSS annoyance. Much of the trouble of writing an RSS → newsletter/notification bridge was making sure subscriptions never get notified more than once, regardless of how items pop in and out of the feed over time.

I agree the range of notification policies one might want is pretty unbounded. For any thinking about application-specific RSS extensions, there’s a tension between more applications and enough user-mass on any application for other creators and clients to bother to support your extension. You want to be careful about multiplying vocabularies. Multiplying feeds works, but it does impose a burden on users. (Nevertheless, it is what I do!)

Though I agree it doesn’t fully capture all “sprouts” use cases, I’m actually kind of excited to support and start using <atom:updated> for changes I want to announce. I can put updated items back near the top of my feeds, use the updated date in my reverse chronological order. I have no idea how inoreader (my current main client) will deal with those reissues, though!

swaldman · April 4, 2024, 6:17am

(I’d love to hear about your 2009 venture / adventure!)

erlend_sh · April 4, 2024, 4:35pm

How does WebSub fit into this, if at all?

swaldman · April 4, 2024, 6:26pm

We’d want to start by supporting the more common polling model for “subscribing” to RSS resources. But we do propose a kind of dynamic “hub” for hosting and modifying those resources, which one might think of loosely as an intercompatible blogging/microblogging/comment-hosting service. Ideally, these hubs would support pingbacks and webmentions, and WebSub.

WebSub is probably a lower priority, because the first thing is to demonstrate functionality and get people excited to use it. WebSub is a more efficient subscription protocol than blind polling, but it doesn’t add user-visible features to get excited about.

But if the project goes well, the right thing to do would be to add WebSub support, so that client services that understand it can subscribe more efficiently. We should keep in mind when thinking about the hub and providing a reference implementation that push notification should eventually be supportable.

ramona · April 4, 2024, 8:02pm

Ohh I’m sorry! I misunderstood.

dmathews · April 11, 2024, 8:34pm

Hey!

We are both passionate about RSS

I submitted an idea about extending RSS by adding OAuth to be able to consume exclusive paid episodes. I would love to get your input or feedback on it: [PIG] Enhancing RSS podcast feeds by using OAuth to access exclusive content

Also, I also built a comic platform (https://taddy.org). All comics uploaded to Taddy are distributed over an open-source specification. We made our own specification (ComicSeries) but I’ve been thinking that I would output an RSS feed with a comics namespace (for people that want to consume it via RSS). I’d be curious to know your thoughts on it.

If I have one question about your application, I wasn’t sure what an RSS-first applications was, but maybe that’s something you are still exploring.

Really excited to see more people passionate about RSS and open distribution.

dorian · April 11, 2024, 9:20pm

I like this proposal in principle though I find the actual RSS 2.0 specification to be the weakest of all feed formats. All it has going for it are the facts that its author (who confused matters when he unilaterally dubbed his creation “RSS 2.0” when it had zero overlap with RSS 1.0) had name recognition within the industry, and it happens to be marginally easier to comprehend than Atom, which was of course invented to repair the flaws of RSS 2.0.

There seems, moreover, to be a general antipathy among software developers toward XML in the last decade or so, which is why you see formats like JSON Feed drafted up. I would be concerned about losing the younger developers (and the truly XML-traumatized) in addition to the Atom people, should the focus be too narrowly trained on RSS 2.0 proper.

That said, when a feed parser ingests a feed, it turns the entries into an internal representation, at which point it’s not the syntax of the serialization that matters, but the semantics. If this were my project, I would propose a set of controlled vocabularies that described the proposed semantic structures in the abstract, with projections into the three extant concrete feed formats, plus any future ones that don’t exist yet.

In my own work prior to and including Intertwingler (my 2023 SoP project), I have authored a number of such vocabularies. These specifications are all arranged so that their namespace URI is that of the specification itself, which also contains a formal, computable representation, embedded directly into the page source. While these are all OWL ontologies under the hood, consumers of these vocabularies need not be Semantic Web applications (precedent: ActivityPub’s ActivityStreams vocabulary). It just happens to be a really mature and robust toolkit for representing data semantics.

(Term reconciliation alone between the formats for their existing semantics would be super valuable.)

So that would be my suggestion: RSS (or Atom or JSON Feed or whatever) All The Things.

swaldman · April 11, 2024, 11:37pm

Thanks! I just offered some thoughts on your proposal, [PIG] Enhancing RSS podcast feeds by using OAuth to access exclusive content - #4 by swaldman

I think you should absolutely supply the metainformation you are supplying in your JSON feed format in RSS, by defining a namespace for it. That’s RSS’ secret sauce. You can just do that. You don’t have to ask anybody, and it’s not a big deal. It doesn’t break anything. Your content shows up in any standard feed reader no matter what, but if you can demonstrate that the extra meta information you provide is useful, some feed readers will begin to pick and choose elements to support in their UI. If users, even some enthusiastic niche of users, love the enhancement, more readers will add more features, and people might even write specialized clients for comics, and voila, you’ve invented the new podcast.

Podcasts are an example of RSS-first applications, de facto.

Initially websites were the applications (even for podcasts), and RSS feeds just announced new updates. But over time and with extra metadata, podcast-specific RSS clients (which we just call Podcast apps) eclipsed podcast websites in beauty and usefulness, and now the website is just a kind of stub or backstop if it exists at all. Whatever the status of the website, since it’s still basically RSS at its core, podcast episodes and show notes and mp3 links are always part of the open web, accessible via any RSS reader.

If you embed rich comic metadata in RSS, and over time users come to prefer specialized feed readers that render them beautifully thanks to that metadata, but the important heart of the content remains accessible as standard RSS in any reader, then you’ll have made comics an RSS-first application. People will publish comics primarily in your extended feed format, rather than as a website that just uses feeds to announce updates so that users come visit the website. The feed item becomes the canonical content.

swaldman · April 12, 2024, 12:27am

Ha!

Somewhere on Mt. Olympus, a great battle has raged between Athena and Dionysus over these questions for thousands of years. You are on the side of Athena, I am a partisan of Dionysus. Although in the end we remain siblings and lovers. (Gross.)

I agree that RSS is technically the shoddiest of all the feed standards. That is why it is the best. It’s not precious. You don’t need an IETF RFC to extend it, just a website you let document your namespace URL. RSS is for amateurs, not for experts. Atom is better thought out than, technically superior to, RSS. RSS is weird, quirky, full of vestiges no one cares about and choices that don’t really make sense. To properly support authors, you’ve got to bring in a whole new namespace and <dc:creator>? Really?

Really.

If anyone does anything better than we do, we just steal it. You can find all kinds of Atom tags embedded in RSS documents.

RSS is a space for drunken play, a Bacchanal where it’s all fine because whatever you did last night, if it turns out it was a dumb idea, no one is going to be any worse for the wear. Its inauspicious start means you don’t have anything so precious to compete with. RSS’s Zeus literally says stuff like “Perfection is a waste of time” and “It totally doesn’t matter what we call it.” and “If practice deviates from the spec, change the spec”.

The spirit of RSS is the very opposite of high modernism. It’s “throw the kids on the quad and see where they walk, then we’ll pave sidewalks where the grass has worn away”. In our project, we propose simultaneously to be the kids and the steam-roller (or its sidewalk equivalent).

We have lots of metadata we want to add to RSS, because there are a bunch of things we want to do that its existing metadata won’t support. We don’t mean to think it out all that carefully. We just mean to add it, and make our things, and publicize it and hope other people walk the same sidewalks. Our one compunction, the one commandment that will slow our fervid play, is Zeus’ admoniton:

If you want to add a feature to a format, first carefully study the existing format and namespaces to be sure what you’re doing hasn’t already been done. If it has, use the original version. This is how you maximize interop.

We want to be sure there are sidewalks to everywhere we want to walk, but if there already are sidewalks, we don’t want to build alternative paths, because interaction with the humans is everything and we want to disperse that energy as little as possible.

Other than that, anything goes. Playfully, even a bit carelessly. It’s better, says Dionysus, to get it done half right, than to tarry on Athena’s counsel to get it truly right and end up getting nothing at all.

(Athena, of course, gets a great deal done. Dionysus would have no wine, or only very terrible wine, if it were not for her diligent inventiveness. But serving Athena well is not within the capacity of your neighborhood drunk. Hiccup. Dionysus’ knack is to make it possible for even the lowest drunk to be the life of the party sometimes. Since there are many more drunks than mathematicians, there is something to be said for that.)

Re XML, I agree that’s a big question mark. On the one hand, XML does have its advantages. JSON is a terrible format for anything but over-the-wire serialization in my view, even uglier than XML for humans to make sense of as text, very brittle, scornful of the most basic amenities like comments. But people do hate XML, and it is an ugly, overly corporate format. Its virtue is its very forgiving eXtensibility (if you don’t rigidify it with XSD or such monstrosities).

For that reason, I’m reluctant (but not decided certainly against) using very XML-specific technologies like XML signature. I’d rather keep the logical sematics of whatever metadata we define to a basic informal schema of

element namespace
element name
key-value attributes
enclosed-elements OR enclosed text (not mixed, embedded HTML is CDATA)

RSS has a lot of installed base and network effect around it already, in all its XML antiglory. I don’t see a shift to anything else anytime soon. I think JSON feeds were an awful idea.

But it would be nice if, somehow, someday, we could move to something much less horrible than XML. Consider, for example, something like KDL (pronounced “cuddle” i it!)

The set of primitives listed above could easily be mapped to a format like that. And perhaps should, if it weren’t for those damned network effects.

Anyway, I agree with you, I’d like to leave open the possibility.

dorian · April 12, 2024, 1:51am

My experience using RDF (going back to 2006) is precisely that you don’t need to go to the IETF or the W3C to create a vocabulary, you can just throw it up online as a computable proposal (writing computable ontologies in X?HTML+RDFa wasn’t really viable until ~2009 and the norms around URL hygiene—ie the namespace URI is the same as the spec’s—didn’t really evolve until much later). One reason why I made Intertwingler is because the tooling around RDF absolutely sucks (still).

But this is neither here nor there. The principal value of rendering data semantics as OWL ontologies (or SHACL or plain RDF Schema) is that the terms expand to URIs, and if you do it right, those URIs resolve to the documentation for the term, and if you really do it right, the documentation for the term can also be ingested by machine and computed over. So the ActivityPub people in particular swear up and down that ActivityStreams is not RDF, even though they derived their spec from an OWL ontology which is tucked away on their GitHub repository.

Aside: <dc:creator> expands to the old http://purl.org/dc/elements/1.1/creator and is subclassed as http://purl.org/dc/terms/creator. (We tend to use #fragment identifiers now, post httpRange-14.) Moreover, shoving RDF/XML into non-RDF XML is a bastard thing to do, but if there are no schema constraints (which is implicitly the case in RSS 2.0 and explicitly the case in Atom), there is nothing preventing you from doing that, and that on its own would probably take care of most if not all of actually rendering the extensions.

One other thing worth mentioning is JSON-LD, which is totally brilliant. It has the concept of a context which is a JSON-LD document that maps the names of object keys to other key names. So you can imagine a scenario where a web server punts out a vanilla JSON file, but includes a Link: header to the context, which vanilla clients will ignore, but savvy ones can use to inflate the response into a set of graph statements. So it’d be entirely possible to make a JSON-LD context that extended JSON-Feed (agreed, a naïve proposal).

But my big take-home from the last almost-two-decades of puttering around with RDF is that syntaxes can be construed as the products of transformations between each other (or an internal representation), so as long as there aren’t any gaps, then that structure is isomorphic. That means concrete syntaxes can be considered secondary to semantics and topology.

laurex · April 12, 2024, 7:48pm

Perhaps relevant: GitHub - dariusk/rss-to-activitypub: An RSS to ActivityPub converter.

swaldman · April 12, 2024, 10:50pm

Thanks!

swaldman · June 4, 2024, 11:19pm

(i’ve been working a fair amount towards these ideas, fwiw.)

Venkat · June 17, 2024, 11:27pm

That’s great to hear!

swaldman · June 18, 2024, 1:03am

Off topic, but i sympathize with your recent post on Noun-Memory Effects. I’ve noticed some similar issues, although I find it very hard to date things like this, all my life in some sense I’ve been sure I’m losing my mind. Nevertheless, I have recently had some particularly profound how-is-it-possible-i-don’t-rememember-the-name-of-this-person-i-really-do-know moments. I did get COVID at a certain point, like just about everyone, post-vaccination, with paxlovid. Are the two phenomena related? I have no idea. (Just last night I could remember Michael Scott but not Steve Carrell, but I’m mostly talking about IRL brainfarts.)

Venkat · June 18, 2024, 1:24am

Thanks. It’s been annoying. I imagine it’s a crisis for people who depend more on live noun recall. I think it’s definitely Covid related. Enough confirmatory anecdata.

swaldman · June 20, 2024, 9:59pm

@erlend_sh @Venkat i’ve “finished” — for now! — my exploration of “sprouts”. now i get to see how well (or not) this approach works out in practice. Sprouted

erlend_sh · June 21, 2024, 8:28am

Great stuff! While it will take a few months until my own work is interacting directly with your Sprouted ideas, we’re certainly headed in that direction

kenb · June 24, 2024, 5:08pm

There is a flaw in RSS, which is, it is unable to recollect past feeds.

https://backfeed.app/ is an example of a solution which deals with this problem.

I’d be interested in seeing first class support for backfeeding if possible.

Topic		Replies	Views
PIG: Plain Art Text SoP 2024 RFC pig	0	295	April 1, 2024
[PIG] Community Search Engines: Event & Chat Artifacts as Research Building Block SoP 2024 RFC pig	17	547	April 23, 2024
[PIG] Enhancing RSS podcast feeds by using OAuth to access exclusive content SoP 2024 RFC pig	4	227	April 12, 2024
[PIG] RFC: Protocol Science Governance & Society SoP 2024 RFC pig , pill	0	114	April 12, 2024
SoP 2024 PIG and PILL RFC Templates + Instructions SoP 2024 RFC	7	1235	April 12, 2024

PIG: RSS All The Things

Related topics