> Applications using the Listen API must not pre-fetch, cache, index, or store any content on the server side.
Note that the id and the pub_date (e.g., latest_pub_date_ms, pub_date_ms...) of a podcast or an episode are exempt from the caching restriction.
Is that.. common? I've never knowingly come across anything like that before, seems weird to me. Sort of makes sense, in a 'you must not try to avoid needing to pay us more because we want more money' sort of way, but.. really? Also, almost entirely (basically, except OSS) undetectable, surely.
[Edit: failure to read my own quote correctly, thanks
xd1936] --- And if you really take it seriously - 'must not [...] store any content' - it really limits what you could even use it for, not being able to store the `id` even for a later reference. I don't think that's what's intended, but it seems to be what's written. ---
(Just so I don't sound like a grumpy old git (I'm not old, at least!) - I really really really like the docs page https://www.listennotes.com/api/docs/ only thing I'd suggest perhaps is embedding the OpenAPI 'HTML' contents below the other options, rather than it being a link to follow. Awesome though.)
Map tiling APIs do this, like Mapbox and Google. Else you could circumvent all but their lowest-tier subscription plans with a brain-dead caching proxy and a large disk which is what they want to avoid.
Amazon's API famously does this as well (or used to, it's been a while) by requiring any prices you show to be no more out of date than N minutes forcing you to basically request on demand every time you need to show it. They'd rather you just send the traffic their way for people to see the price.
Heh, yeah. I think my reaction's still similar though - why shouldn't I be allowed to do that?
The alternative of course is to charge more per tile, or have a base 'access fee' + small incremental charge. Pay per usage doesn't work best for everything, IMO.
(And I'd likely still want to come back occasionally to check it hasn't changed, even if I cached every tile forever. (Which I probably wouldn't, if the hit rate was really low, like it was a one-off, and I'm being cheap about my API usage why wouldn't I also be cheap about my disk usage.))
Companies that provide data for offline use will have a separate licensing modeling, usually with subscriptions for updates or perhaps a finite license term. MaxMind's GeoIP database is a popular example.
That's not really an answer though, that was the starting point.
And this isn't a one-off dataset, we're discussing an API pricing model - there will be new podcasts, existing podcasts' metadata will change; people using this API will want to make repeated calls, they just might also reasonably want to cache results.
If this were my service, I just wouldn't do pay-per-API-call, or at least not only. Of course, the free tier presents more of a problem then, but I'd probably just restrict it more making it less attractive, and have a lower entry point than the $100pcm that's a flat-fee for some but not all extra features, showing images at all (and not in free), for example.
As it is, I reckon loads of users cache results - not maliciously, just because they haven't read that they're not supposed to - and that OP has no idea (because how would they).
Pay-per-use is just the simplest and most straightforward and possibly fairest way to couple the value your API gives someone with the amount they pay in return.
Or, from the eyes of the user, they get full access to the API yet don't have to pay much if their project gets no traction.
The downside is that users can lie, but it's mainly just low-end users who would lie. Pay-per-user licenses are similar: a startup or a hackathon is most likely to share the license between a few people while larger companies are going to be honest because (1) they can afford it and (2) they don't want trouble at scale.
So you can ignore most abuse.
The problem with other payment structures for ListenNotes is that it's a relatively small database. You can clone the whole thing trivially. It doesn't even mirror/host the audio feeds. Its only value is that it put in the work of structuring and normalizing the metadata.
If you built a business on top of ListenNotes, you'd save more and more money as you grow bigger and bigger if you were simply cloning the whole thing with your own crawler. So the more value you would get from ListenNotes, the less you're actually paying them. Or ListenNotes would have to price their per-call fee so high that they could somehow capture a fair price for that value yet shut out smaller users.
Turns out "courtesy agreements" generally do work at scale as larger companies become less and less likely to lie just like they become less and less likely to pirate Photoshop.
> have a lower entry point than the $100pcm that's a flat-fee for some but not all extra features, showing images at all (and not in free), for example.
The downside of this is that now you limit what people can build on cheaper tiers. In fact maybe they can't even build their compelling product without whatever content you're paywalling behind tiers they can't afford on day 1, while the goal is to let someone build anything they want on day 1 so that they are a large end-user on day 1000.
After all, the ideal isn't that you scale value with your customer's income but rather you scale in price as they convert value into income. It, of course, is all just trade-offs.
Right, I would assume that even just the tiles for the biggest cities alone would still be way more than most would want to store. On the other hand, let's assume on the client-side, can you not even cache a tile a user just saw 10s ago but went off screen? Or is it assumed the browser will cache that tile?
> On the other hand, let's assume on the client-side, can you not even cache a tile a user just saw 10s ago but went off screen? Or is it assumed the browser will cache that tile?
I don't know the map tile terms, but the quoted limitation for this service specifies server-side caching.
I’ve noticed similar recently with many paid book search apis out there and was also grossed out.
You’re not paying for a data source at all, you’re paying for an expensive embedded application.
I don’t see how it’s remotely reasonable. The person managing this api has stricter protections on this data (though they’re not even his podcasts) than we have on our personal data.
You're not paying for the data, you're paying for the service.
This is common. Companies that provide the data for offline use tend to have a separate licensing and subscription fee structure. Companies that provide the API tend to forbid offline caching/storage of the data.
The service, though, is 'convenient access to the data [which is already out there]'. And once I've used it, I don't need it 100/sec just because that's how frequently people are using my downstream service to do something with some popular 'trending' podcast; I'm perfectly happy (and it would be a good practice to be!) caching it for some period, until I need the service again to conveniently see if the data that's already out there has changed.
> The service, though, is 'convenient access to the data [which is already out there]'.
The service is whatever is described in the contract you agree to when you purchase it.
If you don't like the terms of the contract, you can always try to negotiate an alternate agreement. Or you can choose not to purchase the service.
The seller isn't obligated to provide their services on your terms, just as you're not obligated to purchase the seller's services on their terms if you don't agree to them.
A single snapshot of an ever-changing database is the culmination of potentially years of research and payroll and system development that API consumers precisely didn't and don't want to do, that's what gives the dataset and thus API value.
The price that captures that value would have to be much higher in the model where you only need to access the database at some interval (let's say weekly), and that's not necessarily any more palatable.
I commend the service provider for aggregating the data and making a business - hope that person is able to make a living from it.
It’s an interesting service that I would be very interested in using in providing a service of my own. And I’d be more than happy to pay for it, but those terms are a non-starter, at least for me.
The year is 2040. There’s no running water. Grocery stores mandate that all purchased liquids must be consumed prior to leaving the premises.
The year is 2060. “Stores” begin synthetically seeding human life in closed environments in according with growth hacking best practices. Product market fit declared a solved problem.
At least for the actual audio, I understand that podcasters get grumpy when people cache that server-side, because they depend on server logs to get viewership numbers for advertisers, so if a popular client downloads the audio once and distributes it to all their customers, they can't make money off any of those customers.
Podcasts also often target advertisements geographically (based on IP address, I guess?). Being able to serve to each listener is part of their value proposition to advertisers.
I worked on a food tracking PWA, and getting it to be useful offline was horrible. We’d have to hit the API at least once a day to grab commonly used foods and refresh our temporary cache. The data did not change at all... eggs don’t suddenly have a different calorie count the next time you eat them lol
A database of all of the world's foods though could easily be larger than I'd like a calorie counting app on my phone to be though, for example. So it's not necessarily silly - network can be cheaper than disk.
I was tinkering a bit recently in an effort to build a simple system that finds 'related' podcasts and see if I can see the network effect play out over time. I did this by building a graph of people (hosts/guests) and episodes and started folding in tags/topics. None of this is in my wheelhouse, and I found:
- It takes a lot of work to curate a substantial collection of podcasts. There are lists all over the place but it's hard to know what's really in there.
- I attmpmted to use SpaCy and/or NLTK to do some 'Named Entity Recognition' in order to extract topics/people/orgnaziations from episode titles and descriptions. This was surprisingly brittle. The string 'Sean Carroll', for example, wasn't detected as a person by either framework (IIRC). It also seems quite brittle to punctuation and other context (e.g. beginning or end of a sentence). This was using the default models shipped with both. I started off with just the english models but expanded as there were lots of names being skipped silently. That helped less than I had hoped.
- I have yet to find a good UI for exploring a graph. I used Neo4j and the built in 'browser' is not intended for that purpose. Gephi has good capability for filtering and analytics, but it takes quite a bit of getting used to and the graph itself isn't amenable to dynamic navigation.
That's all. Bookmarking this as it would really help.
Love seeing development in the podcast space. One specific problem I've been wanting solved for a long while is difficulty with sharing podcasts with friends across podcast apps. If you're not using the same podcast app as your friend, it's always a pain to manually search and find the podcast in your own app. I'd love a universal podcast url, something like `podcast://<podcast_url>` that individual podcast apps can understand, which links you to the podcast within your desired app, similar to the "default browser" behaviour on mobile and desktop. Has anyone come across something like this?
Podcasts are just RSS feeds. Nothing stopping a app registering itself as a handler for the RSS mime type, at least on desktop/Android (I don't know how iOS works here). I doubt most users would have a RSS reader installed at this stage, so most users wouldn't even have a risk of getting it revealed as a list of links to audio files by using the wrong app.
This is a big problem for iOS. My spouse uses the default Podcast app. I use Overcast. Anytime she sends me something to listen iOS tries to open it in Podcasts. When I send something from Overcast it gets sent as an Overcast URL.
In the distant past (2009 or so), podcast://… would open in iTunes or similar, being equivalent to http://xn--rvg. So you’d have something like podcast://example.com/podcast.xml. I haven’t the foggiest idea whether this still works, or how HTTPS might have been integrated or not.
Haven't seen this before, an actual figure rather than 'these big names' (and you have no idea if it's just some small team somewhere for some toy test/demo, or a significant piece of the whole organisation's puzzle).
I'm (just idly) curious what number you waited for (assuming you did) before making that public. Because, and obviously it'll vary a bit for different people, there's going to be some number below which it has negative impact, not just (probably some other, with a 'meh' range between, number) above which it has the positive impact that is it's raison d'être.
Yes I assume that's the same. What I mean is '0' obviously looks bad, '2007' as it was when I commented sounds good (to me anyway).
If you knew you wanted to have that copy on day zero, you probably wouldn't launch with it, because it doesn't look good, so I just wonder at what point people think it starts to be positive, or at least not negative.
Suggestion for pivot: add a podcast playing web application (basically podcast subscriptions, you have already most other in place), and charge a more reasonable amount for that plus unlimited regular search. The pro subscription is way too expensive for me.
Edit: I didn't notice that this is about a new API service
Yeah, modern open source speech recognition like Vosk can have the cost like $2c per hour (70 times less than Google STT cost $1.4/hour) and should be just enough for search.
The hardest part is to make small incremental improvements over a long period of time :)
Like most software projects, this API is never a finished product. It's always work-in-progress.
Small incremental improvements are not glamorous, typically not newsworthy to share to the public.
Some examples of small incremental improvements:
1. Improve API docs. I heard that many API-focused startups have a dedicated team to maintain their API doc page.
2. Dealing with edge cases. As more apps/websites use our API, we'll see some edge cases that we would never know, which could be as simple as adding a data field in the response with 2 lines code change, or changing search index that requires to re-index the whole thing for a few days. There could also be some strange edge cases with billing, e.g. what if a user subscribe to the paid plan, then unsubscribe, then subscribe again, then do something strange, then unsubscribe...
3. Customer support. This involves adding FAQ (tweaking the texts) and preparing email templates to answer frequently asked questions from users.
4. Doing things to keep the service robust & performant, e.g., adding new alerts via Datadog/Pagerduty so we can know what go wrong in time. We also need to have mechanism to be able to know if a particular app sends tons of requests (e.g., send request in an infinite loop) in a short amount of time and we should be able to do something about it (e.g., suspend the account).
As an avid podcast creator and consumer, I’d love to take a full look at this, but I kept getting the full captcha experience. You know what I mean, “select the squares with sidewalks” :(
@wenbin what do you think about adding the ability to comment at specific portions of at podcast (in the player obviously) similar to soundcloud (assuming there's no patent issues).
It's not a replacement for this by any means, but in case anyone would find a reasonably up to date list of around 600,000 podcasts useful, here you go: https://gofile.io/d/MjYVy7 - No episodes, just the name, creator, and feed URLs for further crawling on your own.
For itunes api:
1. You can't search episodes
2. You can't get a lot of search results of podcasts.
3. Their terms of use may not allow you to do what you want to do
We used ListenNotes a while back in a web based podcast player and have only good things to say about the API. It's reasonably priced, much easier to deal with than Apple's API and email support is speedy!
> Applications using the Listen API must not pre-fetch, cache, index, or store any content on the server side. Note that the id and the pub_date (e.g., latest_pub_date_ms, pub_date_ms...) of a podcast or an episode are exempt from the caching restriction.
Is that.. common? I've never knowingly come across anything like that before, seems weird to me. Sort of makes sense, in a 'you must not try to avoid needing to pay us more because we want more money' sort of way, but.. really? Also, almost entirely (basically, except OSS) undetectable, surely.
[Edit: failure to read my own quote correctly, thanks xd1936] --- And if you really take it seriously - 'must not [...] store any content' - it really limits what you could even use it for, not being able to store the `id` even for a later reference. I don't think that's what's intended, but it seems to be what's written. ---
(Just so I don't sound like a grumpy old git (I'm not old, at least!) - I really really really like the docs page https://www.listennotes.com/api/docs/ only thing I'd suggest perhaps is embedding the OpenAPI 'HTML' contents below the other options, rather than it being a link to follow. Awesome though.)