Right, I will definitely work on adding data policy to terms of service. I think it will go along the lines, that you cannot remove the attributions/trademarks, if you are republishing the data, but doing analytics on it is ok.
It's a pretty fine line you're treading here. A lot of the reason the competition is so expensive is that they are licensing the data in bulk. There's lots of litigation in this area, see the Meltwater cases in US and UK.
If you have say 10k news sources from tens of countries around the globe, I doubt that it would be feasible to contract with every specific site out there, especially if you get data say from Google News.
It doesn't have battle-tested everything and the kitchen sink like Rails. It has plenty of frameworks though. They just happen to be far more like ExpressJS for node, or Sinatra for Ruby. In this day of microservice/thin JSON API-style apps, Rails might be overkill. I'm not saying it's bad, it just might be a little much.
This greatly reminds me of my experience with AEM (Adobe Experience Manager). Web-based IDE, version control via OSGI bundles, content repository via Jackrabbit.
I did not enjoy my time working with it but it had some neat ideas and did do some things well.