Show HN: Build a SQLite satabase from your Reddit data With Reddit's upcoming API changes[0], I got nervous that I'd no longer be able to access the many posts and comments I've left there over the years. Inspired by the Dogsheep projects[1], this CLI lets you immediately pull your most recent 1k comments & 1k posts (the max allowed by the paged API) into a nicely-structured SQLite database. It's perfect for loading into Datasette for nice viewing & full-text search of your content. Taking it a step further, the project's killer feature is the ability to import data from GDPR archives. This allows you to store your full Reddit history (including deleted comments and posts on removed subreddits). I hope you find this tool useful! I'll be around to answer questions and field comments (or feel free to open a GH issue). [0]: https://ift.tt/7freIRv... [1]: https://ift.tt/WkEYjGM https://ift.tt/9Ug2fsF May 24, 2023 at 11:35PM
Show HN: Yakread – An RSS reader powered by machine learning This is a web-based reading app I've been working on since August. The main differentiator is that Yakread uses machine learning to rank the articles in your feed: as you click on articles from a particular RSS/newsletter subscription, other articles from that subscription will tend to be ranked higher in the future (via a bandit algorithm). Yakread also uses ML to recommend articles that other users have read, so your feed will have articles in it even before you sign up and add your own subscriptions. For the recommendations, I'm using the collaborative filtering implementation from Spark MLlib[1]. I model RSS feeds instead of individual articles: when you click an article, that counts as a "point" for that article's RSS feed; at recommendation time, the algorithm first selects an RSS feed to recommend, and then it picks one of the popular/recent articles from that feed. To counter popularity bias, I h...