A curated collection of interesting GitHub repositories
View the Project on GitHub tom-doerr/repo_posts
process, filter, and deduplicate large-scale text data with customizable pipelines