Personal | Yeming Tang

Paper Citation Crawler

I developed a web crawler for gathering paper meta information in BibTeX format from ACM conference pages. The crawler used Python-Requests and PhantomJS for fetching and processing web pages. It employed regular expressions and BeautifulSoup for extracting the desired data. To enhance efficiency, multi-threading and proxy configurations were implemented. The collected BibTeX files are available in my public GitHub repository for easy access and sharing.

Codebox Docker Image

I created a Docker image featuring Codebox, allowing me to set up a portable development server for coding on various devices. The image has gained popularity, with over 6.7K community pulls.

Sharet

The project is a web-based file sharing application that allows users to upload and retrieve files using shortened URLs or QR codes. It efficiently organizes file metadata, prevents duplicates using MD5 checksums, and is built as a Docker image for easy deployment. It provides a convenient and versatile solution for efficient file sharing.

danmaQ: A Live Bullet Screen Application

danmaQ is a floating text comment app that revolutionizes audience engagement in live streaming broadcasts. Developed with Python, HTTP, GUI, multi-threading, Nginx, and Git, the system consists of a server, a browser-based client, and a display module. Key features include comment storage, query responses, and comment auditing by administrators. Since its launch in 2015, the app has been successfully hosted by the Tsinghua TUNA Association, serving teachers and students as a dynamic and interactive communication tool.

Cento: A Chinese Classical Collage Poems Generator

The Cento Project generates Chinese classical collage poems by selecting lines from different poems, showcasing the use of advanced natural language processing techniques in poetry creation.