Mapping Qing Relay Stations

The beta stage site is here.

The Motivation

During the Qing dynasty, the large network of postal and military stations, connecting the ruling kernel with China proper, Manchuria, Mongolia, Sinkiang, and Tibet, were venues of information and military logistics for monitoring the empire, and also routes which officials, scholars, merchants traveled. I felt that if I could transform the system into a GIS database, it would be an invaluable research foundation on which future investigations into Qing commercial or cultural history could be built. My project since then has been one of retracing these routes. I will give a very brief technical overview of the backstage process here, and more blogs concerning specific routes will be posted in the coming months(most likely in Chinese).

Data Aggregation

The main primary source that I used was The Jiaqing Administrative Code of the Qing Dynasty (Jiaqing Huidian 嘉慶會典). My initial plan was to extract the information using OCR and then geocoding the raw data with a Google API. However, most of the OCR software that I tried was quite abysmal at discerning Chinese characters that were oriented vertically. I tinkered around a lot with various software parameters and have even thought of writing my own OCR specialized for this purpose but to no avail. In the end, I decided that for my data size, it was faster to have a flawed OCRed result and then eyeball the data for corrections. Geocoding was also pretty challenging: the Google API could only identify place names that had not changed since the Qing period and I had to do a lot of textual research in travel logs and gazettes to uncover the present-day location. For anyone interested, Xiao fang hu zhai yu di cong chao (小方壺齋輿地叢鈔) is a good source to look into.

Continued reading >

World War II Visualization

I have always been interested in how history can talk to the quantitative…so as part of a small data science project I thought it was a good idea to put together a visualization of World War II using R. After some searching, I found several interesting data sources:

  1. Wikipedia: Most of the WWII battles in Wikipedia have a conflict infobox with the date the coordinates of the battle. Naval loss tables can also be scraped from various pages. It would be rather nice to visualize them in time and space using Leaflet
  2. Wikidata & DBpedia: These two databases have structured data that saves us from scraping and cleaning raw data from HTML Wikipedia pages.
  3. Naval History and Heritage Command: Almost all of the losses of the Imperial Japanese Navy during World War II can be found here, including
  4. Data.World: Data of all the bombs dropped by the Allies since WWI.

We can put then together four main datasets from these sources 1. land warfare with the date and location 2. naval loss data 3. aerial bombing data 4. command network data

Land Warfare

Fortunately, there is a list that links us to most of the WWII battles on Wikipedia (saves us from crawling). We scrap, clean and put the links into a data frame:

We can now follow the link to each battle and scrap the data from military infoboxes on the right-hand side of each page. An Xpath locator is particularly handy here to extract particular elements. We can write a function that does all the work for us. Some of the pages do not have any data and will crash the function. To make it work, a try() function can be used to skip the faulty pages and move on to the next:

Continued reading >


无趣的RMQ-ST 模板。

[BZOJ1003] 物流运输