World War II Visualization

I have always been interested in how history can talk to the quantitative…so as part of a small data science project I thought it was a good idea to put together a visualization of World War II using R. After some searching, I found several interesting data sources:

  1. Wikipedia: Most of the WWII battles in Wikipedia have a conflict infobox with the date the coordinates of the battle. Naval loss tables can also be scraped from various pages. It would be rather nice to visualize them in time and space using Leaflet
  2. Wikidata & DBpedia: These two databases have structured data that saves us from scraping and cleaning raw data from HTML Wikipedia pages.
  3. Naval History and Heritage Command: Almost all of the losses of the Imperial Japanese Navy during World War II can be found here, including
  4. Data.World: Data of all the bombs dropped by the Allies since WWI.

We can put then together four main datasets from these sources 1. land warfare with the date and location 2. naval loss data 3. aerial bombing data 4. command network data

Land Warfare

Fortunately, there is a list that links us to most of the WWII battles on Wikipedia (saves us from crawling). We scrap, clean and put the links into a data frame:

We can now follow the link to each battle and scrap the data from military infoboxes on the right-hand side of each page. An Xpath locator is particularly handy here to extract particular elements. We can write a function that does all the work for us. Some of the pages do not have any data and will crash the function. To make it work, a try() function can be used to skip the faulty pages and move on to the next:

Continued reading >




无趣的RMQ-ST 模板。







图论-图的连通 2-SAT








  1. KMP
  2. AC 自动机
  3. Manacher 最长回文字串
  4. 后缀数组
  5. 后缀自动机