CHGIS Introduction Summary Project History Applications Database Design

Summary

The China Historical Geographic Information System, CHGIS, project was launched in January 2001 to establish a database of populated places and historical administrative units for the period of Chinese history between 221 BCE and 1911 CE. CHGIS provides a base GIS platform for researchers to use in spatial analysis, temporal statistical modeling, and representation of selected historical units as digital maps.

The CHGIS project has received major funding from:

and additional assistance from:

The participating institutions have joined together to form CHGIS in order to create a new digital product for free distribution to scholars without restriction. The CHGIS management committee is responsible for the copyright of the published datasets, while each contributing institution retains the rights to further develop their own research materials as they see fit. The result is a no-cost GIS platform for use in teaching, research, and publications.

CHGIS users may obtain the data either by download from designated CHGIS websites, or on CD-ROM. Both the website and CD-ROM provide an account of the project and its development, the current CHGIS datasets, licensing information, and examples of how the data may be used.

There have been three versions of CHGIS data released (2002 - 2005), each successive version replacing the previous one. Further versions will be released until the base coverage of the following provinces has been completed--Anhui, Fujian, Gansu, Guangdong, Guangxi, Hainan, Hebei, Henan, Heilongjiang, Hubei, Hunan, Jiangsu, Jiangxi, Jilin, Liaoning, Ningxia, Shandong, Shaanxi, Shanxi, Sichuan, Yunnan, Zhejiang.

The following provinces lie outside the scope of the current project--Neimeng, Qinghai, Xinjiang, and Xizang.

CHGIS Version 1.0, (published in Apr 2002), contained datasets for the year 1820 (Qing Dynasty). Following Version 1.0, the project has been working backwards to create a continuous time series of records that track changes in placename, administrative status, and geography, as well as forwards to create a dataset of 1911 counties. It should be noted that the data for the year 1820 CE is time slice valid for a single year, and is being slowly superceded and replaced by the time series data in subsequent versions. The Version 1.0 CD-ROM was published as a demonstration of the project's scope and objectives, and the 1820 data contained on that CD-ROM should be considered as a place-holder, rather than the fully documented CHGIS time series data released later versions. It should also be noted that Version 3, (published in Apr 2005) replaces Version 2 (published in Sep 2003).

The main objective of the CHGIS project is to create a flexible tool, in the form of a documented database of places and adminstrative units, which can be used to investigate any sort of geographically specific data related to China. The unique ID numbers for each of the CHGIS temporal instance records can be used as geocodes in relational databases, or to mark up texts, enabling users to import their own datasets into the CHGIS platform. Users will be able to associate their own data with CHGIS records, and then use the CHGIS database to sort, query, and display their data for different historical periods and at different levels of aggregation.


Project History

The CHGIS project was largely inspired by the work of several scholars:

Prof. Tan Qixiang (photo), [1911-1992] the former director of Fudan University's Center for Historical Geography, CHG, oversaw the production of the groundbreaking Historical Atlas of China. In these eight volumes, Tan and his colleagues at CHG compiled maps on the national, regional, and provincial levels for particular years during each of the Chinese Dynasties. Many of the senior researchers who worked with Tan Qixiang on the original Historical Atlas of China are now working on the production of the CHGIS database, under the editorial direction of Prof. Zou Yilin (photo). Please see the members page for a complete list of the Fudan University, CHG contributors.

Prof. Ge Jianxiong (photo) worked with Tan Qixiang, and was Tan's successor as director of CHG (from 1992-2007). Prof. Ge took over the management of materials used in the compilation of the Historical Atlas of China and assembled a team of experts at CHG who are now engaged in the compilation of the CHGIS datasets. Prof. Ge currently serves as the Director of the Fudan University Library, and maintains his supervisory role for the the CHGIS project.

Prof. Man Zhimin (photo) is the current director of CHG at Fudan University, and serves as the CHGIS Project Manager at Fudan University. Under Man's direction, the fundamental research for CHGIS is carried out by a team of senior researchers, and the digitization of annotated paper maps is carried out with a staff of GIS technicians and data entry personnel. Prof. Man is an expert in historical climate change in China, and is the co-developer of the CHGIS database model with Harvard Project Manager, Lex Berman.

Prof. G. William Skinner (photo), [1923-2008] developed models of China's hierarchical regional systems which shaped the objectives and methods of CHGIS. His early trilogy Marketing and Social Structure in Rural China (1964-65) showed how periodic markets define local spatial-cum-temporal systems, and positioned the three levels of market towns as the lower rungs of China's urban hierarchy. His essays in The City in Late Imperial China (1977) conceived of China's socioeconomic landscape in terms of regional hierarchies of cities and towns, each serving as the node of a regional or local territorial system. The Structure of Chinese History (1985) theorized the link between China's spatial structure, so conceived, and historiography. After 1989, Skinner developed a method of Regional Systems Analysis for China, Japan and France, making use of GIS layers and associated databases. [Bibliography of Works by Skinner]

Prof. Robert Hartwell (photo) passed away in 1996 and bequeathed datasets that he and Marianne Colson Hartwell created under the auspices of his Chinese Historical Software, Ltd. to Harvard Yenching Institute. These materials included a functioning set of GIS datasets for the Chinese Dynasties, from Tang to Ming, which were based on the concept of "co-location," or the use of GIS representations of modern county-level administrative units as building blocks to depict the approximate shapes of historical areas. Making use of boundary data for 1990 counties to represent historical units that occupied roughly the same areas, Hartwell drew in approximate line boundaries to divide the contemporary units to fit the historical situations and therefore provide an approximation of the historical unit's area. Although the resulting boundaries are, in many cases, problematic representations, the Hartwell GIS remains an interesting heuristic GIS tool for sorting, querying, and creating digital maps for the years 742, 1080, 1200, 1280, 1391.

The CHGIS website at Harvard University has made the Hartwell GIS data available for download, and they may still be used as a means of generating approximate spatial representations of historical administrative units. However the Hartwell GIS data is not historically sourced or otherwise correlated with the CHGIS data.

See also: Peter K. Bol Intro to the Hartwell GIS (WORD format)

Prof. William Lavely, (photo) Sociology Dept. and the Jackson School of Intl Studies at the University of Washington, has done extensive work on demography in China. Prof. Lavely was one of the principal researchers responsible for China in Time and Space, which provided the first freely available combination of both statistical data and nationwide GIS layers for China. In addition to numerous demographic studies, Prof. Lavely has also developed a Coding Scheme for the Language Atlas of China (PDF), which makes use of the CITAS GIS data to map linguistic areas.

Prof. Lawrence Crissman, (photo) Director of the Asian Spatial Information and Analysis Network (ACASIAN), has spent the last decade compiling a series of GIS datasets on China, other Asian countries and the states of the former Soviet Union. The China datasets are of particular importance, as they combine information collected from both contemporary and historical map sheets, remote sensing data, and a temporal database of all official Guobiao (or National Standard) geocodes that have been assigned to administrative units since 1980. The latter database of Guobiao Codes, which identify all administrative units at the Province, Prefecture, and County levels, keeps track of any changes in the units from year to year. The ability to select units by time, and to simultaneously extract information about the preceding and subsequent units that occupied the same territory, was the key element of Crissman's GIS initial draft of a Spatio-Temporal Database Model for CHGIS.

Prof. Peter Bol, Carswell Professor of Chinese History at Harvard University, is a scholar of Song Dynasty intellectual history who actively pursues spatial analysis of historical processes in China. Bol's field work with graduate students were developed into a variety of local history projects, and Bol has been leading the development of the China Biographical Database project.

Lex Berman, (photo) Project Manager of CHGIS and affiliate of the Harvard Center for Geographic Analysis, recieved a Fulbright Scholarship in the field of Chinese Geography, and has expertise in database modeling, GIS webmapping, and website design and graphics. Serving as Project Manager for CHGIS, Lex integrated GIS datasets from a variety of sources into an authoritative online search engine. Lex also developed the CHGIS spatio-temporal database model, the CHGIS website, webmaps, XML web service, and designed CHGIS publications for CD-ROM and DVD distribution. As Director of Diamond Bay Research, Lex promotes the free sharing of Asian geodata, metadata, and encoding systems.


Applications

The CHGIS project is designed to provide a GIS platform for scholarly and scientific research. Considerable flexibility has been designed into the GIS, allowing for alternate versions and variations of feature attributes, spatial data, and competing political entities. The CHGIS aims to build a reliable database of administrative units and settlements, but does not wish to impose a closed interpretation on the relationships among those units. The advantage of creating the CHGIS, rather than printing paper maps, is that the relationships between the units can be modified and improved whenever new information becomes available and the new "edition" needs only to be posted on the Internet for users to download. The CHGIS editors have set high editorial standards for the inclusion of material in the CHGIS time series data and regulate the release of the official versions from designated CHGIS participating organizations.

Having downloaded the CHGIS datasets, the user can search the database for administrative units and capitals for any given time in Chinese history, can create customized digital maps for particular times and places, or can join their own datasets for spatial analysis, thematic mapping, or other specialized statistical modeling according to their own interests. Also included in the datasets are layers for historical coastlines, major rivers, and generalized elevations. The method of integrating user datasets into the GIS will be based on a four-step process:

  • query the CHGIS dataset for the specific period, area, and features related to the user data
  • export the selection into a spreadsheet (including the CHGIS unique ID numbers for each record)
  • join the user data (or attributes) to the exported spreadsheet
  • join the user data to the related CHGIS spatial data files

Once the imported records have been directly associated with their corresponding spatial objects, they can be mapped or put through other sorts of spatial analysis. The results of such analyses can be output as tables, displayed as digital maps, or input into other data models.

Users of CHGIS are encouraged to share their own specialized datasets with the scholarly community. CHGIS will assist in the posting of a selection of users datasets with the help of such organizations as the Harvard Geospatial Library, the Electronic Cultural Atlas Initiative, and other metadata clearinghouses.

Users are also encouraged to submit suggestions for correction or amendments to the underlying CHGIS database, which can be done by email. All submissions will be considered for review by the CHGIS editors, and if used, will be acknowledged in the notes fields of subsequent official CHGIS products.


Database Design

1. Objective

The main task of the CHGIS relational database is to create unique records for all of the administrative units down to the county (xian) level that were part of the historical dynasties of China from the time of unification (222 BCE) to the end of the dynastic period (1911 CE), and to provide documentation of the sources used to create each record. At the same time records will be created for the various states and confederations independent of those empires, referred to as "Regimes." The purpose is to create a basic database to contain all the aforementioned administrative units which can be queried and linked to digital geographic objects. In addition, settlements below the county seat level are included for some areas and periods. Settlement data will be further expanded once the basic administrative structure is established.

Queries to the database must allow users to select out the valid administrative units for any date covered by the database, or to find particular historical places by name and by feature type. Each administrative unit record in the database will also define its relationship to the hierarchical organization of the territory of the Dynasty or Regime. For example, a related table will show that a particular prefecture record was part of a particular province for a specific period of time. The hierarchical relationships can be queried repetitively to determine the administrative parent jurisdiction or subordinate jurisdictions, from the Dynasty level down to the county level.

In addition to working directly within the relational database, the users must be able to link each record to a geographic object in GIS. In other words, for a particular prefecture record, the user must be able to find a spatial object to represent the prefecture as a digital map. For prefectures, provinces, regimes, and dynasties CHGIS will digitize both polygons (to represent the area of jurisdiction) and points (to represent the location of the administrative seat). Counties and all other settlement types below the county will be digitized as point features.

As part of the digitization process, CHGIS also will produce polygons showing county boundaries for the year 1911 CE.

2. Overview:

As administrative changes take place over time, the administrative units undergo various changes, including: name changes, changes in boundary or location, and changes in feature type. The primary task of the data model is to keep track of these changing attributes and allow for attributes that are valid for specified dates to be retrieved by the user. In addition, the database must keep track of the previous and subsequent units that comprise the territory that has been affected by a change. This temporally searchable administrative change database, extended backwards in time through Dynastic periods, is the key concept behind the CHGIS Spatio-Temporal Database Design.

2.1 Placename and Feature Type Changes

Let us examine each of the types of change being tracked in turn, beginning with name changes. Chinese placenames are typically considered to have two components, tong ming and zhuan ming. The tong ming refers to the given toponym, while the zhuan ming plays the role of an identifier. This identifier is the equivalent of a feature type. For example, the Chinese river Yangzi Jiang, would be said to have Yangzi as tong ming and Jiang as zhuan ming. In other words, "Yangzi" serves as the given toponym, and "Jiang" (i.e. River) acts as the feature type. Indeed, "Yangzi" is so famous in Chinese that it can be used meaningfully by itself, even though the correct placename will be listed in reference works in a bound form with the feature type "Jiang."

The same phenomenon occurs in English when we attempt to classify placenames, because there is some ambiguity as to whether the identifier is part of a bound form or not. We always refer to Fort Dearborn, or Fort Knox, never Dearborn or Knox, but we can say either Bernalillo County or simply Bernalillo, without causing any confusion.

In Chinese, generally speaking, the zhuan ming are always included for townships, counties, and cities, but can be occasionally dropped for prefectures and provinces. The use of zhuan ming for rural townships, villages and other settlements is somewhat arbitrary. For our purposes, a change in the zhuan ming constitutes a name change, because such a change almost always reflects a change in the administrative role, ie. feature type, of the unit in question. Therefore the change of a name from Chongde Xian to Chongde Zhou and back to Chongde Xian would be recorded in three unique records in the database, as shown in the following illustration:

Unique records would also be created if the given toponym changed, as seen in the following example, where Chongde Xian changed its name to Shimen Xian in the year 1661:

2.2 Boundary Changes

The second type of change being tracked are boundary changes. Of course boundary changes only apply to administrative units which are represented as polygon objects that change over time: prefectures, provinces, dynasties & regimes. When a boundary change event occurs, the administrative unit in question either gains or loses part of its territory. In the CHGIS tables, we keep track of these changes in the columns beg_chg_type (Begin Change Type) and end_chg_type (End Change Type). These mean in essence:

  • beg_chg_type = the event as a result of which this record was created
  • end_chg_type = the event as a result of which this record ceased to be valid
The values that appear in these fields for boundary changes are: "jurisdictional area increased" or "jurisdictional area shrank." In the following example both boundary and name changes are recorded. Each row reflects one historical instance of change. CHGIS defines the historical instance as an historical object which does not change for a defined period of time. The period of time, or temporal extent, is defined by the begin and end years (a single year being the smallest temporal unit). Therefore, in the following illustration, we see that Run Zhou had four events of "jurisdictional area increase," in the years 624, 625, 687, and 741. Simultaneous with the boundary change in 741, Run Zhou changed its name to Danyang Jun. In the year 756, Danyang Jun had an event of "jurisdictional area shrinkage," and then in 757, the name was changed back to Run Zhou.

2.3 Point Location Changes

The majority of records in the CHGIS database (including more than 20,000 towns and villages and thousands of counties) are not represented by polygons. This is due to the lack of accurate historical evidence about their areas of jurisdiction. Although these records do not have polygon objects to represent them in GIS, they all have point objects. Therefore a third category of change being tracked in the database are changes in point locations. For practical purposes, this includes only county level and higher points which have a change in the location of the administrative seat. In other words, when a particular county, such as Lin'an Xian, moved its administrative office from one location to a new location, this results in two unique records in the database:

The same method is used to track changes in administrative seat locations for higher level units, such as prefectures, provinces, and regimes. These higher level units are represented by both administrative areas (polygons) and administrative seats (points) in separate GIS layers. In other words, the historical instance record for a particular place, such as Fujian Lu, actually refers to two different GIS objects, one polygon and one point. Consider the following example, which shows a series of historical instances for Fujian Lu. Please note the two additional fields on the right hand side of the table, bou_id (boundary object ID) and pt_id (point object ID).

In each of the rows you see a unique point object ID. The reason for this is that each of these changes reflects an event of "administrative seat location moved." Each of the GIS point objects representing these moves has specific begin and end dates for its period of validity. Note also the pres_loc (present location) field, which indicates that in the year 1125 the administrative office of Fujian Lu moved from a present-day location near Laocheng, Fuzhou Shi to another location further inland, Jian'ou Xian. In the year 1128, the seat of Fujian Lu was once again located at Fuzhou Shi, Laocheng. Even when the first and third locations are identical, they exist as two separate point objects in the GIS layer, because they each have unique begin and end dates. They are identical spatially, but they do NOT overlap temporally.

Also note the field bou_id (boundary object ID) which is not unique for the five instances. This indicates that the first two records (ID# 98014) are represented by a different polygon from the following three records (ID# 98000). In the year 1128, at the same time the administrative seat was moved, the jurisdictional area increased. The polygons (actually a group of polygons regionalized as one object in GIS to account for coastal islands) that represent the two periods look almost the same, but there is a difference. In 1128, Fujian Lu expanded to include Penghu Island (circled in red).

3. Relational Database

Up to this point, all of the information we have been discussing is recorded in a single table, referred to as the "Historical Instance Table," or simply the "Main Table." Each row in the Main Table represents a single historical instance, which CHGIS defines as a record describing a span of time during which the placename, administrative type, and associated spatial objects all remain unchanged. The span of time for each historical instance is defined by specific begin and end dates (the unit of measure being one year). Should any changes occur, such as those described above, a new record is added to the Main Table to reflect the new historical instance.

There are several other tables in the relational database model, the most essential of which are the Source Notes Table, Part Of Table, and GIS Info Table.

3.1 Source Notes Table

The Source Notes Table contains source citations, direct quotes, and commentaries for all of the CHGIS Time Series records in Chinese characters. We do not plan to attempt a translation of the notes, which have been compiled by the senior researchers at Fudan University's Center for Historical Geography (CHG). The copyright for all the notes and commentaries are held by the individual scholars and by the CHG, and anyone wishing to make use of them in any form, are required to obtain permission directly from CHG.

For any given Time Series record in the Main Table links to the source notes are found in the two fields called bou_note_id (boundary note ID#), and pt_note_id (point note ID#). As explained in the preceding section, each row in the Main Table can be represented by both a polygon (for jurisdictional area) and a point location (for administrative seat). The source notes provide separate, detailed citations and commentaries on the boundary and point changes.

For boundary changes, the bou_note_id for a given record in the Main Table can be looked up in the Source Notes Table. Point location changes are also linked via the pt_note_id. In the case of prefecture or higher level units, there will be both boundary and point change source notes. For county level units there will be source notes for point location changes only, and the bou_note_id field will be empty.

The source notes may also be browsed online, using the CHGIS placenames search tool. Note that only the time series records will have related source notes, and that records with identical begin and end dates will not.

3.2 Part Of Table

The Part Of Table records the parent jurisdiction for each of the administrative units in the Main Table, from the county level up to the top level. This works based on a principle of containership, where a given CHILD unit is listed as part of a particular PARENT unit for a specific period of time. For example, a particular county is listed in the Part Of Table as CHILD, and the immediately superior unit which had jurisdiction over the county is listed as the PARENT. In the Part Of Table, the placenames and unique system ID #s are provided for both the CHILD unit (CHILD_ID, CHILD_NMPY) and the PARENT unit (PRT_ID, PRT_NMPY). In the following example Youxi Xian is shown as being Part Of Guiji Jun, during the years 14-22. Guiji Jun is shown as being Part Of Xinmang during the years 9-23.

In the example above, the period of time that Guiji Jun was Part Of Xinmang begins earlier and ends later than the period of time that Youxi Xian was Part Of Guiji Jun. This is a case of temporal containership. Temporal relationships between objects that change over time closely resemble spatial relationships. Consider the following illustration:

The right side of the illustration helps us to visualize the ways in which our dated historical instances co-exist on the temporal plane. Records that follow one another in sequence are adjacent. If the end date of a record is later than the begin date of another record, then they must overlap. If the there is a gap between the end date of one record and the begin date of another record, they must be discontinuous. And when the valid span of time for a particular record lies completely within that of another record it is an example of containership.

When dealing with the historical places recorded in CHGIS, we need to keep in mind that Parent Units change asynchronously from their Children. That means that a Parent Unit might expand, contract, change its name, or the location of its capital seat within the span of time for which a Child Unit is valid. From the theoretical perspective, the temporal divisions of Parent and Child records might look something like the following:

Taking into consideration the illustration above, we can see that it is not possible to simply list one Part Of relationship for the Child Unit that began at time 23 and ended at time 92. Because the Parent Unit itself changed three times during that period, we would need to capture the relationships to all of the temporally overlapping parent records, in this case it requires four unique rows in the Part Of Table:

It is very important to note that the first Part Of record begins at time 23, because we are indicating that the Child Unit, Red, was within the jurisdiction of the Parent, Blue, so the begin date must match the begin date of the Child. However, the subsequent two changes of the Parent, Blue, occurred at times temporally contained by the Child Unit's historical instance, and therefore the begin and end dates match those of the contained parents. Similarly the end date of the Child Unit is used as the final date. As a rule, the dates of relationship rows must match the Begin and End dates of the Child, except in cases where the Parent Unit's dates are temporally contained within the valid period of the Child Unit.

Let's take the above example and see how it is actually implemented in the CHGIS Part Of Table. The example shown below is for Fu Zhou, which has an historical instance recorded in the Main Table valid from 758 CE and ending in 932 CE.

As you can see, for the single historical instance of Fu Zhou (758 CE to 932 CE), there were four temporally overlapping Parent Instances. The first relationship row matches the Begin Date of the Child, and the last relationship row matches the End Date of the Child, while the intervening relationships match the dates of the temporally contained Parents.

3.3 GIS Info Table

Information that is specific to each spatial object in GIS--such as the x - y coordinates of points, the source or basemap used for the spatial object, the object type, and the names of the editors who worked on the data--are all kept in the GIS Info Table. The GIS Info Table is compiled directly from the GIS datasets, and allows the user to browse and search all the spatial objects in one table, rather than opening up each of the GIS layers separately.

This table is linked from the Main Table using the bou_id (boundary object ID) for Polygons and pt_id (point object ID) for Points. As explained above, the historical instance recorded in a single row of the Main Table may have a Point ID, or both a Point ID and a Boundary ID. These IDs can be looked up in the GIS Info Table to find specific information about the spatial objects which represent each historical instance, including the filename of the original GIS layer that contains the spatial object. In short, the GIS Info Table acts as a linktable between the Main Table and the GIS layers.

See more documentation of CHGIS Data Models