3 Getting the DIVA-GIS Data

DIVA-GIS provides free country level road and administrative boundary data for all of India, which can be conveniently accessed through the following links:

These data are also sourced from GADM, where the data can also be downloaded in additional formats here.

3.1 Downloading the Data Automatically

You can download the data manually, but where’s the fun in that? Lets make our code reproducible by making downloads automatic!

Notice that the data from the links are zip files, and need to first be unzipped to see its contents. To do that, we will:

  1. Download the datasets into a folder called data
  2. Unzip the downloaded files
  3. Remove the zip files as they are no longer needed

3.2 Inspecting the Data

We can now inspect each of the unzipped folders with the administrative area:

##  [1] "IND_adm0.cpg" "IND_adm0.csv" "IND_adm0.dbf" "IND_adm0.prj"
##  [5] "IND_adm0.shp" "IND_adm0.shx" "IND_adm1.cpg" "IND_adm1.csv"
##  [9] "IND_adm1.dbf" "IND_adm1.prj" "IND_adm1.shp" "IND_adm1.shx"
## [13] "IND_adm2.cpg" "IND_adm2.csv" "IND_adm2.dbf" "IND_adm2.prj"
## [17] "IND_adm2.shp" "IND_adm2.shx" "IND_adm3.cpg" "IND_adm3.csv"
## [21] "IND_adm3.dbf" "IND_adm3.prj" "IND_adm3.shp" "IND_adm3.shx"
## [25] "license.txt"

and the roads data:

## [1] "IND_roads.dbf" "IND_roads.prj" "IND_roads.shp" "IND_roads.shx"

In the administrative area and roads data, there are 4 types of files (.cpg, .dbf, .shp, .prj, .shx), which correspond to character encoding files, database file, shapefile, projection system file, and a shape/font file used commonly by CAD. The main file we will be focusing on here is the shapefile (.shp), which is an ESRI (a well known Geographic Information Systems (GIS) company) vector data format that is widely used in the field of GIS.

More details on shapefiles can be found here from ESRI, and here from gdal.

3.3 Reading the data into R

Now that we have some understanding of the file formats, we can try reading some of the files into R:

## Reading layer `IND_adm0' from data source `D:\windows\Users\rrwen\Desktop\idea-geoaggregate-indian-roads\data\india-admin-areas\IND_adm0.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 70 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 68.18625 ymin: 6.754256 xmax: 97.41516 ymax: 35.50133
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

When you read the data into R, it will provide you with some general information about the data:

  • Number of features and fields (rows/geometric objects and columns/variables)
  • Geometry type (point, polygon, linestring, and multi-variants of those)
  • Dimension of the geometric data (2D XY or 3D XYZ)
  • Bounding box (bbox) or the encompassing rectangular area of the data
  • Spatial reference ID (epsg SRID) for defining the projection system used
  • String defining additional parameters for the projection system

There are 29 states and 7 UTs in India for a total of 36 states/UTs (knowindia.gov), which means we should have 36 geometric objects in one of the datasets inside data/india-admin-areas.

Looks like the file IND_adm0.shp (level 0) only has 1 feature, which is probably not the data level we are looking for.

Let’s try the level 1 administrative areas next:

## Reading layer `IND_adm1' from data source `D:\windows\Users\rrwen\Desktop\idea-geoaggregate-indian-roads\data\india-admin-areas\IND_adm1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 36 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 68.18625 ymin: 6.754256 xmax: 97.41516 ymax: 35.50133
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

Hey, this looks like it could be the states/UTS! There are 36 features (geometric objects) here, but just to be sure, lets check the level 2 and 3 data as well:

## Reading layer `IND_adm2' from data source `D:\windows\Users\rrwen\Desktop\idea-geoaggregate-indian-roads\data\india-admin-areas\IND_adm2.shp' using driver `ESRI Shapefile'
## Simple feature collection with 594 features and 11 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 68.18625 ymin: 6.754256 xmax: 97.41516 ymax: 35.50133
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## Reading layer `IND_adm3' from data source `D:\windows\Users\rrwen\Desktop\idea-geoaggregate-indian-roads\data\india-admin-areas\IND_adm3.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2299 features and 13 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 68.18625 ymin: 6.754256 xmax: 97.41516 ymax: 35.50133
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

The level 2 data has 594 features (a bit too much for states/UTs data), while the level 3 data has even more at 2299 features (far too many objects!).

So it looks like the level 1 data is what we are looking for! Let’s map it to further inspect it (more on this in the next section):

It was not a coincidence that level 1 had 36 features when we look at the map above!

To recap, we can see that there are levels 0 to 3 (larger less detailed boundaries at level 0 to smaller more refined boundaries at level 3). Since we are looking for state/UT boundaries (a total of 36), we will use level 1, which has 36 geometric features for our walkthrough.

Note: The associated state/UT names can also be extracted from the column NAME_1 which refers to the names for level 1 administrative boundaries:

##  [1] Andaman and Nicobar    Andhra Pradesh         Arunachal Pradesh     
##  [4] Assam                  Bihar                  Chandigarh            
##  [7] Chhattisgarh           Dadra and Nagar Haveli Daman and Diu         
## [10] Delhi                  Goa                    Gujarat               
## [13] Haryana                Himachal Pradesh       Jammu and Kashmir     
## [16] Jharkhand              Karnataka              Kerala                
## [19] Lakshadweep            Madhya Pradesh         Maharashtra           
## [22] Manipur                Meghalaya              Mizoram               
## [25] Nagaland               Orissa                 Puducherry            
## [28] Punjab                 Rajasthan              Sikkim                
## [31] Tamil Nadu             Telangana              Tripura               
## [34] Uttar Pradesh          Uttaranchal            West Bengal           
## 36 Levels: Andaman and Nicobar Andhra Pradesh Arunachal Pradesh ... West Bengal

3.4 Reading the Data for Our Walkthrough

Based on the inspection above, go ahead and read the appropriate data into a sf object:

## Reading layer `IND_roads' from data source `D:\windows\Users\rrwen\Desktop\idea-geoaggregate-indian-roads\data\india-roads\IND_roads.shp' using driver `ESRI Shapefile'
## Simple feature collection with 19148 features and 5 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 68.49822 ymin: 7.925284 xmax: 97.33479 ymax: 35.50128
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## Reading layer `IND_adm1' from data source `D:\windows\Users\rrwen\Desktop\idea-geoaggregate-indian-roads\data\india-admin-areas\IND_adm1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 36 features and 9 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 68.18625 ymin: 6.754256 xmax: 97.41516 ymax: 35.50133
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

The next section will focus on producing some basic maps for visual exploration.