GRASS GIS and R
The movement away from proprietary GIS packages like Arc that are convenient but expensive and toward an open-source program like GRASS has not quite achieved the level of SAS/SPLUS/STATA/etc to R, but the analogy holds: there is a large and growing community of users that are continually providing new packages in a free software product that is harder to learn (at first) but ultimately more powerful and satisfying. And the best bit is that GRASS is now designed to integrate (relatively) smoothly with R (R runs within GRASS), allowing powerful statistical analysis of spatial data and basically one-stop shopping for species distribution modeling. We'll just cover the basics of starting with GRASS and interfacing with R, but I recommended these resources for continuing on:
Neteler, M. and Mitasova, H. 2008. Open source GIS: a grass GIS approach (3rd Ed.). Springer.
Hengl, T. 2009. A practical guide to geostatistical mapping. Open Access (FREE!):
http://blog.revolution-computing.com/2010/04/a-free-book-on-geostatistical-mapping-with-r.html
Installing GRASS GIS
As of this writing, you have 3 basic options for working with GRASS: 1) you can run GRASS in its native UNIX environment on a Linux box or Mac (recommended; find a linux box and run everything via a remote terminal is the best option, thereby not worrying about local computer performance); 2) you can be a guinea pig for the *new* native Windows GRASS, available here:
http://grass.itc.it/grass64/binary/mswindows/native/
which does not require a unix emulator but is still in beta stage and has no easy integration with R; or 3) install the Cygwin linux-like environment for Windows and run GRASS from Cygwin, which used to be a decent alternative to the UNIX version but recent Cygwin changes have made installation and particularly R integration difficult. As much as I wanted to make things easy using a Windows install, I have had so many problems this week working with winGRASS and the new Cygwin changes to GRASS that I'm instead going to show you the Linux version, which at this point seems to be the only option for smooth R integration. For intrepid Windows junkies I post how I proceeded with the latest Cygwin interface in the box below. Mac and Linux users can download GRASS binaries here:
http://grass.osgeo.org/download/software.php
and Linux and skip the box below.
---------------------------------------------------------------------------------------------------------------------------------------------------------------
GRASS on Windows via Cygwin
The
page supposedly describing GRASS 6.3 installation via Cygwin is here: http://grass.osgeo.org/grass63/binary/mswindows/cygwin/#prereq
What
they don't tell you on that page is that the default package
installer for GRASS does not get all the components you need for
visualization
(that is, the Windows X server that allows you to open new frames for
map
display). So I recommend you start here:
http://x.cygwin.com/docs/ug/setup-cygwin-x-installing.html
Follow these instructions carefully, in particular the X packages listed in step 15. During the download and installation process you may get notices that the download could not be completed: choose another download mirror site and try it again. The install process remembers what you've already successfully downloaded, so it doesn't take as long in repeated attempts (although repeated cycles through the process is a bit frustrating). At the last step, choose to create the icons and then fire up Cygwin from the desktop icon.
You
now have an environment that simulates a bash shell UNIX
environment. All the
files that Cygwin can 'see' are embedded in your home directory under
C:\cygwin. To start the X windows server, type 'startxwin' at the
bash
prompt. A new white background dialogue box appears that will run
GRASS.
However, you don't have it yet. Put the 'setup.exe' installer you
downloaded
at the beginning in the C:\ directory (if it's not there already), and
from the
DOS command line (Start-Run: Cmd), type "setup.exe –X". The Cygwin
installer starts again, and this time at step #3 add this location to
the list
of sites:
http://grass.ibiblio.org/grass63/binary/mswindows/cygwin/
and select it (just it, no other mirror necessary). Now under the 'Database' package folder, select "grass: Geographic…", and click Next until the installer finishes. You now hopefully have grass63 installed (or possibly grass62; some mirrors have different versions available). From your X window, type "grass63" (or grass62), and now we're ready to get started, you can proceed with defining a LOCATION AND MAPSET below.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Making the Arc-GRASS transition
I will assume you already have at least a little GIS know-how and have worked with maps in Arc (not essential if you haven't), and are in the position of having an existing raster dataset in Arc that defines your region of interest (e.g., a DEM). You can port your own raster from Arc to GRASS by saving the Arc raster as a ESRI ASCII grid file (same input for Maxent). (In ArcToolbox: Conversion Tools -> From Raster -> Raster to ASCII). This should be a .txt file. The file is put in the GRASS data directory (see below for Linux; in Cygwin this is created in your home directory (mine is C:\cygwin\home\Fridley\grassdata). Here we'll use the elev.asc file from the Maxent session (in grids.zip), renamed to "elev.txt".
There are other tips for importing Arc vector data, etc here:
http://grass.osgeo.org/wiki/Tips_for_Arc_users
`
Defining a LOCATION and MAPSET
Because of its size and formatting complexity, working with GIS data is always a hassle, but the advantage of GRASS is that the hassle is mostly front-loaded: it takes some time to start work in a new region and upload new files, but once they're in the projection and formatting-related problems disappear. (Anyone who works in Arc knows that the Arc situation is the opposite: easy to get files in, sometimes impossible to get them to talk to each other in an analysis.) Once you get the hang of it creating new projects is easy, but you do need to know a few basic parameters of your region of interest to get started.
Whether you're in Linux or Cygwin or winGRASS, GRASS projects have a nested directory structure. All objects are stored in a central database directory (e.g., /home/user/grassdata), which is where you put maps to be imported (here, our elev.txt file). Nested within this directory are separate directories for different projects called LOCATIONs. A Location is both a folder and a geographical region defined by standard spatial properties of spatial extent, resolution, measurement unit (e.g., UTMs), and projection/datum, all of which must be defined when a new Location is created. A MAPSET is a directory of maps for a particular Location. Mapsets are most useful in a multi-user environment, where a central set of read-only maps (often called PERMANENT) can be copied and manipulation by individual users, each with their own Mapset folder (/user1, /user2, etc). I typically create one Mapset called PERMANENT and don’t worry about creating others. Finally, within the Mapset folder are separate folders for different data types (rasters, vectors, volume maps), which fortunately you don't have to worry about because these are all visible to in a standard GRASS session. This diagram from the GRASS Quickstart Wiki (http://grass.osgeo.org/grass64/manuals/html64_user/helptext.html) summarizes how GRASS projects are organized:

Creating a new Location and Mapset for your region of interest is straightforward once you have basic spatial information for it. Here is an example working from the command line of GRASS 5.4 in Linux (instructions for the GUI interface for winGRASS are in the above link).
Start grass:
$ grass54
The define location screen opens (this is always the first screen). Type in a new location name:
LOCATION: BIO793_GSMNP
Keep the Mapset as PERMANENT, and indicate ('y') that you would indeed to create this Location, and give it a one-line description. For the Smokies most geocoordinate data are in UTMs (option C). For datum, specify nad27 (type 'list' to see options). A geodetic datum is a particular spatial reference grid; NAD27 is the older North American Datum, and also common is WGS84 (World Geodetic System of 1984) that covers the globe, which is basically equal to NAD83. Specify option 1 for datum transformation (contiguous North America). Our UTM zone is 17 (north).
Next we specify the UTM boundaries of the location. You can get this info using the spatial attributes property on an existing map in Arc, or you can examine the header information for elev.txt, which lists the X,Y coordinates of the lower-left (ll) corner and calculate E-W-N-S boundaries by multiplying the number of rows or columns by the resolution (here, 30 m). Our west edge is 227845, south edge is 3922854, north edge is 3963294 (=3922854+[30*1348]), and east edge is 315235 (=227845+[30*2913]). We'll work from a spatial resolution of 30 m, which matches our input DEM. After accepting these parameters, our new location is created. GRASS then returns to the define Location and Mapset page; select our new location and GRASS starts from the command line. Note the default GRASS data directory on the define location screen; this is again where maps to be imported should be placed (here, elev.txt). That's it—the hard part is now over.
Importing rasters
Standard
GIS tasks in GRASS are usually very simple. For example, to import
our
elevation raster, type:
r.in.arc input=elev.txt output=elev
which
creates the raster file elev and puts it in our raster folder in our
PERMANENT
Mapset. Note that the syntax for GRASS commands is based on UNIX: a
particular
command function (r.in.arc)
is followed by required and optional arguments (here, two required
arguments
are input and output). If you omit required
arguments, GRASS will usually prompt you for more information.
Raster-based
commands start with 'r.', vector commands with 'v.', and general
commands (like
file manipulation, location information, etc.) with 'g.'. For the
latest
version (GRASS 6.4), the list of available functions with links for each
are
here:
http://grass.osgeo.org/grass64/manuals/html64_user/index.html
For map display in both Linux and Cygwin, GRASS uses the standard X windows display drivers, initiated by:
d.mon start=x1
which opens a monitor window. You can open more windows (x2, x3, etc) as needed, and close them with replacing start with stop. To display our imported elevation map:
d.rast elev
You can add a legend with
d.legend elev
or try just d.legend without arguments to see the range of formatting options. A handy tool for summarizing a raster is r.univar, which lists the total cells and simple stats of the distribution of raster values.
Topographic calculations
Aside from interfacing directly with R, a main advantage of GRASS is the built-in functionality for using DEMs to create standard topographic indices that are often used as proxy variables for critical environmental factors like water and radiation. For example, the r.slope.aspect function uses a DEM to create slope and aspect maps, as well as other DEM derivatives (curvature):
r.slope.aspect elevation=elev slope=slope aspect=aspect
Generation of new raster maps 'slope' and 'aspect' took a few seconds on my Linux box.
For
hydrologic variables, make sure your region includes all upslope
contributing
area. A typical example is the topographic convergence index (TCI)
of Beven
and Kirkby (1979, A physically-based variable contributing area model of
basin hydrology,
Hydrol. Sci. Bull. 1: 43–69), which calculates 'potential wetness' as
the total
upslope contributing area corrected by the steepness of the local slope:
TCI =
log(upslope area / tan(slope angle)). This is a one-line command
in GRASS,
requiring only an input DEM and specification of the output file:
r.topidx intput=elev output=tci
Despite our DEM containing over 2 million pixels, this took only about two minutes on my Linux box. You can do the same calculations using GRID in Arc (or Arc Toolbox), but it takes several steps and (in my experience) much longer. Users with decent rainfall and ET data can create dynamic (spatial water flow) maps using the TOPMODEL hydrologic model with the r.topmodel command. There are many other raster functions related to hydrologic modeling: r.watershed can delimit separate watersheds from a DEM; r.terraflow creates flow accumulation and direction maps (used to estimate stream locations or find ponding basins), and see the raster command manual above for others.
Another
very useful function in GRASS is r.sun, a
solar irradiance model that uses a DEM along
with earth-sun geometry to calculate daily solar radiation for each
pixel
(direct, diffuse, and reflected). You specify a day and overall
latitude, and
the algorithm computes the intensity and daylight length (including a
terrain
shadowing effect) to derive total daily radiation (in watts per m2 per
day).
You can optionally include other variables or maps that allow for ground
albedo,
atmospheric turbidity, and cloudiness. Here's an example for
calculating
direct-beam radiation for each pixel on June 20 (Julian day 171), using
shadows
but assuming clear-sky conditions:
r.sun –s eleven-elev aspin=aspect slopein=slope
day=171 lat=35.00
beam_rad=rad171
Currently, it is trickier to sum up radiation values for multiple days (e.g., a full month or differences in total annual radiation between pixels). I paste an addendum shell script below that automates the r.sun routine for any list of days you specify. You can also do this with the R interface, but for large rasters these calculations are much faster in GRASS than GRASS/R (several minutes in this case).
Extracting values from point samples in GRASS
Outside of the R environment, it is often useful to extract values of certain raster maps for use in subsequent modeling. In GRASS this is accomplished using the r.what command along with a text file in your working directory that specifies the coordinates of your locations of interest; you put a text file of coordinates in, and get a text file of coordinates plus raster values back. It is a slightly confusing process but the following steps work for me:
1. In Excel, save your coordinates as 3 columns: X, Y, and (optionally) a site label. For a 4th column, just paste a comma (",") in a a cell at the end of the line. Save it as a tab-delimited text file with no header (no column labels). When you open it in a text editor, it should look like the treedataxy.txt file. For example line 1 is:
275736 3942439 ATBN-01-0403,
2. Put this file in your GRASS working directory.
3. In GRASS:
r.what input=elev,tci null="NA" <treedataxy.txt >output.txt
Which will extract elevation and TCI values according to the coordinates in treedataxy.txt, and place them in a new text file output.txt. Missing values will get a NA value.
4. The output.txt file now has new rows under each original row for elevation and TCI values. To format this correctly (one row per site), open the file in a good text editor (my favorite is TextPad), and use the 'find and replace' function to replace the end of the line comma with nothing. In TextPad: hit F8, make sure 'regular expression' is checked, find ",\n", replace with [blank]. This kills the commas and the line breaks, and it's now easy to examine in Excel (import text with | delimiter) or feed into R.
Of
course you often desire values from many rasters at once, often those
whose
names are almost identical (but may differ in number, such as days of
radiation); for this you can use r.what
along with the g.mlist
command to specify a
set of rasters that share part of their names. Pay close attention
to the
syntax here for the input argument:
r.what
input="`g.mlist pattern='radiation*' sep=,`" null="NA"
<treedataxy.txt >output.txt
This grabs values for all rasters with "radiation" at the beginning of their name.
Exporting rasters out of GRASS
Any
new maps created in GRASS but needed elsewhere (like Arc) can be
exported with
various 'r.out…'
commands specific to particular raster formats. Exporting to Arc
via a ASCII
GRID file is accomplished via:
r.out.arc input=tci output=tci.txt
which places the tci.txt file in your working directory. In Arc Toolbox, use Conversion Tools -> To Raster -> ASCII to Raster to import the raster back into an Arc raster file.
An example shell script for automating GRASS functions using r.sun
Save this as a text file in the grass data directory with a '.sh' suffix, and then execute in GRASS by typing its name at the GRASS prompt.
# Compute a sum of daily global radiation for a given period of the year
echo "Enter elevation map:"
read elev
g.copy rast=$elev,SOLelev
echo "Enter slope map:"
read sl
g.copy rast=$sl,SOLslope
echo "Enter aspect map:"
read asp
g.copy rast=$asp,SOLaspect
echo "Enter Latitude of given region:"
read lat
#loops over Julian days, from 'i' to 'lastday'
i=1
lastday=30
while [ $i -le $lastday ]
do
# generate convenient map names
DAY=`echo $i | awk '{printf "%03i", $1}'`
echo "Computing radiation for day $DAY..."
r.sun -s elevin=SOLelev aspin=SOLaspect slopein=SOLslope\
lat="$lat" day="$i"\
beam_rad=b_rad.$DAY diff_rad=d_rad.$DAY\
refl_rad=r_rad.$DAY
r.mapcalc totalrad.$DAY = "b_rad.$DAY + d_rad.$DAY + r_rad.$DAY"
r.timestamp totalrad.$DAY date="$i days"
r.colors totalrad.$DAY col=gyr
i=`expr $i + 1`
g.remove rast=d_rad.$DAY,r_rad.$DAY,b_rad.$DAY
done
#sum all days for total radiation, output raster as "globalrad"
echo "Summing all daily rasters..."
r.series input="`g.mlist pattern='totalrad*' sep=,`" output=globalrad method=sum
#note: if more than 200 or so rasters, have to break this up into pieces
#cleanup:
g.remove rast=SOLelev,SOLaspect,SOLslope
echo "Finished."
GRASS and R
Starting the R environment from within a GRASS
session gives
you a powerful approach for spatial modeling and is one of the clear
advantages
for using GRASS over Arc and other GIS software. In UNIX/Mac,
run R from the
GRASS shell environment by starting up a GRASS session in the desired
LOCATION
and MAPSET, then at the GRASS prompt type 'R' to start an R
session. The R
package required to interface with GRASS depends on which version of
GRASS
you're using: for GRASS 5.x, use the "GRASS" library; for GRASS 6.x,
use the "spgrass6" library. For example:
install.packages("spgrass6",
dependencies = TRUE)
Although the way R interfaces with GRASS is different with version 6 (supposedly faster), the commands are similar regardless of version. For Windows users, the situation looks a bit dire, as I can find very little in the way of recent web updating for integrating with R via the new Cygwin changes, and I am unaware of a reasonable integration of winGRASS with R. (Those wanting to try can explore old GRASS-R posts on the web; please let me know of any successes.) A general introduction to GRASS-R is here:
http://grass.osgeo.org/statsgrass/grass_geostats.html
Preliminaries
In this example I'm starting GRASS 5.4 from our BIO793_GSMNP2 location and the PERMANENT mapset. After the R GRASS (or spgrass6) package is installed and R is running from the GRASS environment, load the library and then use the gmeta command to load the current GRASS environment parameters into R:
library(GRASS) #or library(spgrass6)
G = gmeta()
This creates an object of class=grassmeta,
which
tells R which versions of the generic functions to use (like plot), and it
defines the spatial
environment. A summary command specific to GRASS meta-objects
shows:
summary(G)
Data from GASS 5.0 LOCATION BIO793_GSMNP2 with 2913 columns and 1348 rows; UTM, zone: 17.
The
west-east range is 227845, 315235, and the south-north: 3922854,
3963294;
West-east cell sizes are 30 units, and south-north 30 units.
The str(G) command shows that G is just a
list containing spatial information. Note that
all GRASS environment commands are available in R using the system
command. For example, to see a
list of available rasters use the g.list command as if in GRASS:
system("g.list
rast")
-----------------------------------------------------
raster files available in mapset PERMANENT:
aspect elev slope tci
-----------------------------------------------------
To manipulate GRASS rasters in R, the rast.get command converts a raster into a large R list object. The c(F) argument tells R that you want the raw raster values rather than a categorical dataset (you can also use catlabels=F), and the G argument is needed to specify the GRASS spatial environment:
elev = rast.get(G,"elev",c(F))
str(elev)
List of 1
$ elev: num [1:3926724] NA NA NA NA NA NA NA NA …
The NAs describe missing cells. Note that
the raster values
are a vector nested inside a list, so to access particular values you
need to
specify both the list name and the vector name. Note the
difference:
length(elev)
[1] 1
length(elev$elev)
[1] 3926724
As one single vector, there is no X or Y information in our R object. To create an entire data frame with X, Y, and Z (here, elevation) data, we extract the 'east' and 'north' elements of our G object:
elev.frame = data.frame(east(G),north(G),elev$elev)
str(elev.frame)
'data.frame': 3926724 obs. of 3 variables:
$ east.G. : num 227860 227890 227920 227950 227980 …
$ north.G. : num 3963279 3963279 3963279 3963279 3963279 …
$ elev.elev : num NA NA NA NA NA NA NA NA NA ...
The naming conventions are a bit silly, so change
via:
names(elev.frame)
=
c("x","y","elev")
Depending on the objectives of the analysis, it
may make
sense to create one data frame for the region that contains all
rasters
(although beware of creating very large objects). You can grab
multiple
rasters with rast.get and then put them each as a separate column in a
data
frame:
rast.list
= rast.get(G,
rlist=c("elev","aspect","slope","tci","rad171"),
catlabels=as.logical(rep(F,5)))
rf = data.frame(east(G),north(G),rast.list$elev,rast.list$aspect,rast.list$slope,rast.list$tci,rast.list$rad171)
names(rf) = c("x","y","elev","aspect","slope","tci","rad")
str(rf)
Now you can access all raster variables as you
normally do
those in data frames. For example, compare elevation summary
statistics with
GRASS output:
summary(rf$elev)
system("r.univar elev")
The generic plotting function for GRASS objects is
a 3D
image, where you specify the region (G) and the R vector for a
particular
raster:
plot(G,rf$elev,col=terrain.colors(30))
R has functions for specifying certain color palettes: the above is for basic terrain (the argument is for the number of categories to display) and you can also try topo.colors(), heat.colors(), and rainbow(). There is a built-in function for making contour lines for raster in GRASS-R 5 called contour.G, which we can superimpose our on image:
contour.G(G,rf$elev,add=T)
Spatial analysis
One you have a data frame of X, Y, and various 'Z' coordinates, spatial analysis via a variety of R packages is straightforward. Given the number of calculations involved (for example, using pairwise comparisons of millions of pixels), you often want to take randomly sampled rows of the spatial data frame rather than the entire set, and it's generally a good idea to remove missing values before the analysis. A polynomial trend surface fitted by least squares (or generalized least squares using surf.gls) can be fit to a raster using the surf.ls function in the spatial library, and displayed with a GRASS 5 wrapper called trmat.G:
library(spatial)
xyz = na.omit(rf[,1:3]) #remove missing values of elevation
xyz2 = xyz[sample(1:2292309),10000),] #10,000 random points (not NA)
names(xyz2) = c("x","y","z")
trend = surf.ls(4,xyz2) #4th-order polynomial
plot(G,rf$elev)
contour.G(G,trmat.G(G,trend),add=T) #superpose contour lines of trend model on raster
The spatial package includes a function for
creating
variograms that takes the fitted trend-surface as input, along with
desired
number of distance bins. correlogram is the equivalent
function for
distance-based covariances.
variogram(trend,nint=30)
Mapping predictions from a species distribution model
Predictions from a fitted species distribution model can be easily mapped in the GRASS-R interface using the predict function for any model building exercise and a data frame of raster variables that were used in the model. Here's an example where we return to our Smokies tree dataset and are interested in examining the probability of Fraser magnolia occurrence across the entire Park, based on elevation, moisture (tci), and site radiation that we calculated for June 20. Make sure that the treedata.csv file is in the working directory. I additionally need to import the radiation raster (rad171.txt):
system("r.in.arc input=rad171.txt output=rad171")
rad171 = rast.get(G,"rad171",c(F))
rf$rad = rad171$rad171
Let's create the dataset for magnolia:
dat = read.csv("treedata.csv")
dat2 = subset(dat,dat$species=="Magnolia fraseri")
b.plots = unique(as.character(dat2$plotID)) #plots with magnolia
u.plots = unique(as.character(dat$plotID)) #all plots
nob.plots = u.plots[is.element(u.plots,b.plots)==F] #plots without magnolia
dat3 = subset(dat,is.element(as.character(dat$plotID),nob.plots)) #dataset of no magnolia
dat4 = subset(dat3,duplicated(dat3$plotID)==F) #one row per plot
dat4$cover = 0 #cover of magnolia is zero in these plots
dat5 = rbind(dat2,dat4) #new dataframe of presences and absences
We need to extract values of our new variable rad
corresponding to presences and absences from our existing raster. We
can use
the utme and utmn coordinates to subset the raster values in R, but
note they
don't exactly equal our X and Y values. An easier option is to
route it
through GRASS using the r.what command:
frame1
=
data.frame(dat5$utme,dat5$utmn) #data frame of only x
and y coordinates
write.table(frame1,file="magout.txt",sep="\t",row.names=F,col.names=F) #write a text file to working directory for r.what to access
system("r.what input=rad171 <magout.txt >output1.txt") #create a new text file with appended rad171 values
rad1 = read.table("output1.txt",sep="|") #read new text file
dat5$rad = as.numeric(as.character(rad1[,4])) #final column of text file is appended rad171 value; beware it is imported as a factor rather than numeric values
Now we construct a GAM model with binomial error:
library(mgcv)
gam1 = gam(cover>0~s(elev)+s(tci)+s(rad),family=binomial,data=dat5,na.action=na.exclude)
summary(gam1)
Using the predict function we can insert a
another
column into our raster data frame:
rf$predict1 = as.numeric(predict(gam1,newdata=list(elev=rf$elev,tci=rf$tci,rad=rf$rad),type="response"))
Each pixel in the new raster predict1 is now a value between 0 and 1 representing our predictions about magnolia presence:
hist(rf$predict1)
plot(G,rf$predict1,col=heat.colors(30))
We can save our prediction map using the rast.put
function,
for further display elsewhere (such as in GRASS or exported into Arc
via
r.out.arc):
rast.put(G,lname=gampred,layer=rf$predict1,DCELL=T) #DCELL=T
keeps 2-decimal
precision
Display of the map is faster in native GRASS, which we can again access with the system command. The colors of the map in GRASS can be manipulated with the r.colors command.
system("g.list rast")
system("r.colors map=gampred color=gyr")
system("d.mon start=x1")
system("d.rast
gampred")
system("d.legend gampred")