Lesson 3, Topic 1
In Progress

Mapping Slums – Running the random forrest model

After you export your dataset, the next step for you is to run the random forrest model on it. The model is a classification and regression method which consists of multiple so called “decision trees”.
In this lesson you’ll apply the previously prepared data on the RF – model in order to get an actual classification which finally indicates wether a pixel of the satellite image is a slum or not. For this classification you’ll use different covariates like the distance to a public school or the access to a public water point. Each of those covariates has a different impact on the classification wether a particular pixel is a slum or not.

You can find the code which is used for this topic below.

YouTube

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

R-Code used for this topic
rm(list=ls())

#####Set your working directory and other required directory
setwd("C:/......../PROJECTS/MOOC_Teach/Lagos_MOOC/Data_Mooc/R")

output_folder <- "C:......./PROJECTS/MOOC_Teach/Lagos_MOOC/Data_Mooc/R/Result_mooc"


####Instal packages#####
######instal the packages below if you do not have them before loading them in R environmnet#######
######To understand the fucntion of each package go to help window and type package name

######load library#######
library(rgdal)
library(randomForest)
library(caTools)
library(tidyverse)
library(caret)
library(rpart)
library(car)
library(InformationValue)
library(raster)
library(sp)
library(MASS)
library(foreach)
library(doParallel)
library(parallel)
library(gtools)
library(ggplot2)


#####RunRandomForest########

####import dataset csv using the path on your computer ###########
All_data = read.csv("input_mooc/All_data_mooc2.csv", header = TRUE)
print(All_data)

##########Splitdata set########
split = sample.split(All_data$grid_code, SplitRatio = 0.75)
Train_lag = subset(All_data, split == TRUE)
Ref_lag = subset(All_data, split == FALSE)



######Prob classification######
Model_Lag_RF <- randomForest(factor(grid_code) ~ Conflict2Eq + DEM1 + GovtbuildCS + MainRoadEQd + Slope1 + SocioeconID + WatelineEQd + churchCSTdi + climaterisk + dumpCSTdist + factIndCSTd + hzdindex + marktCSTdis + mosqueCSTdi + pubschCSTdi + pubwaterCST + smallsettCS, data=Train_lag,
                             ntree=500, 
                             mtry = 10,
                             na.action=na.exclude,
                             importance=TRUE, 
                             proximity=TRUE)





print(Model_Lag_RF)

#####show the attributes of the Model_Lag_RF#######
attributes(Model_Lag_RF)

Model_Lag_RF$confusion

######show the variable importance
importance(Model_Lag_RF)

round(importance(Model_Lag_RF), 2)

print(varImp(Model_Lag_RF))

varImpPlot(Model_Lag_RF)

######################################