Writing XML from R

Here’s a quick example showing how to write XML using the library XML in R. There is a more modern package called xlm2 that I haven’t had time to try yet.

After loading the necessary libraries and determining how many records we want to write

library(dplyr)
library(randomNames)
library(uuid)
library(XML)
library(parallel)


set.seed(0)
Nrecords <- 500 # number of records for the test data

we create random contacts, that have a name, an id and up to four nicknames.

# create a data frame with random names, an id and some secondary first names
contacts <- data.frame(name = randomNames(Nrecords)) %>% rowwise() %>% 
  mutate(id = UUIDgenerate(),
         otherNames = paste(randomNames(sample(1:4,size = 1),which.names = 'first'),collapse = ',')) 

Next we write a function that writes an xml files for each record. The nicknames are listed as separate values.

# a function that takes an id, name and nicknames  and writes out 
# an XML file 
writeData <- function(id,name,moreNames)
{
  fileName <- paste("~/tmp/xml/",id,'.xml',sep = "")
  contactXML <-  xmlOutputDOM(tag = "Contacts",nsURI = "http://example.org/dddd/eee")
  contactXML$addTag("id",id)
  contactXML$addTag("name",name)
  otherNames <- strsplit(moreNames,',')[[1]]
  contactXML$addTag("otherNames",close=F)
  for(j in 1:length(otherNames))
  {
    contactXML$addTag("nickName",otherNames[j])
  }
  contactXML$closeTag()
  #saveXML(contactXML$value(),file = fileName, prefix = '\n')
  saveXML(contactXML$value(),file = fileName, prefix = '')
}

Then we run the function as a single thread.

# single thread
system.time(
  mapply(writeData,contacts$id,contacts$name,contacts$otherNames)
)
# user  system elapsed 
# 1.992   0.040   2.031 

and for comparison with four threads.

# four concurrent threads
system.time(
  mcmapply(writeData,contacts$id,contacts$name,contacts$otherNames,mc.cores = 4)
)
# 
# user  system elapsed 
# 1.068   0.072   0.614 

We get a speedup of just under four, presumably due to some overheads relating to writing files and other inefficiencies.
The XML result looks like this:
XML file

To learn more about XML head to the XML tutorials.

Advertisements