0

I have the following code to import some data.

url <- "https://finance.yahoo.com/industry/Scientific_Technical_Instruments"

library(rvest)

read <- read_html(url) %>%
  html_table() 

library(plyr)

data <- ldply(read, data.frame)

However the data creates a data frame of 20 columns when there should be just 10. The column names of the data frame have not imported as they should and creates a number of NA values.

Is there a way in R to shift the column names across, then remove the NA columns created?

user113156
  • 6,761
  • 5
  • 35
  • 81

2 Answers2

1

Your object read is a list with headers as the first element and data as the second. Your problem is that your column names in read[[1]] are not syntactically valid names for data frame columns.

You need to sanitise your names by using make.names. E.g.

data <- data.frame(read[[2]]) 
names(data) <- make.names(names(read[[1]])

An one-liner version for this can be found from here.

data <- setNames(data.frame(read[[2]]), make.names(names(read[[1]])))
Otto Kässi
  • 2,943
  • 1
  • 10
  • 27
1
my_data <- data.frame(read[[2]])
colnames(my_data) <- colnames(read[[1]])
G. Cocca
  • 2,456
  • 1
  • 12
  • 13