• Uncategorized

About r : How-to-add-strings-from-multiple-columns-to-multiple-rows-in-a-single-column-in-R

Question Detail

I have a data frame in R (with more than 30 columns and 5000 rows) that looks like this:

Q 1 2 3 4 5
1 A A B C C
2 D E F
3 H I C C
4 X Y Y Y Z
5 A E F C Z

I would like to:
1) remove duplicates in each column
2) move each string in each column into additional rows (e.g.: row 6, 7, 8 etc..) in the first column
3) remove duplicates in the first column

I’ve tried to look around, but they usually ask for strings to be added into 1 column and separated by “;” or “-“, which is not what I’m looking for..

Any solutions? I’m also open to saving my data.frame into a .txt file and using linux/Mac Terminal to solve the problem.

Many thanks in advance!

[UPDATE:]
I thought the data should be reduced first, hence step 1. But agree to all that it can be done at step 3.

The final data.frame should look like this:
Q 1
1 A
2 D
3 H
4 X
5 E
6 I
7 B
8 F
9 C
10 Z
etc…

The strings are actually gene names which may be replicated within rows and columns. It is not ideal to format them in Excel due to changes in gene names. Hope this clears things up. Thank you!

Question Answer

DF <- read.table(text = "Q 1 2 3 4 5
1 A A B C C
2 D E F '' ''
3 H I C C ''
4 X Y Y Y Z
5 A E F C Z", header = TRUE, stringsAsFactors = FALSE)

data.frame(x = unique(as.character(as.matrix(DF[2:length(DF)])))) 

Row numbers are assigned automatically. Name your row whatever you want.

Do you mean to have something like this?

final_df <- data.frame(V1 = unique(unlist(df[!is.na(df)])))

Output is:

   V1
1   A
2   D
3   H
4   X
5   E
6   I
7   Y
8   B
9   F
10  C
11  Z

Sample data:

df <- structure(list(X1 = c("A", "D", "H", "X", "A"), X2 = c("A", "E", 
"I", "Y", "E"), X3 = c("B", "F", "C", "Y", "F"), X4 = c("C", 
NA, "C", "Y", "C"), X5 = c("C", NA, NA, "Z", "Z")), .Names = c("X1", 
"X2", "X3", "X4", "X5"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

You may also like...

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.