I have a data frame in R (with more than 30 columns and 5000 rows) that looks like this:
Q 1 2 3 4 5
1 A A B C C
2 D E F
3 H I C C
4 X Y Y Y Z
5 A E F C Z
I would like to:
1) remove duplicates in each column
2) move each string in each column into additional rows (e.g.: row 6, 7, 8 etc..) in the first column
3) remove duplicates in the first column
I’ve tried to look around, but they usually ask for strings to be added into 1 column and separated by “;” or “-“, which is not what I’m looking for..
Any solutions? I’m also open to saving my data.frame into a .txt file and using linux/Mac Terminal to solve the problem.
Many thanks in advance!
I thought the data should be reduced first, hence step 1. But agree to all that it can be done at step 3.
The final data.frame should look like this:
The strings are actually gene names which may be replicated within rows and columns. It is not ideal to format them in Excel due to changes in gene names. Hope this clears things up. Thank you!