본문 바로가기
  • plotly로 바로쓰는 동적시각화 in R & 파이썬
데이터 전처리

열 순서 변경 in R

by 아참형인간 2021. 5. 11.
relocate.utf8

R의 데이터 프레임에서 작업 시에 데이터 프레임의 열의 순서가 구조 때문에 데이터의 전체적인 구조나 처리가 어려움을 겪는 때가 있다. 이런 경우 열의 순서를 바꾸어 새로운 데이터 프레임을 만들거나 기존의 데이터 프레임에 엎어 쓰고 작업한다. 열이 몇개 되지 않는 경우에는 dplyrselect를 이용하여 간단히 열의 순서를 바꿀 수 있지만 열의 수가 많은 데이터 프레임의 경우에는 작업이 수월하지 않다.

이런 경우 사용할 수 있는 함수가 relocate이다.

샘플 데이터 셋은 우리나라 코로나 데이터를 사용하도록 하겠다. COVID19 패키지는 전세계 코로나 확진자, 사망자, 백신 접종자등 데이터를 제공하는 패키지이다. 사용 방법은 다음의 패키지 설명서를 참조하라.(https://cran.r-project.org/web/packages/COVID19/COVID19.pdf)

## COVID19 패키지 설치
if(!require(COVID19)) {
  install.packages('COVID19')
## COVID19 패키지 로딩
  library(COVID19)
}
## 한국의 코로나 데이터를 data.covid19 데이터 프레임에 저장
data.covid19 <- covid19('KOR')
## We have invested a lot of time and effort in creating COVID-19 Data Hub, please cite the following when using it:
## 
##   Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open
##   Source Software 5(51):2376, doi: 10.21105/joss.02376.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {COVID-19 Data Hub},
##     year = {2020},
##     doi = {10.21105/joss.02376},
##     author = {Emanuele Guidotti and David Ardia},
##     journal = {Journal of Open Source Software},
##     volume = {5},
##     number = {51},
##     pages = {2376},
##   }
## 
## To retrieve citation and metadata of the data sources see ?covid19cite. To hide this message use 'verbose = FALSE'.

전체 데이터 셋 중에 2021년 데이터만 필터링하여 사용하도록 한다. 이를 위해 dplyr패키지를 로딩해야하고 날짜(data.covid19$date)에서 연도만 추출한 후 결과가 2021인 행만 남긴다.

library(dplyr)
## 2021년 데이터만 필터링하여 저장
data.covid19 <- data.covid19 %>% filter(lubridate::year(data.covid19$date) == '2021')

샘플 데이터 셋의 행 수와 열 수를 구해본다. 행의 수가 130개, 열의 수가 36개이다.

dim(data.covid19)
## [1] 130  36
head(data.covid19)
## # A tibble: 6 x 36
## # Groups:   id [1]
##   id    date       vaccines   tests confirmed recovered deaths  hosp  vent   icu
##   <chr> <date>        <dbl>   <int>     <int>     <int>  <int> <dbl> <int> <int>
## 1 KOR   2021-01-01       NA 4092391     62593     43578    942    NA    NA    NA
## 2 KOR   2021-01-02       NA 4123412     63244     44507    962    NA    NA    NA
## 3 KOR   2021-01-03       NA 4154366     64264     45240    981    NA    NA    NA
## 4 KOR   2021-01-04       NA 4185876     64979     46172   1007    NA    NA    NA
## 5 KOR   2021-01-05       NA 4245610     65818     46995   1027    NA    NA    NA
## 6 KOR   2021-01-06       NA 4312786     66686     47649   1046    NA    NA    NA
## # ... with 26 more variables: population <int>, school_closing <int>,
## #   workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   latitude <dbl>, longitude <dbl>, key <lgl>, key_apple_mobility <chr>,
## #   key_google_mobility <chr>

데이터 중에 국가의 위, 경도를 나타내는 latitude, longitude 열을 id 열 뒤에 나타내려면 어떻게 해야 할까?

dplyrselect는 데이터 프레임의 열에 대한 필터링 함수이다. 이 함수를 필터링에 사용하지 않고 열이름을 배치하고 싶은 순서대로 나열해 주면 나열된 열의 순서대로 데이터 프레임이 생성된다. 열 이름 벡터를 사용할 수도 있고 열 인덱스 순서 벡터를 사용할 수도 있다.

data.covid19 %>% select("id", "latitude", "longitude", "date", "vaccines", "tests", "confirmed", "recovered", "deaths", "hosp", "vent", "icu", "population", "school_closing", "workplace_closing", "cancel_events", "gatherings_restrictions", "transport_closing", "stay_home_restrictions", "internal_movement_restrictions", "international_movement_restrictions", "information_campaigns", "testing_policy", "contact_tracing", "stringency_index", "iso_alpha_3", "iso_alpha_2", "iso_numeric", "currency", "administrative_area_level", "administrative_area_level_1", "administrative_area_level_2", "administrative_area_level_3", "latitude", "longitude", "key", "key_apple_mobility", "key_google_mobility")  
## # A tibble: 130 x 36
## # Groups:   id [1]
##    id    latitude longitude date       vaccines   tests confirmed recovered
##    <chr>    <dbl>     <dbl> <date>        <dbl>   <int>     <int>     <int>
##  1 KOR         37      128. 2021-01-01       NA 4092391     62593     43578
##  2 KOR         37      128. 2021-01-02       NA 4123412     63244     44507
##  3 KOR         37      128. 2021-01-03       NA 4154366     64264     45240
##  4 KOR         37      128. 2021-01-04       NA 4185876     64979     46172
##  5 KOR         37      128. 2021-01-05       NA 4245610     65818     46995
##  6 KOR         37      128. 2021-01-06       NA 4312786     66686     47649
##  7 KOR         37      128. 2021-01-07       NA 4378047     67358     48369
##  8 KOR         37      128. 2021-01-08       NA 4439552     67999     49324
##  9 KOR         37      128. 2021-01-09       NA 4503693     68664     50409
## 10 KOR         37      128. 2021-01-10       NA 4532010     69114     52552
## # ... with 120 more rows, and 28 more variables: deaths <int>, hosp <dbl>,
## #   vent <int>, icu <int>, population <int>, school_closing <int>,
## #   workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>
data.covid19 %>% select(1, 32, 33, 2:31, 34:36)
## # A tibble: 130 x 36
## # Groups:   id [1]
##    id    latitude longitude date       vaccines   tests confirmed recovered
##    <chr>    <dbl>     <dbl> <date>        <dbl>   <int>     <int>     <int>
##  1 KOR         37      128. 2021-01-01       NA 4092391     62593     43578
##  2 KOR         37      128. 2021-01-02       NA 4123412     63244     44507
##  3 KOR         37      128. 2021-01-03       NA 4154366     64264     45240
##  4 KOR         37      128. 2021-01-04       NA 4185876     64979     46172
##  5 KOR         37      128. 2021-01-05       NA 4245610     65818     46995
##  6 KOR         37      128. 2021-01-06       NA 4312786     66686     47649
##  7 KOR         37      128. 2021-01-07       NA 4378047     67358     48369
##  8 KOR         37      128. 2021-01-08       NA 4439552     67999     49324
##  9 KOR         37      128. 2021-01-09       NA 4503693     68664     50409
## 10 KOR         37      128. 2021-01-10       NA 4532010     69114     52552
## # ... with 120 more rows, and 28 more variables: deaths <int>, hosp <dbl>,
## #   vent <int>, icu <int>, population <int>, school_closing <int>,
## #   workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>

그런데 위와 같이 사용하는건 너무 어렵고 자칫 오류를 발생하기 쉽다. 이렇게 열이 많은 데이터 프레임은 relocate를 사용하면 쉽게 열의 구조를 바꿀 수 있다.

  1. 특정 열을 맨앞으로 이동
data.covid19 %>% relocate(c(latitude, longitude))
## # A tibble: 130 x 36
## # Groups:   id [1]
##    latitude longitude id    date       vaccines   tests confirmed recovered
##       <dbl>     <dbl> <chr> <date>        <dbl>   <int>     <int>     <int>
##  1       37      128. KOR   2021-01-01       NA 4092391     62593     43578
##  2       37      128. KOR   2021-01-02       NA 4123412     63244     44507
##  3       37      128. KOR   2021-01-03       NA 4154366     64264     45240
##  4       37      128. KOR   2021-01-04       NA 4185876     64979     46172
##  5       37      128. KOR   2021-01-05       NA 4245610     65818     46995
##  6       37      128. KOR   2021-01-06       NA 4312786     66686     47649
##  7       37      128. KOR   2021-01-07       NA 4378047     67358     48369
##  8       37      128. KOR   2021-01-08       NA 4439552     67999     49324
##  9       37      128. KOR   2021-01-09       NA 4503693     68664     50409
## 10       37      128. KOR   2021-01-10       NA 4532010     69114     52552
## # ... with 120 more rows, and 28 more variables: deaths <int>, hosp <dbl>,
## #   vent <int>, icu <int>, population <int>, school_closing <int>,
## #   workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>
  1. 특정 열을 특정 열 뒤로 이동
data.covid19 %>% relocate(c(latitude, longitude), .after = id)
## # A tibble: 130 x 36
## # Groups:   id [1]
##    id    latitude longitude date       vaccines   tests confirmed recovered
##    <chr>    <dbl>     <dbl> <date>        <dbl>   <int>     <int>     <int>
##  1 KOR         37      128. 2021-01-01       NA 4092391     62593     43578
##  2 KOR         37      128. 2021-01-02       NA 4123412     63244     44507
##  3 KOR         37      128. 2021-01-03       NA 4154366     64264     45240
##  4 KOR         37      128. 2021-01-04       NA 4185876     64979     46172
##  5 KOR         37      128. 2021-01-05       NA 4245610     65818     46995
##  6 KOR         37      128. 2021-01-06       NA 4312786     66686     47649
##  7 KOR         37      128. 2021-01-07       NA 4378047     67358     48369
##  8 KOR         37      128. 2021-01-08       NA 4439552     67999     49324
##  9 KOR         37      128. 2021-01-09       NA 4503693     68664     50409
## 10 KOR         37      128. 2021-01-10       NA 4532010     69114     52552
## # ... with 120 more rows, and 28 more variables: deaths <int>, hosp <dbl>,
## #   vent <int>, icu <int>, population <int>, school_closing <int>,
## #   workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>
  1. 특정 열을 특정 열 앞으로 이동
data.covid19 %>% relocate(c(latitude, longitude), .after = date)
## # A tibble: 130 x 36
## # Groups:   id [1]
##    id    date       latitude longitude vaccines   tests confirmed recovered
##    <chr> <date>        <dbl>     <dbl>    <dbl>   <int>     <int>     <int>
##  1 KOR   2021-01-01       37      128.       NA 4092391     62593     43578
##  2 KOR   2021-01-02       37      128.       NA 4123412     63244     44507
##  3 KOR   2021-01-03       37      128.       NA 4154366     64264     45240
##  4 KOR   2021-01-04       37      128.       NA 4185876     64979     46172
##  5 KOR   2021-01-05       37      128.       NA 4245610     65818     46995
##  6 KOR   2021-01-06       37      128.       NA 4312786     66686     47649
##  7 KOR   2021-01-07       37      128.       NA 4378047     67358     48369
##  8 KOR   2021-01-08       37      128.       NA 4439552     67999     49324
##  9 KOR   2021-01-09       37      128.       NA 4503693     68664     50409
## 10 KOR   2021-01-10       37      128.       NA 4532010     69114     52552
## # ... with 120 more rows, and 28 more variables: deaths <int>, hosp <dbl>,
## #   vent <int>, icu <int>, population <int>, school_closing <int>,
## #   workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>
  1. 특정 열을 맨 뒤로 이동
data.covid19 %>% relocate(c(latitude, longitude), .after = last_col())
## # A tibble: 130 x 36
## # Groups:   id [1]
##    id    date       vaccines  tests confirmed recovered deaths  hosp  vent   icu
##    <chr> <date>        <dbl>  <int>     <int>     <int>  <int> <dbl> <int> <int>
##  1 KOR   2021-01-01       NA 4.09e6     62593     43578    942    NA    NA    NA
##  2 KOR   2021-01-02       NA 4.12e6     63244     44507    962    NA    NA    NA
##  3 KOR   2021-01-03       NA 4.15e6     64264     45240    981    NA    NA    NA
##  4 KOR   2021-01-04       NA 4.19e6     64979     46172   1007    NA    NA    NA
##  5 KOR   2021-01-05       NA 4.25e6     65818     46995   1027    NA    NA    NA
##  6 KOR   2021-01-06       NA 4.31e6     66686     47649   1046    NA    NA    NA
##  7 KOR   2021-01-07       NA 4.38e6     67358     48369   1081    NA    NA    NA
##  8 KOR   2021-01-08       NA 4.44e6     67999     49324   1100    NA    NA    NA
##  9 KOR   2021-01-09       NA 4.50e6     68664     50409   1125    NA    NA    NA
## 10 KOR   2021-01-10       NA 4.53e6     69114     52552   1140    NA    NA    NA
## # ... with 120 more rows, and 26 more variables: population <int>,
## #   school_closing <int>, workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>,
## #   latitude <dbl>, longitude <dbl>
  1. 특정 데이터 타입의 열을 모두 맨 앞으로 이동
## 문자형 열을 맨 앞으로 이동
data.covid19 %>% relocate(where(is.character))
## # A tibble: 130 x 36
## # Groups:   id [1]
##    id    iso_alpha_3 iso_alpha_2 currency administrative_area_~ key_apple_mobil~
##    <chr> <chr>       <chr>       <chr>    <chr>                 <chr>           
##  1 KOR   KOR         KR          KRW      Korea, South          <NA>            
##  2 KOR   KOR         KR          KRW      Korea, South          <NA>            
##  3 KOR   KOR         KR          KRW      Korea, South          <NA>            
##  4 KOR   KOR         KR          KRW      Korea, South          <NA>            
##  5 KOR   KOR         KR          KRW      Korea, South          <NA>            
##  6 KOR   KOR         KR          KRW      Korea, South          <NA>            
##  7 KOR   KOR         KR          KRW      Korea, South          <NA>            
##  8 KOR   KOR         KR          KRW      Korea, South          <NA>            
##  9 KOR   KOR         KR          KRW      Korea, South          <NA>            
## 10 KOR   KOR         KR          KRW      Korea, South          <NA>            
## # ... with 120 more rows, and 30 more variables: key_google_mobility <chr>,
## #   date <date>, vaccines <dbl>, tests <int>, confirmed <int>, recovered <int>,
## #   deaths <int>, hosp <dbl>, vent <int>, icu <int>, population <int>,
## #   school_closing <int>, workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_numeric <int>, administrative_area_level <int>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   latitude <dbl>, longitude <dbl>, key <lgl>
## 논리형 열을 맨 앞으로 이동
data.covid19 %>% relocate(where(is.logical))
## # A tibble: 130 x 36
## # Groups:   id [1]
##    administrative_are~ administrative_ar~ key   id    date       vaccines  tests
##    <lgl>               <lgl>              <lgl> <chr> <date>        <dbl>  <int>
##  1 NA                  NA                 NA    KOR   2021-01-01       NA 4.09e6
##  2 NA                  NA                 NA    KOR   2021-01-02       NA 4.12e6
##  3 NA                  NA                 NA    KOR   2021-01-03       NA 4.15e6
##  4 NA                  NA                 NA    KOR   2021-01-04       NA 4.19e6
##  5 NA                  NA                 NA    KOR   2021-01-05       NA 4.25e6
##  6 NA                  NA                 NA    KOR   2021-01-06       NA 4.31e6
##  7 NA                  NA                 NA    KOR   2021-01-07       NA 4.38e6
##  8 NA                  NA                 NA    KOR   2021-01-08       NA 4.44e6
##  9 NA                  NA                 NA    KOR   2021-01-09       NA 4.50e6
## 10 NA                  NA                 NA    KOR   2021-01-10       NA 4.53e6
## # ... with 120 more rows, and 29 more variables: confirmed <int>,
## #   recovered <int>, deaths <int>, hosp <dbl>, vent <int>, icu <int>,
## #   population <int>, school_closing <int>, workplace_closing <int>,
## #   cancel_events <int>, gatherings_restrictions <int>,
## #   transport_closing <int>, stay_home_restrictions <int>,
## #   internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   latitude <dbl>, longitude <dbl>, key_apple_mobility <chr>,
## #   key_google_mobility <chr>
  1. 특정 데이터 타입의 열을 모두 특정 데이터 타입의 뒤나 앞으로 이동
## 문자형 열을 마지막 정수형 열 뒤로 이동
data.covid19 %>% relocate(where(is.character), .after = where(is.integer))
## # A tibble: 130 x 36
## # Groups:   id [1]
##    date       vaccines   tests confirmed recovered deaths  hosp  vent   icu
##    <date>        <dbl>   <int>     <int>     <int>  <int> <dbl> <int> <int>
##  1 2021-01-01       NA 4092391     62593     43578    942    NA    NA    NA
##  2 2021-01-02       NA 4123412     63244     44507    962    NA    NA    NA
##  3 2021-01-03       NA 4154366     64264     45240    981    NA    NA    NA
##  4 2021-01-04       NA 4185876     64979     46172   1007    NA    NA    NA
##  5 2021-01-05       NA 4245610     65818     46995   1027    NA    NA    NA
##  6 2021-01-06       NA 4312786     66686     47649   1046    NA    NA    NA
##  7 2021-01-07       NA 4378047     67358     48369   1081    NA    NA    NA
##  8 2021-01-08       NA 4439552     67999     49324   1100    NA    NA    NA
##  9 2021-01-09       NA 4503693     68664     50409   1125    NA    NA    NA
## 10 2021-01-10       NA 4532010     69114     52552   1140    NA    NA    NA
## # ... with 120 more rows, and 27 more variables: population <int>,
## #   school_closing <int>, workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_numeric <int>, administrative_area_level <int>, id <chr>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, currency <chr>,
## #   administrative_area_level_1 <chr>, key_apple_mobility <chr>,
## #   key_google_mobility <chr>, administrative_area_level_2 <lgl>,
## #   administrative_area_level_3 <lgl>, latitude <dbl>, longitude <dbl>,
## #   key <lgl>
## 논리형 열을 첫번째 숫자열 앞으로 이동
data.covid19 %>% relocate(where(is.logical), .before = where(is.numeric))
## # A tibble: 130 x 36
## # Groups:   id [1]
##    id    date       administrative_are~ administrative_ar~ key   vaccines  tests
##    <chr> <date>     <lgl>               <lgl>              <lgl>    <dbl>  <int>
##  1 KOR   2021-01-01 NA                  NA                 NA          NA 4.09e6
##  2 KOR   2021-01-02 NA                  NA                 NA          NA 4.12e6
##  3 KOR   2021-01-03 NA                  NA                 NA          NA 4.15e6
##  4 KOR   2021-01-04 NA                  NA                 NA          NA 4.19e6
##  5 KOR   2021-01-05 NA                  NA                 NA          NA 4.25e6
##  6 KOR   2021-01-06 NA                  NA                 NA          NA 4.31e6
##  7 KOR   2021-01-07 NA                  NA                 NA          NA 4.38e6
##  8 KOR   2021-01-08 NA                  NA                 NA          NA 4.44e6
##  9 KOR   2021-01-09 NA                  NA                 NA          NA 4.50e6
## 10 KOR   2021-01-10 NA                  NA                 NA          NA 4.53e6
## # ... with 120 more rows, and 29 more variables: confirmed <int>,
## #   recovered <int>, deaths <int>, hosp <dbl>, vent <int>, icu <int>,
## #   population <int>, school_closing <int>, workplace_closing <int>,
## #   cancel_events <int>, gatherings_restrictions <int>,
## #   transport_closing <int>, stay_home_restrictions <int>,
## #   internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   latitude <dbl>, longitude <dbl>, key_apple_mobility <chr>,
## #   key_google_mobility <chr>
  1. 특정 문자로 시작하거나 끝나는 열을 이동
## 'lat'으로 시작하는 열 이름을 가진 열을 맨 앞으로 이동
data.covid19 %>% relocate(starts_with('lat'))
## # A tibble: 130 x 36
## # Groups:   id [1]
##    latitude id    date       vaccines   tests confirmed recovered deaths  hosp
##       <dbl> <chr> <date>        <dbl>   <int>     <int>     <int>  <int> <dbl>
##  1       37 KOR   2021-01-01       NA 4092391     62593     43578    942    NA
##  2       37 KOR   2021-01-02       NA 4123412     63244     44507    962    NA
##  3       37 KOR   2021-01-03       NA 4154366     64264     45240    981    NA
##  4       37 KOR   2021-01-04       NA 4185876     64979     46172   1007    NA
##  5       37 KOR   2021-01-05       NA 4245610     65818     46995   1027    NA
##  6       37 KOR   2021-01-06       NA 4312786     66686     47649   1046    NA
##  7       37 KOR   2021-01-07       NA 4378047     67358     48369   1081    NA
##  8       37 KOR   2021-01-08       NA 4439552     67999     49324   1100    NA
##  9       37 KOR   2021-01-09       NA 4503693     68664     50409   1125    NA
## 10       37 KOR   2021-01-10       NA 4532010     69114     52552   1140    NA
## # ... with 120 more rows, and 27 more variables: vent <int>, icu <int>,
## #   population <int>, school_closing <int>, workplace_closing <int>,
## #   cancel_events <int>, gatherings_restrictions <int>,
## #   transport_closing <int>, stay_home_restrictions <int>,
## #   internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   longitude <dbl>, key <lgl>, key_apple_mobility <chr>,
## #   key_google_mobility <chr>
## 'tude'으로 끝나는 열 이름을 가진 열을 맨 앞으로 이동
data.covid19 %>% relocate(ends_with('tude'))
## # A tibble: 130 x 36
## # Groups:   id [1]
##    latitude longitude id    date       vaccines   tests confirmed recovered
##       <dbl>     <dbl> <chr> <date>        <dbl>   <int>     <int>     <int>
##  1       37      128. KOR   2021-01-01       NA 4092391     62593     43578
##  2       37      128. KOR   2021-01-02       NA 4123412     63244     44507
##  3       37      128. KOR   2021-01-03       NA 4154366     64264     45240
##  4       37      128. KOR   2021-01-04       NA 4185876     64979     46172
##  5       37      128. KOR   2021-01-05       NA 4245610     65818     46995
##  6       37      128. KOR   2021-01-06       NA 4312786     66686     47649
##  7       37      128. KOR   2021-01-07       NA 4378047     67358     48369
##  8       37      128. KOR   2021-01-08       NA 4439552     67999     49324
##  9       37      128. KOR   2021-01-09       NA 4503693     68664     50409
## 10       37      128. KOR   2021-01-10       NA 4532010     69114     52552
## # ... with 120 more rows, and 28 more variables: deaths <int>, hosp <dbl>,
## #   vent <int>, icu <int>, population <int>, school_closing <int>,
## #   workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>
## 'tude'를 포함하는 열 이름을 가진 열을 맨 앞으로 이동
data.covid19 %>% relocate(contains('tude'))
## # A tibble: 130 x 36
## # Groups:   id [1]
##    latitude longitude id    date       vaccines   tests confirmed recovered
##       <dbl>     <dbl> <chr> <date>        <dbl>   <int>     <int>     <int>
##  1       37      128. KOR   2021-01-01       NA 4092391     62593     43578
##  2       37      128. KOR   2021-01-02       NA 4123412     63244     44507
##  3       37      128. KOR   2021-01-03       NA 4154366     64264     45240
##  4       37      128. KOR   2021-01-04       NA 4185876     64979     46172
##  5       37      128. KOR   2021-01-05       NA 4245610     65818     46995
##  6       37      128. KOR   2021-01-06       NA 4312786     66686     47649
##  7       37      128. KOR   2021-01-07       NA 4378047     67358     48369
##  8       37      128. KOR   2021-01-08       NA 4439552     67999     49324
##  9       37      128. KOR   2021-01-09       NA 4503693     68664     50409
## 10       37      128. KOR   2021-01-10       NA 4532010     69114     52552
## # ... with 120 more rows, and 28 more variables: deaths <int>, hosp <dbl>,
## #   vent <int>, icu <int>, population <int>, school_closing <int>,
## #   workplace_closing <int>, cancel_events <int>,
## #   gatherings_restrictions <int>, transport_closing <int>,
## #   stay_home_restrictions <int>, internal_movement_restrictions <int>,
## #   international_movement_restrictions <int>, information_campaigns <int>,
## #   testing_policy <int>, contact_tracing <int>, stringency_index <dbl>,
## #   iso_alpha_3 <chr>, iso_alpha_2 <chr>, iso_numeric <int>, currency <chr>,
## #   administrative_area_level <int>, administrative_area_level_1 <chr>,
## #   administrative_area_level_2 <lgl>, administrative_area_level_3 <lgl>,
## #   key <lgl>, key_apple_mobility <chr>, key_google_mobility <chr>

'데이터 전처리' 카테고리의 다른 글

파이프(%>%) in R  (0) 2021.05.31
엑셀처럼 사용하는 R : DataEditR - 1  (0) 2021.05.27
Factor 레벨 이름 바꾸기 in R  (0) 2021.05.09
PDF 한글 깨짐 처리 in R  (0) 2021.05.04
p값 추출과 p값 표현법 in R  (0) 2021.04.07

댓글