This function processes a raw data frame from Infosiga, applying specific
cleaning and transformation rules based on the type of data (sinistros
,
pessoas
, or veiculos
).
Usage
clean_infosiga(df_infosiga, file_type = c("sinistros", "pessoas", "veiculos"))
Arguments
- df_infosiga
A raw data frame as loaded by
load_infosiga()
.- file_type
A string indicating the type of data to be cleaned. Must be one of
'sinistros'
,'pessoas'
, or'veiculos'
.
Details
The function performs a series of data cleaning tasks, including:
Standardizing categorical variables: Recodes text values to a consistent format (e.g., "SINISTRO FATAL" to "Sinistro fatal").
Type conversion: Converts columns to their appropriate types, such as dates (
lubridate::dmy
), numbers, and factors with ordered levels (e.g., age groups).Handling missing values: Replaces "NAO DISPONIVEL" strings with
NA
.Joining with external data: Merges the data with an internal municipalities dataset (
municipios
) to add geographical information like IBGE codes and administrative regions.Column renaming and selection: Renames columns for clarity (e.g.,
ano_fab
toano_fabricacao
) and selects a final set of relevant variables, dropping intermediate or raw ones.
The specific cleaning pipeline applied depends on the file_type
argument.
Examples
if (FALSE) { # \dontrun{
# First, download and load the data
data_dir <- tempdir()
download_infosiga(destpath = data_dir)
raw_sinistros_df <- load_infosiga(file_type = "sinistros", path = data_dir)
# Clean the 'sinistros' data
cleaned_sinistros_df <- clean_infosiga(raw_sinistros_df, file_type = "sinistros")
# Clean the 'pessoas' data
raw_pessoas_df <- load_infosiga(file_type = "pessoas", path = data_dir)
cleaned_pessoas_df <- clean_infosiga(raw_pessoas_df, file_type = "pessoas")
} # }