Skip to contents

This function processes a raw data frame from Infosiga, applying specific cleaning and transformation rules based on the type of data (sinistros, pessoas, or veiculos).

Usage

clean_infosiga(df_infosiga, file_type = c("sinistros", "pessoas", "veiculos"))

Arguments

df_infosiga

A raw data frame as loaded by load_infosiga().

file_type

A string indicating the type of data to be cleaned. Must be one of 'sinistros', 'pessoas', or 'veiculos'.

Value

A cleaned and processed tibble with standardized columns and types.

Details

The function performs a series of data cleaning tasks, including:

  • Standardizing categorical variables: Recodes text values to a consistent format (e.g., "SINISTRO FATAL" to "Sinistro fatal").

  • Type conversion: Converts columns to their appropriate types, such as dates (lubridate::dmy), numbers, and factors with ordered levels (e.g., age groups).

  • Handling missing values: Replaces "NAO DISPONIVEL" strings with NA.

  • Joining with external data: Merges the data with an internal municipalities dataset (municipios) to add geographical information like IBGE codes and administrative regions.

  • Column renaming and selection: Renames columns for clarity (e.g., ano_fab to ano_fabricacao) and selects a final set of relevant variables, dropping intermediate or raw ones.

The specific cleaning pipeline applied depends on the file_type argument.

Examples

if (FALSE) { # \dontrun{
# First, download and load the data
data_dir <- tempdir()
download_infosiga(destpath = data_dir)
raw_sinistros_df <- load_infosiga(file_type = "sinistros", path = data_dir)

# Clean the 'sinistros' data
cleaned_sinistros_df <- clean_infosiga(raw_sinistros_df, file_type = "sinistros")

# Clean the 'pessoas' data
raw_pessoas_df <- load_infosiga(file_type = "pessoas", path = data_dir)
cleaned_pessoas_df <- clean_infosiga(raw_pessoas_df, file_type = "pessoas")
} # }