Question

A data analyst is assessing a dataset with inconsistent categorical entries, such as "USA," "U.S.A," "United States," and "US" for the country field. Which of the following is the best approach for handling this inconsistency?

A Filtering duplicates based on the entire row
B Using normalization to scale data entries
C Applying data transformation to encode all entries numerically
D Standardizing categorical entries to a single representation
E Converting all entries to uppercase to eliminate case sensitivity
Practice Next

Hey! Ask a query