Question
A data analyst is assessing a dataset with inconsistent
categorical entries, such as "USA," "U.S.A," "United States," and "US" for the country field. Which of the following is the best approach for handling this inconsistency?Solution
Standardizing categorical entries to a single representation ensures consistency by consolidating multiple formats of the same entity into one standardized label. For example, consolidating "USA," "U.S.A," "United States," and "US" into one uniform label, like "United States," ensures that all data entries are interpreted consistently. This process is essential in data cleaning, as inconsistencies in categorical data can lead to inaccurate analysis, skewed results, and duplications in reporting. A uniform categorical format enables reliable grouping, sorting, and filtering for analysis. The other options are incorrect because: • Option 1 (Filtering duplicates) removes identical rows but doesn’t address inconsistency in a single field. • Option 2 (Using normalization) only applies to numeric scaling, not categorical consistency. • Option 3 (Applying data transformation) would encode inconsistencies rather than correct them. • Option 5 (Converting to uppercase) helps with case sensitivity but does not fully standardize variations.
Complete the series by replacing ‘?’
CDG, GFM, KHS, OJY, ?
Following Rome's withdrawal, which city has been selected to host the 2027 World Athletics Championships?
Asansol, Howrah, Malda and Sealdah division come under which railway zone of India?
On keeping the coin inside the water, it appears to be standing, what is the reason?
Who was appointed as ICC's 'Global Ambassador' for T20 World Cup 2024?
The power of Eminent domain is the inherent power of the State over ________ property of citizens for public purposes.
______of the Directive Principles of State Policy deals with the promotion of international peace and security.
When do humans use more facial muscles?
Which organization launched 'Project Vaani' in collaboration with IIT Madras to collect voice data from across India to train AI models in local languages?
Which of the following vitamins includes cobalt as an essential component?