Question
Which data cleaning technique is most appropriate for
handling missing data when missing values are randomly distributed across a dataset?Solution
When missing data points are randomly distributed, imputing values using the mean (for continuous data) or median (for skewed distributions) can be an effective technique. This approach maintains the dataset’s overall structure and helps reduce potential bias introduced by missing values. By substituting missing values with central tendencies, analysts can preserve statistical relationships without significantly distorting the data, ensuring a more accurate analysis. Option A is incorrect as removing rows may lead to a significant data loss, especially if many rows contain missing values. Option C is incorrect because dropping columns with missing values reduces feature dimensions, potentially discarding useful information. Option D is incorrect as placeholder values can introduce bias or mislead analysis, especially if the placeholder value skews the distribution. Option E is incorrect because ignoring missing values leaves gaps, making it difficult to perform accurate analysis.
A developer is tasked with implementing a task scheduling system where multiple tasks with dependencies need to be executed. Which data structure would ...
Which of the following data visualization tools is specifically designed for creating interactive dashboards for business analytics?
In data cleaning, which technique is most effective in handling outliers in a dataset that could skew analysis?
Which of the following is an example of non-random sampling?
Which forecasting method is most appropriate for time series data with a consistent trend but no seasonality?
What is polymorphism in Python?
Which sampling technique is most appropriate when the population is naturally divided into groups that differ significantly from each other?
In Python, which method removes missing values?
What will be the output of the following Python code?
def modify_list(lst):
  for i in range(len(lst)):
    lst[i] = ls...
A scatter plot reveals a strong positive linear relationship between two variables. Which of the following is the most appropriate statistical measure t...