首页 > 编程知识 正文

误差分析的依据,偏差分析最常用的一种方法是

时间:2023-05-05 02:36:45 阅读:220923 作者:3897

数据偏差分析示例

Between the 16th and 19th centuries, in Western Europe, tens of thousands of women were executed during witch-hunts. Due to the difficult nature of identifying witches, special tests were used to determine whether or not a woman was a witch.

在16世纪和19世纪之间,在西欧,狩猎女巫时有成千上万的妇女被处决。 由于很难辨认女巫,因此使用特殊测试来确定女人是否是女巫。

One such example involved throwing the woman into the water with her hands tied behind her back. If she floated, she was a witch, assumed to have been saved by 踏实的荷花, and was sentenced to death. If she drowned, she was innocent.

这样的一个例子涉及用手将女人绑在身后将其扔入水中。 如果她漂流了,她就是一个女巫,假定是被hhdmj救了,并被判处死刑。 如果她淹死了,那就是无辜的。

确认偏差 (Confirmation Bias)

Although the link between witch-hunting and data analytics may not be immediately clear, they are both subject to a cognitive bias known as confirmation bias.

尽管狩猎巫术和数据分析之间的联系可能尚不十分清楚,但它们都受到称为确认偏差的认知偏差的影响。

Confirmation bias occurs when a person searches for, or interprets, information to conform with their prior beliefs.

当人们搜索或解释信息以符合其先前的信念时,就会发生确认偏差 。

Witch-hunting has often been used as an example of confirmation bias. There is no practical significance in proving that a woman is innocent of witchcraft if she dies in the process. But this wasn’t the point of the test. Instead, it was designed purely as a method of confirming the prior belief, guilt.

猎巫经常被用作确认偏见的一个例子。 如果证明妇女在死刑过程中死亡,则证明该妇女无害巫术没有任何实际意义。 但这不是测试的重点。 取而代之的是,它纯粹被设计为一种确认先前信念即罪恶感的方法。

探索性数据分析 (Exploratory Data Analysis)

Exploratory data analysis (EDA) is the initial investigation of data, usually using statistics and graphical representations, to summarise and understand it.

探索性数据分析 (EDA)是对数据的初步调查,通常使用统计数据和图形表示来进行汇总和理解。

In essence, EDA is the process of answering questions about the data. Some of these may be very simple:

本质上,EDA是回答有关数据问题的过程。 其中一些可能非常简单:

How many rows/columns does the data have?

数据有多少行/列? What are the column types?

列类型是什么? Is there any data missing?

是否缺少任何数据?

However, as EDA progresses, the questions start to become more complex. Let’s take the example of a dataset that shows box office information for a new film. One question might be

但是,随着EDA的发展,问题变得越来越复杂。 让我们以显示新电影票房信息的数据集为例。 一个问题可能是

Are there more ticket sales on weekends?

周末有更多的门票销售吗?

You may notice that I have written this question to ensure that it can be answered with a simple “yes” or “no”. This may seem trivial, but it is here that bias can start to occur. I could have written the question differently,

您可能会注意到,我已经写了这个问题,以确保可以通过简单的“是”或“否”回答。 这看似微不足道,但正是在这里开始出现偏见。 我本可以用不同的方式写这个问题,

Do more ticket sales occur on particular days of the week?

在一周中的特定几天会有更多的门票销售吗?

Instead of devising a question to ascertain the distribution of ticket sales through the week, I have asked a question with the purpose of validating my existing belief, that more tickets are sold during the weekend.

我没有提出一个问题来确定整个星期的票务分配情况,而是问了一个问题以验证我现有的信念,即周末售出更多的票。

Photo by Jeremy Yap on Unsplash Jeremy Yap在 Unsplash上 拍摄的照片

The context may be a mile off, but it’s just the same as the witch-tests, which were devised to prove only that women are witches, not whether a woman is a witch.

上下文可以是一英里外,但它只是一样的女巫的测试,其被设计来证明只有妇女是巫婆,女人不是否是一个女巫。

偏见问题 (The Problem of Bias)

This may seem like a trivial point. At the end of the day, you will end up determining whether weekends sell more tickets or not.

这似乎有点琐碎。 最终,您将确定周末是否出售更多门票。

The problem isn’t the analysis in and of itself, but what happens after. By its very nature EDA is meant to be a stepping stone to another question…so what?

问题不在于分析本身,而是之后发生的事情。 从本质上讲,EDA可以成为另一个问题的垫脚石……那又是什么呢?

We use the information discovered during EDA to drive change; perhaps adapt an existing process or create a new one. But change by its very nature involves the upheaval of prior beliefs.

我们使用EDA期间发现的信息来推动变化; 可能改编现有流程或创建新流程。 但是,改变的本质就涉及先验信念的剧变。

If confirmation bias stops people from searching for evidence that contravenes their prior beliefs, then no amount of EDA is going to drive meaningful change.

如果确认偏见阻止人们寻找与他们先前的信念相悖的证据,那么没有多少EDA会推动有意义的变化。

What would happen instead?

会发生什么呢?

Imagine we found out ticket sales were higher during the week. Instead of acting on this (maybe increasing advertising during the weekend to boost weekend ticket sales), most people would jump down the rabbit hole of explainability.

想象一下,我们发现本周的门票销售量较高。 与其采取行动(也许在周末增加广告以提高周末门票销售),不如说大多数人会跳入可解释性的陷阱。

Photo by Austin Chan on Unsplash 由 Austin Chan 摄于 Unsplash

Using further (often contrived) analysis they would try to explain why their belief is right despite the evidence. Then, instead of EDA becoming a vehicle for change, it becomes the ego boost, so to speak, of invalid theory.

他们将使用进一步的(通常是人为的)分析来尝试解释为什么尽管有证据,但他们的信念是正确的。 然后,EDA不再是变革的工具,反而成为无效理论的自我推动。

减轻确认偏差 (Mitigating confirmation bias)

Humans have a natural tendency towards confirmation bias, and we often can’t tell that we are doing it. Therefore, we need to make sure that we are taking active and purposeful measures to mitigate against it.

人类天生倾向于确认偏差,我们常常不能说我们正在这样做。 因此,我们需要确保我们正在采取积极和有目的的措施来缓解这种情况。

Small changes, such as reframing the question, as above, can make a large difference as we are subconsciously removing ourselves from such biases.

细微的变化(例如如上所述重新定义问题)可能会产生很大的变化,因为我们正在下意识地摆脱这种偏见。

None of this is to say that we should completely disregard our prior beliefs. We usually have them for a reason and if there is evidence to the contrary we should question it. Rather, we need to make sure that when we do question it we do so from a fair and balanced perspective.

这并不是说我们应该完全无视先前的信念。 通常,我们出于某种原因拥有它们,如果有相反的证据,我们应该提出质疑。 相反,我们需要确保在质疑时从公平和平衡的角度出发。

This is often something that is very difficult to do, so another technique to avoid confirmation bias is to get a second opinion. Ask someone else to look at the evidence and see what they think.

这通常是很难做到的,因此另一种避免确认偏差的技术是寻求第二意见。 请其他人查看证据,看看他们的想法。

EDA is a powerful tool founded in data. But data not only contains inherent bias, but it is also biased by the people analysing it. It is our responsibility to understand where and when we might have biases and to mitigate against them.

EDA是建立在数据中的强大工具。 但是数据不仅包含固有的偏见,而且分析人员也对它产生偏见。 我们有责任了解我们何时何地可能存在偏见并减轻偏见。

If you enjoyed this article, you may like to read Why All Data Scientists Should Understand Behavioural Economics to find out more about confirmation bias and how it, along with other types of cognitive bias, effect data analytics and data science.

如果您喜欢这篇文章,则可能想阅读《 为什么所有数据科学家都应该理解行为经济学》, 以了解更多有关确认偏差及其确定方式的信息,以及其他类型的认知偏差,影响数据分析和数据科学的知识。

翻译自: https://towardsdatascience.com/confirmation-bias-is-the-enemy-of-exploratory-data-analysis-c6eaea983958

数据偏差分析示例

版权声明:该文观点仅代表作者本人。处理文章:请发送邮件至 三1五14八八95#扣扣.com 举报,一经查实,本站将立刻删除。