暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

汇总超全的Matplotlib可视化最有价值的 50 个图表(附完整 Python 源代码)(二)

做一个柔情的程序猿 2021-01-21
271


点击上方“蓝字”关注我们!

介绍

这些图表根据可视化目标的7个不同情景进行分组。例如,如果要想象两个变量之间的关系,请查看“关联”部分下的图表。或者,如果您想要显示值如何随时间变化,请查看“变化”部分,依此类推。

有效图表的重要特征:

  • 在不歪曲事实的情况下传达正确和必要的信息。

  • 设计简单,您不必太费力就能理解它。

  • 从审美角度支持信息而不是掩盖信息。

  • 信息没有超负荷。

但更多的是抛砖引玉,希望对你们有所帮助。

感谢各位的鼓励与支持🌹🌹🌹,往期文章都在最后梳理出来了(●'◡'●)

接下来就以问题的形式展开梳理👇

分布(Distribution)

连续变量的直方图

直方图显示给定变量的频率分布。下面的图表示基于类型变量对频率条进行分组,从而更好地了解连续变量和类型变量
    # Import Data
    df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

    # Prepare data
    x_var = 'displ'
    groupby_var = 'class'
    df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
    vals = [df[x_var].values.tolist() for i, df in df_agg]

    # Draw
    plt.figure(figsize=(16,9), dpi= 80)
    colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
    n, bins, patches = plt.hist(vals, 30, stacked=True, density=False, color=colors[:len(vals)])

    # Decoration
    plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
    plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
    plt.xlabel(x_var)
    plt.ylabel("Frequency")
    plt.ylim(0, 25)
    plt.xticks(ticks=bins[::3], labels=[round(b,1) for b in bins[::3]])
    plt.show()

    类型变量的直方图

    类型变量的直方图显示该变量的频率分布。通过对条形图进行着色,可以将分布与表示颜色的另一个类型变量相关联。
      # Import Data
      df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

      # Prepare data
      x_var = 'manufacturer'
      groupby_var = 'class'
      df_agg = df.loc[:, [x_var, groupby_var]].groupby(groupby_var)
      vals = [df[x_var].values.tolist() for i, df in df_agg]

      # Draw
      plt.figure(figsize=(16,9), dpi= 80)
      colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
      n, bins, patches = plt.hist(vals, df[x_var].unique().__len__(), stacked=True, density=False, color=colors[:len(vals)])

      # Decoration
      plt.legend({group:col for group, col in zip(np.unique(df[groupby_var]).tolist(), colors[:len(vals)])})
      plt.title(f"Stacked Histogram of ${x_var}$ colored by ${groupby_var}$", fontsize=22)
      plt.xlabel(x_var)
      plt.ylabel("Frequency")
      plt.ylim(0, 40)
      plt.xticks(ticks=bins, labels=np.unique(df[x_var]).tolist(), rotation=90, horizontalalignment='left')
      plt.show()

      密度图(density plot)

      密度图是一种常用工具,用于可视化连续变量的分布。通过“响应”变量对它们进行分组,您可以检查 X 和 Y 之间的关系。以下情况用于表示目的,以描述城市里程的分布如何随着汽缸数的变化而变化。
        # Import Data
        df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

        # Draw Plot
        plt.figure(figsize=(16,10), dpi= 80)
        sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=True, color="g", label="Cyl=4", alpha=.7)
        sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, color="deeppink", label="Cyl=5", alpha=.7)
        sns.kdeplot(df.loc[df['cyl'] == 6, "cty"], shade=True, color="dodgerblue", label="Cyl=6", alpha=.7)
        sns.kdeplot(df.loc[df['cyl'] == 8, "cty"], shade=True, color="orange", label="Cyl=8", alpha=.7)

        # Decoration
        plt.title('Density Plot of City Mileage by n_Cylinders', fontsize=22)
        plt.legend()
        plt.show()

        直方密度线图(density curves with histogram)

        带有直方图的密度曲线汇集了两个图所传达的集体信息,因此您可以将它们放在一个图中而不是两个图中。
          # Import Data
          df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

          # Draw Plot
          plt.figure(figsize=(13,10), dpi= 80)
          sns.distplot(df.loc[df['class'] == 'compact', "cty"], color="dodgerblue", label="Compact", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
          sns.distplot(df.loc[df['class'] == 'suv', "cty"], color="orange", label="SUV", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
          sns.distplot(df.loc[df['class'] == 'minivan', "cty"], color="g", label="minivan", hist_kws={'alpha':.7}, kde_kws={'linewidth':3})
          plt.ylim(0, 0.35)

          # Decoration
          plt.title('Density Plot of City Mileage by Vehicle Type', fontsize=22)
          plt.legend()
          plt.show()

          Joy Plot

          Joy Plot允许不同组的密度曲线重叠,这是一种可视化大量分组数据的彼此关系分布的好方法。它看起来很悦目,并清楚地传达了正确的信息。它可以使用基于 matplotlib 的 joypy 包轻松构建。(『Python数据之道』注:需要安装 joypy 库)
            # 导入资源包:pip install joypy

            import joypy

            # Import Data
            mpg = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

            # Draw Plot
            plt.figure(figsize=(16,10), dpi= 80)
            fig, axes = joypy.joyplot(mpg, column=['hwy', 'cty'], by="class", ylim='own', figsize=(14,10))

            # Decoration
            plt.title('Joy Plot of City and Highway Mileage by Class', fontsize=22)
            plt.show()

            分布式包点图(distributed dot plot)

            分布式包点图显示按组分割的点的单变量分布。点数越暗,该区域的数据点集中度越高。通过对中位数进行不同着色,组的真实定位立即变得明显。
              import matplotlib.patches as mpatches

              # Prepare Data
              df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
              cyl_colors = {4:'tab:red', 5:'tab:green', 6:'tab:blue', 8:'tab:orange'}
              df_raw['cyl_color'] = df_raw.cyl.map(cyl_colors)

              # Mean and Median city mileage by make
              df = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.mean())
              df.sort_values('cty', ascending=False, inplace=True)
              df.reset_index(inplace=True)
              df_median = df_raw[['cty', 'manufacturer']].groupby('manufacturer').apply(lambda x: x.median())

              # Draw horizontal lines
              fig, ax = plt.subplots(figsize=(16,10), dpi= 80)
              ax.hlines(y=df.index, xmin=0, xmax=40, color='gray', alpha=0.5, linewidth=.5, linestyles='dashdot')

              # Draw the Dots
              for i, make in enumerate(df.manufacturer):
              df_make = df_raw.loc[df_raw.manufacturer==make, :]
              ax.scatter(y=np.repeat(i, df_make.shape[0]), x='cty', data=df_make, s=75, edgecolors='gray', c='w', alpha=0.5)
              ax.scatter(y=i, x='cty', data=df_median.loc[df_median.index==make, :], s=75, c='firebrick')

              # Annotate
              ax.text(33, 13, "$red \; dots \; are \; the \: median$", fontdict={'size':12}, color='firebrick')

              # Decorations
              red_patch = plt.plot([],[], marker="o", ms=10, ls="", mec=None, color='firebrick', label="Median")
              plt.legend(handles=red_patch)
              ax.set_title('Distribution of City Mileage by Make', fontdict={'size':22})
              ax.set_xlabel('Miles Per Gallon (City)', alpha=0.7)
              ax.set_yticks(df.index)
              ax.set_yticklabels(df.manufacturer.str.title(), fontdict={'horizontalalignment': 'right'}, alpha=0.7)
              ax.set_xlim(1, 40)
              plt.xticks(alpha=0.7)
              plt.gca().spines["top"].set_visible(False)
              plt.gca().spines["bottom"].set_visible(False)
              plt.gca().spines["right"].set_visible(False)
              plt.gca().spines["left"].set_visible(False)
              plt.grid(axis='both', alpha=.4, linewidth=.1)
              plt.show()

              箱型图(box plot)

              箱形图是一种可视化分布的好方法,记住中位数、第25个第45个四分位数和异常值。但是,您需要注意解释可能会扭曲该组中包含的点数的框的大小。因此,手动提供每个框中的观察数量可以帮助克服这个缺点。

              例如,左边的前两个框具有相同大小的框,即使它们的值分别是5和47。因此,写入该组中的观察数量是必要的。

                # Import Data
                df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                # Draw Plot
                plt.figure(figsize=(13,10), dpi= 80)
                sns.boxplot(x='class', y='hwy', data=df, notch=False)

                # Add N Obs inside boxplot (optional)
                def add_n_obs(df,group_col,y):
                medians_dict = {grp[0]:grp[1][y].median() for grp in df.groupby(group_col)}
                xticklabels = [x.get_text() for x in plt.gca().get_xticklabels()]
                n_obs = df.groupby(group_col)[y].size().values
                for (x, xticklabel), n_ob in zip(enumerate(xticklabels), n_obs):
                plt.text(x, medians_dict[xticklabel]*1.01, "#obs : "+str(n_ob), horizontalalignment='center', fontdict={'size':14}, color='white')

                add_n_obs(df,group_col='class',y='hwy')

                # Decoration
                plt.title('Box Plot of Highway Mileage by Vehicle Class', fontsize=22)
                plt.ylim(10, 40)
                plt.show()

                包点+箱型图(dot+box plot)

                包点+箱形图 (Dot + Box Plot)传达类似于分组的箱形图信息。此外,这些点可以了解每组中有多少数据点。
                  # Import Data
                  df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                  # Draw Plot
                  plt.figure(figsize=(13,10), dpi= 80)
                  sns.boxplot(x='class', y='hwy', data=df, hue='cyl')
                  sns.stripplot(x='class', y='hwy', data=df, color='black', size=3, jitter=1)

                  for i in range(len(df['class'].unique())-1):
                  plt.vlines(i+.5, 10, 45, linestyles='solid', colors='gray', alpha=0.2)

                  # Decoration
                  plt.title('Box Plot of Highway Mileage by Vehicle Class', fontsize=22)
                  plt.legend(title='Cylinders')
                  plt.show()

                  小提琴图(violin plot)

                  小提琴图是箱形图在视觉上令人愉悦的替代品。小提琴的形状或面积取决于它所持有的观察次数。但是,小提琴图可能更难以阅读,并且在专业设置中不常用。
                    # Import Data
                    df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                    # Draw Plot
                    plt.figure(figsize=(13,10), dpi= 80)
                    sns.violinplot(x='class', y='hwy', data=df, scale='width', inner='quartile')

                    # Decoration
                    plt.title('Violin Plot of Highway Mileage by Vehicle Class', fontsize=22)
                    plt.show()

                    人口金字塔图(population pyramid)

                    人口金字塔可用于显示由数量排序的组的分布。或者它也可以用于显示人口的逐级过滤,因为它在下面用于显示有多少人通过营销渠道的每个阶段。
                      # Read data
                      df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/email_campaign_funnel.csv")

                      # Draw Plot
                      plt.figure(figsize=(13,10), dpi= 80)
                      group_col = 'Gender'
                      order_of_bars = df.Stage.unique()[::-1]
                      colors = [plt.cm.Spectral(i/float(len(df[group_col].unique())-1)) for i in range(len(df[group_col].unique()))]

                      for c, group in zip(colors, df[group_col].unique()):
                      sns.barplot(x='Users', y='Stage', data=df.loc[df[group_col]==group, :], order=order_of_bars, color=c, label=group)

                      # Decorations
                      plt.xlabel("$Users$")
                      plt.ylabel("Stage of Purchase")
                      plt.yticks(fontsize=12)
                      plt.title("Population Pyramid of the Marketing Funnel", fontsize=22)
                      plt.legend()
                      plt.show()

                      分类图(categorical plot)

                      由 seaborn库 提供的分类图可用于可视化彼此相关的2个或更多分类变量的计数分布。
                        # Load Dataset
                        titanic = sns.load_dataset("titanic")

                        # Plot
                        g = sns.catplot("alive", col="deck", col_wrap=4,
                        data=titanic[titanic.deck.notnull()],
                        kind="count", height=3.5, aspect=.8,
                        palette='tab20')

                        fig.suptitle('sf')
                        plt.show()

                          # Load Dataset
                          titanic = sns.load_dataset("titanic")

                          # Plot
                          sns.catplot(x="age", y="embark_town",
                          hue="sex", col="class",
                          data=titanic[titanic.embark_town.notnull()],
                          orient="h", height=5, aspect=1, palette="tab10",
                          kind="violin", dodge=True, cut=0, bw=.2)


                          组成(Composition)

                          华夫饼图(waffle chart)

                          可以使用 pywaffle包 创建华夫饼图,并用于显示更大群体中的组的组成
                            #导入资源库:pip install pywaffle
                            # Reference: https://stackoverflow.com/questions/41400136/how-to-do-waffle-charts-in-python-square-piechart
                            from pywaffle import Waffle

                            # Import
                            df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                            # Prepare Data
                            df = df_raw.groupby('class').size().reset_index(name='counts')
                            n_categories = df.shape[0]
                            colors = [plt.cm.inferno_r(i/float(n_categories)) for i in range(n_categories)]

                            # Draw Plot and Decorate
                            fig = plt.figure(
                            FigureClass=Waffle,
                            plots={
                            '111': {
                            'values': df['counts'],
                            'labels': ["{0} ({1})".format(n[0], n[1]) for n in df[['class', 'counts']].itertuples()],
                            'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12},
                            'title': {'label': '# Vehicles by Class', 'loc': 'center', 'fontsize':18}
                            },
                            },
                            rows=7,
                            colors=colors,
                            figsize=(16, 9)
                            )

                              #导入资源包:pip install pywaffle
                              from pywaffle import Waffle

                              # Import
                              # df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                              # Prepare Data
                              # By Class Data
                              df_class = df_raw.groupby('class').size().reset_index(name='counts_class')
                              n_categories = df_class.shape[0]
                              colors_class = [plt.cm.Set3(i/float(n_categories)) for i in range(n_categories)]

                              # By Cylinders Data
                              df_cyl = df_raw.groupby('cyl').size().reset_index(name='counts_cyl')
                              n_categories = df_cyl.shape[0]
                              colors_cyl = [plt.cm.Spectral(i/float(n_categories)) for i in range(n_categories)]

                              # By Make Data
                              df_make = df_raw.groupby('manufacturer').size().reset_index(name='counts_make')
                              n_categories = df_make.shape[0]
                              colors_make = [plt.cm.tab20b(i/float(n_categories)) for i in range(n_categories)]


                              # Draw Plot and Decorate
                              fig = plt.figure(
                              FigureClass=Waffle,
                              plots={
                              '311': {
                              'values': df_class['counts_class'],
                              'labels': ["{1}".format(n[0], n[1]) for n in df_class[['class', 'counts_class']].itertuples()],
                              'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Class'},
                              'title': {'label': '# Vehicles by Class', 'loc': 'center', 'fontsize':18},
                              'colors': colors_class
                              },
                              '312': {
                              'values': df_cyl['counts_cyl'],
                              'labels': ["{1}".format(n[0], n[1]) for n in df_cyl[['cyl', 'counts_cyl']].itertuples()],
                              'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Cyl'},
                              'title': {'label': '# Vehicles by Cyl', 'loc': 'center', 'fontsize':18},
                              'colors': colors_cyl
                              },
                              '313': {
                              'values': df_make['counts_make'],
                              'labels': ["{1}".format(n[0], n[1]) for n in df_make[['manufacturer', 'counts_make']].itertuples()],
                              'legend': {'loc': 'upper left', 'bbox_to_anchor': (1.05, 1), 'fontsize': 12, 'title':'Manufacturer'},
                              'title': {'label': '# Vehicles by Make', 'loc': 'center', 'fontsize':18},
                              'colors': colors_make
                              }
                              },
                              rows=9,
                              figsize=(16, 14)
                              )

                              饼图(pie chart)

                              饼图是显示组成的经典方式。然而,现在通常不建议使用它,因为馅饼部分的面积有时会变得误导。因此,如果您要使用饼图,强烈建议明确记下饼图每个部分的百分比或数字
                                # Import
                                df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                                # Prepare Data
                                df = df_raw.groupby('class').size()

                                # Make the plot with pandas
                                df.plot(kind='pie', subplots=True, figsize=(8, 8))
                                plt.title("Pie Chart of Vehicle Class - Bad")
                                plt.ylabel("")
                                plt.show()

                                  # Import
                                  df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                                  # Prepare Data
                                  df = df_raw.groupby('class').size().reset_index(name='counts')

                                  # Draw Plot
                                  fig, ax = plt.subplots(figsize=(12, 7), subplot_kw=dict(aspect="equal"), dpi= 80)

                                  data = df['counts']
                                  categories = df['class']
                                  explode = [0,0,0,0,0,0.1,0]

                                  def func(pct, allvals):
                                  absolute = int(pct/100.*np.sum(allvals))
                                  return "{:.1f}% ({:d} )".format(pct, absolute)

                                  wedges, texts, autotexts = ax.pie(data,
                                  autopct=lambda pct: func(pct, data),
                                  textprops=dict(color="w"),
                                  colors=plt.cm.Dark2.colors,
                                  startangle=140,
                                  explode=explode)

                                  # Decoration
                                  ax.legend(wedges, categories, title="Vehicle Class", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
                                  plt.setp(autotexts, size=10, weight=700)
                                  ax.set_title("Class of Vehicles: Pie Chart")
                                  plt.show()

                                  树形图(treemap)

                                  树形图类似于饼图,它可以更好地完成工作而不会误导每个组的贡献

                                    # 导入资源包:pip install squarify
                                    import squarify

                                    # Import Data
                                    df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                                    # Prepare Data
                                    df = df_raw.groupby('class').size().reset_index(name='counts')
                                    labels = df.apply(lambda x: str(x[0]) + "\n (" + str(x[1]) + ")", axis=1)
                                    sizes = df['counts'].values.tolist()
                                    colors = [plt.cm.Spectral(i/float(len(labels))) for i in range(len(labels))]

                                    # Draw Plot
                                    plt.figure(figsize=(12,8), dpi= 80)
                                    squarify.plot(sizes=sizes, label=labels, color=colors, alpha=.8)

                                    # Decorate
                                    plt.title('Treemap of Vechile Class')
                                    plt.axis('off')
                                    plt.show()

                                    条形图(bar chart)

                                    条形图是基于计数或任何给定指标可视化项目的经典方式。在下面的图表中,我为每个项目使用了不同的颜色,但您通常可能希望为所有项目选择一种颜色,除非您按组对其进行着色。颜色名称存储在下面代码中的all_colors中。您可以通过在plt.plot()
                                    中设置颜色参数来更改条的颜色。
                                      import random

                                      # Import Data
                                      df_raw = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")

                                      # Prepare Data
                                      df = df_raw.groupby('manufacturer').size().reset_index(name='counts')
                                      n = df['manufacturer'].unique().__len__()+1
                                      all_colors = list(plt.cm.colors.cnames.keys())
                                      random.seed(100)
                                      c = random.choices(all_colors, k=n)

                                      # Plot Bars
                                      plt.figure(figsize=(16,10), dpi= 80)
                                      plt.bar(df['manufacturer'], df['counts'], color=c, width=.5)
                                      for i, val in enumerate(df['counts'].values):
                                      plt.text(i, val, float(val), horizontalalignment='center', verticalalignment='bottom', fontdict={'fontweight':500, 'size':12})

                                      # Decoration
                                      plt.gca().set_xticklabels(df['manufacturer'], rotation=60, horizontalalignment= 'right')
                                      plt.title("Number of Vehicles by Manaufacturers", fontsize=22)
                                      plt.ylabel('# Vehicles')
                                      plt.ylim(0, 45)
                                      plt.show()


                                      「❤️ 感谢大家」

                                      如果你觉得这篇内容对你挺有有帮助的话:

                                      1. 点赞支持下吧,让更多的人也能看到这篇内容(收藏不点赞,都是耍流氓 -_-)
                                      2. 欢迎在留言区与我分享你的想法,也欢迎你在留言区记录你的思考过程。
                                      3. 觉得不错的话,也可以阅读近期梳理的文章(感谢各位的鼓励与支持🌹🌹🌹):

                                      点分享

                                      点点赞

                                      点在看

                                      文章转载自做一个柔情的程序猿,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                      评论