PosterLayout.pdf - 墨天轮文档

PosterLayout.pdf

dong

9页

0次

2023-10-06

100墨值下载

PosterLayout: A New Benchmark and Approach for Content-aware

Visual-Textual Presentation Layout

HsiaoYuan Hsu

1,2

, Xiangteng He

1,2

, Yuxin Peng

1,2

, Hao Kong

and Qing Zhang

Wangxuan Institute of Computer Technology, Peking University

National Key Laboratory for

Multimedia Information Processing, School of Computer Science, Peking University

Meituan

kslh99@stu.pku.edu.cn, {hexiangteng, pengyuxin, konghao}@pku.edu.cn, zhangqing31@meituan.com

Abstract

Content-aware visual-textual presentation layout aims at

arranging spatial space on the given canvas for pre-deﬁned

elements, including text, logo, and underlay, which is a key

to automatic template-free creative graphic design. In prac-

tical applications, e.g., poster designs, the canvas is orig-

inally non-empty, and both inter-element relationships as

well as inter-layer relationships should be concerned when

generating a proper layout. A few recent works deal with

them simultaneously, but they still suffer from poor graphic

performance, such as a lack of layout variety or spatial

non-alignment. Since content-aware visual-textual presen-

tation layout is a novel task, we ﬁrst construct a new dataset

named PKU PosterLayout, which consists of 9,974 poster-

layout pairs and 905 images, i.e., non-empty canvases. It

is more challenging and useful for greater layout variety,

domain diversity, and content diversity. Then, we propose

design sequence formation (DSF) that reorganizes elements

in layouts to imitate the design processes of human design-

ers, and a novel CNN-LSTM-based conditional generative

adversarial network (GAN) is presented to generate proper

layouts. Speciﬁcally, the discriminator is design-sequence-

aware and will supervise the ”design” process of the gen-

erator. Experimental results verify the usefulness of the

new benchmark and the effectiveness of the proposed ap-

proach, which achieves the best performance by generating

suitable layouts for diverse canvases. The dataset and the

source code are available at https://github.com/PKU-ICST-

MIPL/PosterLayout-CVPR2023.

1. Introduction

Nowadays, visual-textual presentation rendering infor-

mative and decorative elements on an image, i.e., canvas, is

widely used to convey information, such as advertisement

posters [5, 13, 16], magazines [20, 22], and so on [4,10,15].

Corresponding author.

Logo

extT

Underlay

(a) (b) (c)

The basis of these creative works is the layout that indicates

the spatial structure of the arranged elements, as shown in

Fig. 1, which is also a key factor inﬂuencing their effective-

ness and aesthetics. For their popularity and usefulness, not

only experienced designers but also novice ones or ”new-

bies” are commonly in need of creating them. People re-

sort to pre-deﬁned templates when they don’t have enough

prerequisites or need mass production. However, one can

easily imagine that these templates harshly limit the ﬂexi-

bility and diversity of the presentations. These drawbacks

of relying on templates hence highlight the importance and

practicality of template-free creative graphic design, which

can be preliminarily satisﬁed by automatically generating

Figure 1. Content-aware visual-textual presentation layout: (a)

Non-empty canvas; (b) Content-aware layout; (c) An example of

rendered presentation applying (b).

visual-textual presentation layouts.

With the advance in deep learning and big data, more and

more data-driven approaches for visual-textual presentation

layout have emerged in this decade. However, most of them

have only been devoted to mining the relationship between

elements and seldom concerned between layers, i.e., layout

and canvas. Without proper constraints, elements are easily

prone to cover the salient contents in the canvas, causing a

severe occlusion problem. For example, in advertisement

poster design, one of the most content-rich presentations,

the product in the canvas shouldn’t be over-occluded, which

is no doubt. A few works [1,23] deal with inter-element and

arXiv:2303.15937v1 [cs.CV] 28 Mar 2023

inter-layer relationships simultaneously, but they still suf-

fer from poor graphic performance, such as a lack of lay-

out variety or spatial non-alignment. To this end, we pro-

pose a CNN-LSTM-based generative adversarial network

(GAN) conditioned by the input canvases to generate lay-

outs, which has a balanced performance on both graphic

and content-aware metrics.

CNN-LSTM is proved effective in time series forecast-

ing or behavior analysis tasks [6, 14]. To enable this time-

sensitive model in layout generation, we propose design se-

quence formation (DSF) to generate design sequences that

imitate the design processes of human designers. In par-

ticular, elements in layouts are reorganized to involve im-

plicit temporal features, and less important ones can be dis-

carded painlessly. It is in line with the logic of human-

computer interaction logic [5] and has the potential to help

train the LSTM model on a training set of size smaller than

20,000 [18]. GAN is a generative model that contains a

discriminator and a generator gaming against each other to

learn the distribution of training data. In the proposed de-

sign sequence GAN (DS-GAN), the discriminator is design-

sequence-aware and will supervise the ”design” process,

i.e., generated layouts, of the generator under the constraints

of the given canvas. As far as we know, this paper is the ﬁrst

adoption of CNN-LSTM in layout generation.

Since content-aware visual-textual presentation layout

remains a novel task, there is only one public dataset in the

ﬁeld, and it has insufﬁcient variety. In this paper, we ﬁrst

construct and release a new dataset and benchmark named

PKU PosterLayout, which consists of 9,974 poster-layout

pairs and 905 images, i.e., non-empty canvases. Each lay-

out is represented by a set of elements labeled with class

and bounding box. We collect data from multiple sources to

guarantee diversity and variety in content, domain, and lay-

out, supporting it as a challenging benchmark expected to

encourage further research. Besides the dataset, we propose

and clearly deﬁne new metrics to accompany the old ones,

a total of eight graphic and content-aware metrics. They

evaluate the layouts in terms of utilization, non-occlusion,

and aesthetics. Both quantitative results and visualized re-

sults show that the proposed approach outperforms other ap-

proaches by generating proper layouts on diverse canvases.

We summarize the contribution of this paper as follows:

• A new and more challenging dataset and benchmark

for content-aware visual-textual presentation layout,

PKU PosterLayout, consists of 9,974 poster-layout

pairs and 905 images, with greater diversity and va-

riety in content, domain, and layout.

• An algorithm for design sequence formation (DSF)

converts plain layout data into design sequences in-

volving temporal features by imitating the design pro-

cess of human designers.

• A CNN-LSTM-based GAN, design sequence GAN

(DS-GAN), is conditioned by images and learns the

distribution of design sequences to generate content-

aware visual-textual presentation layouts. It makes

a good trade-off between graphic and content-aware

metrics, which outperforms the other approaches.

2. Related Work

Research on content-agnostic visual-textual presentation

has developed for a relatively long time, assuming the given

canvas is empty. O’Donovan et al. [15] proposed an energy-

based model that penalizes the part of layouts that violates

pre-deﬁned, complex design principles and thus could ob-

tain a more desirable one after non-linear inverse optimiza-

tion. The authors further presented a system [16] adopt-

ing this model with simpler principles, such as the size of

elements and pair alignment, to alleviate time-consuming

problem in heuristics.

Li et al. proposed LayoutGAN [12], taking a big step

forward in data-driven approaches by introducing GANs in

layout tasks. It adopted a differentiable wireframe render-

ing layer ﬂattening layouts and canvases into wireframe im-

ages, remaining the discrimination process an image classi-

ﬁcation problem. In contrast, it differed from a conventional

GAN in starting from a random initial layout that is primi-

tively valid and modulating it into an eligible one instead of

synthesizing layouts from fully random noise. The authors

further presented an attribute-conditioned LayoutGAN [13]

that guides the layout with the given element attributes, such

as minimum size, ﬁxed aspect ratio, and reading order of

elements. Moreover, it accompanied elements dropout in

the discrimination process, forcing the discriminator to be

aware of the local pattern of layouts, which is helpful in

visual-textual presentation layout. Besides the element at-

tributes, Zheng et al. [22] demonstrated the efﬁciency of

concerning the visual and textual semantics of the elements

and presentation topics. They proposed an embedding net-

work fusing cross-modal features to condition the GAN.

Kikuchi et al. proposed LayoutGAN++ [9] demonstrat-

ing an improvement in handling user-speciﬁc constraints by

optimizing layout in latent space. It got rid of using wire-

frame images with respect to the ﬁndings that the rendering

layer is unstable with a dataset of a limited size. Similarly,

Lee et al. [10] were concerned with user-speciﬁc constraints

and dealt with them using a graph neural network modeling

elements as nodes and their relationships as edges. Clar-

iﬁcation is needed that these user-speciﬁc constraints are

merely inter-layout and insufﬁcient for the task interested

in this paper. Speciﬁcally, content-aware visual-textual pre-

sentation layout concerns both inter-layout and inter-layer

relationships, i.e., layout and canvas, which is driven by

canvas with no mandatory constraints attached. However,

the ideas behind these content-agnostic approaches are still

of 9

100墨值下载

posterlayout

刀马

关注

评论