生成式 AI

AI 图像生成、视频生成、音乐创作等 AIGC 领域最新动态。

Build with Nano Banana 2, our best image generation and editing model
生成

Build with Nano Banana 2, our best image generation and editing model

Nano Banana 2 (Gemini 3.1 Flash Image) delivers Pro-level intelligence and fidelity for all image applications.

Google launches Nano Banana 2 model with faster image generation
生成

Google launches Nano Banana 2 model with faster image generation

Google is making Nano Banana 2 a default model in Gemini app and in AI mode

Nano Banana 2: Combining Pro capabilities with lightning-fast speed
生成

Nano Banana 2: Combining Pro capabilities with lightning-fast speed

<p>A new frontier in artificial intelligence has emerged with the unveiling of an advanced <strong>image generation mode...

生成

可灵3.0模型登顶全球视频生成大模型榜单

36氪获悉,近日,全球知名AI基准测试机构Artificial Analysis发布了最新的全球视频生成大模型榜单,可灵3.0系列模型(Kling 3.0 Pro)以1240的Arena ELO基准测试评分位居文生视频赛道第一位,在前15名...

生成

OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

arXiv:2602.12304v2 Announce Type: replace-cross Abstract: Existing mainstream video customization methods focus on gener...

生成

Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models

arXiv:2602.10953v2 Announce Type: replace-cross Abstract: Diffusion Language Models (DLMs) generate text by iteratively ...

生成

Monocular Normal Estimation via Shading Sequence Estimation

arXiv:2602.09929v3 Announce Type: replace-cross Abstract: Monocular normal estimation aims to estimate the normal map fr...

生成

World Simulation with Video Foundation Models for Physical AI

arXiv:2511.00062v2 Announce Type: replace-cross Abstract: We introduce [Cosmos-Predict2.5], the latest generation of the...

生成

Diversity Boosts AI-Generated Text Detection

arXiv:2509.18880v3 Announce Type: replace-cross Abstract: Detecting AI-generated text is an increasing necessity to comb...

生成

Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise

arXiv:2310.17167v2 Announce Type: replace-cross Abstract: This paper introduces two key contributions aimed at improving...

生成

Retrieval Challenges in Low-Resource Public Service Information: A Case Study on Food Pantry Access

arXiv:2602.21598v1 Announce Type: cross Abstract: Public service information systems are often fragmented, inconsistentl...

生成

Revisiting RAG Retrievers: An Information Theoretic Benchmark

arXiv:2602.21553v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems rely critically on the re...

生成

Provably Safe Generative Sampling with Constricting Barrier Functions

arXiv:2602.21429v1 Announce Type: cross Abstract: Flow-based generative models, such as diffusion models and flow matchi...

生成

Make Every Draft Count: Hidden State based Speculative Decoding

arXiv:2602.21224v1 Announce Type: cross Abstract: Speculative decoding has emerged as a pivotal technique to accelerate ...

生成

EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors

arXiv:2602.21218v1 Announce Type: cross Abstract: High-quality data is essential for modern machine learning, yet many v...

微软研究登上Nature:把人类文明刻在玻璃里保存一万年
生成

微软研究登上Nature:把人类文明刻在玻璃里保存一万年

编辑|冷猫人类有一种执念,就是将我们引以为傲的文明数据永远的保留下去。从旅行者一号的金唱片开始,这一切都被附上了一层浪漫色彩。这张金唱片以声音和图像的形式描绘地球生命。在发射时,制作人萨根博士表示:「只有在星际空间中存在先进的太空文明时,太...

生成

When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators

arXiv:2602.19946v2 Announce Type: replace-cross Abstract: Recent text-to-image (T2I) diffusion models produce visually s...

生成

Language Modeling and Understanding Through Paraphrase Generation and Detection

arXiv:2602.08274v3 Announce Type: replace-cross Abstract: Language enables humans to share knowledge, reason about the w...

生成

HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment

arXiv:2512.24787v2 Announce Type: replace-cross Abstract: Slate recommendation, which presents users with a ranked item ...

生成

Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation

arXiv:2511.17844v3 Announce Type: replace-cross Abstract: Fine-tuning large-scale text-to-video diffusion models to add ...

生成

Latent-Augmented Discrete Diffusion Models

arXiv:2510.18114v2 Announce Type: replace-cross Abstract: Discrete diffusion models have emerged as a powerful class of ...

生成

PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

arXiv:2509.25774v3 Announce Type: replace-cross Abstract: While reinforcement learning has advanced the alignment of tex...

生成

Polychromic Objectives for Reinforcement Learning

arXiv:2509.25424v3 Announce Type: replace-cross Abstract: Reinforcement learning fine-tuning (RLFT) is a dominant paradi...

生成

Diffusion Generative Recommendation with Continuous Tokens

arXiv:2504.12007v5 Announce Type: replace-cross Abstract: Recent advances in generative artificial intelligence, particu...

生成

A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers

arXiv:2502.01310v4 Announce Type: replace-cross Abstract: Neural network-based optimal transport (OT) is a recent and fr...

生成

ICE-ID: A Novel Historical Census Dataset for Longitudinal Identity Resolution

arXiv:2506.13792v2 Announce Type: replace Abstract: We introduce \textbf{ICE-ID}, a benchmark dataset comprising 984,028...

生成

TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer

arXiv:2602.20643v1 Announce Type: cross Abstract: Mobility trajectories are essential for understanding urban dynamics a...

生成

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

arXiv:2602.20497v1 Announce Type: cross Abstract: Diffusion models have achieved remarkable success in image and video g...

生成

VINA: Variational Invertible Neural Architectures

arXiv:2602.20480v1 Announce Type: cross Abstract: The distinctive architectural features of normalizing flows (NFs), not...

生成

Fast Spectrogram Event Extraction via Offline Self-Supervised Learning: From Fusion Diagnostics to Bioacoustics

arXiv:2602.20317v1 Announce Type: cross Abstract: Next-generation fusion facilities like ITER face a "data deluge," gene...

生成

Shape-informed cardiac mechanics surrogates in data-scarce regimes via geometric encoding and generative augmentation

arXiv:2602.20306v1 Announce Type: cross Abstract: High-fidelity computational models of cardiac mechanics provide mechan...

生成

InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

arXiv:2602.20294v1 Announce Type: cross Abstract: Simulating real personalities with large language models requires grou...

生成

CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

arXiv:2602.20213v1 Announce Type: cross Abstract: The evaluation of Large Language Models (LLMs) for code generation rel...

生成

Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling

arXiv:2602.20210v1 Announce Type: cross Abstract: Crystal modeling spans a family of conditional and unconditional gener...

生成

When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

arXiv:2602.20193v1 Announce Type: cross Abstract: Standard evaluations of backdoor attacks on text-to-image (T2I) models...

生成

PreScience: A Benchmark for Forecasting Scientific Contributions

arXiv:2602.20459v1 Announce Type: new Abstract: Can AI systems trained on the scientific record up to a fixed point in t...

生成

Diffusion Modulation via Environment Mechanism Modeling for Planning

arXiv:2602.20422v1 Announce Type: new Abstract: Diffusion models have shown promising capabilities in trajectory generat...

生成

Seedance 2.0 might be gen AI video’s next big hope, but it’s still slop

When Irish filmmaker Ruairi Robinson began uploading a series of short clips created with Seedance 2.0 - TikTok develope...