生成式 AI
AI 图像生成、视频生成、音乐创作等 AIGC 领域最新动态。
Build with Nano Banana 2, our best image generation and editing model
Nano Banana 2 (Gemini 3.1 Flash Image) delivers Pro-level intelligence and fidelity for all image applications.
Google launches Nano Banana 2 model with faster image generation
Google is making Nano Banana 2 a default model in Gemini app and in AI mode
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
<p>A new frontier in artificial intelligence has emerged with the unveiling of an advanced <strong>image generation mode...
可灵3.0模型登顶全球视频生成大模型榜单
36氪获悉,近日,全球知名AI基准测试机构Artificial Analysis发布了最新的全球视频生成大模型榜单,可灵3.0系列模型(Kling 3.0 Pro)以1240的Arena ELO基准测试评分位居文生视频赛道第一位,在前15名...
OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model
arXiv:2602.12304v2 Announce Type: replace-cross Abstract: Existing mainstream video customization methods focus on gener...
Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models
arXiv:2602.10953v2 Announce Type: replace-cross Abstract: Diffusion Language Models (DLMs) generate text by iteratively ...
Monocular Normal Estimation via Shading Sequence Estimation
arXiv:2602.09929v3 Announce Type: replace-cross Abstract: Monocular normal estimation aims to estimate the normal map fr...
World Simulation with Video Foundation Models for Physical AI
arXiv:2511.00062v2 Announce Type: replace-cross Abstract: We introduce [Cosmos-Predict2.5], the latest generation of the...
Diversity Boosts AI-Generated Text Detection
arXiv:2509.18880v3 Announce Type: replace-cross Abstract: Detecting AI-generated text is an increasing necessity to comb...
Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise
arXiv:2310.17167v2 Announce Type: replace-cross Abstract: This paper introduces two key contributions aimed at improving...
Retrieval Challenges in Low-Resource Public Service Information: A Case Study on Food Pantry Access
arXiv:2602.21598v1 Announce Type: cross Abstract: Public service information systems are often fragmented, inconsistentl...
Revisiting RAG Retrievers: An Information Theoretic Benchmark
arXiv:2602.21553v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems rely critically on the re...
Provably Safe Generative Sampling with Constricting Barrier Functions
arXiv:2602.21429v1 Announce Type: cross Abstract: Flow-based generative models, such as diffusion models and flow matchi...
Make Every Draft Count: Hidden State based Speculative Decoding
arXiv:2602.21224v1 Announce Type: cross Abstract: Speculative decoding has emerged as a pivotal technique to accelerate ...
EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors
arXiv:2602.21218v1 Announce Type: cross Abstract: High-quality data is essential for modern machine learning, yet many v...
微软研究登上Nature:把人类文明刻在玻璃里保存一万年
编辑|冷猫人类有一种执念,就是将我们引以为傲的文明数据永远的保留下去。从旅行者一号的金唱片开始,这一切都被附上了一层浪漫色彩。这张金唱片以声音和图像的形式描绘地球生命。在发射时,制作人萨根博士表示:「只有在星际空间中存在先进的太空文明时,太...
When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators
arXiv:2602.19946v2 Announce Type: replace-cross Abstract: Recent text-to-image (T2I) diffusion models produce visually s...
Language Modeling and Understanding Through Paraphrase Generation and Detection
arXiv:2602.08274v3 Announce Type: replace-cross Abstract: Language enables humans to share knowledge, reason about the w...
HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment
arXiv:2512.24787v2 Announce Type: replace-cross Abstract: Slate recommendation, which presents users with a ranked item ...
Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation
arXiv:2511.17844v3 Announce Type: replace-cross Abstract: Fine-tuning large-scale text-to-video diffusion models to add ...
Latent-Augmented Discrete Diffusion Models
arXiv:2510.18114v2 Announce Type: replace-cross Abstract: Discrete diffusion models have emerged as a powerful class of ...
PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models
arXiv:2509.25774v3 Announce Type: replace-cross Abstract: While reinforcement learning has advanced the alignment of tex...
Polychromic Objectives for Reinforcement Learning
arXiv:2509.25424v3 Announce Type: replace-cross Abstract: Reinforcement learning fine-tuning (RLFT) is a dominant paradi...
Diffusion Generative Recommendation with Continuous Tokens
arXiv:2504.12007v5 Announce Type: replace-cross Abstract: Recent advances in generative artificial intelligence, particu...
A Statistical Learning Perspective on Semi-dual Adversarial Neural Optimal Transport Solvers
arXiv:2502.01310v4 Announce Type: replace-cross Abstract: Neural network-based optimal transport (OT) is a recent and fr...
ICE-ID: A Novel Historical Census Dataset for Longitudinal Identity Resolution
arXiv:2506.13792v2 Announce Type: replace Abstract: We introduce \textbf{ICE-ID}, a benchmark dataset comprising 984,028...
TrajGPT-R: Generating Urban Mobility Trajectory with Reinforcement Learning-Enhanced Generative Pre-trained Transformer
arXiv:2602.20643v1 Announce Type: cross Abstract: Mobility trajectories are essential for understanding urban dynamics a...
LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration
arXiv:2602.20497v1 Announce Type: cross Abstract: Diffusion models have achieved remarkable success in image and video g...
VINA: Variational Invertible Neural Architectures
arXiv:2602.20480v1 Announce Type: cross Abstract: The distinctive architectural features of normalizing flows (NFs), not...
Fast Spectrogram Event Extraction via Offline Self-Supervised Learning: From Fusion Diagnostics to Bioacoustics
arXiv:2602.20317v1 Announce Type: cross Abstract: Next-generation fusion facilities like ITER face a "data deluge," gene...
Shape-informed cardiac mechanics surrogates in data-scarce regimes via geometric encoding and generative augmentation
arXiv:2602.20306v1 Announce Type: cross Abstract: High-fidelity computational models of cardiac mechanics provide mechan...
InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation
arXiv:2602.20294v1 Announce Type: cross Abstract: Simulating real personalities with large language models requires grou...
CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
arXiv:2602.20213v1 Announce Type: cross Abstract: The evaluation of Large Language Models (LLMs) for code generation rel...
Multimodal Crystal Flow: Any-to-Any Modality Generation for Unified Crystal Modeling
arXiv:2602.20210v1 Announce Type: cross Abstract: Crystal modeling spans a family of conditional and unconditional gener...
When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks
arXiv:2602.20193v1 Announce Type: cross Abstract: Standard evaluations of backdoor attacks on text-to-image (T2I) models...
PreScience: A Benchmark for Forecasting Scientific Contributions
arXiv:2602.20459v1 Announce Type: new Abstract: Can AI systems trained on the scientific record up to a fixed point in t...
Diffusion Modulation via Environment Mechanism Modeling for Planning
arXiv:2602.20422v1 Announce Type: new Abstract: Diffusion models have shown promising capabilities in trajectory generat...
Seedance 2.0 might be gen AI video’s next big hope, but it’s still slop
When Irish filmmaker Ruairi Robinson began uploading a series of short clips created with Seedance 2.0 - TikTok develope...