American robot doing parkour two years ago.
submitted by /u/enigmatic_erudition
[link] [comments]
In this post, we explore the Five V's Framework—a field-tested methodology that has helped 65% of AWS Generative AI Innovation Center customer projects successfully transition from concept to production, with some launching in just 45 days. The framework provides a structured approach through Value, Visualize, Validate, Verify, and Venture phases, shifting focus from "What can AI do?" to "What do we need AI to do?" while ensuring solutions deliver measurable business outcomes and sustainable operational excellence.
( 122
min )
Countdown to GTC Washington, DC: What to Watch Next Week 🔗 Next week, Washington, D.C., becomes the center of gravity for artificial intelligence. NVIDIA GTC Washington, D.C., lands at the Walter E. Washington Convention Center Oct. 27-29 — and for those who care about where computing is headed, this is the moment to pay attention.
Read Article
( 5
min )
Python's flexibility with data types is convenient when coding, but it can lead to runtime errors when your code receives unexpected data formats.
( 27
min )
arXiv:2510.20222v1 Announce Type: new
Abstract: In real-world time series forecasting tasks, category information plays a pivotal role in capturing inherent data patterns. This paper introduces QKCV (Query-Key-Category-Value) attention, an extension of the traditional QKV framework that incorporates a static categorical embedding C to emphasize category-specific information. As a versatile plug-in module, QKCV enhances the forecasting accuracy of attention-based models (e.g., Vanilla Transformer, Informer, PatchTST, TFT) across diverse real-world datasets. Furthermore, QKCV demonstrates remarkable adaptability in fine-tuning univariate time series foundation model by solely updating the static embedding C while preserving pretrained weights, thereby reducing computational overhead and achieving superior fine-tuning performance.
( 2
min )
arXiv:2510.20228v1 Announce Type: new
Abstract: We introduce SpLIIF to generate implicit neural representations and enable arbitrary downscaling of weather variables. We train a model from sparse weather stations and topography over Japan and evaluate in- and out-of-distribution accuracy predicting temperature and wind, comparing it to both an interpolation baseline and CorrDiff. We find the model to be up to 50% better than both CorrDiff and the baseline at downscaling temperature, and around 10-20% better for wind.
( 2
min )
arXiv:2510.20271v1 Announce Type: new
Abstract: Topological features capture global geometric structure in imaging data, but practical adoption in deep learning requires both computational efficiency and differentiability. We present optimized GPU kernels for the Euler Characteristic Curve (ECC) computation achieving 16-2000\"O speedups over prior GPU implementations on synthetic grids, and introduce a differentiable PyTorch layer enabling end-to-end learning. Our CUDA kernels, optimized for Ampere GPUs use 128B-coalesced access and hierarchical shared-memory accumulation. Our PyTorch layer learns thresholds in a single direction via a Differentiable Euler Characteristic Transform-style sigmoid relaxation. We discuss downstream relevance, including applications highlighted by prior ECC work, and outline batching/multi-GPU extensions to broaden adoption.
( 2
min )
arXiv:2510.19829v1 Announce Type: cross
Abstract: Electroencephalography (EEG) plays a crucial role in brain-computer interfaces (BCIs) and neurological diagnostics, but its real-world deployment faces challenges due to noise artifacts, missing data, and high annotation costs. We introduce SSL-SE-EEG, a framework that integrates Self-Supervised Learning (SSL) with Squeeze-and-Excitation Networks (SE-Nets) to enhance feature extraction, improve noise robustness, and reduce reliance on labeled data. Unlike conventional EEG processing techniques, SSL-SE-EEG} transforms EEG signals into structured 2D image representations, suitable for deep learning. Experimental validation on MindBigData, TUH-AB, SEED-IV and BCI-IV datasets demonstrates state-of-the-art accuracy (91% in MindBigData, 85% in TUH-AB), making it well-suited for real-time BCI applications. By enabling low-power, scalable EEG processing, SSL-SE-EEG presents a promising solution for biomedical signal analysis, neural engineering, and next-generation BCIs.
( 2
min )
arXiv:2510.19854v1 Announce Type: cross
Abstract: Accurate tropical cyclone (TC) short-term intensity forecasting with a 24-hour lead time is essential for disaster mitigation in the Atlantic TC basin. Since most TCs evolve far from land-based observing networks, satellite imagery is critical to monitoring these storms; however, these complex and high-resolution spatial structures can be challenging to qualitatively interpret in real time by forecasters. Here we propose a concise, interpretable, and descriptive approach to quantify fine TC structures with a multi-resolution analysis (MRA) by the discrete wavelet transform, enabling data analysts to identify physically meaningful structural features that strongly correlate with rapid intensity change. Furthermore, deep-learning techniques can build on this MRA for short-term intensity guidance.
( 2
min )
arXiv:2510.20334v1 Announce Type: cross
Abstract: The objective of this work is to develop a method for detecting rare gamma quanta against the background of charged particles in the fluxes from sources in the Universe with the help of the deep learning and normalizing flows based method designed for anomaly detection. It is shown that the suggested method has a potential for the gamma detection. The method was tested on model data from the TAIGA-IACT experiment. The obtained quantitative performance indicators are still inferior to other approaches, and therefore possible ways to improve the implementation of the method are proposed.
( 2
min )
arXiv:2510.20436v1 Announce Type: cross
Abstract: We present a fully decentralized routing framework for multi-robot exploration missions operating under the constraints of a Lunar Delay-Tolerant Network (LDTN). In this setting, autonomous rovers must relay collected data to a lander under intermittent connectivity and unknown mobility patterns. We formulate the problem as a Partially Observable Markov Decision Problem (POMDP) and propose a Graph Attention-based Multi-Agent Reinforcement Learning (GAT-MARL) policy that performs Centralized Training, Decentralized Execution (CTDE). Our method relies only on local observations and does not require global topology updates or packet replication, unlike classical approaches such as shortest path and controlled flooding-based algorithms. Through Monte Carlo simulations in randomized exploration environments, GAT-MARL provides higher delivery rates, no duplications, and fewer packet losses, and is able to leverage short-term mobility forecasts; offering a scalable solution for future space robotic systems for planetary exploration, as demonstrated by successful generalization to larger rover teams.
( 2
min )
arXiv:2406.05088v2 Announce Type: replace
Abstract: The rapid development of time series forecasting research has brought many deep learning-based modules in this field. However, despite the increasing amount of new forecasting architectures, it is still unclear if we have leveraged the full potential of these existing modules within a properly designed architecture. In this work, we propose a novel hierarchical neural architecture search approach for time series forecasting tasks. With the design of a hierarchical search space, we incorporate many architecture types designed for forecasting tasks and allow for the efficient combination of different forecasting architecture modules. Results on long-term-time-series-forecasting tasks show that our approach can search for lightweight high-performing forecasting architectures across different forecasting tasks.
( 2
min )
arXiv:2509.06213v3 Announce Type: replace
Abstract: We investigate reinforcement learning in the Game Of Hidden Rules (GOHR) environment, a complex puzzle in which an agent must infer and execute hidden rules to clear a 6$\times$6 board by placing game pieces into buckets. We explore two state representation strategies, namely Feature-Centric (FC) and Object-Centric (OC), and employ a Transformer-based Advantage Actor-Critic (A2C) algorithm for training. The agent has access only to partial observations and must simultaneously infer the governing rule and learn the optimal policy through experience. We evaluate our models across multiple rule-based and trial-list-based experimental setups, analyzing transfer effects and the impact of representation on learning efficiency.
( 2
min )
arXiv:2501.00565v3 Announce Type: replace-cross
Abstract: Even in low dimensions, sampling from multi-modal distributions is challenging. We provide the first sampling algorithm for a broad class of distributions -- including all Gaussian mixtures -- with a query complexity that is polynomial in the parameters governing multi-modality, assuming fixed dimension. Our sampling algorithm simulates a time-reversed diffusion process, using a self-normalized Monte Carlo estimator of the intermediate score functions. Unlike previous works, it avoids metastability, requires no prior knowledge of the mode locations, and relaxes the well-known log-smoothness assumption which excluded general Gaussian mixtures so far.
( 2
min )
arXiv:2502.13085v3 Announce Type: replace-cross
Abstract: Estimating Mutual Information (MI), a key measure of dependence of random quantities without specific modelling assumptions, is a challenging problem in high dimensions. We propose a novel mutual information estimator based on parametrizing conditional densities using normalizing flows, a deep generative model that has gained popularity in recent years. This estimator leverages a block autoregressive structure to achieve improved bias-variance trade-offs on standard benchmark tasks.
( 2
min )
arXiv:2510.20436v1 Announce Type: new
Abstract: We present a fully decentralized routing framework for multi-robot exploration missions operating under the constraints of a Lunar Delay-Tolerant Network (LDTN). In this setting, autonomous rovers must relay collected data to a lander under intermittent connectivity and unknown mobility patterns. We formulate the problem as a Partially Observable Markov Decision Problem (POMDP) and propose a Graph Attention-based Multi-Agent Reinforcement Learning (GAT-MARL) policy that performs Centralized Training, Decentralized Execution (CTDE). Our method relies only on local observations and does not require global topology updates or packet replication, unlike classical approaches such as shortest path and controlled flooding-based algorithms. Through Monte Carlo simulations in randomized exploration environments, GAT-MARL provides higher delivery rates, no duplications, and fewer packet losses, and is able to leverage short-term mobility forecasts; offering a scalable solution for future space robotic systems for planetary exploration, as demonstrated by successful generalization to larger rover teams.
( 2
min )
arXiv:2502.13085v3 Announce Type: replace
Abstract: Estimating Mutual Information (MI), a key measure of dependence of random quantities without specific modelling assumptions, is a challenging problem in high dimensions. We propose a novel mutual information estimator based on parametrizing conditional densities using normalizing flows, a deep generative model that has gained popularity in recent years. This estimator leverages a block autoregressive structure to achieve improved bias-variance trade-offs on standard benchmark tasks.
( 2
min )
arXiv:2509.06213v3 Announce Type: replace-cross
Abstract: We investigate reinforcement learning in the Game Of Hidden Rules (GOHR) environment, a complex puzzle in which an agent must infer and execute hidden rules to clear a 6$\times$6 board by placing game pieces into buckets. We explore two state representation strategies, namely Feature-Centric (FC) and Object-Centric (OC), and employ a Transformer-based Advantage Actor-Critic (A2C) algorithm for training. The agent has access only to partial observations and must simultaneously infer the governing rule and learn the optimal policy through experience. We evaluate our models across multiple rule-based and trial-list-based experimental setups, analyzing transfer effects and the impact of representation on learning efficiency.
( 2
min )
In this post, we explore how companies can systematically incorporate responsible AI practices into their generative AI project prioritization methodology to better evaluate business value against costs while addressing novel risks like hallucination and regulatory compliance. The post demonstrates through a practical example how conducting upfront responsible AI risk assessments can significantly change project rankings by revealing substantial mitigation work that affects overall project complexity and timeline.
( 120
min )
Fine-tuning has become much more accessible in 2024–2025, with parameter-efficient methods letting even 70B+ parameter models run on consumer GPUs.
( 26
min )
The nights grow longer and the shadows get bolder with Vampire The Masquerade: Bloodlines 2 on GeForce NOW, launching with GeForce RTX 5080-power. Members can sink their teeth into the action role-playing game from Paradox Interactive as part of nine games coming to the cloud this week, including NINJA GAIDEN 4. Be among the first
Read Article
( 7
min )
arXiv:2510.18989v1 Announce Type: new
Abstract: Nonlinear PDE solvers require fine space-time discretizations and local linearizations, leading to high memory cost and slow runtimes. Neural operators such as FNOs and DeepONets offer fast single-shot inference by learning function-to-function mappings and truncating high-frequency components, but they suffer from poor out-of-distribution (OOD) generalization, often failing on inputs outside the training distribution. We propose an adversarial teacher-student distillation framework in which a differentiable numerical solver supervises a compact neural operator while a PGD-style active sampling loop searches for worst-case inputs under smoothness and energy constraints to expand the training set. Using differentiable spectral solvers enables gradient-based adversarial search and stabilizes sample mining. Experiments on Burgers and Navier-Stokes systems demonstrate that adversarial distillation substantially improves OOD robustness while preserving the low parameter cost and fast inference of neural operators.
( 2
min )
arXiv:2510.19374v1 Announce Type: cross
Abstract: We revisit Cox's proportional hazard models and LASSO in the aim of improving feature selection in survival analysis. Unlike traditional methods relying on cross-validation or BIC, the penalty parameter $\lambda$ is directly tuned for feature selection and is asymptotically pivotal thanks to taking the square root of Cox's partial likelihood. Substantially improving over both cross-validation LASSO and BIC subset selection, our approach has a phase transition on the probability of retrieving all and only the good features, like in compressed sensing. The method can be employed by linear models but also by artificial neural networks.
( 2
min )
arXiv:2303.04201v4 Announce Type: replace
Abstract: Determining causal effects of interventions onto outcomes from real-world, observational (non-randomized) data, e.g., treatment repurposing using electronic health records, is challenging due to underlying bias. Causal deep learning has improved over traditional techniques for estimating individualized treatment effects (ITE). We present the Doubly Robust Variational Information-theoretic Deep Adversarial Learning (DR-VIDAL), a novel generative framework that combines two joint models of treatment and outcome, ensuring an unbiased ITE estimation even when one of the two is misspecified. DR-VIDAL integrates: (i) a variational autoencoder (VAE) to factorize confounders into latent variables according to causal assumptions; (ii) an information-theoretic generative adversarial network (Info-GAN) to generate counterfactuals; (iii) a doubly robust block incorporating treatment propensities for outcome predictions. On synthetic and real-world datasets (Infant Health and Development Program, Twin Birth Registry, and National Supported Work Program), DR-VIDAL achieves better performance than other non-generative and generative methods. In conclusion, DR-VIDAL uniquely fuses causal assumptions, VAE, Info-GAN, and doubly robustness into a comprehensive, performant framework. Code is available at: https://github.com/Shantanu48114860/DR-VIDAL-AMIA-22 under MIT license.
( 3
min )
arXiv:2508.21664v2 Announce Type: replace-cross
Abstract: This paper demonstrates the feasibility of trajectory learning for ensemble forecasts by employing the continuous ranked probability score (CRPS) as a loss function. Using the two-scale Lorenz '96 system as a case study, we develop and train both additive and multiplicative stochastic parametrizations to generate ensemble predictions. Results indicate that CRPS-based trajectory learning produces parametrizations that are both accurate and sharp. The resulting parametrizations are straightforward to calibrate and outperform derivative-fitting-based parametrizations in short-term forecasts. This approach is particularly promising for data assimilation applications due to its accuracy over short lead times.
( 2
min )
arXiv:2510.19374v1 Announce Type: new
Abstract: We revisit Cox's proportional hazard models and LASSO in the aim of improving feature selection in survival analysis. Unlike traditional methods relying on cross-validation or BIC, the penalty parameter $\lambda$ is directly tuned for feature selection and is asymptotically pivotal thanks to taking the square root of Cox's partial likelihood. Substantially improving over both cross-validation LASSO and BIC subset selection, our approach has a phase transition on the probability of retrieving all and only the good features, like in compressed sensing. The method can be employed by linear models but also by artificial neural networks.
( 2
min )
In this post, we explore how product teams can leverage Amazon Bedrock and AWS services to transform their creative workflows through generative AI, enabling rapid content iteration across multiple formats while maintaining brand consistency and compliance. The solution demonstrates how teams can deploy a scalable generative AI application that accelerates everything from product descriptions and marketing copy to visual concepts and video content, significantly reducing time to market while enhancing creative quality.
( 123
min )
In this post, we explore advanced cost monitoring strategies for Amazon Bedrock deployments, introducing granular custom tagging approaches for precise cost allocation and comprehensive reporting mechanisms that build upon the proactive cost management foundation established in Part 1. The solution demonstrates how to implement invocation-level tagging, application inference profiles, and integration with AWS Cost Explorer to create a complete 360-degree view of generative AI usage and expenses.
( 121
min )
In this post, we introduce a comprehensive solution for proactively managing Amazon Bedrock inference costs through a cost sentry mechanism designed to establish and enforce token usage limits, providing organizations with a robust framework for controlling generative AI expenses. The solution uses serverless workflows and native Amazon Bedrock integration to deliver a predictable, cost-effective approach that aligns with organizational financial constraints while preventing runaway costs through leading indicators and real-time budget enforcement.
( 122
min )
In this post, we demonstrate how Amazon Nova Premier with Amazon Bedrock can systematically migrate legacy C code to modern Java/Spring applications using an intelligent agentic workflow that breaks down complex conversions into specialized agent roles. The solution reduces migration time and costs while improving code quality through automated validation, security assessment, and iterative refinement processes that handle even large codebases exceeding token limitations.
( 131
min )
In this post, we detail how Metagenomi partnered with AWS to implement the Progen2 protein language model on AWS Inferentia, achieving up to 56% cost reduction for high-throughput enzyme generation workflows. The implementation enabled cost-effective generation of millions of novel enzyme variants using EC2 Inf2 Spot Instances and AWS Batch, demonstrating how cloud-based generative AI can make large-scale protein design more accessible for biotechnology applications .
( 123
min )
Last week, over 1,000 attendees joined NVIDIA AI Day Sydney to learn about sovereign AI — including 16 breakout sessions on agentic and physical AI, robotics and AI factories.
( 8
min )
Professors Facundo Batista and Dina Katabi, along with three additional MIT alumni, are honored for their outstanding professional achievement and commitment to service.
( 5
min )
Every data engineering team operates on a set of accepted principles, a playbook of "best practices" intentionally designed to ensure scalability, governance, and performance.
The post Best practices that break data platforms appeared first on Data Science Central.
( 24
min )
In the epoch of LLMs, it may seem like the most classical machine learning concepts, methods, and techniques like feature engineering are no longer in the spotlight.
( 23
min )
arXiv:2510.17887v1 Announce Type: new
Abstract: We present a comprehensive, physics aware deep learning framework for constructing fast and accurate surrogate models of rarefied, shock containing micro nozzle flows. The framework integrates three key components, a Fusion DeepONet operator learning architecture for capturing parameter dependencies, a physics-guided feature space that embeds a shock-aligned coordinate system, and a two-phase curriculum strategy emphasizing high-gradient regions. To demonstrate the generality and inductive bias of the proposed framework, we first validate it on the canonical viscous Burgers equation, which exhibits advective steepening and shock like gradients.
( 2
min )
arXiv:2510.17937v1 Announce Type: new
Abstract: We present UniRL-Zero, a unified reinforcement learning (RL) framework that boosts, multimodal language model understanding and reasoning, diffusion model multimedia generation, and their beneficial interaction capabilities within a unified model. Our work defines six scenarios for unified model reinforcement learning, providing systematic baselines for reinforcement learning of unified understanding and generation model. Our code is available at https://github.com/G-U-N/UniRL.
( 2
min )
arXiv:2510.18238v1 Announce Type: new
Abstract: There has been increasing research interest in AI/ML for social impact, and correspondingly more publication venues have refined review criteria for practice-driven AI/ML research. However, these review guidelines tend to most concretely recognize projects that simultaneously achieve deployment and novel ML methodological innovation. We argue that this introduces incentives for researchers that undermine the sustainability of a broader research ecosystem of social impact, which benefits from projects that make contributions on single front (applied or methodological) that may better meet project partner needs. Our position is that researchers and reviewers in machine learning for social impact must simultaneously adopt: 1) a more expansive conception of social impacts beyond deployment and 2) more rigorous evaluations of the impact of deployed systems.
( 2
min )
arXiv:2510.18388v1 Announce Type: new
Abstract: This paper investigates the approximation properties of shallow neural networks with activation functions that are powers of exponential functions. It focuses on the dependence of the approximation rate on the dimension and the smoothness of the function being approximated within the Barron function space. We examine the approximation rates of ReLU$^{k}$ activation functions, proving that the optimal rate cannot be achieved under $\ell^{1}$-bounded coefficients or insufficient smoothness conditions.
We also establish optimal approximation rates in various norms for functions in Barron spaces and Sobolev spaces, confirming the curse of dimensionality. Our results clarify the limits of shallow neural networks' approximation capabilities and offer insights into the selection of activation functions and network structures.
( 2
min )
arXiv:2510.18615v1 Announce Type: new
Abstract: We present a new approach for distilling boosted trees into decision trees, in the objective of generating an ML model offering an acceptable compromise in terms of predictive performance and interpretability. We explain how the correction approach called rectification can be used to implement such a distillation process. We show empirically that this approach provides interesting results, in comparison with an approach to distillation achieved by retraining the model.
( 2
min )
arXiv:2510.17916v1 Announce Type: cross
Abstract: The Free Energy Principle (FEP) states that self-organizing systems must minimize variational free energy to persist, but the path from principle to implementable algorithm has remained unclear. We present a constructive proof that the FEP can be realized through exact local credit assignment. The system decomposes gradient computation hierarchically: spatial credit via feedback alignment, temporal credit via eligibility traces, and structural credit via a Trophic Field Map (TFM) that estimates expected gradient magnitude for each connection block. We prove these mechanisms are exact at their respective levels and validate the central claim empirically: the TFM achieves 0.9693 Pearson correlation with oracle gradients. This exactness produces emergent capabilities including 98.6% retention after task interference, autonomous recovery from 75% structural damage, self-organized criticality (spectral radius p ~= 1.0$), and sample-efficient reinforcement learning on continuous control tasks without replay buffers. The architecture unifies Prigogine's dissipative structures, Friston's free energy minimization, and Hopfield's attractor dynamics, demonstrating that exact hierarchical inference over network topology can be implemented with local, biologically plausible rules.
( 3
min )
arXiv:2510.18143v1 Announce Type: cross
Abstract: Small Language Models (SLMs) offer compelling advantages in deployment cost and latency, but their accuracy often lags behind larger models, particularly for complex domain-specific tasks. While supervised fine-tuning can help bridge this performance gap, it requires substantial manual effort in data preparation and iterative optimization. We present PaDA-Agent (Pattern-guided Data Augmentation Agent), an evaluation-driven approach that streamlines the data augmentation process for SLMs through coordinated operations. Unlike state-of-the-art approaches that focus on model training errors only and generating error-correcting samples, PaDA-Agent discovers failure patterns from the validation data via evaluations and drafts targeted data augmentation strategies aiming to directly reduce the generalization gap. Our experimental results demonstrate significant improvements over state-of-the-art LLM-based data augmentation approaches for Llama 3.2 1B Instruct model fine-tuning.
( 2
min )
arXiv:2510.18608v1 Announce Type: cross
Abstract: The birth of Foundation Models brought unprecedented results in a wide range of tasks, from language to vision, to robotic control. These models are able to process huge quantities of data, and can extract and develop rich representations, which can be employed across different domains and modalities. However, they still have issues in adapting to dynamic, real-world scenarios without retraining the entire model from scratch. In this work, we propose the application of Continual Learning and Compositionality principles to foster the development of more flexible, efficient and smart AI solutions.
( 2
min )
arXiv:2510.18760v1 Announce Type: cross
Abstract: Data restoration from degraded observations, of sparsity hypotheses, is an active field of study. Traditional iterative optimization methods are now complemented by deep learning techniques. The development of unfolded methods benefits from both families. We carry out a comparative study of three architectures on parameterized chromatographic signal databases, highlighting the performance of these approaches, especially when employing metrics adapted to physico-chemical peak signal characterization.
( 2
min )
In this post, we walk through how to take an ML model built in SageMaker Canvas and deploy it using SageMaker Serverless Inference, helping you go from model creation to production-ready predictions quickly and efficiently without managing any infrastructure. This solution demonstrates a complete workflow from adding your trained model to the SageMaker Model Registry through creating serverless endpoint configurations and deploying endpoints that automatically scale based on demand .
( 40
min )
In this post, we explore how Amazon Nova Sonic's speech-to-speech capabilities can be combined with Amazon Bedrock AgentCore to create sophisticated multi-agent voice assistants that break complex tasks into specialized, manageable components. The approach demonstrates how to build modular, scalable voice applications using a banking assistant example with dedicated sub-agents for authentication, banking inquiries, and mortgage services, offering a more maintainable alternative to monolithic voice assistant designs.
( 38
min )
In this post, we demonstrate how to deploy and manage machine learning training workloads using the Amazon SageMaker HyperPod training operator, which enhances training resilience for Kubernetes workloads through pinpoint recovery and customizable monitoring capabilities. The Amazon SageMaker HyperPod training operator helps accelerate generative AI model development by efficiently managing distributed training across large GPU clusters, offering benefits like centralized training process monitoring, granular process recovery, and hanging job detection that can reduce recovery times from tens of minutes to seconds.
( 41
min )
Coastal communities in the U.S. have a 26% chance of flooding within a 30-year period. This percentage is expected to increase due to climate-change-driven sea-level rise, making these areas even more vulnerable. Michael Beck, professor and director of the UC Santa Cruz Center for Coastal Climate Resilience, focuses on modeling and mapping the benefits of
Read Article
( 7
min )
SentinelStep enables AI agents to handle monitoring tasks that run for hours or days, like watching for emails or tracking prices. It works by managing when agents should check and their context, avoiding wasted resources and missed updates.
The post Tell me when: Building agents that can wait, monitor, and act appeared first on Microsoft Research.
( 11
min )
Building AI agents that work in production requires more than powerful models.
( 26
min )
arXiv:2510.15985v1 Announce Type: new
Abstract: Sepsis is a life-threatening infectious syndrome associated with high mortality in intensive care units (ICUs). Early and accurate sepsis prediction (SP) is critical for timely intervention, yet remains challenging due to subtle early manifestations and rapidly escalating mortality. While AI has improved SP efficiency, existing methods struggle to capture weak early temporal signals. This paper introduces a Multi-Endogenous-view Representation Enhancement (MERE) mechanism to construct enriched feature views, coupled with a Cascaded Dual-convolution Time-series Attention (CDTA) module for multi-scale temporal representation learning. The proposed MEET-Sepsis framework achieves competitive prediction accuracy using only 20% of the ICU monitoring time required by SOTA methods, significantly advancing early SP. Extensive validation confirms its efficacy. Code is available at: https://github.com/yueliangy/MEET-Sepsis.
( 2
min )
arXiv:2510.15986v1 Announce Type: new
Abstract: Sleep disorders have a major impact on patients' health and quality of life, but their diagnosis remains complex due to the diversity of symptoms. Today, technological advances, combined with medical data analysis, are opening new perspectives for a better understanding of these disorders. In particular, explainable artificial intelligence (XAI) aims to make AI model decisions understandable and interpretable for users. In this study, we propose a clustering-based method to group patients according to different sleep disorder profiles. By integrating an explainable approach, we identify the key factors influencing these pathologies. An experiment on anonymized real data illustrates the effectiveness and relevance of our approach.
( 2
min )
arXiv:2510.16026v1 Announce Type: new
Abstract: We provide an accessible description of a peer-reviewed generalizable causal machine learning pipeline to (i) discover latent causal sources of large-scale electronic health records observations, and (ii) quantify the source causal effects on clinical outcomes. We illustrate how imperfect multimodal clinical data can be processed, decomposed into probabilistic independent latent sources, and used to train taskspecific causal models from which individual causal effects can be estimated. We summarize the findings of the two real-world applications of the approach to date as a demonstration of its versatility and utility for medical discovery at scale.
( 2
min )
arXiv:2510.16440v1 Announce Type: new
Abstract: This report presents the winning solution for Task 1 of Colliding with Adversaries: A Challenge on Robust Learning in High Energy Physics Discovery at ECML-PKDD 2025. The task required designing an adversarial attack against a provided classification model that maximizes misclassification while minimizing perturbations. Our approach employs a multi-round gradient-based strategy that leverages the differentiable structure of the model, augmented with random initialization and sample-mixing techniques to enhance effectiveness. The resulting attack achieved the best results in perturbation size and fooling success rate, securing first place in the competition.
( 2
min )
arXiv:2510.16877v1 Announce Type: new
Abstract: Using a nearly-frozen pretrained model, the continual representation learning paradigm reframes parameter updates as a similarity-matching problem to mitigate catastrophic forgetting. However, directly leveraging pretrained features for downstream tasks often suffers from multicollinearity in the similarity-matching stage, and more advanced methods can be computationally prohibitive for real-time, low-latency applications. Inspired by the fly olfactory circuit, we propose Fly-CL, a bio-inspired framework compatible with a wide range of pretrained backbones. Fly-CL substantially reduces training time while achieving performance comparable to or exceeding that of current state-of-the-art methods. We theoretically show how Fly-CL progressively resolves multicollinearity, enabling more effective similarity matching with low time complexity. Extensive simulation experiments across diverse network architectures and data regimes validate Fly-CL's effectiveness in addressing this challenge through a biologically inspired design. Code is available at https://github.com/gfyddha/Fly-CL.
( 2
min )
arXiv:2510.17120v1 Announce Type: new
Abstract: We introduce a novel regularization scheme for autoencoders based on matricial free energy. Our approach defines a differentiable loss function in terms of the singular values of the code matrix (code dimension x batch size). From the standpoint of free probability an d random matrix theory, this loss achieves its minimum when the singular value distribution of the code matrix coincides with that of an appropriately sculpted random metric with i.i.d. Gaussian entries. Empirical simulations demonstrate that minimizing the negative matricial free energy through standard stochastic gradient-based training yields Gaussian-like codes that generalize across training and test sets. Building on this foundation, we propose a matricidal free energy maximizing autoencoder that reliably produces Gaussian codes and show its application to underdetermined inverse problems.
( 2
min )
arXiv:2510.17214v1 Announce Type: new
Abstract: Effective and accurate diagnosis of fuel cell health status is crucial for ensuring the stable operation of fuel cell stacks. Among various parameters, high-frequency impedance serves as a critical indicator for assessing fuel cell state and health conditions. However, its online testing is prohibitively complex and costly. This paper employs a deep sparse auto-encoding network for the prediction and classification of high-frequency impedance in fuel cells, achieving metric of accuracy rate above 92\%. The network is further deployed on an FPGA, attaining a hardware-based recognition rate almost 90\%.
( 2
min )
arXiv:2510.17380v1 Announce Type: new
Abstract: Optimizing the energy management within a smart grids scenario presents significant challenges, primarily due to the complexity of real-world systems and the intricate interactions among various components. Reinforcement Learning (RL) is gaining prominence as a solution for addressing the challenges of Optimal Power Flow in smart grids. However, RL needs to iterate compulsively throughout a given environment to obtain the optimal policy. This means obtaining samples from a, most likely, costly simulator, which can lead to a sample efficiency problem. In this work, we address this problem by substituting costly smart grid simulators with surrogate models built using Phisics-informed Neural Networks (PINNs), optimizing the RL policy training process by arriving to convergent results in a fraction of the time employed by the original environment.
( 2
min )
arXiv:2510.17496v1 Announce Type: new
Abstract: We introduce I-RAVEN-X, a symbolic benchmark designed to evaluate generalization and robustness in analogical and mathematical reasoning for Large Language Models (LLMs) and Large Reasoning Models (LRMs). I-RAVEN-X extends I-RAVEN by increasing operand complexity, attribute range, and introducing perceptual uncertainty. Compared to LLMs, empirical results show that LRMs achieve improved productivity and systematicity on longer reasoning relations and wider attribute ranges, respectively. However, LRMs are still significantly challenged by reasoning under uncertainty and cannot effectively explore multiple probabilistic outcomes.
( 2
min )
arXiv:2510.15883v1 Announce Type: cross
Abstract: Traditional stochastic control methods in finance struggle in real world markets due to their reliance on simplifying assumptions and stylized frameworks. Such methods typically perform well in specific, well defined environments but yield suboptimal results in changed, non stationary ones. We introduce FinFlowRL, a novel framework for financial optimal stochastic control. The framework pretrains an adaptive meta policy learning from multiple expert strategies, then finetunes through reinforcement learning in the noise space to optimize the generative process. By employing action chunking generating action sequences rather than single decisions, it addresses the non Markovian nature of markets. FinFlowRL consistently outperforms individually optimized experts across diverse market conditions.
( 2
min )
arXiv:2510.15892v1 Announce Type: cross
Abstract: This study introduces geometric algebra to decompose credit system relationships into their projective (correlation-like) and rotational (feedback-spiral) components. We represent economic states as multi-vectors in Clifford algebra, where bivector elements capture the rotational coupling between unemployment, consumption, savings, and credit utilization. This mathematical framework reveals interaction patterns invisible to conventional analysis: when unemployment and credit contraction enter simultaneous feedback loops, their geometric relationship shifts from simple correlation to dangerous rotational dynamics that characterize systemic crises.
( 2
min )
arXiv:2510.15903v1 Announce Type: cross
Abstract: This study presents a comprehensive empirical comparison between quantum machine learning (QML) and classical machine learning (CML) approaches in Automated Market Makers (AMM) and Decentralized Finance (DeFi) trading strategies through extensive backtesting on 10 models across multiple cryptocurrency assets. Our analysis encompasses classical ML models (Random Forest, Gradient Boosting, Logistic Regression), pure quantum models (VQE Classifier, QNN, QSVM), hybrid quantum-classical models (QASA Hybrid, QASA Sequence, QuantumRWKV), and transformer models. The results demonstrate that hybrid quantum models achieve superior overall performance with 11.2\% average return and 1.42 average Sharpe ratio, while classical ML models show 9.8\% average return and 1.47 average Sharpe ratio. The QASA Sequence hybrid model achieves the highest individual return of 13.99\% with the best Sharpe ratio of 1.76, demonstrating the potential of quantum-classical hybrid approaches in AMM and DeFi trading strategies.
( 2
min )
arXiv:2510.16140v1 Announce Type: cross
Abstract: The CMAP (cultural mapping and pattern analysis) visualization toolkit introduced in this paper is an open-source suite for analyzing and visualizing text data - from qualitative fieldnotes and in-depth interview transcripts to historical documents and web-scaped data like message board posts or blogs. The toolkit is designed for scholars integrating pattern analysis, data visualization, and explanation in qualitative and/or computational social science (CSS). Despite the existence of off-the-shelf commercial qualitative data analysis software, there is a dearth of highly scalable open source options that can work with large data sets, and allow advanced statistical and language modeling. The foundation of the toolkit is a pragmatic approach that aligns research tools with social science project goals- empirical explanation, theory-guided measurement, comparative design, or evidence-based recommendations- guided by the principle that research paradigm and questions should determine methods. Consequently, the CMAP visualization toolkit offers a range of possibilities through the adjustment of relatively small number of parameters, and allows integration with other python tools.
( 3
min )
arXiv:2510.16985v1 Announce Type: cross
Abstract: Bengali social media platforms have witnessed a sharp increase in hate speech, disproportionately affecting women and adolescents. While datasets such as BD-SHS provide a basis for structured evaluation, most prior approaches rely on either computationally costly full-model fine-tuning or proprietary APIs. This paper presents the first application of Parameter-Efficient Fine-Tuning (PEFT) for Bengali hate speech detection using LoRA and QLoRA. Three instruction-tuned large language models - Gemma-3-4B, Llama-3.2-3B, and Mistral-7B - were fine-tuned on the BD-SHS dataset of 50,281 annotated comments. Each model was adapted by training fewer than 1% of its parameters, enabling experiments on a single consumer-grade GPU. The results show that Llama-3.2-3B achieved the highest F1-score of 92.23%, followed by Mistral-7B at 88.94% and Gemma-3-4B at 80.25%. These findings establish PEFT as a practical and replicable strategy for Bengali and related low-resource languages.
( 2
min )
arXiv:2411.02058v3 Announce Type: replace
Abstract: A data-driven approach based on unsupervised machine learning is proposed to infer the intrinsic dimension $m^{\ast}$ of the high-dimensional trajectories of the Fermi-Pasta-Ulam-Tsingou (FPUT) model. Principal component analysis (PCA) is applied to trajectory data consisting of $n_s = 4,000,000$ datapoints, of the FPUT $\beta$ model with $N = 32$ coupled oscillators, revealing a critical relationship between $m^{\ast}$ and the model's nonlinear strength. By estimating the intrinsic dimension $m^{\ast}$ using multiple methods (participation ratio, Kaiser rule, and the Kneedle algorithm), it is found that $m^{\ast}$ increases with the model nonlinearity. Interestingly, in the weakly nonlinear regime, for trajectories initialized by exciting the first mode, the participation ratio estimates $m^{\ast} = 2, 3$, strongly suggesting that quasi-periodic motion on a low-dimensional Riemannian manifold underlies the characteristic energy recurrences observed in the FPUT model.
( 3
min )
arXiv:2505.24760v2 Announce Type: replace
Abstract: We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and various common games. Its key innovation is the ability to generate virtually infinite training data with adjustable complexity, unlike most previous reasoning datasets, which are typically fixed. This procedural generation approach allows for continuous evaluation across varying difficulty levels. Our experimental results demonstrate the efficacy of RG in both evaluating and reinforcement learning of reasoning models.
( 2
min )
arXiv:2508.18130v2 Announce Type: replace
Abstract: Transformers are the de-facto choice for sequence modelling, yet their quadratic self-attention and weak temporal bias can make long-range forecasting both expensive and brittle. We introduce FreezeTST, a lightweight hybrid that interleaves frozen random-feature (reservoir) blocks with standard trainable Transformer layers. The frozen blocks endow the network with rich nonlinear memory at no optimisation cost; the trainable layers learn to query this memory through self-attention. The design cuts trainable parameters and also lowers wall-clock training time, while leaving inference complexity unchanged. On seven standard long-term forecasting benchmarks, FreezeTST consistently matches or surpasses specialised variants such as Informer, Autoformer, and PatchTST; with substantially lower compute. Our results show that embedding reservoir principles within Transformers offers a simple, principled route to efficient long-term time-series prediction.
( 2
min )
arXiv:2012.01194v2 Announce Type: replace-cross
Abstract: In this article, we introduce and analyze a deep learning based approximation algorithm for SPDEs. Our approach employs neural networks to approximate the solutions of SPDEs along given realizations of the driving noise process. If applied to a set of simulated noise trajectories, it yields empirical distributions of SPDE solutions, from which functionals like the mean and variance can be estimated. We test the performance of the method on stochastic heat equations with additive and multiplicative noise as well as stochastic Black-Scholes equations with multiplicative noise and Zakai equations from nonlinear filtering theory. In all cases, the proposed algorithm yields accurate results with short runtimes in up to 100 space dimensions.
( 2
min )
arXiv:2504.20322v2 Announce Type: replace-cross
Abstract: Fine-grained visual classification aims to recognize objects belonging to many subordinate categories of a supercategory, where appearance alone often fails to distinguish highly similar classes. We propose a unified framework that integrates image, text, and metadata via cross-contrastive pre-training. We first align the three modality encoders in a shared embedding space and then fine-tune the image and metadata encoders for classification. On NABirds, our approach improves over the baseline by 7.83% and achieves 84.44% top-1 accuracy, outperforming strong multimodal methods.
( 2
min )
arXiv:2505.21580v2 Announce Type: replace-cross
Abstract: Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in high dimensions, kernelised Stein discrepancy (KSD) tests are a powerful tool. Here, we develop a KSD-type test for IRG models that can be carried out with a single observation of the network. The test applies to a network of any size, but is particularly interesting for small networks for which asymptotic tests are not warranted. We also provide theoretical guarantees.
( 2
min )
arXiv:2508.07559v2 Announce Type: replace-cross
Abstract: We study the approximation complexity of high-dimensional second-order elliptic PDEs with homogeneous boundary conditions on the unit hypercube, within the framework of Barron spaces. Under the assumption that the coefficients belong to suitably defined Barron spaces, we prove that the solution can be efficiently approximated by two-layer neural networks, circumventing the curse of dimensionality. Our results demonstrate the expressive power of shallow networks in capturing high-dimensional PDE solutions under appropriate structural assumptions.
( 2
min )
arXiv:2510.17120v1 Announce Type: cross
Abstract: We introduce a novel regularization scheme for autoencoders based on matricial free energy. Our approach defines a differentiable loss function in terms of the singular values of the code matrix (code dimension x batch size). From the standpoint of free probability an d random matrix theory, this loss achieves its minimum when the singular value distribution of the code matrix coincides with that of an appropriately sculpted random metric with i.i.d. Gaussian entries. Empirical simulations demonstrate that minimizing the negative matricial free energy through standard stochastic gradient-based training yields Gaussian-like codes that generalize across training and test sets. Building on this foundation, we propose a matricidal free energy maximizing autoencoder that reliably produces Gaussian codes and show its application to underdetermined inverse problems.
( 2
min )
arXiv:2505.21580v2 Announce Type: replace
Abstract: Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in high dimensions, kernelised Stein discrepancy (KSD) tests are a powerful tool. Here, we develop a KSD-type test for IRG models that can be carried out with a single observation of the network. The test applies to a network of any size, but is particularly interesting for small networks for which asymptotic tests are not warranted. We also provide theoretical guarantees.
( 2
min )
arXiv:2012.01194v2 Announce Type: replace-cross
Abstract: In this article, we introduce and analyze a deep learning based approximation algorithm for SPDEs. Our approach employs neural networks to approximate the solutions of SPDEs along given realizations of the driving noise process. If applied to a set of simulated noise trajectories, it yields empirical distributions of SPDE solutions, from which functionals like the mean and variance can be estimated. We test the performance of the method on stochastic heat equations with additive and multiplicative noise as well as stochastic Black-Scholes equations with multiplicative noise and Zakai equations from nonlinear filtering theory. In all cases, the proposed algorithm yields accurate results with short runtimes in up to 100 space dimensions.
( 2
min )
NVIDIA and Google Cloud are expanding access to accelerated computing to transform the full spectrum of enterprise workloads, from visual computing to agentic and physical AI. Google Cloud today announced the general availability of G4 VMs, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. Plus, NVIDIA Omniverse and NVIDIA Isaac Sim are now
Read Article
( 7
min )
AI engineering has shifted from a futuristic niche to one of the most in-demand tech careers on the planet.
( 23
min )
arXiv:2510.15294v1 Announce Type: new
Abstract: In this paper we present a neural network-based method for the automatic detection of phase transitions and classification of hidden percolation patterns in a (1+1)-dimensional replication process. The proposed network model is based on the combination of CNN, TCN and GRU networks, which are trained directly on raw configurations without any manual feature extraction. The network reproduces the phase diagram and assigns phase labels to configurations. It shows that deep architectures are capable of extracting hierarchical structures from the raw data of numerical experiments.
( 2
min )
arXiv:2510.15535v1 Announce Type: new
Abstract: The extensive adoption of Deep Neural Networks has led to their increased utilization in challenging scientific visualization tasks. Recent advancements in building compressed data models using implicit neural representations have shown promising results for tasks like spatiotemporal volume visualization and super-resolution. Inspired by these successes, we develop compressed neural representations for multivariate datasets containing tens to hundreds of variables. Our approach utilizes a single network to learn representations for all data variables simultaneously through parameter sharing. This allows us to achieve state-of-the-art data compression. Through comprehensive evaluations, we demonstrate superior performance in terms of reconstructed data quality, rendering and visualization quality, preservation of dependency information among variables, and storage efficiency.
( 2
min )
arXiv:2505.11325v2 Announce Type: replace-cross
Abstract: Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient sampling procedure to construct Bayesian posteriors for such estimates based on Martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the uncertainty quantification of our method in inference applications.
( 2
min )
arXiv:2510.15093v1 Announce Type: cross
Abstract: We present a generalized, data-driven collisional operator for one-component plasmas, learned from molecular dynamics simulations, to extend the collisional kinetic model beyond the weakly coupled regime. The proposed operator features an anisotropic, non-stationary collision kernel that accounts for particle correlations typically neglected in classical Landau formulations. To enable efficient numerical evaluation, we develop a fast spectral separation method that represents the kernel as a low-rank tensor product of univariate basis functions. This formulation admits an $O(N \log N)$ algorithm via fast Fourier transforms and preserves key physical properties, including discrete conservation laws and the H-theorem, through a structure-preserving central difference discretization. Numerical experiments demonstrate that the proposed model accurately captures plasma dynamics in the moderately coupled regime beyond the standard Landau model while maintaining high computational efficiency and structure-preserving properties.
( 2
min )
arXiv:2505.11325v2 Announce Type: replace-cross
Abstract: Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient sampling procedure to construct Bayesian posteriors for such estimates based on Martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the uncertainty quantification of our method in inference applications.
( 2
min )
AI has ignited a new industrial revolution. NVIDIA and TSMC are working together to build the infrastructure that powers the world’s AI factories, right here in America. NVIDIA founder and CEO Jensen Huang today visited TSMC’s semiconductor manufacturing facility in Phoenix to celebrate the first NVIDIA Blackwell wafer produced on U.S. soil, representing that Blackwell
Read Article
( 6
min )
Open Source AI Week kicks off on Monday with a series of hackathons, workshops and meetups spotlighting the latest advances in AI, machine learning and open-source innovation. The event brings together leading organizations, researchers and open-source communities to share knowledge, collaborate on tools and explore how openness accelerates AI development. NVIDIA continues to expand access
Read Article
( 5
min )
To reduce waste, the Refashion program helps users create outlines for adaptable clothing, such as pants that can be reconfigured into a dress. Each component of these pieces can be replaced, rearranged, or restyled.
( 7
min )
Exciting news for BigQuery ML (BQML) users.
( 22
min )
Vector databases have become essential in most modern AI applications.
( 26
min )
I've recently been experimenting with one of my favorite old-school neural networks, a tiny program that runs on my laptop and knows only about the data I give it. Without internet training, char-rnn doesn't have outside references to draw on (for better or for worse) but
( 4
min )
AI Weirdness: the strange side of machine learning
( 2
min )
This post shows how TP ICAP used Amazon Bedrock Knowledge Bases and Amazon Bedrock Evaluations to build ClientIQ, an enterprise-grade solution with enhanced security features for extracting CRM insights using AI, delivering immediate business value.
( 41
min )
In the post Principal Financial Group increases Voice Virtual Assistant performance using Genesys, Amazon Lex, and Amazon QuickSight, we discussed the overall Principal Virtual Assistant solution using Genesys Cloud, Amazon Lex V2, multiple AWS services, and a custom reporting and analytics solution using Amazon QuickSight.
( 38
min )
In this post, we discuss an approach that can guide you to build comprehensive and empirically driven evaluations that can help you make better decisions when selecting the right model for your task.
( 43
min )
In this post, we show how Splash Music is setting a new standard for AI-powered music creation by using its advanced HummingLM model with AWS Trainium on Amazon SageMaker HyperPod. As a selected startup in the 2024 AWS Generative AI Accelerator, Splash Music collaborated closely with AWS Startups and the AWS Generative AI Innovation Center (GenAIIC) to fast-track innovation and accelerate their music generation FM development lifecycle.
( 41
min )
arXiv:2510.14125v1 Announce Type: new
Abstract: We introduce a neural network-driven robust optimisation framework that integrates data-driven domain as a constraint into the nonlinear programming technique, addressing the overlooked issue of domain-inconsistent solutions arising from the interaction of parametrised neural network models with optimisation solvers. Applied to a 1180 MW capacity combined cycle gas power plant, our framework delivers domain-consistent robust optimal solutions that achieve a verified 0.76 percentage point mean improvement in energy efficiency. For the first time, scaling this efficiency gain to the global fleet of gas power plants, we estimate an annual 26 Mt reduction potential in CO$_2$ (with 10.6 Mt in Asia, 9.0 Mt in the Americas, and 4.5 Mt in Europe). These results underscore the synergetic role of machine learning in delivering near-term, scalable decarbonisation pathways for global climate action.
( 2
min )
arXiv:2510.14449v1 Announce Type: new
Abstract: Multi-class wine classification presents fundamental trade-offs between model accuracy, feature dimensionality, and interpretability - critical factors for production deployment in analytical chemistry. This paper presents a comprehensive empirical study of One-vs-Rest logistic regression on the UCI Wine dataset (178 samples, 3 cultivars, 13 chemical features), comparing from-scratch gradient descent implementation against scikit-learn's optimized solvers and quantifying L1 regularization effects on feature sparsity. Manual gradient descent achieves 92.59 percent mean test accuracy with smooth convergence, validating theoretical foundations, though scikit-learn provides 24x training speedup and 98.15 percent accuracy. Class-specific analysis reveals distinct chemical signatures with heterogeneous patterns where color intensity varies dramatically (0.31 to 16.50) across cultivars. L1 regularization produces 54-69 percent feature reduction with only 4.63 percent accuracy decrease, demonstrating favorable interpretability-performance trade-offs. We propose an optimal 5-feature subset achieving 62 percent complexity reduction with estimated 92-94 percent accuracy, enabling cost-effective deployment with 80 dollars savings per sample and 56 percent time reduction. Statistical validation confirms robust generalization with sub-2ms prediction latency suitable for real-time quality control. Our findings provide actionable guidelines for practitioners balancing comprehensive chemical analysis against targeted feature measurement in resource-constrained environments.
( 3
min )
arXiv:2510.14780v1 Announce Type: new
Abstract: This paper addresses the problem of estimating causal directed acyclic graphs in linear non-Gaussian acyclic models with latent confounders (LvLiNGAM). Existing methods assume mutually independent latent confounders or cannot properly handle models with causal relationships among observed variables.
We propose a novel algorithm that identifies causal DAGs in LvLiNGAM, allowing causal structures among latent variables, among observed variables, and between the two. The proposed method leverages higher-order cumulants of observed data to identify the causal structure. Extensive simulations and experiments with real-world data demonstrate the validity and practical utility of the proposed algorithm.
( 2
min )
arXiv:2510.13819v1 Announce Type: cross
Abstract: This paper studies user localization aided by a Reconfigurable Intelligent Surface (RIS). A feedback link from the Base Station (BS) to the user is adopted to enable dynamic power control of the user pilot transmissions in the uplink. A novel multi-agent algorithm for the joint control of the RIS phase configuration and the user transmit power is presented, which is based on a hybrid approach integrating NeuroEvolution (NE) and supervised learning. The proposed scheme requires only single-bit feedback messages for the uplink power control, supports RIS elements with discrete responses, and is numerically shown to outperform fingerprinting, deep reinforcement learning baselines and backpropagation-based position estimators.
( 2
min )
arXiv:2510.14303v1 Announce Type: cross
Abstract: In recent years, the rapid increase in academic publications across various fields has posed severe challenges for academic paper analysis: scientists struggle to timely and comprehensively track the latest research findings and methodologies. Key concept extraction has proven to be an effective analytical paradigm, and its automation has been achieved with the widespread application of language models in industrial and scientific domains. However, existing paper databases are mostly limited to similarity matching and basic classification of key concepts, failing to deeply explore the relational networks between concepts. This paper is based on the OpenAlex opensource knowledge graph. By analyzing nearly 8,000 open-source paper data from Novosibirsk State University, we discovered a strong correlation between the distribution patterns of paper key concept paths and both innovation points and rare paths. We propose a prompt engineering-based key concept path analysis method. This method leverages small language models to achieve precise key concept extraction and innovation point identification, and constructs an agent based on a knowledge graph constraint mechanism to enhance analysis accuracy. Through fine-tuning of the Qwen and DeepSeek models, we achieved significant improvements in accuracy, with the models publicly available on the Hugging Face platform.
( 3
min )
arXiv:2510.14694v1 Announce Type: cross
Abstract: We are grateful to the discussants, Levis and Kennedy [2025], Luo and Geng [2025], Wang and van der Laan [2025], and Yang and Kim [2025], for their thoughtful comments on our paper (Nabi et al., 2025). In this rejoinder, we summarize our main contributions and respond to each discussion in turn.
( 2
min )
arXiv:2509.00102v2 Announce Type: replace
Abstract: Transformer-based foundation models for Electrocardiograms (ECGs) have recently achieved impressive performance in many downstream applications.
( 2
min )
arXiv:2503.09649v3 Announce Type: replace-cross
Abstract: Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. This paper provides a gentle introduction to this approach in bioinformatics, and is the first to review key applications in proteomics, genome-wide association studies (GWAS), single-cell and multi-omics studies in their legal as well as methodological and infrastructural challenges. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads to a faster and more robust exploration and translation of results. More widespread use of federated learning may have a similar impact in bioinformatics, allowing academic and clinical institutions to access many combinations of genotypic, phenotypic and environmental information that are undercovered or not included in existing biobanks.
( 3
min )
arXiv:2510.14694v1 Announce Type: cross
Abstract: We are grateful to the discussants, Levis and Kennedy [2025], Luo and Geng [2025], Wang and van der Laan [2025], and Yang and Kim [2025], for their thoughtful comments on our paper (Nabi et al., 2025). In this rejoinder, we summarize our main contributions and respond to each discussion in turn.
( 2
min )
arXiv:2510.14780v1 Announce Type: cross
Abstract: This paper addresses the problem of estimating causal directed acyclic graphs in linear non-Gaussian acyclic models with latent confounders (LvLiNGAM). Existing methods assume mutually independent latent confounders or cannot properly handle models with causal relationships among observed variables.
We propose a novel algorithm that identifies causal DAGs in LvLiNGAM, allowing causal structures among latent variables, among observed variables, and between the two. The proposed method leverages higher-order cumulants of observed data to identify the causal structure. Extensive simulations and experiments with real-world data demonstrate the validity and practical utility of the proposed algorithm.
( 2
min )
arXiv:2503.09649v3 Announce Type: replace-cross
Abstract: Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. This paper provides a gentle introduction to this approach in bioinformatics, and is the first to review key applications in proteomics, genome-wide association studies (GWAS), single-cell and multi-omics studies in their legal as well as methodological and infrastructural challenges. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads to a faster and more robust exploration and translation of results. More widespread use of federated learning may have a similar impact in bioinformatics, allowing academic and clinical institutions to access many combinations of genotypic, phenotypic and environmental information that are undercovered or not included in existing biobanks.
( 3
min )
Organizations often face challenges when implementing single-shot fine-tuning approaches for their generative AI models. The single-shot fine-tuning method involves selecting training data, configuring hyperparameters, and hoping the results meet expectations without the ability to make incremental adjustments. Single-shot fine-tuning frequently leads to suboptimal results and requires starting the entire process from scratch when improvements are […]
( 37
min )
In this post, we'll demonstrate how to implement a Quick Service Restaurants (QSRs) drive-thru solution using Amazon Nova Sonic and AWS services. We'll walk through building an intelligent system that combines voice AI with interactive menu displays, providing technical insights and implementation guidance to help restaurants modernize their drive-thru operations.
( 43
min )
This post provides a comprehensive hands-on guide to fine-tune Amazon Nova Lite for document processing tasks, with a focus on tax form data extraction. Using our open-source GitHub repository code sample, we demonstrate the complete workflow from data preparation to model deployment.
( 42
min )
In this article, you will learn three proven ways to speed up model training by optimizing precision, memory, and data flow — without adding any...
( 26
min )
An increasing number of AI and machine learning-based systems feed on text data — language models are a notable example today.
( 22
min )
GeForce NOW is more than just a platform to stream fresh games every week — it offers celebrations for the gamers who make it epic, with member rewards to sweeten the deal. This week, GeForce NOW Ultimate members can score a free Borderlands 4 reward and upgrade their setup with the latest SteelSeries Nimbus Cloud
Read Article
( 8
min )
arXiv:2510.12254v1 Announce Type: new
Abstract: Text-to-Image (T2I) models have demonstrated their versatility in a wide range of applications. However, adaptation of T2I models to specialized tasks is often limited by the availability of task-specific data due to privacy concerns. On the other hand, harnessing the power of rich multimodal data from modern mobile systems and IoT infrastructures presents a great opportunity. This paper introduces Federated Multi-modal Knowledge Transfer (FedMMKT), a novel framework that enables co-enhancement of a server T2I model and client task-specific models using decentralized multimodal data without compromising data privacy.
( 2
min )
arXiv:2510.12672v1 Announce Type: new
Abstract: Large Language Models are susceptible to jailbreak attacks that bypass built-in safety guardrails (e.g., by tricking the model with adversarial prompts). We propose Concept Alignment and Concept Manipulation \textbf{CALM}, an inference-time method that suppresses harmful concepts by modifying latent representations of the last layer of the model, without retraining. Leveraging \gls*{cw} technique from Computer Vision combined with orthogonal projection, CALM removes unwanted latent directions associated with harmful content while preserving model performance. Experiments show that CALM reduces harmful outputs and outperforms baseline methods in most metrics, offering a lightweight approach to AI safety with no additional training data or model fine-tuning, while incurring only a small computational overhead at inference.
( 2
min )
arXiv:2510.11789v1 Announce Type: cross
Abstract: We study the convergence rate of learning pairwise interactions in single-layer attention-style models, where tokens interact through a weight matrix and a non-linear activation function. We prove that the minimax rate is $M^{-\frac{2\beta}{2\beta+1}}$ with $M$ being the sample size, depending only on the smoothness $\beta$ of the activation, and crucially independent of token count, ambient dimension, or rank of the weight matrix. These results highlight a fundamental dimension-free statistical efficiency of attention-style nonlocal models, even when the weight matrix and activation are not separately identifiable and provide a theoretical understanding of the attention mechanism and its training.
( 2
min )
arXiv:2510.12077v1 Announce Type: cross
Abstract: We study neural network compressibility by using singular learning theory to extend the minimum description length (MDL) principle to singular models like neural networks. Through extensive experiments on the Pythia suite with quantization, factorization, and other compression techniques, we find that complexity estimates based on the local learning coefficient (LLC) are closely, and in some cases, linearly correlated with compressibility. Our results provide a path toward rigorously evaluating the limits of model compression.
( 2
min )
arXiv:2510.12560v1 Announce Type: cross
Abstract: End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.
( 2
min )
After being trained with this technique, vision-language models can better identify a unique item in a new scene.
( 7
min )
In this post, we share four high-impact, widely adopted use cases built with Nova in Amazon Bedrock, supported by real-world customers deployments, offerings available from AWS partners, and experiences. These examples are ideal for organizations researching their own AI adoption strategies and use cases across industries.
( 39
min )
In this post, we explore how Amazon Bedrock AgentCore Memory transforms raw conversational data into persistent, actionable knowledge through sophisticated extraction, consolidation, and retrieval mechanisms that mirror human cognitive processes. The system tackles the complex challenge of building AI agents that don't just store conversations but extract meaningful insights, merge related information across time, and maintain coherent memory stores that enable truly context-aware interactions.
( 40
min )
Misconfiguration issues in distributed training with Amazon EKS can be prevented following a systematic approach to launch required components and verify their proper configuration. This post walks through the steps to set up and verify an EKS cluster for training large models using DLCs.
( 44
min )
This post provides a comprehensive guide on integrating the Almond kernel into SageMaker Studio, offering a solution for Scala development within the platform.
( 39
min )
The former department chair was an early innovator in the use of artificial intelligence to both study and influence how children learn music.
( 6
min )
Media Lab PhD student Kimaya Lecamwasam researches how music can shape well-being.
( 7
min )
Across the tech sector, the term ‘AI agent’ has become Silicon Valley’s latest Rorschach test—revealing more about the speaker’s worldview than any shared technical definition. Across Fortune 500 companies, I’ve seen CTOs, CMOs, business leaders, and AI researchers invoke the phrase to mean radically different things—a linguistic inkblot that’s already draining millions through misaligned investments.… Read More »The rise of accountable AI agents: How knowledge graphs solve the autonomy problem
The post The rise of accountable AI agents: How knowledge graphs solve the autonomy problem appeared first on Data Science Central.
( 20
min )
The NVIDIA Inception startup projects that space-based data centers will offer 10x lower energy costs and reduce the need for energy consumption on Earth.
( 7
min )
arXiv:2510.09705v1 Announce Type: new
Abstract: Static feature exclusion strategies often fail to prevent bias when hidden dependencies influence the model predictions. To address this issue, we explore a reinforcement learning (RL) framework that integrates bias mitigation and automated feature selection within a single learning process. Unlike traditional heuristic-driven filter or wrapper approaches, our RL agent adaptively selects features using a reward signal that explicitly integrates predictive performance with fairness considerations. This dynamic formulation allows the model to balance generalization, accuracy, and equity throughout the training process, rather than rely exclusively on pre-processing adjustments or post hoc correction mechanisms. In this paper, we describe the construction of a multi-component reward function, the specification of the agents action space over feature subsets, and the integration of this system with ensemble learning. We aim to provide a flexible and generalizable way to select features in environments where predictors are correlated and biases can inadvertently re-emerge.
( 2
min )
arXiv:2510.09723v1 Announce Type: new
Abstract: In this paper, we introduce Narrative Learning, a methodology where models are defined entirely in natural language and iteratively refine their classification criteria using explanatory prompts rather than traditional numerical optimisation. We report on experiments to evaluate the accuracy and potential of this approach using 3 synthetic and 3 natural datasets and compare them against 7 baseline explainable machine learning models. We demonstrate that on 5 out of 6 of these datasets, Narrative Learning became more accurate than the baseline explainable models in 2025 or earlier because of improvements in language models. We also report on trends in the lexicostatistics of these models' outputs as a proxy for the comprehensibility of the explanations.
( 2
min )
arXiv:2510.09739v1 Announce Type: new
Abstract: The lexical hypothesis posits that personality traits are encoded in language and is foundational to models like the Big Five. We created a bottom-up personality model from a classic adjective list using machine learning and compared its descriptive utility against the Big Five by analyzing one million Reddit comments. The Big Five, particularly Agreeableness, Conscientiousness, and Neuroticism, provided a far more powerful and interpretable description of these online communities. In contrast, our machine-learning clusters provided no meaningful distinctions, failed to recover the Extraversion trait, and lacked the psychometric coherence of the Big Five. These results affirm the robustness of the Big Five and suggest personality's semantic structure is context-dependent. Our findings show that while machine learning can help check the ecological validity of established psychological theories, it may not be able to replace them.
( 2
min )
arXiv:2510.09752v1 Announce Type: new
Abstract: Patent drafting presents significant challenges due to its reliance on the extensive experience and specialized expertise of patent attorneys, who must possess both legal acumen and technical understanding of an invention to craft patent applications in a formal legal writing style. This paper presents a demonstration of Patentformer, an AI-powered automated patent drafting platform designed to support patent attorneys by rapidly producing high-quality patent applications adhering to legal writing standards.
( 2
min )
arXiv:2510.09845v1 Announce Type: new
Abstract: This work demonstrates the possibilities for improving wildfire and air quality management in the western United States by leveraging the unprecedented hourly data from NASA's TEMPO satellite mission and advances in self-supervised deep learning. Here we demonstrate the efficacy of deep learning for mapping the near real-time hourly spread of wildfire fronts and smoke plumes using an innovative self-supervised deep learning-system: successfully distinguishing smoke plumes from clouds using GOES-18 and TEMPO data, strong agreement across the smoke and fire masks generated from different sensing modalities as well as significant improvement over operational products for the same cases.
( 3
min )
arXiv:2510.09977v1 Announce Type: new
Abstract: Online sensing plays an important role in advancing modern manufacturing. The real-time sensor signals, which can be stored as high-resolution time series data, contain rich information about the operation status. One of its popular usages is online process monitoring, which can be achieved by effective anomaly detection from the sensor signals. However, most existing approaches either heavily rely on labeled data for training supervised models, or are designed to detect only extreme outliers, thus are ineffective at identifying subtle semantic off-track anomalies to capture where new regimes or unexpected routines start. To address this challenge, we propose an matrix profile-based unsupervised anomaly detection algorithm that captures fabrication cycle similarity and performs semantic segmentation to precisely identify the onset of defect anomalies in additive manufacturing. The effectiveness of the proposed method is demonstrated by the experiments on real-world sensor data.
( 2
min )
arXiv:2510.10739v1 Announce Type: new
Abstract: We introduce a general stochastic differential equation framework for modelling multiobjective optimization dynamics in iterative Large Language Model (LLM) interactions. Our framework captures the inherent stochasticity of LLM responses through explicit diffusion terms and reveals systematic interference patterns between competing objectives via an interference matrix formulation. We validate our theoretical framework using iterative code generation as a proof-of-concept application, analyzing 400 sessions across security, efficiency, and functionality objectives. Our results demonstrate strategy-dependent convergence behaviors with rates ranging from 0.33 to 1.29, and predictive accuracy achieving R2 = 0.74 for balanced approaches. This work proposes the feasibility of dynamical systems analysis for multi-objective LLM interactions, with code generation serving as an initial validation domain.
( 2
min )
arXiv:2510.10915v1 Announce Type: new
Abstract: Time series anomaly detection(TSAD) is a critical task in signal processing field, ensuring the reliability of complex systems. Reconstruction-based methods dominate in TSAD. Among these methods, VAE-based methods have achieved promising results. Existing VAE-based methods suffer from the limitation of single-window feature and insufficient leveraging of long-term time and frequency information. We propose a Conditional Variational AutoEncoder with Long-term dependency and Probabilistic time-frequency fusion, named LPCVAE. LPCVAE introduces LSTM to capture long-term dependencies beyond windows. It further incorporates a Product-of-Experts (PoE) mechanism for adaptive and distribution-level probabilistic fusion. This design effectively mitigates time-frequency information loss. Extensive experiments on four public datasets demonstrate it outperforms state-of-the-art methods. The results confirm that integrating long-term time and frequency representations with adaptive fusion yields a robust and efficient solution for TSAD.
( 2
min )
arXiv:2510.11209v1 Announce Type: new
Abstract: We propose a new reservoir computing method for forecasting high-resolution spatiotemporal datasets. By combining multi-resolution inputs from coarser to finer layers, our architecture better captures both local and global dynamics. Applied to Sea Surface Temperature data, it outperforms standard parallel reservoir models in long-term forecasting, demonstrating the effectiveness of cross-layers coupling in improving predictive accuracy. Finally, we show that the optimal network dynamics in each layer become increasingly linear, revealing the slow modes propagated to subsequent layers.
( 2
min )
arXiv:2510.10526v1 Announce Type: cross
Abstract: This research develops a sentiment-driven quantitative trading system that leverages a large language model, FinGPT, for sentiment analysis, and explores a novel method for signal integration using a reinforcement learning algorithm, Twin Delayed Deep Deterministic Policy Gradient (TD3). We compare the performance of strategies that integrate sentiment and technical signals using both a conventional rule-based approach and a reinforcement learning framework. The results suggest that sentiment signals generated by FinGPT offer value when combined with traditional technical indicators, and that reinforcement learning algorithm presents a promising approach for effectively integrating heterogeneous signals in dynamic trading environments.
( 2
min )
arXiv:2507.11620v2 Announce Type: replace
Abstract: Event time series are sequences of discrete events occurring at irregular time intervals, each associated with a domain-specific observational modality. They are common in domains such as high-energy astrophysics, computational social science, cybersecurity, finance, healthcare, neuroscience, and seismology. Their unstructured and irregular structure poses significant challenges for extracting meaningful patterns and identifying salient phenomena using conventional techniques. We propose novel two- and three-dimensional tensor representations for event time series, coupled with sparse autoencoders that learn physically meaningful latent representations. These embeddings support a variety of downstream tasks, including anomaly detection, similarity-based retrieval, semantic clustering, and unsupervised classification. We demonstrate our approach on a real-world dataset from X-ray astronomy, showing that these representations successfully capture temporal and spectral signatures and isolate diverse classes of X-ray transients. Our framework offers a flexible, scalable, and generalizable solution for analyzing complex, irregular event time series across scientific and industrial domains.
( 3
min )
arXiv:2510.11789v1 Announce Type: new
Abstract: We study the convergence rate of learning pairwise interactions in single-layer attention-style models, where tokens interact through a weight matrix and a non-linear activation function. We prove that the minimax rate is $M^{-\frac{2\beta}{2\beta+1}}$ with $M$ being the sample size, depending only on the smoothness $\beta$ of the activation, and crucially independent of token count, ambient dimension, or rank of the weight matrix. These results highlight a fundamental dimension-free statistical efficiency of attention-style nonlocal models, even when the weight matrix and activation are not separately identifiable and provide a theoretical understanding of the attention mechanism and its training.
( 2
min )
arXiv:2510.12077v1 Announce Type: new
Abstract: We study neural network compressibility by using singular learning theory to extend the minimum description length (MDL) principle to singular models like neural networks. Through extensive experiments on the Pythia suite with quantization, factorization, and other compression techniques, we find that complexity estimates based on the local learning coefficient (LLC) are closely, and in some cases, linearly correlated with compressibility. Our results provide a path toward rigorously evaluating the limits of model compression.
( 2
min )
An algorithm can change the face of food assistance policy in the Global South, says MIT assistant professor and J-WAFS researcher Ali Aouad.
( 6
min )
Acting as a “virtual spectrometer,” SpectroGen generates spectroscopic data in any modality, such as X-ray or infrared, to quickly assess a material’s quality.
( 7
min )
Co-founded by an MIT alumnus, Watershed Bio offers researchers who aren’t software engineers a way to run large-scale analyses to accelerate biology.
( 6
min )
In this post, we explore how to build a conversational device management system using Amazon Bedrock AgentCore. With this solution, users can manage their IoT devices through natural language, using a UI for tasks like checking device status, configuring WiFi networks, and monitoring user activity.
( 37
min )
This post shows how Salesforce integrated Amazon Bedrock Custom Model Import into their machine learning operations (MLOps) workflow, reused existing endpoints without application changes, and benchmarked scalability. We share key metrics on operational efficiency and cost optimization gains, and offer practical insights for simplifying your deployment strategy.
( 38
min )
At Oracle AI World, NVIDIA and Oracle announced they are deepening their collaboration to bolster sovereign AI initiatives and accelerate government digital transformation worldwide. By combining NVIDIA’s AI computing platforms with Oracle’s scalable cloud infrastructure, the collaboration enables organizations, such as Abu Dhabi’s Department of Government Enablement (DGE), in partnership with Deloitte and Core42, to
Read Article
( 7
min )
AI is transforming the way enterprises build, deploy and scale intelligent applications. As demand surges for enterprise-grade AI applications that offer speed, scalability and security, industries are swiftly moving toward platforms that can streamline data processing and deliver intelligence at every layer of the business. At Oracle AI World, Oracle today announced a new OCI
Read Article
( 8
min )
The next AI revolution starts where rockets launch. NVIDIA DGX Spark’s first stop: Starbase, Texas. NVIDIA founder and CEO Jensen Huang arrived at the SpaceX facility — amid towering engines and gleaming steel — to hand-deliver the company’s just-launched DGX Spark to Elon Musk. Huang arrived walking past rows of engineers who waved and grinned.
Read Article
( 8
min )
arXiv:2510.09705v1 Announce Type: cross
Abstract: Static feature exclusion strategies often fail to prevent bias when hidden dependencies influence the model predictions. To address this issue, we explore a reinforcement learning (RL) framework that integrates bias mitigation and automated feature selection within a single learning process. Unlike traditional heuristic-driven filter or wrapper approaches, our RL agent adaptively selects features using a reward signal that explicitly integrates predictive performance with fairness considerations. This dynamic formulation allows the model to balance generalization, accuracy, and equity throughout the training process, rather than rely exclusively on pre-processing adjustments or post hoc correction mechanisms. In this paper, we describe the construction of a multi-component reward function, the specification of the agents action space over feature subsets, and the integration of this system with ensemble learning. We aim to provide a flexible and generalizable way to select features in environments where predictors are correlated and biases can inadvertently re-emerge.
( 2
min )
arXiv:2510.09902v1 Announce Type: cross
Abstract: This essay develops a parallel between the Fundamental Theorem of Galois Theory and the Stone--Weierstrass theorem: both can be viewed as assertions that tie the distinguishing power of a class of objects to their expressive power. We provide an elementary theorem connecting the relevant notions of "distinguishing power". We also discuss machine learning and data science contexts in which these theorems, and more generally the theme of links between distinguishing power and expressive power, appear. Finally, we discuss the same theme in the context of linguistics, where it appears as a foundational principle, and illustrate it with several examples.
( 2
min )
In this post, we explore how Physical AI represents the next frontier in intelligent automation, where artificial intelligence transcends digital boundaries to perceive, understand, and manipulate the tangible world around us.
( 38
min )
In this post, we demonstrate the development of a conceptual Medical Reports Analysis Dashboard that combines Amazon Bedrock AI capabilities, LangChain's document processing, and Streamlit's interactive visualization features. The solution transforms complex medical data into accessible insights through a context-aware chat system powered by large language models available through Amazon Bedrock and dynamic visualizations of health parameters.
( 39
min )
In this post, we'll show how Kitsa, a health-tech company specializing in AI-driven clinical trial recruitment and site selection, used Amazon Quick Automate to transform their clinical trial site selection solution. Amazon Quick Automate, a capability of Amazon Quick Suite, enables enterprises to build, deploy and maintain resilient workflow automations at scale.
( 38
min )
In this post, we explore how Amazon Quick Suite's Model Context Protocol (MCP) client enables secure, standardized connections to enterprise applications and AI agents, eliminating the need for complex custom integrations. You'll discover how to set up MCP Actions integrations with popular enterprise tools like Atlassian Jira and Confluence, AWS Knowledge MCP Server, and Amazon Bedrock AgentCore Gateway to create a collaborative environment where people and AI agents can seamlessly work together across your organization's data and applications.
( 42
min )
Learn why customers choose AgentCore to build secure, reliable AI solutions using their choice of frameworks and models for production workloads.
( 39
min )
At the OCP Global Summit, NVIDIA is offering a glimpse into the future of gigawatt AI factories. NVIDIA will unveil specs of the NVIDIA Vera Rubin NVL144 MGX-generation open architecture rack servers, which more than 50 MGX partners are gearing up for along with ecosystem support for NVIDIA Kyber, which connects 576 Rubin Ultra GPUs,
Read Article
( 8
min )
arXiv:2510.08722v1 Announce Type: new
Abstract: Instance discrimination is a self-supervised representation learning paradigm wherein individual instances within a dataset are treated as distinct classes. This is typically achieved by generating two disparate views of each instance by applying stochastic transformations, which encourages the model to learn representations that are invariant to the common underlying object across these views.
( 2
min )
arXiv:2510.08944v1 Announce Type: new
Abstract: Real-world time series data exhibit non-stationary behavior, regime shifts, and temporally varying noise (heteroscedastic) that degrade the robustness of standard regression models. We introduce the Variability-Aware Recursive Neural Network (VARNN), a novel residual-aware architecture for supervised time-series regression that learns an explicit error memory from recent prediction residuals and uses it to recalibrate subsequent predictions. VARNN augments a feed-forward predictor with a learned error-memory state that is updated from residuals over a short context steps as a signal of variability and drift, and then conditions the final prediction at the current time step. Across diverse dataset domains, appliance energy, healthcare, and environmental monitoring, experimental results demonstrate VARNN achieves superior performance and attains lower test MSE with minimal computational overhead over static, dynamic, and recurrent baselines. Our findings show that the VARNN model offers robust predictions under a drift and volatility environment, highlighting its potential as a promising framework for time-series learning.
( 2
min )
arXiv:2510.08610v1 Announce Type: cross
Abstract: Code completion can help developers improve efficiency and ease the development lifecycle. Although code completion is available in modern integrated development environments (IDEs), research lacks in determining what makes a good context for code completion based on the information available to the IDEs for the large language models (LLMs) to perform better. In this paper, we describe an effective context collection strategy to assist the LLMs in performing better at code completion tasks. The key idea of our strategy is to preprocess the repository into smaller code chunks and later use syntactic and semantic similarity-based code chunk retrieval with relative positioning. We found that code chunking and relative positioning of the chunks in the final context improve the performance of code completion tasks.
( 2
min )
arXiv:2510.09042v1 Announce Type: cross
Abstract: In this work, we propose a meta-learning-based Koopman modeling and predictive control approach for nonlinear systems with parametric uncertainties. An adaptive deep meta-learning-based modeling approach, called Meta Adaptive Koopman Operator (MAKO), is proposed. Without knowledge of the parametric uncertainty, the proposed MAKO approach can learn a meta-model from a multi-modal dataset and efficiently adapt to new systems with previously unseen parameter settings by using online data. Based on the learned meta Koopman model, a predictive control scheme is developed, and the stability of the closed-loop system is ensured even in the presence of previously unseen parameter settings. Through extensive simulations, our proposed approach demonstrates superior performance in both modeling accuracy and control efficacy as compared to competitive baselines.
( 2
min )
arXiv:2502.06379v3 Announce Type: replace
Abstract: A recent line of research has exploited pre-trained generative diffusion models as priors for solving Bayesian inverse problems. We contribute to this research direction by designing a sequential Monte Carlo method for linear-Gaussian inverse problems which builds on "decoupled diffusion", where the generative process is designed such that larger updates to the sample are possible. The method is asymptotically exact and we demonstrate the effectiveness of our Decoupled Diffusion Sequential Monte Carlo (DDSMC) algorithm on both synthetic as well as protein and image data. Further, we demonstrate how the approach can be extended to discrete data.
( 2
min )
arXiv:2509.03709v3 Announce Type: replace
Abstract: We provide our perspective on X-Learning (XL), a novel distributed learning architecture that generalizes and extends the concept of decentralization. Our goal is to present a vision for XL, introducing its unexplored design considerations and degrees of freedom. To this end, we shed light on the intuitive yet non-trivial connections between XL, graph theory, and Markov chains. We also present a series of open research directions to stimulate further research.
( 2
min )
arXiv:2405.04147v2 Announce Type: replace-cross
Abstract: Most of the recent results in polynomial functional regression have been focused on an in-depth exploration of single-parameter regularization schemes. In contrast, in this study we go beyond that framework by introducing an algorithm for multiple parameter regularization and presenting a theoretically grounded method for dealing with the associated parameters. This method facilitates the aggregation of models with varying regularization parameters. The efficacy of the proposed approach is assessed through evaluations on both synthetic and some real-world medical data, revealing promising results.
( 2
min )
arXiv:2501.17122v3 Announce Type: replace-cross
Abstract: The two-timescale gradient descent-ascent (GDA) is a canonical gradient algorithm designed to find Nash equilibria in min-max games. We analyze the two-timescale GDA by investigating the effects of learning rate ratios on convergence behavior in both finite-dimensional and mean-field settings. In particular, for finite-dimensional quadratic min-max games, we obtain long-time convergence in near quasi-static regimes through the hypocoercivity method. For mean-field GDA dynamics, we investigate convergence under a finite-scale ratio using a mixed synchronous-reflection coupling technique.
( 2
min )
arXiv:2405.04147v2 Announce Type: replace
Abstract: Most of the recent results in polynomial functional regression have been focused on an in-depth exploration of single-parameter regularization schemes. In contrast, in this study we go beyond that framework by introducing an algorithm for multiple parameter regularization and presenting a theoretically grounded method for dealing with the associated parameters. This method facilitates the aggregation of models with varying regularization parameters. The efficacy of the proposed approach is assessed through evaluations on both synthetic and some real-world medical data, revealing promising results.
( 2
min )
arXiv:2502.06379v3 Announce Type: replace-cross
Abstract: A recent line of research has exploited pre-trained generative diffusion models as priors for solving Bayesian inverse problems. We contribute to this research direction by designing a sequential Monte Carlo method for linear-Gaussian inverse problems which builds on "decoupled diffusion", where the generative process is designed such that larger updates to the sample are possible. The method is asymptotically exact and we demonstrate the effectiveness of our Decoupled Diffusion Sequential Monte Carlo (DDSMC) algorithm on both synthetic as well as protein and image data. Further, we demonstrate how the approach can be extended to discrete data.
( 2
min )
arXiv:2510.07663v1 Announce Type: new
Abstract: Predicting long-term loan defaults is hard because borrower behavior often changes and data distributions shift over time. This paper presents HYDRA-EI, a hybrid ensemble incremental learning framework. It uses several stages of feature processing and combines multiple models. The framework builds relational, cross, and frequency-based features. It uses graph attention, automatic cross-feature creation, and transformations from the frequency domain. HYDRA-EI updates weekly using new data and adjusts the model weights with a simple performance-based method. It works without frequent manual changes or fixed retraining. HYDRA-EI improves model stability and generalization, which makes it useful for long-term credit risk tasks.
( 2
min )
arXiv:2510.07919v1 Announce Type: new
Abstract: Overall architecture of the personalized multi-objective ranking system. It comprises: (1) a Feature Center and Prerank Model for initial feature processing and candidate generation; (2) a Multi-Task Learning (MTL) model predicting various user feedback signals; (3) a Multi-Task Fusion (MTF) module (our proposed GRADE framework) that learns personalized weights ($w_1, \dots, w_n$); these weights are then applied to calculate final scores and sorted to generate a blended ranking by the Blended Ranking Model, which ultimately delivers results to users.
( 2
min )
arXiv:2510.08295v1 Announce Type: new
Abstract: Conventional time-series generation often ignores domain-specific physical constraints, limiting statistical and physical consistency. We propose a hierarchical framework that embeds the inherent hierarchy of physical laws-conservation, dynamics, boundary, and empirical relations-directly into deep generative models, introducing a new paradigm of physics-informed inductive bias. Our method combines Fourier Neural Operators (FNOs) for learning physical operators with Conditional Flow Matching (CFM) for probabilistic generation, integrated via time-dependent hierarchical constraints and FNO-guided corrections. Experiments on harmonic oscillators, human activity recognition, and lithium-ion battery degradation show 16.3% higher generation quality, 46% fewer physics violations, and 18.5% improved predictive accuracy over baselines.
( 2
min )
arXiv:2510.08382v1 Announce Type: new
Abstract: In this paper we will give a characterization of the learnability of forgiving 0-1 loss functions in the finite label multiclass setting. To do this, we create a new combinatorial dimension that is based off of the Natarajan Dimension \citep{natarajan1989learning} and we show that a hypothesis class is learnable in our setting if and only if this Generalized Natarajan Dimension is finite. We also show a connection to learning with set-valued feedback. Through our results we show that the learnability of a set learning problem is characterized by the Natarajan Dimension.
( 2
min )
arXiv:2510.08413v1 Announce Type: new
Abstract: Many prompt engineering techniques have been successful in practice, even when optimizing over a large prompt space with with a small amount of task-specific data. Recent work has partially explained this success by showing generalization bounds which apply PAC-Bayes theory to the discrete prompt space, but they are non-vacuous only in data-rich scenarios. We argue that such widespread success can be more fully explained through more carefully considering data- or distribution-dependent perplexity, which acts as an effective prior and steers the optimization towards prompts that are more ``natural'' for the task at hand. We derive novel generalization bounds that are non-vacuous for data-scarce prompt optimization via more useful priors, formally analyzing how perplexity regularization tightens these bounds by limiting exploration. Empirically, we explore both the bounds' effectiveness and the practical benefits of perplexity regularization in improving prompt generalization.
( 2
min )
arXiv:2510.08549v1 Announce Type: new
Abstract: We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.
( 2
min )
arXiv:2510.07501v1 Announce Type: cross
Abstract: Truncation by death, a prevalent challenge in critical care, renders traditional dynamic treatment regime (DTR) evaluation inapplicable due to ill-defined potential outcomes. We introduce a principal stratification-based method, focusing on the always-survivor value function. We derive a semiparametrically efficient, multiply robust estimator for multi-stage DTRs, demonstrating its robustness and efficiency. Empirical validation and an application to electronic health records showcase its utility for personalized treatment optimization.
( 2
min )
arXiv:2510.07871v1 Announce Type: cross
Abstract: In this report, we describe the technical details of our submission to the IROS 2025 RoboSense Challenge Social Navigation Track. This track focuses on developing RGBD-based perception and navigation systems that enable autonomous agents to navigate safely, efficiently, and socially compliantly in dynamic human-populated indoor environments. The challenge requires agents to operate from an egocentric perspective using only onboard sensors including RGB-D observations and odometry, without access to global maps or privileged information, while maintaining social norm compliance such as safe distances and collision avoidance. Building upon the Falcon model, we introduce a Proactive Risk Perception Module to enhance social navigation performance. Our approach augments Falcon with collision risk understanding that learns to predict distance-based collision risk scores for surrounding humans, which enables the agent to develop more robust spatial awareness and proactive collision avoidance behaviors. The evaluation on the Social-HM3D benchmark demonstrates that our method improves the agent's ability to maintain personal space compliance while navigating toward goals in crowded indoor scenes with dynamic human agents, achieving 2nd place among 16 participating teams in the challenge.
( 3
min )
arXiv:2510.08573v1 Announce Type: cross
Abstract: We construct a neural network to perform regression on the local dark-matter density field given line-of-sight peculiar velocities of dark-matter halos, biased tracers of the dark matter field. Our architecture combines a convolutional U-Net with a point-cloud DeepSets. This combination enables efficient use of small-scale information and improves reconstruction quality relative to a U-Net-only approach. Specifically, our hybrid network recovers both clustering amplitudes and phases better than the U-Net on small scales.
( 2
min )
arXiv:2505.01670v2 Announce Type: replace-cross
Abstract: This work introduces a novel approach to fMRI-based visual image reconstruction using a subject-agnostic common representation space. We show that the brain signals of the subjects can be aligned in this common space during training to form a semantically aligned common brain. This is leveraged to demonstrate that aligning subject-specific lightweight modules to a reference subject is significantly more efficient than traditional end-to-end training methods. Our approach excels in low-data scenarios. We evaluate our methods on different datasets, demonstrating that the common space is subject and dataset-agnostic.
( 2
min )
arXiv:2510.07501v1 Announce Type: new
Abstract: Truncation by death, a prevalent challenge in critical care, renders traditional dynamic treatment regime (DTR) evaluation inapplicable due to ill-defined potential outcomes. We introduce a principal stratification-based method, focusing on the always-survivor value function. We derive a semiparametrically efficient, multiply robust estimator for multi-stage DTRs, demonstrating its robustness and efficiency. Empirical validation and an application to electronic health records showcase its utility for personalized treatment optimization.
( 2
min )
arXiv:2510.08382v1 Announce Type: cross
Abstract: In this paper we will give a characterization of the learnability of forgiving 0-1 loss functions in the finite label multiclass setting. To do this, we create a new combinatorial dimension that is based off of the Natarajan Dimension \citep{natarajan1989learning} and we show that a hypothesis class is learnable in our setting if and only if this Generalized Natarajan Dimension is finite. We also show a connection to learning with set-valued feedback. Through our results we show that the learnability of a set learning problem is characterized by the Natarajan Dimension.
( 2
min )
arXiv:2510.08573v1 Announce Type: cross
Abstract: We construct a neural network to perform regression on the local dark-matter density field given line-of-sight peculiar velocities of dark-matter halos, biased tracers of the dark matter field. Our architecture combines a convolutional U-Net with a point-cloud DeepSets. This combination enables efficient use of small-scale information and improves reconstruction quality relative to a U-Net-only approach. Specifically, our hybrid network recovers both clustering amplitudes and phases better than the U-Net on small scales.
( 2
min )
Receiving the Robert A. Muh award, the technologist and author heralded a bright future for AI, breakthroughs in longevity, and more.
( 7
min )
NVIDIA Blackwell swept the new SemiAnalysis InferenceMAX v1 benchmarks, delivering the highest performance and best overall efficiency. InferenceMax v1 is the first independent benchmark to measure total cost of compute across diverse models and real-world scenarios. Best return on investment: NVIDIA GB200 NVL72 delivers unmatched AI factory economics — a $5 million investment generates $75
Read Article
( 8
min )
Microsoft Azure today announced the new NDv6 GB300 VM series, delivering the industry’s first supercomputing-scale production cluster of NVIDIA GB300 NVL72 systems, purpose-built for OpenAI’s most demanding AI inference workloads. This supercomputer-scale cluster features over 4,600 NVIDIA Blackwell Ultra GPUs connected via the NVIDIA Quantum-X800 InfiniBand networking platform. Microsoft’s unique systems approach applied radical engineering
Read Article
( 7
min )
Lock, load and stream — the battle is just beginning. EA’s highly anticipated Battlefield 6 is set to storm the cloud when it launches tomorrow with GeForce RTX 5080 power, delivering the high-intensity combat and heart-pounding chaos the series is known for. Catch it as part of the six games joining the cloud, including Bethesda’s
Read Article
( 7
min )
In this post, we demonstrate how to integrate Amazon SageMaker HyperPod with Anyscale platform to address critical infrastructure challenges in building and deploying large-scale AI models. The combined solution provides robust infrastructure for distributed AI workloads with high-performance hardware, continuous monitoring, and seamless integration with Ray, the leading AI compute engine, enabling organizations to reduce time-to-market and lower total cost of ownership.
( 40
min )
In this post, we introduce Amazon Nova customization for text content moderation through Amazon SageMaker AI, enabling organizations to fine-tune models for their specific moderation needs. The evaluation across three benchmarks shows that customized Nova models achieve an average improvement of 7.3% in F1 scores compared to the baseline Nova Lite, with individual improvements ranging from 4.2% to 9.2% across different content moderation tasks.
( 47
min )
arXiv:2510.06367v1 Announce Type: new
Abstract: Neural ODEs are a widely used, powerful machine learning technique in particular for physics. However, not every solution is physical in that it is an Euler-Lagrange equation. We present Helmholtz metrics to quantify this resemblance for a given ODE and demonstrate their capabilities on several fundamental systems with noise. We combine them with a second order neural ODE to form a Lagrangian neural ODE, which allows to learn Euler-Lagrange equations in a direct fashion and with zero additional inference cost. We demonstrate that, using only positional data, they can distinguish Lagrangian and non-Lagrangian systems and improve the neural ODE solutions.
( 2
min )
arXiv:2510.06632v1 Announce Type: new
Abstract: Non-Negative Matrix Factorization (NMF) is an unsupervised learning method offering low-rank representations across various domains such as audio processing, biomedical signal analysis, and image recognition. The incorporation of $\alpha$-divergence in NMF formulations enhances flexibility in optimization, yet extending these methods to multi-layer architectures presents challenges in ensuring convergence. To address this, we introduce a novel approach inspired by the Boltzmann probability of the energy barriers in chemical reactions to theoretically perform convergence analysis. We introduce a novel method, called Chem-NMF, with a bounding factor which stabilizes convergence. To our knowledge, this is the first study to apply a physical chemistry perspective to rigorously analyze the convergence behaviour of the NMF algorithm. We start from mathematically proven asymptotic convergence results and then show how they apply to real data. Experimental results demonstrate that the proposed algorithm improves clustering accuracy by 5.6% $\pm$ 2.7% on biomedical signals and 11.1% $\pm$ 7.2% on face images (mean $\pm$ std).
( 2
min )
arXiv:2510.06372v1 Announce Type: cross
Abstract: We provide an upper bound on the number of neurons required in a shallow
neural network to approximate a continuous function on a compact set with a
given accuracy. This method, inspired by a specific proof of the
Stone-Weierstrass theorem, is constructive and more general than previous
bounds of this character, as it applies to any continuous function on any
compact set.
( 2
min )
arXiv:2510.06440v1 Announce Type: cross
Abstract: The New York State Department of Transportation (NYSDOT) has a network of roadside traffic cameras that are used by both the NYSDOT and the public to observe road conditions. The NYSDOT evaluates road conditions by driving on roads and observing live cameras, tasks which are labor-intensive but necessary for making critical operational decisions during winter weather events. However, machine learning models can provide additional support for the NYSDOT by automatically classifying current road conditions across the state. In this study, convolutional neural networks and random forests are trained on camera images and weather data to predict road surface conditions. Models are trained on a hand-labeled dataset of ~22,000 camera images, each classified by human labelers into one of six road surface conditions: severe snow, snow, wet, dry, poor visibility, or obstructed. Model generalizability is prioritized to meet the operational needs of the NYSDOT decision makers, and the weather-related road surface condition model in this study achieves an accuracy of 81.5% on completely unseen cameras.
( 3
min )
arXiv:2510.06868v1 Announce Type: cross
Abstract: We consider image transmission via deep joint source-channel coding (DeepJSCC) over multi-hop additive white Gaussian noise (AWGN) channels by training a DeepJSCC encoder-decoder pair with a pre-trained deep hash distillation (DHD) module to semantically cluster images, facilitating security-oriented applications through enhanced semantic consistency and improving the perceptual reconstruction quality. We train the DeepJSCC module to both reduce mean square error (MSE) and minimize cosine distance between DHD hashes of source and reconstructed images. Significantly improved perceptual quality as a result of semantic alignment is illustrated for different multi-hop settings, for which classical DeepJSCC may suffer from noise accumulation, measured by the learned perceptual image patch similarity (LPIPS) metric.
( 2
min )
arXiv:2502.07939v2 Announce Type: replace-cross
Abstract: This paper introduces Discrete Markov Probabilistic Models (DMPMs), a novel discrete diffusion algorithm for discrete data generation. The algorithm operates in discrete bit space, where the noising process is a continuous-time Markov chain that flips labels uniformly at random. The time-reversal process, like the forward noise process, is a jump process with its intensity governed by a discrete analogue of the classical score function. Crucially, this intensity is proven to be the conditional expectation of a function of the forward process, underlining theoretical alignment with score-based generative models. We establish convergence bounds for the algorithm under minimal assumptions, ensuring robustness and efficiency, which we demonstrate through experiments on low-dimensional Bernoulli-distributed datasets and high-dimensional binary MNIST data. The results highlight competitive performance in generating discrete structures compared to the state-of-the-art. This work bridges theoretical foundations and practical applications, advancing the development of effective and theoretically grounded discrete generative modeling.
( 2
min )
arXiv:2505.22296v2 Announce Type: replace-cross
Abstract: Adding sequence parallelism into LLaMA-Factory, we open-sourced 360-LLaMA-Factory at https://github.com/Qihoo360/360-LLaMA-Factory. 360-LLaMA-Factory has received wide recognition and used in models such as Light-R1 arXiv:2503.10460, TinyR1 arXiv:2503.04872, Kaggle AIMO math models and also in large companies' training frameworks. This technical report delves deeper into the different sequence parallel modes behind 360-LLaMA-Factory and discusses our implementation insights.
( 2
min )
arXiv:2508.08833v2 Announce Type: replace-cross
Abstract: In this paper, we introduce a systematic framework beyond conventional method to assess LLMs' mathematical-reasoning robustness by stress-testing them on advanced math problems that are mathematically equivalent but with linguistic and parametric variation. These transformations allow us to measure the sensitivity of LLMs to non-mathematical perturbations, thereby enabling a more accurate evaluation of their mathematical reasoning capabilities. Using this new evaluation methodology, we created PutnamGAP, a new benchmark dataset with multiple mathematically-equivalent variations of competition-level math problems. With the new dataset, we evaluate multiple families of representative LLMs and examine their robustness. Across 18 commercial and open-source models we observe sharp performance degradation on the variants. OpenAI's flagship reasoning model, O3, scores 51.5% on the originals but drops by 4.7 percentage points on surface-renaming variants, and by 12.9 percentage points on parametric variants, while smaller models fare far worse. Overall, the results show that the proposed new evaluation methodology is effective for deepening our understanding of the robustness of LLMs and generating new insights for further improving their mathematical reasoning capabilities.
( 3
min )
arXiv:2510.06372v1 Announce Type: new
Abstract: We provide an upper bound on the number of neurons required in a shallow
neural network to approximate a continuous function on a compact set with a
given accuracy. This method, inspired by a specific proof of the
Stone-Weierstrass theorem, is constructive and more general than previous
bounds of this character, as it applies to any continuous function on any
compact set.
( 2
min )
arXiv:2502.07939v2 Announce Type: replace
Abstract: This paper introduces Discrete Markov Probabilistic Models (DMPMs), a novel discrete diffusion algorithm for discrete data generation. The algorithm operates in discrete bit space, where the noising process is a continuous-time Markov chain that flips labels uniformly at random. The time-reversal process, like the forward noise process, is a jump process with its intensity governed by a discrete analogue of the classical score function. Crucially, this intensity is proven to be the conditional expectation of a function of the forward process, underlining theoretical alignment with score-based generative models. We establish convergence bounds for the algorithm under minimal assumptions, ensuring robustness and efficiency, which we demonstrate through experiments on low-dimensional Bernoulli-distributed datasets and high-dimensional binary MNIST data. The results highlight competitive performance in generating discrete structures compared to the state-of-the-art. This work bridges theoretical foundations and practical applications, advancing the development of effective and theoretically grounded discrete generative modeling.
( 2
min )
The MIT–MBZUAI Collaborative Research Program will unite faculty and students from both institutions to advance AI and accelerate its use in pressing scientific and societal challenges.
( 4
min )
New tool from MIT CSAIL creates realistic virtual kitchens and living rooms where simulated robots can interact with models of real-world objects, scaling up training data for robot foundation models.
( 8
min )
Telecommunication networks are critical infrastructure for every nation, underpinning the global flow of communications and data across billions of smartphones and connected devices. The next generation of these networks, known as 6G, will for the first time be built to support AI traffic — and can be powered by AI from the ground up. The
Read Article
( 8
min )
In this post, we show how Vxceed used Amazon Bedrock to develop this AI-powered multi-agent solution that generates personalized sales pitches for field sales teams at scale.
( 39
min )
Machine learning operations (MLOps) is the combination of people, processes, and technology to productionize ML use cases efficiently. To achieve this, enterprise customers must develop MLOps platforms to support reproducibility, robustness, and end-to-end observability of the ML use case’s lifecycle. Those platforms are based on a multi-account setup by adopting strict security constraints, development best […]
( 40
min )
arXiv:2510.06025v1 Announce Type: new
Abstract: Out-of-Distribution (OOD) detection is critical to AI reliability and safety, yet in many practical settings, only a limited amount of training data is available. Bayesian Neural Networks (BNNs) are a promising class of model on which to base OOD detection, because they explicitly represent epistemic (i.e. model) uncertainty. In the small training data regime, BNNs are especially valuable because they can incorporate prior model information. We introduce a new family of Bayesian posthoc OOD scores based on expected logit vectors, and compare 5 Bayesian and 4 deterministic posthoc OOD scores. Experiments on MNIST and CIFAR-10 In-Distributions, with 5000 training samples or less, show that the Bayesian methods outperform corresponding deterministic methods.
( 2
min )
arXiv:2510.06028v1 Announce Type: new
Abstract: The paper provides data-dependent bounds on the test error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such as random labels in classification. The bounds are stable under approximation with Langevin Monte Carlo algorithms. Experiments on the MNIST and CIFAR-10 datasets verify that the bounds yield nontrivial predictions on true labeled data and correctly upper bound the test error for random labels. Our method indicates that generalization in the low-temperature, interpolation regime is already signaled by small training errors in the more classical high temperature regime.
( 2
min )
arXiv:2510.06029v1 Announce Type: new
Abstract: We introduce molFTP (molecular fragment-target prevalence), a compact representation that delivers strong predictive performance. To prevent feature leakage across cross-validation folds, we implement a dummy-masking procedure that removes information about fragments present in the held-out molecules. We further show that key leave-one-out (key-loo) closely approximates true molecule-level leave-one-out (LOO), with deviation below 8% on our datasets. This enables near full data training while preserving unbiased cross-validation estimates of model performance. Overall, molFTP provides a fast, leakage-resistant fragment-target prevalence vectorization with practical safeguards (dummy masking or key-LOO) that approximate LOO at a fraction of its cost.
( 2
min )
arXiv:2510.06106v1 Announce Type: new
Abstract: Deep neural networks have achieved remarkable success, yet our understanding of how they learn remains limited. These models can learn high-dimensional tasks, which is generally statistically intractable due to the curse of dimensionality. This apparent paradox suggests that learnable data must have an underlying latent structure. What is the nature of this structure? How do neural networks encode and exploit it, and how does it quantitatively impact performance - for instance, how does generalization improve with the number of training examples? This thesis addresses these questions by studying the roles of locality and compositionality in data, tasks, and deep learning representations.
( 2
min )
arXiv:2510.06165v1 Announce Type: new
Abstract: Feature attributions are post-training analysis methods that assess how various input features of a machine learning model contribute to an output prediction. Their interpretation is straightforward when features act independently, but becomes less direct when the predictive model involves interactions such as multiplicative relationships or joint feature contributions. In this work, we propose a general theory of higher-order feature attribution, which we develop on the foundation of Integrated Gradients (IG). This work extends existing frameworks in the literature on explainable AI. When using IG as the method of feature attribution, we discover natural connections to statistics and topological signal processing. We provide several theoretical results that establish the theory, and we validate our theory on a few examples.
( 2
min )
arXiv:2510.05121v1 Announce Type: cross
Abstract: This study investigates the effectiveness of Large Language Models (LLMs) for the extraction of structured knowledge in the form of Subject-Predicate-Object triples. We apply the setup for the domain of Economics application. The findings can be applied to a wide range of scenarios, including the creation of economic trade knowledge graphs from natural language legal trade agreement texts. As a use case, we apply the model to regional trade agreement texts to extract trade-related information triples. In particular, we explore the zero-shot, one-shot and few-shot prompting techniques, incorporating positive and negative examples, and evaluate their performance based on quantitative and qualitative metrics. Specifically, we used Llama 3.1 model to process the unstructured regional trade agreement texts and extract triples. We discuss key insights, challenges, and potential future directions, emphasizing the significance of language models in economic applications.
( 2
min )
arXiv:2505.24085v2 Announce Type: replace
Abstract: Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with elevated health risks, where timely detection is pivotal for mitigating stroke-related morbidity. This study introduces an innovative hybrid methodology integrating unsupervised deep learning and gradient boosting models to improve AF detection. A 19-layer deep convolutional autoencoder (DCAE) is coupled with three boosting classifiers-AdaBoost, XGBoost, and LightGBM (LGBM)-to harness their complementary advantages while addressing individual limitations. The proposed framework uniquely combines DCAE with gradient boosting, enabling end-to-end AF identification devoid of manual feature extraction. The DCAE-LGBM model attains an F1-score of 95.20%, sensitivity of 99.99%, and inference latency of four seconds, outperforming existing methods and aligning with clinical deployment requirements. The DCAE integration significantly enhances boosting models, positioning this hybrid system as a reliable tool for automated AF detection in clinical settings.
( 2
min )
arXiv:2412.10513v2 Announce Type: replace-cross
Abstract: Decision trees are a popular machine learning method, known for their inherent explainability. In Explainable AI, decision trees can be used as surrogate models for complex black box AI models or as approximations of parts of such models. A key challenge of this approach is determining how accurately the extracted decision tree represents the original model and to what extent it can be trusted as an approximation of their behavior. In this work, we investigate the use of the Probably Approximately Correct (PAC) framework to provide a theoretical guarantee of fidelity for decision trees extracted from AI models. Based on theoretical results from the PAC framework, we adapt a decision tree algorithm to ensure a PAC guarantee under certain conditions. We focus on binary classification and conduct experiments where we extract decision trees from BERT-based language models with PAC guarantees. Our results indicate occupational gender bias in these models.
( 3
min )
arXiv:2505.12128v2 Announce Type: replace-cross
Abstract: Matrix factorization mechanisms for differentially private training have emerged as a promising approach to improve model utility under privacy constraints. In practical settings, models are typically trained over multiple epochs, requiring matrix factorizations that account for repeated participation. Existing theoretical upper and lower bounds on multi-epoch factorization error leave a significant gap. In this work, we introduce a new explicit factorization method, Banded Inverse Square Root (BISR), which imposes a banded structure on the inverse correlation matrix. This factorization enables us to derive an explicit and tight characterization of the multi-epoch error. We further prove that BISR achieves asymptotically optimal error by matching the upper and lower bounds. Empirically, BISR performs on par with state-of-the-art factorization methods, while being simpler to implement, computationally efficient, and easier to analyze.
( 2
min )
arXiv:2506.06299v3 Announce Type: replace-cross
Abstract: Public opinion manipulation has entered a new phase, amplifying its roots in rhetoric and propaganda. Advances in large language models (LLMs) and autonomous agents now let influence campaigns reach unprecedented scale and precision. Researchers warn AI could foster mass manipulation. Generative tools can expand propaganda output without sacrificing credibility and inexpensively create election falsehoods that are rated as more human-like than those written by humans. Techniques meant to refine AI reasoning, such as chain-of-thought prompting, can just as effectively be used to generate more convincing falsehoods. Enabled by these capabilities, another disruptive threat is emerging: swarms of collaborative, malicious AI agents. Fusing LLM reasoning with multi-agent architectures, these systems are capable of coordinating autonomously, infiltrating communities, and fabricating consensus cheaply. By adaptively mimicking human social dynamics, they threaten democracy.
( 3
min )
arXiv:2510.05338v1 Announce Type: cross
Abstract: In this review, we assess the use of Bayesian methods in model predictive control (MPC), focusing on neural-network-based modeling, control design, and uncertainty quantification. We systematically analyze individual studies and how they are implemented in practice. While Bayesian approaches are increasingly adopted to capture and propagate uncertainty in MPC, reported gains in performance and robustness remain fragmented, with inconsistent baselines and limited reliability analyses. We therefore argue for standardized benchmarks, ablation studies, and transparent reporting to rigorously determine the effectiveness of Bayesian techniques for MPC.
( 2
min )
arXiv:2510.06025v1 Announce Type: cross
Abstract: Out-of-Distribution (OOD) detection is critical to AI reliability and safety, yet in many practical settings, only a limited amount of training data is available. Bayesian Neural Networks (BNNs) are a promising class of model on which to base OOD detection, because they explicitly represent epistemic (i.e. model) uncertainty. In the small training data regime, BNNs are especially valuable because they can incorporate prior model information. We introduce a new family of Bayesian posthoc OOD scores based on expected logit vectors, and compare 5 Bayesian and 4 deterministic posthoc OOD scores. Experiments on MNIST and CIFAR-10 In-Distributions, with 5000 training samples or less, show that the Bayesian methods outperform corresponding deterministic methods.
( 2
min )
arXiv:2510.06028v1 Announce Type: cross
Abstract: The paper provides data-dependent bounds on the test error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such as random labels in classification. The bounds are stable under approximation with Langevin Monte Carlo algorithms. Experiments on the MNIST and CIFAR-10 datasets verify that the bounds yield nontrivial predictions on true labeled data and correctly upper bound the test error for random labels. Our method indicates that generalization in the low-temperature, interpolation regime is already signaled by small training errors in the more classical high temperature regime.
( 2
min )
arXiv:2510.06106v1 Announce Type: cross
Abstract: Deep neural networks have achieved remarkable success, yet our understanding of how they learn remains limited. These models can learn high-dimensional tasks, which is generally statistically intractable due to the curse of dimensionality. This apparent paradox suggests that learnable data must have an underlying latent structure. What is the nature of this structure? How do neural networks encode and exploit it, and how does it quantitatively impact performance - for instance, how does generalization improve with the number of training examples? This thesis addresses these questions by studying the roles of locality and compositionality in data, tasks, and deep learning representations.
( 2
min )
arXiv:2510.06165v1 Announce Type: cross
Abstract: Feature attributions are post-training analysis methods that assess how various input features of a machine learning model contribute to an output prediction. Their interpretation is straightforward when features act independently, but becomes less direct when the predictive model involves interactions such as multiplicative relationships or joint feature contributions. In this work, we propose a general theory of higher-order feature attribution, which we develop on the foundation of Integrated Gradients (IG). This work extends existing frameworks in the literature on explainable AI. When using IG as the method of feature attribution, we discover natural connections to statistics and topological signal processing. We provide several theoretical results that establish the theory, and we validate our theory on a few examples.
( 2
min )
Assistant Professor Priya Donti’s research applies machine learning to optimize renewable energy.
( 6
min )
The approach combines physics and machine learning to avoid damaging disruptions when powering down tokamak fusion machines.
( 7
min )
Incorporating machine learning, MIT engineers developed a way to 3D print alloys that are much stronger than conventionally manufactured versions.
( 7
min )
In this post, we demonstrate how Amazon Nova Act automates QuickSight data story creation, saving time so you can focus on making critical, data-driven business decisions.
( 37
min )
In this post, we demonstrated how a financial services company can use an FM to process large volumes of customer records and get specific data-driven product recommendations. We also showed how to implement an automated monitoring solution for Amazon Bedrock batch inference jobs. By using EventBridge, Lambda, and DynamoDB, you can gain real-time visibility into batch processing operations, so you can efficiently generate personalized product recommendations based on customer credit data.
( 39
min )
arXiv:2510.03284v1 Announce Type: new
Abstract: This paper proposes Edge-FIT (Federated Instruction Tuning on the Edge), a scalable framework for Federated Instruction Tuning (FIT) of Large Language Models (LLMs). Traditional Federated Learning (TFL) methods, like FedAvg, fail when confronted with the massive parameter size of LLMs [3], [6]. Our Edge-FIT framework combines federated learning with 4-bit Quantized Low-Rank Adaptation (QLORA), mitigating the core issues of communication and computational overhead. We demonstrate this by filtering the general-purpose Databricks Dolly 15k dataset for the IoT domain. Experimental results show the Edge-FIT tuned Llama 2(7B) achieves an F1-Score of 0.89. We also demonstrate a viable trade-off using the 3.8B Phi-3-mini model, validating Edge-FIT as a scalable framework for decentralized LLM deployment on home compute gateways.
( 2
min )
arXiv:2510.03298v1 Announce Type: new
Abstract: We introduce Constraint-Aware Federated Learning with Lagrangian Dual Optimization (CAFL-L), a principled extension of FedAvg that explicitly incorporates device-level resource constraints including energy, communication, memory, and thermal budgets. CAFL-L employs Lagrangian dual optimization to dynamically adapt training hyperparameters -- freezing depth, local steps, batch size, and communication compression -- while preserving training stability through token-budget preservation via gradient accumulation. Experiments on a character-level language model demonstrate that CAFL-L achieves superior constraint satisfaction compared to standard FedAvg (reducing memory usage by 20% and communication by 95%) while maintaining competitive validation performance, making it practical for deployment on resource-constrained edge devices.
( 2
min )
arXiv:2510.03310v1 Announce Type: new
Abstract: LLMs are emerging tools for simulating human behavior in business, economics, and social science, offering a lower-cost complement to laboratory experiments, field studies, and surveys. This paper evaluates how well LLMs replicate human behavior in operations management. Using nine published experiments in behavioral operations, we assess two criteria: replication of hypothesis-test outcomes and distributional alignment via Wasserstein distance. LLMs reproduce most hypothesis-level effects, capturing key decision biases, but their response distributions diverge from human data, including for strong commercial models. We also test two lightweight interventions -- chain-of-thought prompting and hyperparameter tuning -- which reduce misalignment and can sometimes let smaller or open-source models match or surpass larger systems.
( 2
min )
arXiv:2510.03394v1 Announce Type: new
Abstract: Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training large language models (LLMs) with stronger reasoning abilities. It has also been applied to a variety of logic puzzles. In this work, we study the Korean word-chain game using RLVR. We show that rule-derived rewards can naturally conflict, and demonstrate through experiments that a curriculum-learning scheme mitigates these conflicts. Our findings motivate further studies of puzzle tasks in diverse languages.
( 2
min )
arXiv:2510.03494v1 Announce Type: new
Abstract: We study finite-horizon offline reinforcement learning (RL) with function approximation for both policy evaluation and policy optimization. Prior work established that statistically efficient learning is impossible for either of these problems when the only assumptions are that the data has good coverage (concentrability) and the state-action value function of every policy is linearly realizable ($q^\pi$-realizability) (Foster et al., 2021). Recently, Tkachuk et al. (2024) gave a statistically efficient learner for policy optimization, if in addition the data is assumed to be given as trajectories. In this work we present a statistically efficient learner for policy evaluation under the same assumptions. Further, we show that the sample complexity of the learner used by Tkachuk et al. (2024) for policy optimization can be improved by a tighter analysis.
( 2
min )
arXiv:2510.04020v1 Announce Type: new
Abstract: To address the dual challenges of inherent stochasticity and non-differentiable metrics in physical spatiotemporal forecasting, we propose Spatiotemporal Forecasting as Planning (SFP), a new paradigm grounded in Model-Based Reinforcement Learning. SFP constructs a novel Generative World Model to simulate diverse, high-fidelity future states, enabling an "imagination-based" environmental simulation. Within this framework, a base forecasting model acts as an agent, guided by a beam search-based planning algorithm that leverages non-differentiable domain metrics as reward signals to explore high-return future sequences. These identified high-reward candidates then serve as pseudo-labels to continuously optimize the agent's policy through iterative self-training, significantly reducing prediction error and demonstrating exceptional performance on critical domain metrics like capturing extreme events.
( 2
min )
arXiv:2510.04088v1 Announce Type: new
Abstract: This article introduces the theory of offline reinforcement learning in large state spaces, where good policies are learned from historical data without online interactions with the environment. Key concepts introduced include expressivity assumptions on function approximation (e.g., Bellman completeness vs. realizability) and data coverage (e.g., all-policy vs. single-policy coverage). A rich landscape of algorithms and results is described, depending on the assumptions one is willing to make and the sample and computational complexity guarantees one wishes to achieve. We also discuss open questions and connections to adjacent areas.
( 2
min )
arXiv:2510.04090v1 Announce Type: new
Abstract: Supervised learning (SL) methods are indispensable for neural network (NN) training used to perform classification tasks. While resulting in very high accuracy, SL training often requires making NN parameter number dependent on the number of classes, limiting their applicability when the number of classes is extremely large or unknown in advance. In this paper we propose a methodology that allows one to train the same NN architecture regardless of the number of classes. This is achieved by using predefined vector systems as the target latent space configuration (LSC) during NN training. We discuss the desired properties of target configurations and choose randomly perturbed vectors of An root system for our experiments. These vectors are used to successfully train encoders and visual transformers (ViT) on Cinic-10 and ImageNet-1K in low- and high-dimensional cases by matching NN predictions with the predefined vectors. Finally, ViT is trained on a dataset with 1.28 million classes illustrating the applicability of the method to training on datasets with extremely large number of classes. In addition, potential applications of LSC in lifelong learning and NN distillation are discussed illustrating versatility of the proposed methodology.
( 3
min )
arXiv:2510.04386v1 Announce Type: new
Abstract: Continuous glucose monitoring (CGM) generates dense data streams critical for diabetes management, but most used forecasting models lack interpretability for clinical use. We present SSM-CGM, a Mamba-based neural state-space forecasting model that integrates CGM and wearable activity signals from the AI-READI cohort. SSM-CGM improves short-term accuracy over a Temporal Fusion Transformer baseline, adds interpretability through variable selection and temporal attribution, and enables counterfactual forecasts simulating how planned changes in physiological signals (e.g., heart rate, respiration) affect near-term glucose. Together, these features make SSM-CGM an interpretable, physiologically grounded framework for personalized diabetes management.
( 2
min )
arXiv:2510.04927v1 Announce Type: new
Abstract: Training automatic modulation classification (AMC) models on centrally aggregated data raises privacy concerns, incurs communication overhead, and often fails to confer robustness to channel shifts. Federated learning (FL) avoids central aggregation by training on distributed clients but remains sensitive to class imbalance, non-IID client distributions, and limited labeled samples. We propose FedSSL-AMC, which trains a causal, time-dilated CNN with triplet-loss self-supervision on unlabeled I/Q sequences across clients, followed by per-client SVMs on small labeled sets. We establish convergence of the federated representation learning procedure and a separability guarantee for the downstream classifier under feature noise. Experiments on synthetic and over-the-air datasets show consistent gains over supervised FL baselines under heterogeneous SNR, carrier-frequency offsets, and non-IID label partitions.
( 2
min )
arXiv:2510.04974v1 Announce Type: new
Abstract: We present StructuralDecompose, an R package for modular and interpretable time series decomposition. Unlike existing approaches that treat decomposition as a monolithic process, StructuralDecompose separates the analysis into distinct components: changepoint detection, anomaly detection, smoothing, and decomposition. This design provides flexibility and robust- ness, allowing users to tailor methods to specific time series characteristics. We demonstrate the package on simulated and real-world datasets, benchmark its performance against state-of-the- art tools such as Rbeast and autostsm, and discuss its role in interpretable machine learning workflows.
( 2
min )
arXiv:2510.03725v1 Announce Type: cross
Abstract: While deep learning methods for detecting informal settlements have already been developed, they have not yet fully utilized the potential offered by recent pretrained neural networks. We compare two types of pretrained neural networks for detecting the favelas of Rio de Janeiro: 1. Generic networks pretrained on large diverse datasets of unspecific images, 2. A specialized network pretrained on satellite imagery. While the latter is more specific to the target task, the former has been pretrained on significantly more images. Hence, this research investigates whether task specificity or data volume yields superior performance in urban informal settlement detection.
( 2
min )
arXiv:2510.03728v1 Announce Type: cross
Abstract: Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance.
( 2
min )
arXiv:2510.04912v1 Announce Type: cross
Abstract: In Kigali, Rwanda, motorcycle taxis are a primary mode of transportation, often navigating unpredictably and disregarding traffic rules, posing significant challenges for autonomous driving systems. This study compares four object detection models--YOLOv5, Faster R-CNN, SSD, and RetinaNet--for motorbike detection using a custom dataset of 198 images collected in Kigali. Implemented in PyTorch with transfer learning, the models were evaluated for accuracy, localization, and inference speed to assess their suitability for real-time navigation in resource-constrained settings. We identify implementation challenges, including dataset limitations and model complexities, and recommend simplified architectures for future work to enhance accessibility for autonomous systems in developing countries like Rwanda.
( 2
min )
arXiv:2412.10246v2 Announce Type: replace
Abstract: Large language models (LLMs) frequently generate confident yet inaccurate responses, introducing significant risks for deployment in safety-critical domains. We present a novel, test-time approach to detecting model hallucination through systematic analysis of information flow across model layers. We target cases when LLMs process inputs with ambiguous or insufficient context. Our investigation reveals that hallucination manifests as usable information deficiencies in inter-layer transmissions. While existing approaches primarily focus on final-layer output analysis, we demonstrate that tracking cross-layer information dynamics ($\mathcal{L}$I) provides robust indicators of model reliability, accounting for both information gain and loss during computation. $\mathcal{L}$I integrates easily with pretrained LLMs without requiring additional training or architectural modifications.
( 2
min )
arXiv:2402.06674v5 Announce Type: replace-cross
Abstract: Membership inference attacks (MIAs) are used to test practical privacy of machine learning models. MIAs complement formal guarantees from differential privacy (DP) under a more realistic adversary model. We analyse MIA vulnerability of fine-tuned neural networks both empirically and theoretically, the latter using a simplified model of fine-tuning. We show that the vulnerability of non-DP models when measured as the attacker advantage at a fixed false positive rate reduces according to a simple power law as the number of examples per class increases. A similar power-law applies even for the most vulnerable points, but the dataset size needed for adequate protection of the most vulnerable points is very large.
( 2
min )
arXiv:2408.05606v2 Announce Type: replace-cross
Abstract: Recommender systems aim to estimate the dynamically changing user preferences and sequential dependencies between historical user behaviour and metadata. Although transformer-based models have proven to be effective in sequential recommendations, their state growth is proportional to the length of the sequence that is being processed, which makes them expensive in terms of memory and inference costs. Our research focused on three promising directions in sequential recommendations: enhancing speed through the use of State Space Models (SSM), as they can achieve SOTA results in the sequential recommendations domain with lower latency, memory, and inference costs, as proposed by arXiv:2403.03900 improving the quality of recommendations with Large Language Models (LLMs) via Monolithic Preference Optimization without Reference Model (ORPO); and implementing adaptive batch- and step-size algorithms to reduce costs and accelerate training processes.
( 2
min )
arXiv:2507.21726v2 Announce Type: replace-cross
Abstract: Tree tensor networks (TTNs) are widely used in low-rank approximation and quantum many-body simulation. In this work, we present a formal analysis of the differential geometry underlying TTNs. Building on this foundation, we develop efficient first- and second-order optimization algorithms that exploit the intrinsic quotient structure of TTNs. Additionally, we devise a backpropagation algorithm for training TTNs in a kernel learning setting. We validate our methods through numerical experiments on a representative machine learning task.
( 2
min )
arXiv:2508.04884v2 Announce Type: replace-cross
Abstract: In this work, we study the problem of choosing the discretisation schedule for sampling from masked discrete diffusion models in terms of the information geometry of the induced probability path. Specifically, we show that the optimal schedule under the Fisher-Rao geometry recovers the popularly-used cosine schedule.
( 2
min )
arXiv:2510.03494v1 Announce Type: cross
Abstract: We study finite-horizon offline reinforcement learning (RL) with function approximation for both policy evaluation and policy optimization. Prior work established that statistically efficient learning is impossible for either of these problems when the only assumptions are that the data has good coverage (concentrability) and the state-action value function of every policy is linearly realizable ($q^\pi$-realizability) (Foster et al., 2021). Recently, Tkachuk et al. (2024) gave a statistically efficient learner for policy optimization, if in addition the data is assumed to be given as trajectories. In this work we present a statistically efficient learner for policy evaluation under the same assumptions. Further, we show that the sample complexity of the learner used by Tkachuk et al. (2024) for policy optimization can be improved by a tighter analysis.
( 2
min )
arXiv:2510.04088v1 Announce Type: cross
Abstract: This article introduces the theory of offline reinforcement learning in large state spaces, where good policies are learned from historical data without online interactions with the environment. Key concepts introduced include expressivity assumptions on function approximation (e.g., Bellman completeness vs. realizability) and data coverage (e.g., all-policy vs. single-policy coverage). A rich landscape of algorithms and results is described, depending on the assumptions one is willing to make and the sample and computational complexity guarantees one wishes to achieve. We also discuss open questions and connections to adjacent areas.
( 2
min )
arXiv:2508.04884v2 Announce Type: replace
Abstract: In this work, we study the problem of choosing the discretisation schedule for sampling from masked discrete diffusion models in terms of the information geometry of the induced probability path. Specifically, we show that the optimal schedule under the Fisher-Rao geometry recovers the popularly-used cosine schedule.
( 2
min )
arXiv:2407.10214v3 Announce Type: replace-cross
Abstract: We identify a large class of positive-semidefinite kernels for which a certain polynomial rate of convergence of maximum mean discrepancies of Farey sequences is equivalent to the Riemann hypothesis. This class includes all Mat\'ern kernels of order at least one-half.
( 2
min )
arXiv:2509.01001v2 Announce Type: replace-cross
Abstract: Single-cell technologies provide an unprecedented opportunity for dissecting the interplay between the cancer cells and the associated tumor microenvironment, and the produced high-dimensional omics data should also augment existing survival modeling approaches for identifying tumor cell type-specific genes predictive of cancer patient survival. However, there is no statistical model to integrate multiscale data including individual-level survival data, multicellular-level cell composition data and cellular-level single-cell omics covariates. We propose a class of Bayesian generalized promotion time cure models (GPTCMs) for the multiscale data integration to identify cell-type-specific genes and improve cancer prognosis. We demonstrate with simulations in both low- and high-dimensional settings that the proposed Bayesian GPTCMs are able to identify cell-type-associated covariates and improve survival prediction.
( 2
min )
In this post, we demonstrate how PowerSchool built and deployed a custom content filtering solution using Amazon SageMaker AI that achieved better accuracy while maintaining low false positive rates. We walk through our technical approach to fine tuning Llama 3.1 8B, our deployment architecture, and the performance results from internal validations.
( 40
min )
Microsoft’s Eric Horvitz and guests Bruce Wittmann, Tessa Alexanian, and James Diggans discuss the Paraphrase Project—a red-teaming effort that exposed and secured a biosecurity vulnerability in AI-driven protein design. The work offers a model for addressing AI’s dual-use risks.
The post Ideas: More AI-resilient biosecurity with the Paraphrase Project appeared first on Microsoft Research.
( 36
min )
Microsoft researchers reveal a confidential research effort that explored how open-source AI tools could be used to bypass biosecurity checks—and helped create fixes now influencing global standards.
The post When AI Meets Biology: Promise, Risk, and Responsibility appeared first on Microsoft Research.
( 11
min )
arXiv:2510.02763v1 Announce Type: new
Abstract: We present a self-supervised machine learning framework for detecting and mapping harmful algal bloom (HAB) severity and speciation using multi-sensor satellite data. By fusing reflectance data from operational instruments (VIIRS, MODIS, Sentinel-3, PACE) with TROPOMI solar-induced fluorescence (SIF), our framework, called SIT-FUSE, generates HAB severity and speciation products without requiring per-instrument labeled datasets. The framework employs self-supervised representation learning, hierarchical deep clustering to segment phytoplankton concentrations and speciations into interpretable classes, validated against in-situ data from the Gulf of Mexico and Southern California (2018-2025). Results show strong agreement with total phytoplankton, Karenia brevis, Alexandrium spp., and Pseudo-nitzschia spp. measurements. This work advances scalable HAB monitoring in label-scarce environments while enabling exploratory analysis via hierarchical embeddings: a critical step toward operationalizing self-supervised learning for global aquatic biogeochemistry.
( 2
min )
arXiv:2510.02417v1 Announce Type: cross
Abstract: DNA is a promising medium for digital information storage for its exceptional density and durability. While prior studies advanced coding theory, workflow design, and simulation tools, challenges such as synthesis costs, sequencing errors, and biological constraints (GC-content imbalance, homopolymers) limit practical deployment. To address this, our framework draws from quantum parallelism concepts to enhance encoding diversity and resilience, integrating biologically informed constraints with deep learning to enhance error mitigation in DNA storage. NeuroDNAAI encodes binary data streams into symbolic DNA sequences, transmits them through a noisy channel with substitutions, insertions, and deletions, and reconstructs them with high fidelity. Our results show that traditional prompting or rule-based schemes fail to adapt effectively to realistic noise, whereas NeuroDNAAI achieves superior accuracy. Experiments on benchmark datasets demonstrate low bit error rates for both text and images. By unifying theory, workflow, and simulation into one pipeline, NeuroDNAAI enables scalable, biologically valid archival DNA storage
( 2
min )
arXiv:2510.02707v1 Announce Type: cross
Abstract: Adversarial attacks present a significant threat to modern machine learning systems. Yet, existing detection methods often lack the ability to detect unseen attacks or detect different attack types with a high level of accuracy. In this work, we propose a statistical approach that establishes a detection baseline before a neural network's deployment, enabling effective real-time adversarial detection. We generate a metric of adversarial presence by comparing the behavior of a compressed/uncompressed neural network pair. Our method has been tested against state-of-the-art techniques, and it achieves near-perfect detection across a wide range of attack types. Moreover, it significantly reduces false positives, making it both reliable and practical for real-world applications.
( 2
min )
arXiv:2402.08992v2 Announce Type: replace-cross
Abstract: High-probability guarantees in stochastic optimization are often obtained only under strong noise assumptions such as sub-Gaussian tails. We show that such guarantees can also be achieved under the weaker assumption of bounded variance by developing a stochastic proximal point method. This method combines a proximal subproblem solver, which inherently reduces variance, with a probability booster that amplifies per-iteration reliability into high-confidence results. The analysis demonstrates convergence with low sample complexity, without restrictive noise assumptions or reliance on mini-batching.
( 2
min )
arXiv:2407.17446v2 Announce Type: replace-cross
Abstract: In this paper, we propose a novel generalisation of the signature of a path, motivated by fractional calculus, which is able to describe the solutions of linear Caputo controlled FDEs. We also propose another generalisation of the signature, inspired by the previous one, but more convenient to use in machine learning. Finally, we test this last signature in a toy application to the problem of handwritten digit recognition, where significant improvements in accuracy rates are observed compared to those of the original signature.
( 2
min )
arXiv:2407.17446v2 Announce Type: replace
Abstract: In this paper, we propose a novel generalisation of the signature of a path, motivated by fractional calculus, which is able to describe the solutions of linear Caputo controlled FDEs. We also propose another generalisation of the signature, inspired by the previous one, but more convenient to use in machine learning. Finally, we test this last signature in a toy application to the problem of handwritten digit recognition, where significant improvements in accuracy rates are observed compared to those of the original signature.
( 2
min )
arXiv:2402.08992v2 Announce Type: replace-cross
Abstract: High-probability guarantees in stochastic optimization are often obtained only under strong noise assumptions such as sub-Gaussian tails. We show that such guarantees can also be achieved under the weaker assumption of bounded variance by developing a stochastic proximal point method. This method combines a proximal subproblem solver, which inherently reduces variance, with a probability booster that amplifies per-iteration reliability into high-confidence results. The analysis demonstrates convergence with low sample complexity, without restrictive noise assumptions or reliance on mini-batching.
( 2
min )
Organizations are increasingly integrating generative AI capabilities into their applications to enhance customer experiences, streamline operations, and drive innovation. As generative AI workloads continue to grow in scale and importance, organizations face new challenges in maintaining consistent performance, reliability, and availability of their AI-powered applications. Customers are looking to scale their AI inference workloads across […]
( 43
min )
In this post, we demonstrate how to access AgentCore Gateway through a VPC interface endpoint from an Amazon Elastic Compute Cloud (Amazon EC2) instance in a VPC. We also show how to configure your VPC endpoint policy to provide secure access to the AgentCore Gateway while maintaining the principle of least privilege access.
( 44
min )
arXiv:2510.01867v1 Announce Type: new
Abstract: We consider a generalization of the celebrated Online Convex Optimization (OCO) framework with online adversarial constraints. We present two algorithms having simple modular structures that yield universal dynamic regret and cumulative constraint violation bounds, improving upon the state-of-the-art results. Our results hold in the most general case when both the cost and constraint functions are chosen arbitrarily by an adversary, and the constraint functions need not contain any common feasible point. The results are established by reducing the constrained learning problem to an instance of the standard OCO problem with specially constructed surrogate cost functions.
( 2
min )
arXiv:2510.01253v1 Announce Type: cross
Abstract: Large language models (LLMs) demonstrate strong mathematical reasoning, but reliance on closed-source APIs for OR tasks raises privacy concerns, and training open-source models from scratch incurs high compute costs. We introduce OR-Toolformer, which fine-tunes Llama-3.1-8B-Instruct with a semi-automatic data synthesis pipeline that generates diverse OR problem-answer pairs and augments the model with external solvers to produce API calls. On three of four standard benchmarks, OR-Toolformer achieves up to 80.1% execution accuracy, exceeding size-matched baselines by over 4.3%. In zero-shot evaluation on two unseen OR problem types, it attains 54% average accuracy, a 21 percentage-point improvement over the strongest baseline. These findings validate the efficacy of tool-augmented fine-tuning LLMs for accurate and generalizable OR problem modeling and solving.
( 2
min )
arXiv:2510.01328v1 Announce Type: cross
Abstract: Theories with a sign problem due to a complex action or Boltzmann weight can sometimes be numerically solved using a stochastic process in the complexified configuration space. However, the probability distribution effectively sampled by this complex Langevin process is not known a priori and notoriously hard to understand. In generative AI, diffusion models can learn distributions, or their log derivatives, from data. We explore the ability of diffusion models to learn the distributions sampled by a complex Langevin process, comparing score-based and energy-based diffusion models, and speculate about possible applications.
( 2
min )
arXiv:2505.11627v2 Announce Type: replace
Abstract: Extreme weather is straining electricity systems, exposing the limitations of reactive responses, and prompting the need for proactive resilience planning. Most existing approaches to enhance electricity system resilience employ simplified uncertainty models and decouple proactive and reactive decisions. This paper proposes a novel tri-level optimization model that integrates proactive actions, adversarial disruptions, and reactive responses. Conformal prediction is used to construct distribution-free system-disruption uncertainty sets with coverage guarantees. The tri-level problem is solved by using duality theory to derive a bi-level reformulation and employing Bender's decomposition. Numerical experiments demonstrate that our approach outperforms conventional robust and two-stage methods.
( 2
min )
In this post, we demonstrate how organizations can enhance their employee productivity by integrating Kore.ai’s AI for Work platform with Amazon Q Business. We show how to configure AI for Work as a data accessor for Amazon Q index for independent software vendors (ISVs), so employees can search enterprise knowledge and execute end-to-end agentic workflows involving search, reasoning, actions, and content generation.
( 41
min )
Today, we’re excited to announce the Amazon Bedrock AgentCore Model Context Protocol (MCP) Server. With built-in support for runtime, gateway integration, identity management, and agent memory, the AgentCore MCP Server is purpose-built to speed up creation of components compatible with Bedrock AgentCore. You can use the AgentCore MCP server for rapid prototyping, production AI solutions, […]
( 37
min )
Bakshi will help shape and scale entrepreneurship education and platform at MIT.
( 6
min )
Optimized for generative AI, TX-GAIN is driving innovation in biodefense, materials discovery, cybersecurity, and other areas of research and development.
( 5
min )
Editor’s note: This blog has been updated to include an additional game for October, The Outer Worlds 2. October is creeping in with plenty of gaming treats. From thrilling adventures to spine‑tingling scares, the cloud gaming lineup is packed with 18 new games, including the highly anticipated shooter Battlefield 6, launching on GeForce NOW this month.
Read Article
( 6
min )
arXiv:2510.00321v1 Announce Type: new
Abstract: The exponential growth of internet generated data has fueled advancements in artificial intelligence (AI), machine learning (ML), and deep learning (DL) for extracting actionable insights in marketing,telecom, and health sectors. This chapter explores ML applications across three domains namely healthcare, marketing, and telecommunications, with a primary focus on developing a framework for optimal ML algorithm selection. In healthcare, the framework addresses critical challenges such as cardiovascular disease prediction accounting for 28.1% of global deaths and fetal health classification into healthy or unhealthy states, utilizing three datasets. ML algorithms are categorized into eager, lazy, and hybrid learners, selected based on dataset attributes, performance metrics (accuracy, precision, recall), and Akaike Information Criterion (AIC) scores. For validation, eight datasets from the three sectors are employed in the experiments. The key contribution is a recommendation framework that identifies the best ML model according to input attributes, balancing performance evaluation and model complexity to enhance efficiency and accuracy in diverse real-world applications. This approach bridges gaps in automated model selection, offering practical implications for interdisciplinary ML deployment.
( 3
min )
arXiv:2510.00374v1 Announce Type: new
Abstract: We present GDLNN, a new graph machine learning architecture, for graph classification tasks. GDLNN combines a domain-specific programming language, called GDL, with neural networks. The main strength of GDLNN lies in its GDL layer, which generates expressive and interpretable graph representations. Since the graph representation is interpretable, existing model explanation techniques can be directly applied to explain GDLNN's predictions. Our evaluation shows that the GDL-based representation achieves high accuracy on most graph classification benchmark datasets, outperforming dominant graph learning methods such as GNNs. Applying an existing model explanation technique also yields high-quality explanations of GDLNN's predictions. Furthermore, the cost of GDLNN is low when the explanation cost is included.
( 2
min )
arXiv:2510.00873v1 Announce Type: new
Abstract: This brief study focuses on the application of autoencoders to improve the quality of low-amplitude signals, such as gravitational events. A pre-existing autoencoder was trained using cosmic event data, optimizing its architecture and parameters. The results show a significant increase in the signal-to-noise ratio of the processed signals, demonstrating the potential of autoencoders in the analysis of small signals with multiple sources of interference.
( 2
min )
arXiv:2510.01022v1 Announce Type: new
Abstract: We introduce a novel version of the geometric scattering transform for geometric graphs containing scalar and vector node features. This new scattering transform has desirable symmetries with respect to rigid-body roto-translations (i.e., $SE(3)$-equivariance) and may be incorporated into a geometric GNN framework. We empirically show that our equivariant scattering-based GNN achieves comparable performance to other equivariant message-passing-based GNNs at a fraction of the parameter count.
( 2
min )
arXiv:2510.00572v1 Announce Type: cross
Abstract: Intrusion Detection Systems (IDS) face persistent challenges due to evolving cyberattacks, high-dimensional traffic data, and severe class imbalance in benchmark datasets such as NSL-KDD. To address these issues, we propose IntrusionX, a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs) for local feature extraction and Long Short-Term Memory (LSTM) networks for temporal modeling. The architecture is further optimized using the Squirrel Search Algorithm (SSA), enabling effective hyperparameter tuning while maintaining computational efficiency. Our pipeline incorporates rigorous preprocessing, stratified data splitting, and dynamic class weighting to enhance the detection of rare classes. Experimental evaluation on NSL-KDD demonstrates that IntrusionX achieves 98% accuracy in binary classification and 87% in 5-class classification, with significant improvements in minority class recall (U2R: 71%, R2L: 93%). The novelty of IntrusionX lies in its reproducible, imbalance-aware design with metaheuristic optimization.
( 2
min )
arXiv:2510.01061v1 Announce Type: cross
Abstract: Distribution matching is central to many vision and graphics tasks, where the widely used Wasserstein distance is too costly to compute for high dimensional distributions. The Sliced Wasserstein Distance (SWD) offers a scalable alternative, yet its Monte Carlo estimator suffers from high variance, resulting in noisy gradients and slow convergence. We introduce Reservoir SWD (ReSWD), which integrates Weighted Reservoir Sampling into SWD to adaptively retain informative projection directions in optimization steps, resulting in stable gradients while remaining unbiased. Experiments on synthetic benchmarks and real-world tasks such as color correction and diffusion guidance show that ReSWD consistently outperforms standard SWD and other variance reduction baselines. Project page: https://reservoirswd.github.io/
( 2
min )
arXiv:2510.01168v1 Announce Type: cross
Abstract: We study a class of constrained nonconvex--nonconcave minimax problems in which the inner maximization involves potentially complex constraints. Under the assumption that the inner problem of a novel lifted minimax problem satisfies a local Kurdyka-{\L}ojasiewicz (KL) condition, we show that the maximal function of the original problem enjoys a local H\"older smoothness property. We also propose a sequential convex programming (SCP) method for solving constrained optimization problems and establish its convergence rate under a local KL condition. Leveraging these results, we develop an inexact proximal gradient method for the original minimax problem, where the inexact gradient of the maximal function is computed via the SCP method applied to a locally KL-structured subproblem. Finally, we establish complexity guarantees for the proposed method in computing an approximate stationary point of the original minimax problem.
( 2
min )
arXiv:2506.07614v4 Announce Type: replace-cross
Abstract: We study the problem of sampling from strongly log-concave distributions over $\mathbb{R}^d$ using the Poisson midpoint discretization (a variant of the randomized midpoint method) for overdamped/underdamped Langevin dynamics. We prove its convergence in the 2-Wasserstein distance ($W_2$), achieving a cubic speedup in dependence on the target accuracy ($\epsilon$) over the Euler-Maruyama discretization, surpassing existing bounds for randomized midpoint methods. Notably, in the case of underdamped Langevin dynamics, we demonstrate the complexity of $W_2$ convergence is much smaller than the complexity lower bounds for convergence in $L^2$ strong error established in the literature.
( 2
min )
arXiv:2510.01022v1 Announce Type: cross
Abstract: We introduce a novel version of the geometric scattering transform for geometric graphs containing scalar and vector node features. This new scattering transform has desirable symmetries with respect to rigid-body roto-translations (i.e., $SE(3)$-equivariance) and may be incorporated into a geometric GNN framework. We empirically show that our equivariant scattering-based GNN achieves comparable performance to other equivariant message-passing-based GNNs at a fraction of the parameter count.
( 2
min )
arXiv:2510.01168v1 Announce Type: cross
Abstract: We study a class of constrained nonconvex--nonconcave minimax problems in which the inner maximization involves potentially complex constraints. Under the assumption that the inner problem of a novel lifted minimax problem satisfies a local Kurdyka-{\L}ojasiewicz (KL) condition, we show that the maximal function of the original problem enjoys a local H\"older smoothness property. We also propose a sequential convex programming (SCP) method for solving constrained optimization problems and establish its convergence rate under a local KL condition. Leveraging these results, we develop an inexact proximal gradient method for the original minimax problem, where the inexact gradient of the maximal function is computed via the SCP method applied to a locally KL-structured subproblem. Finally, we establish complexity guarantees for the proposed method in computing an approximate stationary point of the original minimax problem.
( 2
min )
NVIDIA AI Days — hosted for and in different pockets of the world — are drawing in hundreds of enthusiasts, developers, researchers and startups to discuss and explore the latest technologies making AI breakthroughs possible.
( 8
min )
Many users want to run large language models (LLMs) locally for more privacy and control, and without subscriptions, but until recently, this meant a trade-off in output quality. Newly released open-weight models, like OpenAI’s gpt-oss and Alibaba’s Qwen 3, can run directly on PCs, delivering useful high-quality outputs, especially for local agentic AI. This opens
Read Article
( 8
min )
In this post, we share how Hapag-Lloyd developed and implemented a machine learning (ML)-powered assistant predicting vessel arrival and departure times that revolutionizes their schedule planning. By using Amazon SageMaker AI and implementing robust MLOps practices, Hapag-Lloyd has enhanced its schedule reliability—a key performance indicator in the industry and quality promise to their customers.
( 41
min )
We’re excited to announce that Rox is generally available, with Rox infrastructure built on AWS and delivered across web, Slack, macOS, and iOS. In this post, we share how Rox accelerates sales productivity with AI agents powered by Amazon Bedrock.
( 37
min )
arXiv:2509.25207v1 Announce Type: new
Abstract: Recent advancements in large language models (LLMs) have shown promise in feature engineering for tabular data, but concerns about their reliability persist, especially due to variability in generated outputs. We introduce a multi-level diagnosis and evaluation framework to assess the robustness of LLMs in feature engineering across diverse domains, focusing on the three main factors: key variables, relationships, and decision boundary values for predicting target classes. We demonstrate that the robustness of LLMs varies significantly over different datasets, and that high-quality LLM-generated features can improve few-shot prediction performance by up to 10.52%. This work opens a new direction for assessing and enhancing the reliability of LLM-driven feature engineering in various domains.
( 2
min )
arXiv:2509.25284v1 Announce Type: new
Abstract: Dynamic resource allocation in heterogeneous wireless networks (HetNets) is challenging for traditional methods under varying user loads and channel conditions. We propose a deep reinforcement learning (DRL) framework that jointly optimises transmit power, bandwidth, and scheduling via a multi-objective reward balancing throughput, energy efficiency, and fairness. Using real base station coordinates, we compare Proximal Policy Optimisation (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3) against three heuristic algorithms in multiple network scenarios. Our results show that DRL frameworks outperform heuristic algorithms in optimising resource allocation in dynamic networks. These findings highlight key trade-offs in DRL design for future HetNets.
( 2
min )
arXiv:2509.25382v1 Announce Type: new
Abstract: In this work, we explore the latent space of a denoising variational autoencoder with a mixture-of-Gaussians prior (VAE-MoG), trained on gravitational wave data from event GW150914. To evaluate how well the model captures the underlying structure, we use Hamiltonian Monte Carlo (HMC) to draw posterior samples conditioned on clean inputs, and compare them to the encoder's outputs from noisy data. Although the model reconstructs signals accurately, statistical comparisons reveal a clear mismatch in the latent space. This shows that strong denoising performance doesn't necessarily mean the latent representations are reliable highlighting the importance of using posterior-based validation when evaluating generative models.
( 2
min )
arXiv:2509.25311v1 Announce Type: cross
Abstract: We implement physics-informed-neural-networks (PINNs) to compute holographic entanglement entropy and entanglement wedge cross section. This technique allows us to compute these quantities for arbitrary shapes of the subregions in any asymptotically AdS metric. We test our computations against some known results and further demonstrate the utility of PINNs in examples, where it is not straightforward to perform such computations.
( 2
min )
arXiv:2509.25482v1 Announce Type: cross
Abstract: We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
( 2
min )
arXiv:2509.25630v1 Announce Type: cross
Abstract: Efficient sampling from complex and high dimensional target distributions turns out to be a fundamental task in diverse disciplines such as scientific computing, statistics and machine learning. In this paper, we revisit the randomized Langevin Monte Carlo (RLMC) for sampling from high dimensional distributions without log-concavity. Under the gradient Lipschitz condition and the log-Sobolev inequality, we prove a uniform-in-time error bound in $\mathcal{W}_2$-distance of order $O(\sqrt{d}h)$ for the RLMC sampling algorithm, which matches the best one in the literature under the log-concavity condition. Moreover, when the gradient of the potential $U$ is non-globally Lipschitz with superlinear growth, modified RLMC algorithms are proposed and analyzed, with non-asymptotic error bounds established. To the best of our knowledge, the modified RLMC algorithms and their non-asymptotic error bounds are new in the non-globally Lipschitz setting.
( 2
min )
arXiv:2509.25661v1 Announce Type: cross
Abstract: This study considers multiple reconfigurable intelligent surfaces (RISs)-aided multiuser downlink systems with the goal of jointly optimizing the transmitter precoding and RIS phase shift matrix to maximize spectrum efficiency. Unlike prior work that assumed ideal RIS reflectivity, a practical coupling effect is considered between reflecting amplitude and phase shift for the RIS elements. This makes the optimization problem non-convex. To address this challenge, we propose a deep deterministic policy gradient (DDPG)-based deep reinforcement learning (DRL) framework. The proposed model is evaluated under both fixed and random numbers of users in practical mmWave channel settings. Simulation results demonstrate that, despite its complexity, the proposed DDPG approach significantly outperforms optimization-based algorithms and double deep Q-learning, particularly in scenarios with random user distributions.
( 2
min )
arXiv:2509.25858v1 Announce Type: cross
Abstract: The topic of aging decline on performance of NBA players has been discussed in this study. The autoencoder with K-means clustering machine learning method was adopted to career trend classification of NBA players, and the LSTM deep learning method was adopted in performance prediction of each NBA player. The dataset was collected from the basketball game data of veteran NBA players. The contribution of the work performed better than the other methods with generalization ability for evaluating various types of NBA career trend, and can be applied in different types of sports in the field of sport analytics.
( 2
min )
arXiv:2509.26005v1 Announce Type: cross
Abstract: We introduce a formal active learning methodology for guiding the placement of Lagrangian observers to infer time-dependent vector fields -- a key task in oceanography, marine science, and ocean engineering -- using a physics-informed spatio-temporal Gaussian process surrogate model. The majority of existing placement campaigns either follow standard `space-filling' designs or relatively ad-hoc expert opinions. A key challenge to applying principled active learning in this setting is that Lagrangian observers are continuously advected through the vector field, so they make measurements at different locations and times. It is, therefore, important to consider the likely future trajectories of placed observers to account for the utility of candidate placement locations. To this end, we present BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories. We observe noticeable benefits of BALLAST-aided sequential observer placement strategies on both synthetic and high-fidelity ocean current models.
( 2
min )
arXiv:2509.26145v1 Announce Type: cross
Abstract: Depression is a major global public health challenge and its early identification is crucial. Social media data provides a new perspective for depression detection, but existing methods face limitations such as insufficient accuracy, insufficient utilization of time series features, and high annotation costs. To this end, this study proposes the LMILAtt model, which innovatively integrates Long Short-Term Memory autoencoders and attention mechanisms: firstly, the temporal dynamic features of user tweets (such as depressive tendency evolution patterns) are extracted through unsupervised LSTM autoencoders. Secondly, the attention mechanism is used to dynamically weight key texts (such as early depression signals) and construct a multi-example learning architecture to improve the accuracy of user-level detection. Finally, the performance was verified on the WU3D dataset labeled by professional medicine. Experiments show that the model is significantly better than the baseline model in terms of accuracy, recall and F1 score. In addition, the weakly supervised learning strategy significantly reduces the cost of labeling and provides an efficient solution for large-scale social media depression screening.
( 2
min )
arXiv:2509.26207v1 Announce Type: cross
Abstract: Transformer-based models have become the state of the art across multiple domains, from natural language processing to machine listening, thanks to attention mechanisms. However, the attention layers require a large number of parameters and high-end hardware for both training and inference. We propose a novel pruning technique targeted explicitly at the attention mechanism, where we decouple the pruning of the four layers in the attention block, namely: query, keys, values and outputs' projection matrices. We also investigate pruning strategies to prune along the head and channel dimensions, and compare the performance of the Audio Spectrogram Transformer (AST) model under different pruning scenarios. Our results show that even by pruning 50\% of the attention parameters we incur in performance degradation of less than 1\%
( 2
min )
arXiv:2304.11127v4 Announce Type: replace
Abstract: Recent scientific advances require complex experiment design, necessitating the meticulous tuning of many experiment parameters. Tree-structured Parzen estimator (TPE) is a widely used Bayesian optimization method in recent parameter tuning frameworks such as Hyperopt and Optuna. Despite its popularity, the roles of each control parameter in TPE and the algorithm intuition have not been discussed so far. The goal of this paper is to identify the roles of each control parameter and their impacts on parameter tuning based on the ablation studies using diverse benchmark datasets. The recommended setting concluded from the ablation studies is demonstrated to improve the performance of TPE. Our TPE implementation used in this paper is available at https://github.com/nabenabe0928/tpe/tree/single-opt.
( 2
min )
arXiv:2306.12612v2 Announce Type: replace
Abstract: Neural networks are typically sensitive to small input perturbations, leading to unexpected or brittle behaviour. We present RobustNeuralNetworks.jl: a Julia package for neural network models that are constructed to naturally satisfy a set of user-defined robustness metrics. The package is based on the recently proposed Recurrent Equilibrium Network (REN) and Lipschitz-Bounded Deep Network (LBDN) model classes, and is designed to interface directly with Julia's most widely-used machine learning package, Flux.jl. We discuss the theory behind our model parameterization, give an overview of the package, and provide a tutorial demonstrating its use in image classification, reinforcement learning, and nonlinear state-observer design.
( 2
min )
arXiv:2408.10502v2 Announce Type: replace-cross
Abstract: Despite the widespread occurrence of classification problems and the increasing collection of point process data across many disciplines, study of error probability for point process classification only emerged very recently. Here, we consider classification of renewal processes. We obtain asymptotic expressions for the Bhattacharyya bound on misclassification error probabilities for heavy-tailed renewal processes.
( 2
min )
arXiv:2507.09828v3 Announce Type: replace-cross
Abstract: Bayesian optimization is a powerful tool for optimizing an expensive-to-evaluate black-box function. In particular, the effectiveness of expected improvement (EI) has been demonstrated in a wide range of applications. However, theoretical analyses of EI are limited compared with other theoretically established algorithms. This paper analyzes a randomized variant of EI, which evaluates the EI from the maximum of the posterior sample path. We show that this posterior sampling-based random EI achieves the sublinear Bayesian cumulative regret bounds under the assumption that the black-box function follows a Gaussian process. Finally, we demonstrate the effectiveness of the proposed method through numerical experiments.
( 2
min )
arXiv:2509.25630v1 Announce Type: new
Abstract: Efficient sampling from complex and high dimensional target distributions turns out to be a fundamental task in diverse disciplines such as scientific computing, statistics and machine learning. In this paper, we revisit the randomized Langevin Monte Carlo (RLMC) for sampling from high dimensional distributions without log-concavity. Under the gradient Lipschitz condition and the log-Sobolev inequality, we prove a uniform-in-time error bound in $\mathcal{W}_2$-distance of order $O(\sqrt{d}h)$ for the RLMC sampling algorithm, which matches the best one in the literature under the log-concavity condition. Moreover, when the gradient of the potential $U$ is non-globally Lipschitz with superlinear growth, modified RLMC algorithms are proposed and analyzed, with non-asymptotic error bounds established. To the best of our knowledge, the modified RLMC algorithms and their non-asymptotic error bounds are new in the non-globally Lipschitz setting.
( 2
min )
arXiv:2509.26005v1 Announce Type: new
Abstract: We introduce a formal active learning methodology for guiding the placement of Lagrangian observers to infer time-dependent vector fields -- a key task in oceanography, marine science, and ocean engineering -- using a physics-informed spatio-temporal Gaussian process surrogate model. The majority of existing placement campaigns either follow standard `space-filling' designs or relatively ad-hoc expert opinions. A key challenge to applying principled active learning in this setting is that Lagrangian observers are continuously advected through the vector field, so they make measurements at different locations and times. It is, therefore, important to consider the likely future trajectories of placed observers to account for the utility of candidate placement locations. To this end, we present BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories. We observe noticeable benefits of BALLAST-aided sequential observer placement strategies on both synthetic and high-fidelity ocean current models.
( 2
min )
arXiv:2509.25482v1 Announce Type: cross
Abstract: We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
( 2
min )
arXiv:2509.26040v1 Announce Type: cross
Abstract: We survey the field of nonparametric inference under shape constraints, providing a historical overview and a perspective on its current state. An outlook and some open problems offer thoughts on future directions.
( 2
min )
arXiv:2408.10502v2 Announce Type: replace
Abstract: Despite the widespread occurrence of classification problems and the increasing collection of point process data across many disciplines, study of error probability for point process classification only emerged very recently. Here, we consider classification of renewal processes. We obtain asymptotic expressions for the Bhattacharyya bound on misclassification error probabilities for heavy-tailed renewal processes.
( 2
min )
arXiv:2507.09828v3 Announce Type: replace
Abstract: Bayesian optimization is a powerful tool for optimizing an expensive-to-evaluate black-box function. In particular, the effectiveness of expected improvement (EI) has been demonstrated in a wide range of applications. However, theoretical analyses of EI are limited compared with other theoretically established algorithms. This paper analyzes a randomized variant of EI, which evaluates the EI from the maximum of the posterior sample path. We show that this posterior sampling-based random EI achieves the sublinear Bayesian cumulative regret bounds under the assumption that the black-box function follows a Gaussian process. Finally, we demonstrate the effectiveness of the proposed method through numerical experiments.
( 2
min )
After seeing generated text evolve from the days of tiny neural networks to today's ChatGPT-style large language models, I have to conclude: there's something special about the tiny guys.
Maybe it's the way the tiny neural networks string together text letter by letter just
( 4
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Quantum computing promises to reshape industries — but progress hinges on solving key problems. Error correction. Simulations of qubit designs. Circuit compilation optimization tasks. These are among the bottlenecks that must be overcome to bring quantum hardware into the era of useful applications. Enter accelerated computing. The parallel processing of accelerated computing offers the power
Read Article
( 6
min )
Building robots that can effectively operate alongside human workers in factories, hospitals and public spaces presents an enormous technical challenge. These robots require humanlike dexterity, perception, cognition and whole-body coordination to navigate unpredictable real-world environments in real time.
( 8
min )
In this post, we demonstrate how to implement real-time fraud prevention using GraphStorm v0.5's new capabilities for deploying graph neural network (GNN) models through Amazon SageMaker. We show how to transition from model training to production-ready inference endpoints with minimal operational overhead, enabling sub-second fraud detection on transaction graphs with billions of nodes and edges.
( 43
min )
arXiv:2509.24076v1 Announce Type: new
Abstract: Pairwise distance-based costs are crucial for self-supervised and contrastive feature learning. Mixture Density Networks (MDNs) are a widely used approach for generative models and density approximation, using neural networks to produce multiple centers that define a Gaussian mixture. By combining MDNs with contrastive costs, this paper proposes data density approximation using four types of kernelized matrix costs: the scalar cost, the vector-matrix cost, the matrix-matrix cost (the trace of Schur complement), and the SVD cost (the nuclear norm), for learning multiple centers required to define a mixture density.
( 2
min )
arXiv:2509.24801v1 Announce Type: new
Abstract: A major challenge in physics-informed machine learning is to understand how the incorporation of prior domain knowledge affects learning rates when data are dependent. Focusing on empirical risk minimization with physics-informed regularization, we derive complexity-dependent bounds on the excess risk in probability and in expectation. We prove that, when the physical prior information is aligned, the learning rate improves from the (slow) Sobolev minimax rate to the (fast) optimal i.i.d. one without any sample-size deflation due to data dependence.
( 2
min )
arXiv:2509.24827v1 Announce Type: new
Abstract: In this paper we summarize the results of the Putnam-like benchmark published by Google DeepMind. This dataset consists of 96 original problems in the spirit of the Putnam Competition and 576 solutions of LLMs. We analyse the performance of models on this set of problems to verify their ability to solve problems from mathematical contests.
( 2
min )
arXiv:2509.24974v1 Announce Type: new
Abstract: Data scarcity drives the need for more sample-efficient large language models. In this work, we use the double descent phenomenon to holistically compare the sample efficiency of discrete diffusion and autoregressive models. We show that discrete diffusion models require larger capacity and more training epochs to escape their underparameterized regime and reach the interpolation threshold. In the strongly overparameterized regime, both models exhibit similar behavior, with neither exhibiting a pronounced second descent in test loss across a large range of model sizes. Overall, our results indicate that autoregressive models are more sample-efficient on small-scale datasets, while discrete diffusion models only become competitive when given sufficient capacity and compute.
( 2
min )
arXiv:2509.25100v1 Announce Type: new
Abstract: We introduce ORPO-Distill, a general-purpose method for cross-architecture LLM distillation that formulates the problem as a preference optimization task. Un- like standard CoT distillation, the approach transfers knowledge through diverse reasoning traces. It employs an Odds-Ratio Preference Optimization objective that contrasts teacher and student traces for more effective learning, and adopts a mixed-policy strategy for utilizing student-generated outputs, outperforming both off- and on-policy alternatives. Experiments on five datasets and multiple student models show consistent improvements over conventional black-box KD baselines.
( 2
min )
arXiv:2509.22701v1 Announce Type: cross
Abstract: This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the proposed continuous transfer learning model evolves dynamically during operations, minimizing retraining needs. Evaluated on Google Cluster Data, the model achieves over 99% accuracy, reducing computational overhead and improving scheduling latency for constrained tasks. This scalable solution enables real-time optimization, advancing machine learning integration in cluster management and paving the way for future adaptive scheduling strategies.
( 2
min )
arXiv:2509.22708v1 Announce Type: cross
Abstract: Generative Zero-Shot Learning approach (GZSL) has demonstrated significant potential in 3D point cloud semantic segmentation tasks. GZSL leverages generative models like GANs or VAEs to synthesize realistic features (real features) of unseen classes. This allows the model to label unseen classes during testing, despite being trained only on seen classes. In this context, we introduce the Generalized Zero-Shot Learning based-upon Mixture-of-Experts (GZSL-MoE) model. This model incorporates Mixture-of-Experts layers (MoE) to generate fake features that closely resemble real features extracted using a pre-trained KPConv (Kernel Point Convolution) model on seen classes. The main contribution of this paper is the integration of Mixture-of-Experts into the Generator and Discriminator components of the Generative Zero-Shot Learning model for 3D point cloud semantic segmentation, applied to the COVERED dataset (CollabOratiVE Robot Environment Dataset) for Human-Robot Collaboration (HRC) environments. By combining the Generative Zero-Shot Learning model with Mixture-of- Experts, GZSL-MoE for 3D point cloud semantic segmentation provides a promising solution for understanding complex 3D environments, especially when comprehensive training data for all object classes is unavailable. The performance evaluation of the GZSL-MoE model highlights its ability to enhance performance on both seen and unseen classes. Keywords Generalized Zero-Shot Learning (GZSL), 3D Point Cloud, 3D Semantic Segmentation, Human-Robot Collaboration, COVERED (CollabOratiVE Robot Environment Dataset), KPConv, Mixture-of Experts
( 3
min )
arXiv:2509.22719v1 Announce Type: cross
Abstract: In recent years, Transformer-based architectures have become the dominant method for Computer Vision applications. While Transformers are explainable and scale well with dataset size, they lack the inductive biases of Convolutional Neural Networks. While these biases may be learned on large datasets, we show that introducing these inductive biases through learned masks allow Vision Transformers to learn on much smaller datasets without Knowledge Distillation. These Transformers, which we call Inductively Biased Image Transformers (IBiT), are significantly more accurate on small datasets, while retaining the explainability Transformers.
( 2
min )
arXiv:2509.23317v1 Announce Type: cross
Abstract: We investigate the recovery dynamics of healthy cardiac activity after physical exertion using multimodal biosignals recorded with a polycardiograph. Multifractal features derived from the singularity spectrum capture the scale-invariant properties of cardiovascular regulation. Five supervised classification algorithms - Logistic Regression (LogReg), Suport Vector Machine with RBF kernel (SVM-RBF), k-Nearest Neighbors (kNN), Decision Tree (DT), and Random Forest (RF) - were evaluated to distinguish recovery states in a small, imbalanced dataset. Our results show that multifractal analysis, combined with multimodal sensing, yields reliable features for characterizing recovery and points toward nonlinear diagnostic methods for heart conditions.
( 2
min )
arXiv:2509.23715v1 Announce Type: cross
Abstract: Ensuring that both new and experienced drivers master current traffic rules is critical to road safety. This paper evaluates Large Language Models (LLMs) on Romanian driving-law QA with explanation generation. We release a 1{,}208-question dataset (387 multimodal) and compare text-only and multimodal SOTA systems, then measure the impact of domain-specific fine-tuning for Llama 3.1-8B-Instruct and RoLlama 3.1-8B-Instruct. SOTA models perform well, but fine-tuned 8B models are competitive. Textual descriptions of images outperform direct visual input. Finally, an LLM-as-a-Judge assesses explanation quality, revealing self-preference bias. The study informs explainable QA for less-resourced languages.
( 2
min )
arXiv:2509.24134v1 Announce Type: cross
Abstract: We present AstroCo, a Conformer-style encoder for irregular stellar light curves. By combining attention with depthwise convolutions and gating, AstroCo captures both global dependencies and local features. On MACHO R-band, AstroCo outperforms Astromer v1 and v2, yielding 70 percent and 61 percent lower error respectively and a relative macro-F1 gain of about 7 percent, while producing embeddings that transfer effectively to few-shot classification. These results highlight AstroCo's potential as a strong and label-efficient foundation for time-domain astronomy.
( 2
min )
arXiv:2509.24227v1 Announce Type: cross
Abstract: Prostate cancer (PCa) is the most frequently diagnosed malignancy in men and the eighth leading cause of cancer death worldwide. Multiparametric MRI (mpMRI) has become central to the diagnostic pathway for men at intermediate risk, improving de-tection of clinically significant PCa (csPCa) while reducing unnecessary biopsies and over-diagnosis. However, mpMRI remains limited by false positives, false negatives, and moderate to substantial interobserver agreement. Time-dependent diffusion (TDD) MRI, a novel sequence that enables tissue microstructure characterization, has shown encouraging preclinical performance in distinguishing clinically significant from insignificant PCa. Combining TDD-derived metrics with machine learning may provide robust, zone-specific risk prediction with less dependence on reader training and improved accuracy compared to current standard-of-care. This study protocol out-lines the rationale and describes the prospective evaluation of a home-developed AI-enhanced TDD-MRI software (PROSTDAI) in routine diagnostic care, assessing its added value against PI-RADS v2.1 and validating results against MRI-guided prostate biopsy.
( 3
min )
arXiv:2509.24404v1 Announce Type: cross
Abstract: This project presents an AI-based system for tone replication in music production, focusing on predicting EQ parameter settings directly from audio features. Unlike traditional audio-to-audio methods, our approach outputs interpretable parameter values (e.g., EQ band gains) that musicians can further adjust in their workflow. Using a dataset of piano recordings with systematically varied EQ settings, we evaluate both regression and neural network models. The neural network achieves a mean squared error of 0.0216 on multi-band tasks. The system enables practical, flexible, and automated tone matching for music producers and lays the foundation for extensions to more complex audio effects.
( 2
min )
arXiv:2509.24823v1 Announce Type: cross
Abstract: We propose a high-payload image watermarking method for textual embedding, where a semantic description of the image - which may also correspond to the input text prompt-, is embedded inside the image. In order to be able to robustly embed high payloads in large-scale images - such as those produced by modern AI generators - the proposed approach builds upon a traditional watermarking scheme that exploits orthogonal and turbo codes for improved robustness, and integrates frequency-domain embedding and perceptual masking techniques to enhance watermark imperceptibility. Experiments show that the proposed method is extremely robust against a wide variety of image processing, and the embedded text can be retrieved also after traditional and AI inpainting, permitting to unveil the semantic modification the image has undergone via image-text mismatch analysis.
( 2
min )
arXiv:2509.25126v1 Announce Type: cross
Abstract: We study spectral learning for orthogonally decomposable (odeco) tensors, emphasizing the interplay between statistical limits, optimization geometry, and initialization. Unlike matrices, recovery for odeco tensors does not hinge on eigengaps, yielding improved robustness under noise. While iterative methods such as tensor power iterations can be statistically efficient, initialization emerges as the main computational bottleneck. We investigate perturbation bounds, non-convex optimization analysis, and initialization strategies, clarifying when efficient algorithms attain statistical limits and when fundamental barriers remain.
( 2
min )
arXiv:2506.04053v2 Announce Type: replace
Abstract: Sliced Mutual Information (SMI) is widely used as a scalable alternative to mutual information for measuring non-linear statistical dependence. Despite its advantages, such as faster convergence, robustness to high dimensionality, and nullification only under statistical independence, we demonstrate that SMI is highly susceptible to data manipulation and exhibits counterintuitive behavior. Through extensive benchmarking and theoretical analysis, we show that SMI saturates easily, fails to detect increases in statistical dependence (even under linear transformations designed to enhance the extraction of information), prioritizes redundancy over informative content, and in some cases, performs worse than simpler dependence measures like the correlation coefficient.
( 2
min )
arXiv:2507.06765v2 Announce Type: replace
Abstract: This document proposes a parametric activation function (ac.f.) aimed at improving multidimensional nonlinear data regression. It is a established knowledge that nonlinear ac.f's are required for learning nonlinear datasets. This work shows that smoothness and gradient properties of the ac.f. further impact the performance of large neural networks in terms of overfitting and sensitivity to model parameters. Smooth but vanishing-gradient ac.f's such as ELU or SiLU (Swish) have limited performance and non-smooth ac.f's such as RELU and Leaky-RELU further impart discontinuity in the trained model. Improved performance is demonstrated with a smooth "Leaky Exponential Linear Unit", with non-zero gradient that can be trained. A novel diffusion-loss metric is also proposed to gauge the performance of the trained models in terms of overfitting.
( 2
min )
arXiv:2507.16008v2 Announce Type: replace
Abstract: Physics-informed neural networks (PINNs) have gained prominence in recent years and are now effectively used in a number of applications. However, their performance remains unstable due to the complex landscape of the loss function. To address this issue, we reformulate PINN training as a nonconvex-strongly concave saddle-point problem. After establishing the theoretical foundation for this approach, we conduct an extensive experimental study, evaluating its effectiveness across various tasks and architectures. Our results demonstrate that the proposed method outperforms the current state-of-the-art techniques.
( 2
min )
arXiv:2508.01407v2 Announce Type: replace
Abstract: Accurate, explainable and physically credible forecasting remains a persistent challenge for multivariate time-series whose statistical properties vary across domains. We propose DORIC, a Domain-Universal, ODE-Regularized, Interpretable-Concept Transformer for Time-Series Forecasting that generates predictions through five self-supervised, domain-agnostic concepts while enforcing differentiable residuals grounded in first-principles constraints.
( 2
min )
arXiv:2507.19438v2 Announce Type: replace-cross
Abstract: Machine learning interatomic potentials have become an indispensable tool for materials science, enabling the study of larger systems and longer timescales. State-of-the-art models are generally graph neural networks that employ message passing to iteratively update atomic embeddings that are ultimately used for predicting properties. In this work we extend the message passing formalism with the inclusion of a continuous variable that accounts for fractional atomic existence. This allows us to calculate the gradient of the Gibbs free energy with respect to both the Cartesian coordinates of atoms and their existence. Using this we propose a gradient-based grand canonical optimization method and document its capabilities for a Cu(110) surface oxide.
( 2
min )
arXiv:2509.25126v1 Announce Type: new
Abstract: We study spectral learning for orthogonally decomposable (odeco) tensors, emphasizing the interplay between statistical limits, optimization geometry, and initialization. Unlike matrices, recovery for odeco tensors does not hinge on eigengaps, yielding improved robustness under noise. While iterative methods such as tensor power iterations can be statistically efficient, initialization emerges as the main computational bottleneck. We investigate perturbation bounds, non-convex optimization analysis, and initialization strategies, clarifying when efficient algorithms attain statistical limits and when fundamental barriers remain.
( 2
min )
arXiv:2509.24076v1 Announce Type: cross
Abstract: Pairwise distance-based costs are crucial for self-supervised and contrastive feature learning. Mixture Density Networks (MDNs) are a widely used approach for generative models and density approximation, using neural networks to produce multiple centers that define a Gaussian mixture. By combining MDNs with contrastive costs, this paper proposes data density approximation using four types of kernelized matrix costs: the scalar cost, the vector-matrix cost, the matrix-matrix cost (the trace of Schur complement), and the SVD cost (the nuclear norm), for learning multiple centers required to define a mixture density.
( 2
min )
Explosive growth of AI data centers is expected to increase greenhouse gas emissions. Researchers are now seeking solutions to reduce these environmental harms.
( 9
min )
arXiv:2509.21393v1 Announce Type: new
Abstract: Physics Informed Neural Networks offer a mesh free framework for solving PDEs but are highly sensitive to loss weight selection. We propose two dimensional analysis based weighting schemes, one based on quantifiable terms, and another also incorporating unquantifiable terms for more balanced training. Benchmarks on heat conduction, convection diffusion, and lid driven cavity flows show that the second scheme consistently improves stability and accuracy over equal weighting. Notably, in high Peclet number convection diffusion, where traditional solvers fail, PINNs with our scheme achieve stable, accurate predictions, highlighting their robustness and generalizability in CFD problems.
( 2
min )
arXiv:2509.21405v1 Announce Type: new
Abstract: This work addresses object identification under known dynamics in unmanned aerial vehicle applications, where learning and classification are combined through a physics-informed residual neural network. The proposed framework leverages physics-informed learning for state mapping and state-derivative prediction, while a softmax layer enables multi-class confidence estimation. Quadcopter, fixed-wing, and helicopter aerial vehicles are considered as case studies. The results demonstrate high classification accuracy with reduced training time, offering a promising solution for system identification problems in domains where the underlying dynamics are well understood.
( 2
min )
arXiv:2509.21659v1 Announce Type: new
Abstract: Partial differential equation (PDE)-governed inverse problems are fundamental across various scientific and engineering applications; yet they face significant challenges due to nonlinearity, ill-posedness, and sensitivity to noise. Here, we introduce a new computational framework, RED-DiffEq, by integrating physics-driven inversion and data-driven learning. RED-DiffEq leverages pretrained diffusion models as a regularization mechanism for PDE-governed inverse problems. We apply RED-DiffEq to solve the full waveform inversion problem in geophysics, a challenging seismic imaging technique that seeks to reconstruct high-resolution subsurface velocity models from seismic measurement data. Our method shows enhanced accuracy and robustness compared to conventional methods. Additionally, it exhibits strong generalization ability to more complex velocity models that the diffusion model is not trained on. Our framework can also be directly applied to diverse PDE-governed inverse problems.
( 2
min )
arXiv:2509.21703v1 Announce Type: new
Abstract: Understanding urban human mobility patterns at various spatial levels is essential for social science. This study presents a machine learning framework to downscale origin-destination (OD) taxi trips flows in New York City from a larger spatial unit to a smaller spatial unit. First, correlations between OD trips and demographic, socioeconomic, and commuting characteristics are developed using four models: Linear Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Neural Networks (NN). Second, a perturbation-based sensitivity analysis is applied to interpret variable importance for nonlinear models. The results show that the linear regression model failed to capture the complex variable interactions. While NN performs best with the training and testing datasets, SVM shows the best generalization ability in downscaling performance. The methodology presented in this study provides both analytical advancement and practical applications to improve transportation services and urban development.
( 2
min )
arXiv:2509.22395v1 Announce Type: new
Abstract: The decline in interest rates and economic stabilization has heightened the importance of accurate mortality rate forecasting, particularly in insurance and pension markets. Multi-step-ahead predictions are crucial for public health, demographic planning, and insurance risk assessments; however, they face challenges when data are limited. Hybrid systems that combine statistical and Machine Learning (ML) models offer a promising solution for handling both linear and nonlinear patterns. This study evaluated the impact of different multi-step forecasting approaches (Recursive, Direct, and Multi-Input Multi-Output) and ML models on the accuracy of hybrid systems. Results from 12 datasets and 21 models show that the selection of both the multi-step approach and the ML model is essential for improving performance, with the ARIMA-LSTM hybrid using a recursive approach outperforming other models in most cases.
( 2
min )
arXiv:2509.22484v1 Announce Type: new
Abstract: We present a machine learning pipeline for biomarker discovery in Multiple Sclerosis (MS), integrating eight publicly available microarray datasets from Peripheral Blood Mononuclear Cells (PBMC). After robust preprocessing we trained an XGBoost classifier optimized via Bayesian search. SHapley Additive exPlanations (SHAP) were used to identify key features for model prediction, indicating thus possible biomarkers. These were compared with genes identified through classical Differential Expression Analysis (DEA). Our comparison revealed both overlapping and unique biomarkers between SHAP and DEA, suggesting complementary strengths. Enrichment analysis confirmed the biological relevance of SHAP-selected genes, linking them to pathways such as sphingolipid signaling, Th1/Th2/Th17 cell differentiation, and Epstein-Barr virus infection all known to be associated with MS. This study highlights the value of combining explainable AI (xAI) with traditional statistical methods to gain deeper insights into disease mechanism.
( 2
min )
arXiv:2509.21327v1 Announce Type: cross
Abstract: Predicting the spread of wildfires is essential for effective fire management and risk assessment. With the fast advancements of artificial intelligence (AI), various deep learning models have been developed and utilized for wildfire spread prediction. However, there is limited understanding of the advantages and limitations of these models, and it is also unclear how deep learning-based fire spread models can be compared with existing non-AI fire models. In this work, we assess the ability of five typical deep learning models integrated with weather and environmental variables for wildfire spread prediction based on over ten years of wildfire data in the state of Hawaii. We further use the 2023 Maui fires as a case study to compare the best deep learning models with a widely-used fire spread model, FARSITE. The results show that two deep learning models, i.e., ConvLSTM and ConvLSTM with attention, perform the best among the five tested AI models. FARSITE shows higher precision, lower recall, and higher F1-score than the best AI models, while the AI models offer higher flexibility for the input data. By integrating AI models with an explainable AI method, we further identify important weather and environmental factors associated with the 2023 Maui wildfires.
( 3
min )
arXiv:2509.21387v1 Announce Type: cross
Abstract: Prior works have shown that neural networks can be heavily pruned while preserving performance, but the impact of pruning on model interpretability remains unclear. In this work, we investigate how magnitude-based pruning followed by fine-tuning affects both low-level saliency maps and high-level concept representations. Using a ResNet-18 trained on ImageNette, we compare post-hoc explanations from Vanilla Gradients (VG) and Integrated Gradients (IG) across pruning levels, evaluating sparsity and faithfulness. We further apply CRAFT-based concept extraction to track changes in semantic coherence of learned concepts. Our results show that light-to-moderate pruning improves saliency-map focus and faithfulness while retaining distinct, semantically meaningful concepts. In contrast, aggressive pruning merges heterogeneous features, reducing saliency map sparsity and concept coherence despite maintaining accuracy. These findings suggest that while pruning can shape internal representations toward more human-aligned attention patterns, excessive pruning undermines interpretability.
( 2
min )
arXiv:2509.21613v1 Announce Type: cross
Abstract: Multi-Objective Reinforcement Learning (MORL) presents significant challenges and opportunities for optimizing multiple objectives in Large Language Models (LLMs). We introduce a MORL taxonomy and examine the advantages and limitations of various MORL methods when applied to LLM optimization, identifying the need for efficient and flexible approaches that accommodate personalization functionality and inherent complexities in LLMs and RL. We propose a vision for a MORL benchmarking framework that addresses the effects of different methods on diverse objective relationships. As future research directions, we focus on meta-policy MORL development that can improve efficiency and flexibility through its bi-level learning paradigm, highlighting key research questions and potential solutions for improving LLM performance.
( 2
min )
arXiv:2509.22011v1 Announce Type: cross
Abstract: We present a rigorous asymptotic analysis of Echo State Networks (ESNs) in a teacher student setting with a linear teacher with oracle weights. Leveraging random matrix theory, we derive closed form expressions for the asymptotic bias, variance, and mean-squared error (MSE) as functions of the input statistics, the oracle vector, and the ridge regularization parameter. The analysis reveals two key departures from classical ridge regression: (i) ESNs do not exhibit double descent, and (ii) ESNs attain lower MSE when both the number of training samples and the teacher memory length are limited. We further provide an explicit formula for the optimal regularization in the identity input covariance case, and propose an efficient numerical scheme to compute the optimum in the general case. Together, these results offer interpretable theory and practical guidelines for tuning ESNs, helping reconcile recent empirical observations with provable performance guarantees
( 2
min )
arXiv:2509.22598v1 Announce Type: cross
Abstract: We prove that all standard subregular language classes are linearly separable when represented by their deciding predicates. This establishes finite observability and guarantees learnability with simple linear models. Synthetic experiments confirm perfect separability under noise-free conditions, while real-data experiments on English morphology show that learned features align with well-known linguistic constraints. These results demonstrate that the subregular hierarchy provides a rigorous and interpretable foundation for modeling natural language structure. Our code used in real-data experiments is available at https://github.com/UTokyo-HayashiLab/subregular.
( 2
min )
arXiv:2411.07102v2 Announce Type: replace-cross
Abstract: In this work, we address unconstrained finite-sum optimization problems, with particular focus on instances originating in large scale deep learning scenarios. Our main interest lies in the exploration of the relationship between recent line search approaches for stochastic optimization in the overparametrized regime and momentum directions. First, we point out that combining these two elements with computational benefits is not straightforward. To this aim, we propose a solution based on mini-batch persistency. We then introduce an algorithmic framework that exploits a mix of data persistency, conjugate-gradient type rules for the definition of the momentum parameter and stochastic line searches. The resulting algorithm provably possesses convergence properties under suitable assumptions and is empirically shown to outperform other popular methods from the literature, obtaining state-of-the-art results in both convex and nonconvex large scale training problems.
( 2
min )
arXiv:2505.02091v2 Announce Type: replace-cross
Abstract: Solving non-convex resource allocation problems poses significant challenges in wireless communication systems, often beyond the capability of traditional optimization techniques. To address this issue, we propose LLM-OptiRA, the first framework that leverages large language models (LLMs) to automatically detect and transform non-convex components into solvable forms, enabling fully automated resolution of non-convex resource allocation problems in wireless communication systems. LLM-OptiRA not only simplifies problem-solving by reducing reliance on expert knowledge, but also integrates error correction and feasibility validation mechanisms to ensure robustness. Experimental results show that LLM-OptiRA achieves an execution rate of 96% and a success rate of 80% on GPT-4, significantly outperforming baseline approaches in complex optimization tasks across diverse scenarios.
( 2
min )
arXiv:2505.16965v2 Announce Type: replace-cross
Abstract: Text segmentation based on the semantic meaning of sentences is a fundamental task with broad utility in many downstream applications. In this paper, we propose a graphical model-based unsupervised learning approach, named BP-Seg for efficient text segmentation. Our method not only considers local coherence, capturing the intuition that adjacent sentences are often more related, but also effectively groups sentences that are distant in the text yet semantically similar. This is achieved through belief propagation on the carefully constructed graphical models. Experimental results on both an illustrative example and a dataset with long-form documents demonstrate that our method performs favorably compared to competing approaches.
( 2
min )
arXiv:2509.22011v1 Announce Type: new
Abstract: We present a rigorous asymptotic analysis of Echo State Networks (ESNs) in a teacher student setting with a linear teacher with oracle weights. Leveraging random matrix theory, we derive closed form expressions for the asymptotic bias, variance, and mean-squared error (MSE) as functions of the input statistics, the oracle vector, and the ridge regularization parameter. The analysis reveals two key departures from classical ridge regression: (i) ESNs do not exhibit double descent, and (ii) ESNs attain lower MSE when both the number of training samples and the teacher memory length are limited. We further provide an explicit formula for the optimal regularization in the identity input covariance case, and propose an efficient numerical scheme to compute the optimum in the general case. Together, these results offer interpretable theory and practical guidelines for tuning ESNs, helping reconcile recent empirical observations with provable performance guarantees
( 2
min )
In this solution, we demonstrate how the user (a parent) can interact with a Strands or LangGraph agent in conversational style and get information about the immunization history and schedule of their child, inquire about the available slots, and book appointments. With some changes, AI agents can be made event-driven so that they can automatically send reminders, book appointments, and so on.
( 40
min )
In this post, we demonstrate how to build a multi-agent SRE assistant using Amazon Bedrock AgentCore, LangGraph, and the Model Context Protocol (MCP). This system deploys specialized AI agents that collaborate to provide the deep, contextual intelligence that modern SRE teams need for effective incident response and infrastructure management.
( 47
min )
We will also collaborate on new AI experiences for FT readers.
( 2
min )
We’re joining Thorn, All Tech Is Human, and other leading companies in an effort to prevent the misuse of generative AI to perpetrate, proliferate, and further sexual harms against children.
( 2
min )
Increasing enterprise support with more security features and controls, updates to our Assistants API, and tools to better manage costs.
( 2
min )
We’re adding new features to help developers have more control over fine-tuning and announcing new ways to build custom models with OpenAI.
( 4
min )