sdk Archives - AI News

Amazon Nova Act: A step towards smarter, web-native AI agents

Ryan Daws — Tue, 01 Apr 2025 16:57:43 +0000

Amazon has introduced Nova Act, an advanced AI model engineered for smarter agents that can execute tasks within web browsers.

While large language models popularised the concept of “agents” as tools that answer queries or retrieve information via methods such as Retrieval-Augmented Generation (RAG), Amazon envisions something more robust. The company defines agents not just as responders but as entities capable of performing tangible, multi-step tasks in diverse digital and physical environments.

“Our dream is for agents to perform wide-ranging, complex, multi-step tasks like organising a wedding or handling complex IT tasks to increase business productivity,” said Amazon.

Current market offerings often fall short, with many agents requiring continuous human supervision and their functionality dependent on comprehensive API integration—something not feasible for all tasks. Nova Act is Amazon’s answer to these limitations.

Alongside the model, Amazon is releasing a research preview of the Amazon Nova Act SDK. Using the SDK, developers can create agents capable of automating web tasks like submitting out-of-office notifications, scheduling calendar holds, or enabling automatic email replies.

The SDK aims to break down complex workflows into dependable “atomic commands” such as searching, checking out, or interacting with specific interface elements like dropdowns or popups. Detailed instructions can be added to refine these commands, allowing developers to, for instance, instruct an agent to bypass an insurance upsell during checkout.

To further enhance accuracy, the SDK supports browser manipulation via Playwright, API calls, Python integrations, and parallel threading to overcome web page load delays.

Nova Act: Exceptional performance on benchmarks

Unlike other generative models that showcase middling accuracy on complex tasks, Nova Act prioritises reliability. Amazon highlights its model’s impressive scores of over 90% on internal evaluations for specific capabilities that typically challenge competitors.

Nova Act achieved a near-perfect 0.939 on the ScreenSpot Web Text benchmark, which measures natural language instructions for text-based interactions, such as adjusting font sizes. Competing models such as Claude 3.7 Sonnet (0.900) and OpenAI’s CUA (0.883) trail behind by significant margins.

Similarly, Nova Act scored 0.879 in the ScreenSpot Web Icon benchmark, which tests interactions with visual elements like rating stars or icons. While the GroundUI Web test, designed to assess an AI’s proficiency in navigating various user interface elements, showed Nova Act slightly trailing competitors, Amazon sees this as an area ripe for improvement as the model evolves.

Amazon stresses its focus on delivering practical reliability. Once an agent built using Nova Act functions as expected, developers can deploy it headlessly, integrate it as an API, or even schedule it to run tasks asynchronously. In one demonstrated use case, an agent automatically orders a salad for delivery every Tuesday evening without requiring ongoing user intervention.

Amazon sets out its vision for scalable and smart AI agents

One of Nova Act’s standout features is its ability to transfer its user interface understanding to new environments with minimal additional training. Amazon shared an instance where Nova Act performed admirably in browser-based games, even though its training had not included video game experiences. This adaptability positions Nova Act as a versatile agent for diverse applications.

This capability is already being leveraged in Amazon’s own ecosystem. Within Alexa+, Nova Act enables self-directed web navigation to complete tasks for users, even when API access is not comprehensive enough. This represents a step towards smarter AI assistants that can function independently, harnessing their skills in more dynamic ways.

Amazon is clear that Nova Act represents the first stage in a broader mission to craft intelligent, reliable AI agents capable of handling increasingly complex, multi-step tasks.

Expanding beyond simple instructions, Amazon’s focus is on training agents through reinforcement learning across varied, real-world scenarios rather than overly simplistic demonstrations. This foundational model serves as a checkpoint in a long-term training curriculum for Nova models, indicating the company’s ambition to reshape the AI agent landscape.

“The most valuable use cases for agents have yet to be built,” Amazon noted. “The best developers and designers will discover them. This research preview of our Nova Act SDK enables us to iterate alongside these builders through rapid prototyping and iterative feedback.”

Nova Act is a step towards making AI agents truly useful for complex, digital tasks. From rethinking benchmarks to emphasising reliability, its design philosophy is centred around empowering developers to move beyond what’s possible with current-generation tools.

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Amazon Nova Act: A step towards smarter, web-native AI agents appeared first on AI News.

GTC 2021: Nvidia debuts accelerated computing libraries, partners with Google, IBM, and others to speed up quantum research

Ryan Daws — Tue, 09 Nov 2021 13:06:58 +0000

Nvidia has unveiled 65 new and updated software development kits at GTC 2021, alongside a partnership with industry leaders to speed up quantum research.

The company’s roster of accelerated computing kits now exceeds 150 and supports the almost three million developers in NVIDIA’s Developer Program.

Four of the major new SDKs are:

ReOpt – Automatically optimises logistical processes using advanced, parallel algorithms. This includes vehicle routes, warehouse selection, and fleet mix. The dynamic rerouting capabilities – shown in an on-stage demo – can reduce travel time, save fuel costs, and minimise idle periods.
cuNumeric – Implements the popular NumPy application programming interface and enables scaling to multi-GPU and multi-node systems with zero code changes.
cuQuantum – Designed for quantum computing, it enables large quantum circuits to be simulated faster. This enables quantum researchers to simulate areas such as near-term variational quantum algorithms for molecules, error correction algorithms to identify fault tolerance, and accelerate popular quantum simulators from Atos, Google, and IBM.
CUDA-X accelerated DGL container – Helps developers and data scientists working on graph neural networks to quickly set up a working environment. The container makes it easy to work in an integrated, GPU-accelerated GNN environment combining DGL and Pytorch.

Some existing AI-related SDKs that have received notable updates are:

Deepstream 6.0 – introduces a new graph composer that makes computer vision accessible with a visual drag-and-drop interface.
Triton 2.15, TensorRT 8.2 and cuDNN 8.4 – assists with the development of deep neural networks by providing new optimisations for large language models and inference acceleration for gradient-boosted decision trees and random forests.
Merlin 0.8 – boosts recommendation systems with its new capabilities for predicting a user’s next action with little or no user data and support for models larger than GPU memory.

Accelerating quantum research

Nvidia has established a partnership with Google, IBM, and a number of small companies, national labs, and university research groups to accelerate quantum research.

“It takes a village to nurture an emerging technology, so Nvidia is collaborating with Google Quantum AI, IBM, and others to take quantum computing to the next level,” explained the company in a blog post.

The first library from the aforementioned new cuQuantum SDK is Nvidia’s initial contribution to the partnership. The library is called cuStateVec and is an accelerator for the state vector simulation method which tracks the full state of the system in memory and can scale to tens of qubits.

cuStateVec has been integrated into Google Quantum AI’s state vector simulator qsim and can be used through the open-source framework Cirq.

“Quantum computing promises to solve tough challenges in computing that are beyond the reach of traditional systems,” commented Catherine Vollgraff Heidweiller at Google Quantum AI.

“This high-performance simulation stack will accelerate the work of researchers around the world who are developing algorithms and applications for quantum computers.”

In December, cuStateVec will also be integrated with Qiskit Aer—a high-performance simulator framework for quantum circuits from IBM.

Among the national labs using cuQuantum to accelerate their research are Oak Ridge, Argonne, Lawrence Berkeley National Laboratory, and Pacific Northwest National Laboratory. University research groups include those at Caltech, Oxford, and MIT.

Nvidia is helping developers to get started by creating a ‘DGX quantum appliance’ that puts its simulation software in a container optimised for its DGX A100 systems. The software will be available early next year via the company’s NGC Catalog.

(Image Credit: Nvidia)

Looking to revamp your digital transformation strategy? Learn more about the Digital Transformation Week event taking place in Amsterdam on 23-24 November 2021 and discover key strategies for making your digital efforts a success.

The post GTC 2021: Nvidia debuts accelerated computing libraries, partners with Google, IBM, and others to speed up quantum research appeared first on AI News.

Paravision boosts its computer vision and facial recognition capabilities

Ryan Daws — Wed, 29 Sep 2021 13:06:14 +0000

US-based Paravision has announced updates to boost its computer vision and facial recognition capabilities across mobile, on-premise, edge, and cloud deployments.

“From cloud to edge, Paravision’s goal is to help our partners develop and deploy transformative solutions around face recognition and computer vision,” said Joey Pritikin, Chief Product Officer at Paravision.

“With these sweeping updates to our product family, and with what has become possible in terms of accuracy, speed, usability and portability, we see a remarkable opportunity to unite disparate applications with a coherent sense of identity that bridges physical spaces and cyberspace.”

A new Scaled Vector Search (SVS) capability acts as a search engine to provide accurate, rapid, and stable face matching on large databases that may contain tens of millions of identities. Paravision claims the SVS engine supports hundreds of transactions per second with extremely low latencies.

Another scaling solution called Streaming Container 5 enables the processing of video at over 250 frames per second from any number of streams. The solution features advanced face tracking to ensure that identities remain accurate even in busy environments.

With more enterprises than ever looking to the latency-busting and privacy-enhancing benefits of edge computing, Paravision has partnered with Teknique to co-create a series of hardware and software reference designs that enable the rapid development of face recognition and computer vision capabilities at the edge.

Teknique is a leader in the development of hardware based on designs from California-based fabless semiconductor company Ambarella.

Paravision’s Face SDK has been enhanced for smart cameras powered by Ambarella CVflow chipsets. The update enables facial recognition on CVflow-powered cameras to achieve up to 40 frames per second full pipeline performance.

A new Liveness and Anti-spoofing SDK also adds new safeguards for Ambarella-powered facial recognition solutions. The toolkit uses Ambarella’s visible light, near-infrared, and depth-sensing capabilities to determine whether the camera is seeing a live subject or whether it’s being tricked by recorded footage or a dummy image.

On the mobile side, Paravision has released its Face SDK for Android. The SDK includes face detection, landmarks, quality assessment, template creation, and 1-to-1 or 1-to-many matching. Reference applications are included which include UI/UX recommendations and tools.

Last but certainly not least, Paravision has announced the availability of its first person-level computer vision SDK. The new SDK is designed to go “beyond face recognition” to detect the presence and position of individuals and unlock new use cases.

Provided examples of real-world applications for the computer vision SDK include occupancy analysis, the ability to spot tailgating, as well as custom intention or subject attributes.

“With Person Detection, users could determine whether employees are allowed access to a specific area, are wearing a mask or hard hat, or appear to be in distress,” the company explains. “It can also enable useful business insights such as metrics about queue times, customer throughput or to detect traveller bottlenecks.”

With these exhaustive updates, Paravision is securing its place as one of the most exciting companies in the AI space.

Paravision is ranked the US leader across several of NIST’s Face Recognition Vendor Test evaluations including 1:1 verification, 1:N identification, performance for paperless travel, and performance with face masks.

(Photo by Daniil Kuželev on Unsplash)

Find out more about Digital Transformation Week North America, taking place on 9-10 November 2021, a virtual event and conference exploring advanced DTX strategies for a ‘digital everything’ world.

The post Paravision boosts its computer vision and facial recognition capabilities appeared first on AI News.