Putting AI to the (Accessibility) Test

A robot sitting on a bench with his head down, looking at a book, trying to decipher the words

Generative AI. It seems to be all the rage through 2024. It seems like all large companies have one: Meta and their smart glasses, Apple and its new Apple Intelligence, Microsoft and Copilot, and even Freedom Scientific and FS Companion. This is leaving out a lot; the OG (Chat GPT), Gemini, and Claude, to name but a few. It clearly has its uses; it helps summarize emails, write code, and much more.

One of the interesting arguments in the generative AI field is around accessibility. There are many smart individuals out there who have tackled the question of AI and designing accessible content.

What about AI’s capability to help with testing for accessibility? Over the past few months, I’ve been incorporating generative AI into my workflow as a blind Accessibility Engineer here at TPGi and there are three vital ways it has provided me great aid. In some ways, it has reduced the need for a sighted tester. In others, it has allowed me to ask more pointed questions to a sighted tester to get after the information to perform the task, whether it’s an audit or remediation validation. Without further ado, let’s find out how AI can help you as an accessibility tester.

But First, a Couple of Warnings

As accessibility testers, we’re given access to many different environments from our clients. Only some of these are in production. We must think about the client’s privacy before engaging generative AI to aid in testing. In essence, we have to ask the question “will the client mind if someone else sees this?”.

If you, as the tester, believe that you are compromising client confidentiality in any way, do not engage the services of generative AI. Client privacy and the right to control whether their information can be used in model training is, of course, just as important to them as it is to the rest of us. Please don’t neglect privacy for ease.

Speaking of privacy, no actual client data or information was used in the creation of this article; the examples provided are based on real conversations but are hypothetical and created for this article.

A potential trap of AI is the propensity to “hallucinate” or provide inaccurate information, especially when pictures are involved. Always be mindful that the information received from the AI model may be incorrect. A good way to combat this is to run it through AI multiple times, and if possible, run it through multiple AI models to compare. Not only will the hallucinations be caught, but one model may provide slightly different information than another model, which may reveal a piece of information previously unknown to you.

OK. On to the Good Stuff!

As I’m a blind tester, this will be written through a blindness lens. To that end, there are two main components that make this work in my favor: Jaws PictureSmart and the Open AI add-on for NVDA. Notably, I use PictureSmart almost exclusively, as the add-on for NVDA requires some initial set-up and to pay a couple of dollars to receive the tokens needed. And just to return to privacy concerns, briefly, it’s worth noting that PictureSmart is a private AI so data is not used for model training purposes.

One could also make use of the Be My Eyes application; since AI is so common, there are multiple options available. Once the technology is all set up, it’s simple to send content to the AI and get a description back. Here’s how that can help in your testing!

Analyzing Screenshots

The phrase “see screenshot attached” or some variation has more than likely come up a lot of times during your accessibility consulting. As a blind person, there are commonly two paths forward: inform the client of said blindness and ask for another alternative, or ask someone to help. With generative AI, it’s simple to use PictureSmart to do the first screening of the content. This may end the need to inform someone of your disability, as well as prevent said disability from being a barrier to completing a task. Co-workers, who are in my experience, always willing and eager to help, are often involved in their own projects and their time is valuable to their clients; this is one more step towards total job independence. Now, my need to interrupt someone else’s day, even but for a moment, is reduced.

This occurs more in remediation (the process of resolving accessibility issues)—often there are multiple screenshots per component or feature. However, it can also occur on a fresh audit that has been scoped and divided into components, especially when the client has a rapidly-changing website, or the textual steps to reproduce are unclear. Let’s say that the component we’re working on is the individual search result on a Google search page. The instructions you get are to type something in the search box and look for a business. Not the most descriptive, is it? Well, let’s also pretend you have this screenshot:

A screenshot of a menu description automatically generated
Google search result for “nearest seafood restaurant”

If we run this through JAWS PictureSmart, we get:

The image is a screenshot of a Google search results page for “nearest seafood restaurant” around Caldwell, ID. It shows a list of seafood restaurants. Three listed are: 1. O Crab in Nampa, ID, rated 3.9; 2. Hook & Reel Cajun Seafood & Bar in Meridian, ID, rated 4.3; 3. Land Ocean Restaurant Meridian in Meridian, ID, rated 4.4. There is a map to the right with locations marked. Filters like “Near me” and “Directions” are visible at the top.

We can then activate the “Ask questions about this image” link to get more detail. A good question to ask might be “describe to me in detail about ‘O crab'”. The response we receive is:

‘O Crab’ is a seafood restaurant located in Nampa, ID, with a rating of 3.9 stars from about 1,000 reviews. The description mentions that the seafood was perfect and that the server was very kind and on point. The listing includes a picture of a seafood dish featuring crab legs and other seafood items. The restaurant seems to be positioned as a moderately priced option, indicated by the “$$” symbol.

This would provide enough information for me to feel confident that there is indeed a business on this page, and I can begin to audit. With that textual information, too, it provides information to look out for, one of which being — is the “$$” symbol meaningful to assistive technologies? So, not only has it given us information about the component, but it has also given us some visual items to look out for! Speaking of looking out for items, the next thing it can help with is…

Checking Visual Information Against Programmatic Information

Now it’s time to check and make sure the visual information the AI deduced and returned to us is being conveyed to screen readers. Simply navigating to the restaurant with JAWS (screen reader) gives us:

O Crab

Rated 3.9 out of 5.

(1K) · moderately expensive · Seafood

Nampa, ID

Closed ⋅ Opens 11 AM

‘Really enjoyed the crab lobster and crawfish with potatoes and corn.’

Comparing this with the AI-provided output earlier tells me a few things:

  • There is something that conveys the moderate price in text (“The restaurant seems to be positioned as a moderately priced option, indicated by the “$$” symbol.”)
  • They’re hiding the picture of the seafood (“The listing includes a picture of a seafood dish featuring crab legs and other seafood items.”)
  • The validation of reviews: (“with a rating of 3.9 stars from about 1,000 reviews.”)

Although this information is extremely helpful, it doesn’t negate the need for manual testing; these elements, and indeed this text, raise a couple of issues I would call out—is “(1K)” descriptive? I don’t think so. I would still flag this and bring it to the client’s attention. I would, however, know that it’s referring to the user reviews, so I could include that in my details.

A few honorable mentions that AI can help with are verifying whether links are underlined, if text is emphasized, and other visual attributes. It will take more pointed questions, but this can come in extremely handy if you want to do a quick check. Simply asking “are the links underlined?” will usually net a usable answer; confirming with the JAWS key+F keystroke—the keystroke that will output information on the font and point-size at the cursor’s position—can clue you in. If the AI and JAWS don’t match in these tests, it might be worth noting to your team when you pass the component off for color contrast and other visual checks. Another benefit that AI provides, and in my personal opinion the most beneficial of all, is visual checking to verify alt attributes are correct!

Checking Alt Attributes

As a personal note, I absolutely love the ability to get image descriptions of pictures. I love using it on Facebook and other forms of social media to finally understand that funny meme. What AI can also help with is deciding if an alt attribute is satisfactory, or if there’s a need to mark an image as decorative. Let’s move away from crab and refer to TPGi’s own website for a quick demonstration!

Part of the TPGi homepage showing a stylized ARC logo and the related markup revealed in the Dev Tools panel below, highlighted for clarity
Screenshot of a page on the TPGi site with a stylized ARC logo.

On the TPGi homepage, there’s an image that’s described in the alt attribute as “ARC logo on a cloud next to various icons representing tools that are part of the ARC Platform.” It’s an example of branding or logo usage that is present on many sites, but not all provide a description of the image as we find here:

  • On some sites we might find “CompanyName Logo” in the alt text
  • It might have an empty alt
  • It could even be “9019384721329384571.PNG” or something equally horrible and meaningless (likely where the content management system has not forced the person who uploaded it to provide a description, so it tries to fall back to the only text it can associate with the image, in this case a string of gibberish characters).

No matter what it’s labeled as, let’s see what an AI-generated description thinks of it. Using PictureSmart, simply using the hotkeys (JAWS key+Space, P, Enter) on the image will send it off and we’ll get a description back:

The image features a stylized cloud with the letters ‘ARC’ inside it. Surrounding the cloud are various colorful icons: a cloud with two overlapping circles, a person with connected lines and boxes, a customer support person with a headset, data charts with upward-trending bars, a pair of books, and a graduation cap. The overall design uses bright colors and conveys themes of cloud computing, communication, education, and analytics.

If we choose to get more information, it may populate something like:

The image features a dark blue cloud shape with the letters ‘ARC’ prominently displayed in white. The letter ‘C’ is partially filled with a light blue and yellow section. Surrounding the cloud are various colorful icons:

1. A cloud with overlapping circles in yellow and magenta, suggesting collaboration or connectivity.
2. A flowchart icon with a person connected to two boxes via lines, signifying data organization or workflow.
3. A graph with bars and a line showing upward progress, indicating growth or analytics.
4. A silhouette of a person wearing headphones with a speech bubble, representing customer service or communication.
5. Two books, one magenta and one blue, symbolizing learning or knowledge.
6. A graduation cap in blue with an orange tassel, highlighting education or achievement.

The bottom of the image has a series of colorful stripes that match the tones used in the icons, creating a cohesive design theme.

Since in this case we’re comparing the existing alt to the description to see if it is valid, I’d feel confident that the items mentioned in the details are in fact tools present within the ARC platform.

In Conclusion

AI is a tool. Whether it’s getting an alternative look at something to be evaluated, more information about an image, or asking it a thousand formatting questions, it can be useful. AI is a very good assistant. It can provide you with answers, if you know what questions to ask, but it can’t do all the work for you. It is yet another tool in the toolbox to success and, I think, an important one.

Since using PictureSmart in my workflow, I’ve had to ask a great deal fewer “where can this be found?” questions. I’ve also been able to take more of an informed gander at some of the visual issues that I would normally pass off to a sighted colleague. Perhaps the most important benefit to me is that I finally get to decide which images I feel are decorative. While I trust my fellow testers, there have often been times where I’ve gently pushed back and let people I’m auditing with know that I, as a blind user, see merit in knowing what an image contains rather than it just being present, and I can now make a judgement call myself.

Circling back to AI and accessibility, it’s not the be all and end all to making the web a more accessible place. We still have lots of work to do before AI can even help with running accessibility tests. But AI does know things (or at least it wants to give you the impression that it ‘knows’), and it can answer very direct and pointed questions.

While you, the tester, are still in the driver’s seat, it can be quite helpful for someone to provide you with a bit of support. It just so happens that a computer has stepped in to help me. Did it end the need for a sighted person to help me test? No. Color contrast, text spacing, and other things that a visual eye is needed for have not yet been tackled. It has, though, provided me with a feeling of greater independence, and that is important. Even a small step forward is still a step in the right direction.

Image credit: Andrea De Santis

Like to be notified about more articles like this? Subscribe to the Knowledge Center Newsletter. It not only gives you summaries and links to our technical blog posts but also TPGi webinars, podcasts, and business blog posts – as well as accessibility and web tech conferences and other events, and a reading list of other relevant articles. You get one email a month, it’s free, requires just your email address, and we promise we won’t share that with anyone. Check the archive.
Categories: Technical

About Tj Squires

TJ Squires is an Accessibility Engineer who’s been with TPGi since 2021, making the digital world a better place, one suggested accessibility fix at a time. Residing in small-town Idaho, TJ spends his off-hours perfecting the art of cooking, piloting an imaginary spaceship, getting lost in books, and catering to the whims of his fiancée and their dog—both of whom remind him daily that he's not in charge.