Using Cognitect’s aws-api library to work with AWS is nice and easy (and simple). This post shows how to use it with AWS Transcribe for speech-to-text.

aws-api is an idiomatic, data-oriented Clojure library for invoking AWS APIs.


Previously I’d used Amazonica for working with general AWS services, or more focused libraries like Faraday for working with DynamoDB. Amazonica takes a conceptually similar approach to aws-api in that its code is (mostly) generated by reflecting on AWS-published API specs. Faraday wraps Dynamo’s API in an idiomatic Clojure API, with some other niceties e.g. serializing Clojure data to/from Dynamo JSON.

Compared to these projects, aws-api’s approach has some key similarities and differences:

  • The majority of Amazonica and aws-api’s API surface is derived from AWS API definitions.
  • Amazonica and Faraday depend directly on AWS SDK libraries, but aws-api takes a “clean room” approach by calling AWS API endpoints directly — no AWS SDK dependencies.
  • Faraday provides a very Clojure-y wrapper for AWS Dynamo SDK. Amazonica provides generated (and bespoke) wrappers for AWS APIs that look/feel like idiomatic Clojure. aws-api takes a very data-centric approach, providing one function to invoke any AWS API operation.

I like that aws-api requires no transitive dependencies on AWS SDK. (No more transitive Jackson conflicts!) It has relatively few dependencies overall.

It’s extremely data-centric and doesn’t attempt to obscure the nature of data in AWS API requests/responses behind a Clojure-y façade. Considering it as a fairly mechanical translation of AWS API specs, it’s very much data-as-code.

The :TitleCase keywords were a little jarring at first but I came to appreciate only needing to reference the AWS API docs, as opposed to understanding both the AWS API and a bespoke Clojure wrapper. For example, I’d often need to reference Faraday’s source to see how its custom data shapes mapped to their Dynamo request/response counterparts, because they were often different for the sake of idiomacy (is this a word 🤷‍♂️).

Because it’s so data-centric, it’s trivial to enumerate all types of operations you can perform against each AWS service, along with their inputs and outputs. This makes it a very REPL-friendly tool.

API Interactions

First, we need to authenticate ourselves. The default provider covers most use cases:

(require '[ :as creds])
(def credentials

If you’ve used AWS API you’re probably familiar with the concept of the default credentials provider chain, and maybe (or unfortunately) its order of precedence. Something I found very useful was the ability to create a custom credential provider chain (which I’d done with much less elegance before aws-api). Here I defined a single, chained provider that works locally and when deployed to EC2 or ECS:

(def credentials
     (creds/profile-credentials-provider "my-custom-profile")]))

The next thing we’ll need is a client to our AWS API, in this case Transcribe:

(require '[ :as aws])
(def transcribe
  (aws/client {:api :transcribe
               :region :us-east-1
               :credentials-provider credentials}))

Because aws-api is so data-centric, there’s only one function for interacting with the API: invoke. Everything else is data. If you want to see all operations for an API, you can use the ops function.

(aws/ops transcribe)
 :DeleteVocabulary {:name "DeleteVocabulary",
                    :documentation "<p>Deletes a vocabulary from Amazon Transcribe. </p>",
                    :request {:VocabularyName string},
                    :required [:VocabularyName],
                    :response nil}

You can then pick/filter those results and get detailed documentation for each with the doc function. This tells us what the operation does, what inputs it requires, and what we can expect in the response. We can print docs for each of these operations in the REPL:

(aws/doc transcribe :StartTranscriptionJob)

<p>Starts an asynchronous job to transcribe speech to text. </p>


{:TranscriptionJobName string,
 :LanguageCode string,
 :MediaSampleRateHertz integer,
 :MediaFormat string,
 :Media {:MediaFileUri string},
 :OutputBucketName string,
 {:VocabularyName string,
  :ShowSpeakerLabels boolean,
  :MaxSpeakerLabels integer,
  :ChannelIdentification boolean}}


[:TranscriptionJobName :LanguageCode :MediaFormat :Media]


 {:MediaFormat string,
  :CompletionTime timestamp,
  :MediaSampleRateHertz integer,
  :TranscriptionJobName string,
  {:VocabularyName string,
   :ShowSpeakerLabels boolean,
   :MaxSpeakerLabels integer,
   :ChannelIdentification boolean},
  :CreationTime timestamp,
  :Transcript {:TranscriptFileUri string},
  :TranscriptionJobStatus string,
  :FailureReason string,
  :LanguageCode string,
  :Media {:MediaFileUri string}}}

Invocation is easy, because the :op and :request values are easily interpretable from the output of the ops or doc functions:

  transcribe                        ;; the client
  {:op :StartTranscriptionJob       ;; the operation
   :request                         ;; the operation input
   {:TranscriptionJobName job-name
    :LanguageCode         "en-US"
    :MediaFormat          "wav"
    :Media                {:MediaFileUri s3-path}}})

Doing Things

I needed to transcribe some speech to text, and of course AWS has a service for that called Transcribe. It’s not very fast but it’s cheap and fairly easy to use. You can put your audio in a S3 bucket and request a transcription. Unfortunately you have to poll (or use SNS + Lambda?) for the resulting transcription, which will also go into a S3 bucket.

We can create a transcription job:

(defn start-transcribe [s3-path job-name]
    {:op :StartTranscriptionJob
     {:TranscriptionJobName job-name
      :LanguageCode         "en-US"
      :MediaFormat          "wav"
      :Media                {:MediaFileUri s3-path}
      ;; NOTE output bucket is optional
      ;; AWS otherwise puts the transcript in a random bucket
      :OutputBucketName     "my-unique-bucket"}}))

{:TranscriptionJob {:MediaFormat "wav",
                    :TranscriptionJobName "my-random-job-name",
                    :CreationTime #inst"2019-02-25T12:51:45.000-00:00",
                    :TranscriptionJobStatus "IN_PROGRESS",
                    :LanguageCode "en-US",
                    :Media {:MediaFileUri ""}}}

But this only tells us the job is in progress, and we must wait for it to complete. I define another simple helper function to get the details of a job:

(defn get-transcribe-job [job-name]
  (aws/invoke transcribe
              {:op      :GetTranscriptionJob
               :request {:TranscriptionJobName job-name}}))

(get-transcribe-job "my-random-job-name")
{:TranscriptionJob {:MediaFormat "wav",
                    :MediaSampleRateHertz 8000,
                    :TranscriptionJobName "my-random-job-name",
                    :Settings {:ChannelIdentification false},
                    :CreationTime #inst"2019-02-25T13:08:06.000-00:00",
                    :Transcript {},
                    :TranscriptionJobStatus "IN_PROGRESS",
                    :LanguageCode "en-US",
                    :Media {:MediaFileUri ""}}}

We can see it’s still in progress, but when it’s finished it will contain a path to the transcript JSON in S3. I use a simple function to read the JSON from S3:

(defn read-file [bucket key]
  (-> (aws/invoke s3 {:op :GetObject, :request {:Bucket bucket :Key key}})

And finally, a convenience function to poll for the results of a given job:

(defn await-transcription [job-name]
  (loop [job (get-transcribe-job job-name)]
    (case (get-in job [:TranscriptionJob :TranscriptionJobStatus])
      (do (Thread/sleep 5000) ;; I have no shame
          (recur (get-transcribe-job job-name)))

      (-> (read-file "my-unique-bucket" (str job-name ".json"))
          (json/read-str) ;;
          (get-in ["results" "transcripts" 0 "transcript"]))

      (throw (ex-info "Transcribe failed" job)))))

(await-transcription "my-random-job-name") ;; spins for ~1 minute
=> "Bravo, November Papa Romeo, Oscar."    ;; sample.wav contained an audio captcha

I was impressed with how easy it was to interact with an AWS API I’d never used before via aws-api. Everything I needed was right there in the REPL with a handful of functions: client, ops, and invoke. The immediacy and transparency of working with AWS via aws-api makes for a great Clojure experience.