Fixing GNU bash associative array insert speed

Bash uses linear search to insert values in to associative arrays. This is all well and good for small numbers of keys. I was adding millions1. I went poking around the bash source code today (2020-04-18) to confirm my suspicion and gauge the difficulty of adding an option to do something more sensible.

In less than a day after I reported it, there is a patch My timing code and pre and post patch timings are here:

Here the steps I took and where I might go if I get serious about fixing the problem:

1 Get the source code

1.1 Find it

find the homepage
A quick bit of googling lead to the homepage
use git
For a minute it looked like GNU was still stuck in the bad old days of having to download a tarball and then apply a series of patches, but fortunately, it there is a git repo

1.2 Download it

git clone

1.3 Build it

Bash follows a time honored build convention


1.4 Analyze it

/* Create an entry for STRING, in TABLE.  If the entry already
   exists, then return it (unless the HASH_NOSRCH flag is set). */
hash_insert (string, table, flags)
     char *string;
     HASH_TABLE *table;
     int flags;
  int bucket;
  unsigned int hv;

  if (table == 0)
    table = hash_create (0);

  item = (flags & HASH_NOSRCH) ? (BUCKET_CONTENTS *)NULL
                               : hash_search (string, table, 0);
/* Return a pointer to the hashed item.  If the HASH_CREATE flag is passed,
   create a new hash table entry for STRING, otherwise return NULL. */
hash_search (string, table, flags)
     const char *string;
     HASH_TABLE *table;
     int flags;
  int bucket;
  unsigned int hv;

  if (table == 0 || ((flags & HASH_CREATE) == 0 && HASH_ENTRIES (table) == 0))

  bucket = HASH_BUCKET (string, table, hv);

  for (list = table->bucket_array ? table->bucket_array[bucket] : 0; list; list = list->next)
      /* This is the comparison function */
      if (hv == list->khash && STREQ (list->key, string))
          return (list);

2 Next steps

2.1 DONE Reach out to the maintainers

see if they would even entertain the idea of a patch

2.2 CANCELED Look for appropriate in-memory hash insert/lookup functions

2.3 CANCELED Code it

2.4 CANCELED test it

2.5 CANCELED submit patch


  1. yes, there are many better tools for this job, but not in the constrained environment where this had to run. ↩︎