Honey, I Dropped the Table

Uploaded on May 30, 2023

I’m rewriting Knife City in Rails. I’ve never built anything in Rails so while I was struggling with something today, I decided that there might be a problem with the db structure that I was messing around with, so I figured that I would drop the image table and try importing it again. So I dropped the table and.. whoops I dropped the production table.

I had been using ChatGPT to help me in learning how to use Rails, so what made sense was to search “How to undo drop table MySQL” – it told me to run mysql> show variables like 'log_bin'; so I did. Unfortunately, log_bin was disabled. Fuck. Did I just lose all of my most important data? All of the data that I’ve spent hours of my time documenting, and have been working on for the last 3+ years? Shit shit shit shit.

Well, the first thing I realized is that I did have a backup of the db from 9 days ago. I also still had the images saved somewhere on the machine – and they were named corresponding to their db values. I could look up every file in the directory on the database – if it didn’t exist in the database, that means that I found a missing file. Now all I need to do is add all those entries into the database.

A Python script was the most obvious way to do this, so I went to ChatGPT and had it generate the script for me. The following were my prompts to ChatGPT to retrieve my data.

  • How to undo drop table mysql
    • The response was basically “you’re fucked”
  • Create a python script to go through each file in directory and see if its file name including its extension exists as a value of column image_hash in table images on database [redacted]. include a place to put db options for port username, password etc.
    • This is really the bulk of the work – this got extremely close to exactly what I needed, and I just massaged the data from here.
  • can you modify this code: [output of above]
    to instead of outputing a bunch of strings, to put it in an array of dicts that includes the file name and file creation time ?
    • There’s a column for upload time that’s in y-m-d format (which needs to be refactored for the new Knife City). At first I was resigned to just setting the date of all the uploads to be today (May 29) But then I realized I could use file metadata to return an accurate date, and would have even been able to give an accurate upload time if my data wasn’t structured poorly.
    • Also – chatgpt knows what it just sent you – I didn’t need to paste the output of a previous response.
  • you forgot the trailing directories – [directory redacted] – please fix this line and only output the fixed line
    • ChatGPT wrote a bug here! It had originally crawled through a directory with my files in it using a path provided through a variable. But when ChatGPT wrote code that was supposed to check the meta data of the file to figure out the upload time, it forgot to add in the trailing path. (It looked for file.png for ex. instead of /path/to/file.png)
    • ChatGPT also failed to only output the corrected line and instead output the entire script again. Great use of computing time!
  • can you change this script so that the creation time is a year-month-day format and can you ignore any filename that doesn’t have an extension in its name?
    • Creation time was originally an int displaying the seconds since Jan 1 1970. Further optimizing to ignore bad data.
  • can you include the width and height of each image in the output of each file as well
  • im getting OSError: Unsupported BMP header type (538976296) – if we run into any exception like this, can we just set the width/height to 0,0 ?
    • Some of the files on the system are bad files which is why they don’t exist in the db – sometimes the file system started writing the file and then it stopped for some reason (bad EOF? Network issue?), so when I need to read files and assume they’re images with height/width metadata and they don’t have that data, we need to include exception handling.
    • Could be a resource issue but this output a response suuuper slow and when it finally wrote the code, and started explaining its output, it opened a tick quote and when it was supposed to close it, the app crashed on me and made me start a new session. Wonder what happened
  • instead of printing file info, can you enter a new row into the db table images
    for each row the column vals are:
    image_author = adlai
    image_hash = (object.file_name)
    image_title = "a"
    image_description_json = "blocks: [
    {
    text: "",
    key: "foo",
    type: "unstyled",
    entityRanges: [],
    },
    ],"
    image_description_text="a"
    image_alt_text="a"
    upload_datetime=(object.creation_time)
    ip: ''
    crumbs_up: 0
    crumbs_down: 0
    image_width=(object.width)
    image_height=(object.height)
    • Here chatGPT knew what the output was from the previous question I had asked it and used the attribute names from that output in order to write the sql insert statements.
    • The output here was finally what I needed – I took this code and I ssh’ed into my server and wrote it to a file with vim. I ran the file and it generated the sql rows I needed!
    • Unfortunately due to a race condition, it uploaded the files all out of order, which disrupts the narrative flow of scrolling through Knife City.
  • write a python script to read all the entries in table images greater than id 1747 in table images, and rearrange them chronologically based on upload_datetime. upload_datetime is in y-m-d format. I want everything else to stay the same
    • It didn’t understand that I wanted this done in the database, not just to have the rows output locally.
  • i want you to modify the database to represent the order that you print them in though
    • It suggested creating a new column just for this migration, which is a terrible recommendation.
  • i don’t like this – instead of doing it this way, once we get the chronological list of entries, delete all of the rows we just sorted, and re-add all the rows in the same chronological order. Also, if records share the upload_datetime, reupload them based on the original id in chronological order
    • It complied, but the race condition was still taking place? I would delete the recently added row and add them back in and they’d get added in the same order.
  • interesting,, the code looks good, but for some reason (maybe due to some race condition) the images are still entering the database in the incorrect order. is there some way to insert values into the db that would ensure their place in the db is reflective of the order theyre in in rows in the snippet?
    • Talking to it like a human, lol.
  • no, we can’t accomplish it with another column, that’s a bad, messy solution. please find another way to ensure that the rows are inserted in the correct order
    • Talking shit
    • It suggested locking the database, which is a reasonable solution, but didn’t work. Race condition still happening
  • okay, that didn’t work either.lets try this, let’s specifically set the ids on the insert commands based on the starting point of id=1748 so that the order of `rows` is preserved.
    • This output me exactly what I wanted.

The only thing left now was to make new titles and descriptions for all the images. Luckily, because the tag data is abstracted to another table and just references the image, all the tag data remained intact and I didn’t have to reupload that.

Main Takeaways

  • ChatGPT saves me so much time – way more time effective than going through docs/ Stack Overflow
  • ChatGPT is extremely good at getting really close to what you need
  • It struggles finishing the job
  • When it comes up with a way to do something, sometimes it’s a solution that would work but is a bad decision in terms of code maintainability
  • When you tell it that you want to do it something another way than the way it suggests, it struggles adapting to the change in direction.