Sometime last week, I faced a pretty interesting bug for the first time in my career. It was fascinating cause it made me dive into Django more and learnt how Django handles
request.FILES or files in
I was working on a task that needed me to get an image as a request and pass the image through several machine learning checks before finally uploading the image to Cloudinary.
One would think since the machine learning checks were already available in an API, it should be simple right? Well, so I thought but it wasn’t it.
What was the bug?
import cloudinary image = request.FILES.get('image') ImageCheck.run_checks(image) #it fails here cloudinary.uploader.upload(image)
While trying to pass the image through checks before saving it, Cloudinary kept throwing an error,
Now, this was an already existing code that was already pushing images to Cloudinary, the only thing I was to add was the checks to ensure the image wasn't fake or tampered.
So I feared I had broken the code.
The first step I took was to remove the checks, I tried running the upload again from Cloudinary and it worked? So what was the issue? I was already getting frustrated cause I was working with a deadline and I hate bugs(well who doesn't 😩).
The next step was to check the checks (😂😂). While debugging the checks, I found out that it was only working on the first check, every other check kept failing, and a similar error; no image.
So my assumption was somehow the Machine learning model was manipulating the image, so I tried uploading the image twice on Cloudinary without checks.
import cloudinary image = request.FILES.get('image') cloudinary.uploader.upload( image) #attepting to save again fails cloudinary.uploader.upload(image)
And well that failed too.
My next idea was to try saving it in on the cloud first, download it again, and then run the checks, and if it failed, I delete it from the cloud. Now, not only was this a stupid idea, it won't work cause I still won't be able to delete the image from Cloudinary using the API.
I was already losing my mind, till I had the idea of saving locally for a while, and opening it any time I needed for each check, and deleting it after upload. I achieved that using the Django default storage from settings;
import cloudinary from django.core.files.storage import default_storage image = request.FILES.get('image'), saved_image = default_storage.save(image.name, image), ImageCheck.run_checks(default_storage.open(saved_image)); cloudinary.uploader.upload(default_storage.open(saved_image));
While this solved my issue, I wasn't satisfied with it, for several reasons:
- It was defeating the purpose of trying not to save it on the local server.
- It was time-consuming cause I had to call
default_storage.open(image)every time I wanted to use the image.
- I had to delete it after each request.
- And I still didn't know what the issue was exactly.
So what exactly was the issue?
While researching more into Django, I found out that files sent from requests were either saved as
InMemoryUploadedFile, or automatically saved as
TemporaryUploadedFile objects if more than a certain size.
After speaking to friends I found that, after reading an
InMemoryUploadedFile object, the cursor is set to the end of the file, making it read as an empty file on the next read.
At last! we know the problem.
Funny enough, the solution to this annoying issue was so simple, bring the cursor back to the start of the file, so you can re-read it. This can be solved by running the
.seek(0) method on the file.
import cloudinary image = request.FILES.get('image'), ImageCheck.run_checks(image) image.seek(0) cloudinary.uploader.upload(image)
Now, this was a much better solution, cause:
- It didn't need another storage.
- And it was less time.
TemporaryUploadedFileobjects are saved on the local storage like my first solution but are automatically deleted after the request has been handled.