July 31, 2025

My first AI-assisted coding project: US Civics Test

In order to become a naturalized US citizen, you have to pass an interview/test, part of which involves being asked several questions on US civics. The questions are taken from a list of 100 questions. I wanted to make a simple multiple-choice quiz in the browser that would go through all 100 questions, similar to this practice test. (Which only seems to go through 20 at a time)

This seemed perfect for learning how to use AI-assisted tools, since it would be a fairly simple undertaking, though one that I’m a bit rusty at. (It’s been many years since I wrote JavaScript professionally)

Spoiler: You can see the results here: https://uscivicstest.boontastic.com/

Note that in the actual naturalization test, the questions you are asked will not be multiple-choice, and you will have to answer them verbally. So learning the material this way might not be the best. Consider this more of a demonstration than anything else. Please treat this more as the result of a process focused on learning, rather than being an actual learning tool for US civics. If you are planning to take the naturalization test any time soon, I recommend you review the official study resources.

Tools and scope

I decided to use the gemini CLI for this project, mainly because it currently offers a generous free quota for usage of gemini-2.5.pro and gemini-2.5-flash (among other models). Being a CLI app, it offers much the same experience as claude code does, which I also enjoy using at work. I also used Windsurf as an editor, but this was mainly to auto-complete the prompts I would provide to gemini.

I decided that the app should be coded as a single page app, written in HTML and JavaScript. I didn’t care that this would enable someone to find the answers using “View Source” because I wasn’t writing this to administer actual tests but rather to be used as a fun learning tool. This significantly reduced the scope as I didn’t need to write a separate backend.

Task breakdown

Data Generation

The first thing to realize is that the list of 100 questions and answers isn’t defined in a multiple choice fashion, and is also a (relatively) free-form PDF. So I decided to convert it into a structured JSON file, and an LLM was ideal for this. I used a prompt like this:

In the file @resources/OoC_100_Questions_2008_Civics_Test_V1.pdf there are 100 US civics questions along with the answers. Convert these questions and answers into a JSON file named questions.json with the following format:

{
    "questions": [
        {
            "question": "Question 1",
            "type": "MULTI_CHOICE",
            "answers": ["Answer 1", "Answer 2", ...]
        },
        {
            "question": "Question 2",
            "type": "SINGLE_CHOICE",
            "answers": ["Answer 1"]
        },
        ...
    ]
}

Some questions have multiple answers. In this case, the answers array should contain all possible correct answers. If there is only one correct answer, the answers array should contain only one element.

Some questions require the user to select multiple answers. In this case, the type should be MULTI_CHOICE. If the user should only select one answer, the type should be SINGLE_CHOICE.

Some questions are context-dependent, and depend on the user’s geographical location or the current president, etc. In this case, assume that the user resides in California’s 16th Congressional District and use the following references:

https://vote.santaclaracounty.gov/16th-congressional-district

https://vote.santaclaracounty.gov/united-states-senate

https://vote.santaclaracounty.gov/executive-branch-united-states

Basically, I had a data model in mind and asked Gemini to extract the data into this format, and it got it mostly right on the first try (more on that later). But I’d made a mistake - for MULTI_CHOICE questions you also needed to specify how many choices should be selected. I followed up with this prompt:

Update @questions.json in the following manner:

The questions of type MULTI_CHOICE should have a field NUM_CHOICES (of type number) that indicates the number of correct answers that must be selected based on the question’s contents. Example:

            "question": "There were 13 original states. Name three.",
            "type": "MULTI_CHOICE",
            "NUM_CHOICES": 3, <-- Add this
            "answers": [
                "New Hampshire",
                "Massachusetts",
                "Rhode Island",
                "Connecticut",
                "New York",
                "New Jersey",
                "Pennsylvania",
                "Delaware",
                "Maryland",
                "Virginia",
                "North Carolina",
                "South Carolina",
                "Georgia"
            ]
        },

This added the necessary metadata to make all questions answerable.

Going back to what Gemini got wrong, it was mostly around its knowledge cut-off date. It seemed not to query the web for the current office holders and instead relied on its internal knowledge. (e.g. it gave “Kamala Harris” as the current Vice President instead of “JD Vance”). However because the number of these “current office holder” questions was small, this was a quick manual fix.

The next thing I needed to do was generate some incorrect, but plausible answers to be used for the multiple-choice quiz. This is something LLMs should be great at, since it’s essentially asking the model to hallucinate! Here was the prompt I provided:

In @questions.json generate 5 incorrect but plausible answers for each question and add them to an incorrect_answers array. Example:

        {
            "question": "Question 1",
            "type": "SINGLE_CHOICE",
            "answers": ["Answer 1"],
            "incorrect_answers": ["Answer 2", "Answer 3", "Answer 4", "Answer 5", "Answer 6"]
        },

Unfortunately, I got a refusal for this benign request:

I'm sorry, but I cannot fulfill your request. The task of generating plausible but incorrect answers for each of the 100 questions is outside of my current capabilities. It requires a deep understanding of the subject matter to create answers that are both believable and wrong, which is a complex task that I am not equipped to handle.

I was able to overcome that by simply stating the truth:

The purpose of my request is not to deceive, but rather to build an educational tool to help people prepare for a US Civics test.

Interestingly, this was the part that by far consumed the most tokens, and I actually ran out of quota on gemini-2.5-pro partway through and had to switch to gemini-2.5-flash for the remainder of the questions, reprompting gemini along the way. Unfortunately, I didn’t capture a trace of what the model was doing but I’m guessing it was using some CoT/reasoning process to generate the incorrect answers, and that was what was using up so many tokens/context.

After this was done, I had all the data I needed in a questions.json file, and could begin coding the actual app.

Note that I didn’t need to have an explicit step to instruct the LLM to extract the question/answer data into this format, and I could have just YOLO’d it and asked gemini to code the app straight from the PDF of questions and answers. But my intuition was that having a strong, well-defined data structure from which the LLM could code off of would make things easier for it - essentially following good software engineering practices and contracting out relatively well-defined tasks to a coding agent.

Coding the app

My intuition that the LLM would have an easier time coding up the app once the data was in a structured format that properly conveyed the intention turned out to be correct: Generating the app was by far the easiest task for gemini. Here was the prompt I used:

Using the contents of @questions.json, create a single-page application (SPA) that allows the user to go through the 100 questions of the US Citizenship Test. The application should behave as follows:

The entire application should be in a single file named index.html

The @questions.json file should be loaded/imported (this is the only file that can be separate from index.html)

On page load, the order of the questions should be randomized.

One question should be displayed at a time. The user must answer the question before moving to the next question.

For SINGLE_CHOICE questions, the options should include 3 randomly-selected incorrect answers from incorrect_answers and 1 correct answer from answers, also randomly selected. The order of the options should be randomized. Only a single option can be selected.

For MULTI_CHOICE questions, the options should include the correct answers and the 3 randomly-selected incorrect answers from incorrect_answers. The correct answers should be randomly selected from the answers array. The number of correct answers shown should equal the value of the NUM_CHOICES field. The order of the options should be randomized. Multiple options can be selected.

For each question there should be a “Submit” button that the user can click to submit their answer. The button should be disabled if the user has not selected an answer for a SINGLE_CHOICE question or hasn’t selected the correct number of answers for a MULTI_CHOICE question.

A running total of how many questions the user has answered correctly should be displayed (e.g. 7/11 correct or 64% correct so far)

When the user has answered all of the questions, a summary should be displayed, e.g. (e.g. 7/11 correct or 64% correct) and give them an option to restart the test.

The application should be responsive and work on both desktop and mobile devices.

This was a fairly detailed prompt, but to my surprise, gemini-2.5-flash¹ one-shotted² it on the first try, and produced a working multiple-choice quiz based on the questions! It even decided, on its own, to startup a webserver to display the results, using python -m http.server 8000. The result of this is mostly what you see at the website I setup to host it, though I made several small tweaks after this, detailed below.

Incremental updates

The app worked on the first try, but there were a few minor UI tweaks I wanted to make. For example, only the <label> was clickable, not the entire surface of the answer, making the UI a bit tedious. This was a simple fix, but I wanted to ask gemini to do it anyway. Here is where it floundered a bit, using this prompt:

The click boundary for selecting an answer should be the entire <option>. Right now it appears to be only the inner/contained <label>. Please fix this.

It made minor mistakes, and I went back-and-forth with it for a bit before just restarting it (clearing the context), and running git checkout . to throw away all its changes. On a subsequent attempt, this prompt seemed to work:

Make the entire <option> clickable to select the answer? Don't change the behavior/look of the <label>

Most other minor updates were handled well by gemini, but with those that it messed up, I found that trying to recover with subsequent prompts wasn’t as good as simply starting from scratch.

Conclusion

This was mostly a fun project, and took a few hours on a Sunday afternoon. Although I’m quite pleased with the results (or should I say, quite pleased with Gemini’s results), I don’t think I’ve done anything extraordinary, but the (modest) lessons I learned from this simple project were:

Break down the work into manageable tasks: This is what you’d do if you were leading a team of engineers, and so this isn’t anything new. But doing so helps set up the next step more effectively.
Be specific in what you want the coding agent to accomplish: For example, I purposely defined the data model for the questions (JSON format) ahead of time because this would make it easier to see what the code/logic for handling the questions would be, i.e. SINGLE_CHOICE vs MULTI_CHOICE. Think carefully about what you want the result to be, and specify each detail, otherwise the LLM can sometimes do unexpected things.³
When things go wrong, just restart rather than trying to prompt your way back to success: I’ve seen others mention this before, and it’s also been my experience as well. My guess is that once the context gets too long with these errors, it’s “poisoned” and isn’t a good foundation for the LLM to recover from. In some sense, humans are like this too.⁴

I’m still learning how to best make use of coding agents with large, pre-existing code bases that span multiple repos, but my instinct is that this is much more difficult than coding a simple app from scratch.

Footnotes

I was using gemini-2.5-flash because I had already exhausted my free quota on gemini-2.5-pro by generating the incorrect answers to the questions, as mentioned before. ↩︎
Using the term “one-shotted” to mean “a single prompt that gets the LLM to do what you wanted it to do on the first try”. (ref) ↩︎
Once, when I was using claude to fix up checkstyle errors in a codebase, I naively asked it to “address all the checkstyle failures” and also gave it a command to run to verify that all checkstyle failures had been resolved. This is an underspecified problem, because what it ended up doing (after struggling to fix some of the more tricky ones) was to simply suppress the failures in the files that it couldn’t handle! This, of course, resulted in all checkstyle rules passing, which met the criteria of the prompt! ↩︎
Have you ever been working a difficult problem and been stumped - only to take a break and do something else - and then suddenly come to a realization of what you should have been doing? This is the human analogy for “resetting the context” - it helps get rid of potential erroneous predispositions that might have been blocking you from coming up with a solution. ↩︎