The drama in trying to convert election PDFs to Spreadsheets
My phone pinged: “Tell Mark Essien they are coming for him.”
I had no idea what that even meant. It was a forwarded WhatsApp message that someone had sent to a friend of mine.
All I was doing was trying to convert PDFs from the 2023 Nigeria Presidential elections into spreadsheets.
But let’s rewind a bit and give some context.
In 2020 a small group of about 5 young people went to a crossroads in Lagos, Nigeria to protest against police brutality. Their protest was called #EndSARS, and they were protesting police brutality. SARS (The Special Anti-Robbery Squad) was a notorious segment of the Nigeria police force, known for extortion and extra-judicial killings of anybody they termed armed robbers.
For 4 days that tiny group protested, with barely anybody noticing.
On the last day, as they were about to leave, someone tweeted about their protest. Then someone else retweeted, and someone else. And soon, there thousands of retweets, with social media hailing them as heroes.
A few people who saw the protest on twitter joined them. It was still a small crowd, but it was growing. The news started spreading on social media, and the crowd grew. Within a couple of days, there were thousands of people protesting with the #EndSARS hashtag. The government was silent, believing it would blow away shortly.
But the protests grew and grew. Young people in other cities started joining. Nigerian social media was aflame - this was the only topic. Till the protests became violent. The protesters started attacking police stations. The policemen started shooting protesters.
The protesters marched to notorious SARS prisons and forced them open. Prisoners escaped. Malls were burnt down. Policemen threw off their uniforms and Government officials fled.
As the violence flared, social media now started calling for calm. But it was late - a lot of hoodlums had already infiltrated the protests, and they were bent on getting revenge for everything they blamed the Government for.
The Government tried to stop the protests, but there was nobody to talk to. There was no head of protest, no figurehead, nobody with any real authority. They called it a “headless mob”.
Social media started calling for calm, and asking the president to speak. After many days of silence, he finally spoke, urging for calm, disbanding SARS and promising to address the concerns of the protesters.
Things calmed down. But there was still a huge crowd gathered at the Lekki Tollgate - a major, 6 lane highway that passes through the heart of Lagos. Concerts were playing at the location, people were serving food and drinks.
Then ominously, on the 20th of October of 2020 some people drove there in unmarked cars and removed all the Cameras installed at the tollgate.
That night, as the DJs played, all the floodlights at the tollgate switched off. Social media users warned about military vehicles driving along the highway towards the location.
Short clips from mobile phones capture what happened next. It’s dark, people are running, gunshots everywhere. People are gathered around bleeding people, screaming. More running, lights flashing.
There are no cameras, so there is no clear reconstruction.
But the next morning, the army is in control of the tollgate, and the protesters are gone. Social media is filled with messages that many people were killed and the army took away their bodies and cleared the scene. The army and government said nothing like that happened.
An uneasy calm starts.
There is a lot of anger, but the protests are over. Everything seems to be back to normal. But there is an under-current of youth anger.
And the presidential election is in 2023.
Nigeria has always been a two party state - much like the U.S. It uses pretty much the same presidential system as the U.S, and just like the Democrats and Republicans, there have been variants of the same two parties facing each other. Since 2011, it has been the PDP (former ruling party) against the APC (current ruling party).
In 2022, one of the candidates abruptly left the PDP, claiming that the process was rigged against him winning the nomination. He joined a small party that had only ever won one state governorship election in its entire history.
Suddenly, the social media mob adopted this candidate - Peter Obi - as their preferred candidate. Many of the same handles who had called out #EndSARS now became #Obidient. Tweets, WhatsApp messages grew, calling on others to join the #Obidient movement.
Peter Obi himself seemed as surprised as everyone else that the headless mob had adopted him, but he quickly accepted the Obidients, and started using the slogans as part of his campaign.
The political establishment mocked the movement, saying the “four people tweeting in a room” had never won even a council election in Nigeria.
The movement grew louder and louder on social media, with the members of the movement letting out their anger at the existing political institution. The politicians mocked back - saying that politics was not won on social media, but at the grassroots - referring to the villagers who normally vote for whoever gives them the most money.
Election Day came. The Labour Party candidate was Peter Obi, the candidate adopted by the young people on social media. The candidate from the ruling party was Bola Tinubu, former Governor of the most populous state (Lagos), and who has been involved in selecting every Governor to the state since he was Governor. He was also rumored to be getting a cut of all Lagos State tax revenue via a consulting firm he set up when he was Governor. The candidate from the former ruling party was Atiku Abubakar, a former vice president, a very wealthy man. His wealth came from his holdings in the ports, which he helped privatize when he was vice president.
For the first time, Nigeria had adopted electronic transmission of voting from the polling units. That meant that the polling unit voting sheets could be viewed in real time on a website.
As the results started coming in, the country was electrified. Tens of thousands of votes were coming in for Peter Obi. Then hundreds of thousands of votes started coming in. Then a million votes came in and another million.
The senatorial election results started coming in, and the Labour Party was winning seats. An unknown party with no previous Governor or Senate seat was sweeping seats across wide swathes of the country.
But there was no way of knowing who won. Even though the Nigerian Electoral Body had agreed to electronic transmission of polling unit results, the senators and political establishment had blocked the electronic transfer of actual votes. They wanted it still written on paper, for reasons we can guess at.
With 170,000 thousand polling units, it was going to take a long time to count the results.
But across social media, videos and photos were spreading of the Labour Party winning polling unit after polling unit, people yelling in happiness as the party won. Even in the rural areas, where the politicians had claimed they had absolute control, the party was winning unit after unit.
But other disturbing videos were also coming out - of polling units destroyed, and people threatening to hurt anyone who voted for the Labour party.
Social Media waited for the election commission to announce the results. People tried to tally the results online, but with 170,000 PDFs in a completely unstructured format, it was close to impossible.
Then INEC started announcing the state results, over a multiple day counting marathon. At the end, the result was announced:
- The Obidients Labour Party: 6.1 million votes
- The former ruling PDP Party: 6.9 million votes
- The ruling party: 8.7million votes
The Obidients had won in the capital city and had defeated the ruling party and their candidate in Lagos - the state they had ruled for the last 16 years. But according to the official tally, they had lost in the country.
But even as the results were being announced, people on social media were discovering that if they manually added all the results from the servers of the electoral umpire together, the total did not add up to the votes announced. They showed examples from Rivers State, where just a few polling units added together exceeded the votes announced by the electoral umpire. People cried foul.
The amended electoral rules required the parties to challenge election results within 21 days after they were announced. But to file, evidence was needed. The only evidence about how the election really went was spread out across 170,000 photographs of polling unit results. There was no digital version of the results, just photos.
The clock was ticking, and immediately various Obidient groups sprung up to figure out how to extract the data out of the photos. I was in one of the groups, and we decided to try various things.
The first thing was OCR. The photos were all snapped in many different ways, with hardly any structure. Each party result had a number beside it. The photo angles were different. Many sheets were blurry or had camera flash on them.
All the open source OCR software gave bad results. The best result came from Amazon Rekognition, but it was still not good enough - it would occasionally change the scores, and that was simply not going to work.
After experimenting with OCR for about a day, we gave up. We had about 8 days left to go.
We had a brainstorming meeting, and decided to try a new approach. We would simply ask the Obidients to help us do the conversion. If hundreds of Obidients did the transcription, it would go fast.
So we quickly designed out a website:
And then started coding. The frontend was in React, and the backend in PHP Laravel.
A couple of days later, the app was done. A simple site that showed a picture of a polling unit sheet and asked people to enter the votes they saw in some textfields. Then the values would be saved in the database.
I tweeted out the link.
Within minutes, I was getting replies from the Obidients. People were jumping on the site, and the first results started going up. The progress bar started moving, slowly at first, and then faster and faster. We went from transcribing one result every minute to transcribing 1 every 10 seconds, and soon we were transcribing at 1 sheet per second.
Traffic grew on the site, and soon our completely unoptimized backend was struggling to catch up. But we were moving forward. We quickly grew to be transcribing 20,000 sheets per day.
The site had a results page that was updating the results live as the sheets were transcribing. The results were growing, and by the time we had counted 6 million votes, the Obidients were clearly in the lead. We kept transcribing, and by the time we had reached 50% of the count - 10 million votes - Peter Obi was strongly in the lead.
I tweeted that out, and all hell broke loose. Suddenly, out of nowhere thousands of spam twitter accounts came out of nowhere, attacking the effort. Threats, attacks of all kinds hit us.
Then the bots came. Thousands of entries started on the site - all entering huge numbers for the ruling party. Different IP addresses, different proxies, all entering fake numbers.
When we started the project, we had a plan. We would first transcribe all 170,000 polling unit results, then we would do a second pass of all the results again as a validation step. If the same numbers were entered twice, then it was likely that the entered numbers were correct. But with the bots entering fake numbers, we now had a new battle to fight.
We immediately enabled captcha, and that slowed down the bots a bit. Then we quickly implemented a check to see if anybody would enter weirdly large numbers, we would ignore their entries moving forward. Then we started showing some results we knew to the bots - if they entered wrong numbers, we would stop accepting the results.
Whoever was behind the bots kept adapting and counteracting what we were doing. They went from using a script to using what felt like hundreds of humans. They went from entering absurdly large numbers to entering plausible numbers. They started entering some correct and some wrong numbers.
But the combination of techniques and the large number of Obidients also working meant that we were also getting a huge number of correct entries.
It seemed the counter-parties realized that they would not win on technology, so they started a new campaign. Hundreds of accounts that claimed to be Obidients suddenly popped up, saying that we were working for the ruling party, and that the work we were doing was designed to prove that the Obidients had lost the election. Every single post I made would have tens of those accounts replying.
And they doubled down - they created a fake screenshot showing that the ruling party had won on our site, and started spreading that. A current Government minister even tweeted this out, and a badly written article hurriedly appeared on a major local newspaper.
Their technique worked. The Obidients started doubting. Prominent Obidient accounts started threads questioning if we were working for the Government.
Meanwhile, we had completed transcribing 150,000 polling unit sheets, and we needed to move to the validation phase where we would correct all the damage the bots had done.
But the damage had been done. The crowd we had pulled to do the work did not trust us anymore. We did not have anyone anymore to help us validate the entries. Work started, but moved very slowly - less than 2000 entries validated in a day. It would take us 3 months to finish at this rate.
Was our project going to fail? If we did not announce results, the Obidients would be sure that we were working for the Government. And the Government already saw us as enemies. How could we solve this?
I started looking through the data. We had 800,000 submissions in our database for 170,000 polling units. I was randomly sampling, and I noticed that a huge number of entries were correct. And strangely enough, the ones with the wrong entries had lots and lots of wrong entries. While the vast majority, which only had one entry, were mostly correct.
I tested around a bit, and then I realized. We had a bug in the code. When you opened the website for the first time, it was buggy - it was not returning a totally random entry the way it was supposed to. It had a huge tendency to return from a small set of entries. But subsequent ones were now random.
That meant that every time the website was fully refreshed, the first polling unit entered was one that probably already had hundreds of entries. But if you kept working, you would now be working on new units.
And I realized what could have happened - whoever was controlling the bots must have told them to refresh after every entry. Probably they figured out that it would make it harder for us to detect. But then they ended up entering all the wrong values mostly for a small set of units that we could easily clean up. Up to 90% of our data was perfectly clean.
We quickly stopped all entries, and start crunching the results. A few hours of heavy server lifting, and we were done. We had transcribed 170,000 polling unit sheets to CSV format in 5 days with a large group of volunteers.
We shared the results with the team for them to submit as part of their evidence. And we were done.
We published the results in spreadsheets here. https://drive.google.com/drive/folders/173oHgms6wYy5WKz_i3Lhl5mXcmobCWHz?usp=sharing.
Peter Obi filed his petition on the last day. And I was happy to see some familiar things in the filing:
Now we wait to see how the tribunal will decide.
Email me on firstname.lastname@example.org or follow on twitter: twitter.com/markessien.