A researcher says the Pentagon exposed huge amounts of web-monitoring data in a security failure.
Anyone with a free Amazon Web Services account could have looked at the hoard of information stored in the cloud by the U.S. Defense Department, according to Chris Vickery, a researcher at cybersecurity firm UpGuard who discovered the exposure.
Amazon Web Services is a cloud platform that individuals, businesses and the government use for things like storing data and boosting computing power. Amazon said on its website it is best practice to restrict access to information stored in the cloud to "people that absolutely need it."
The military databases hold at least 1.8 billion internet posts scraped from social media, news sites, forums and other publicly available websites, Vickery told CNN Tech. The posts are in multiple languages and originate from countries across the world, including the United States.
The information, which Vickery said goes back as far back as 2009, is held by U.S. Central Command (Centcom) and U.S. Pacific Command (Pacom). There's no indication that malicious attackers accessed the databases. The Defense Department secured the data by October 1 after Vickery alerted officials of the problem in mid-September, he said.
The information that was exposed had been publicly available -- it was not, for instance, sensitive user data. Still, the failure to fully secure the data raises concerns about government cybersecurity practices.
"[It's] a pretty serious leak when you're talking about intelligence information being stored in an Amazon cloud service and not properly safeguarded," said Timothy Edgar, a former White House official in the Obama administration and former U.S. intelligence official.
Edgar said he frequently questioned the security and implementation of cloud technology while working in intelligence. "That's exactly what we were worried about," he said.
Cloud computing allows a large organization like a government agency or business to readily access information stored on remote servers from far-flung locations. It is increasingly how data is stored.
The Defense Department confirmed the exposure in an email to CNN Tech.
"We determined that the data was accessed via unauthorized means by employing methods to circumvent security protocols," said Maj. Josh Jacques, a spokesperson for U.S. Central Command. "Once alerted to the unauthorized access, Centcom implemented additional security measures to prevent unauthorized access."
How the data was discovered
Amazon (AMZN) servers where data is stored, called S3 buckets, are private by default. Private means only authorized users can access them. For one to be made more widely accessible, someone would have to configure it to be available to all Amazon Web Services users, but users would need to know or find the name of the bucket in order to access it.
By searching specific keywords, Vickery identifies information that companies and organizations inadvertently expose. In this case, he looked for buckets containing the word "com."
Three S3 buckets were configured to allow anyone with an Amazon Web Services account to access them. They were labeled "centcom-backup," "centcom-archive" and "pacom-archive," Vickery said.
Last week, Amazon introduced new S3 security features, including displaying an indicator next to any bucket that is publicly accessible.
Related: Data of almost 200 million voters leaked online
This is not the first exposure of data Vickery has discovered. He previously found major leaks from Verizon and a Republican analytics firm. Both firms closed the security holes once alerted to the issue.
"The overall goal is to make people aware that data breaches and companies exposing data haphazardly is a huge, epidemic-sized problem," Vickery said. "If something of this size and importance suffers from the same problem, we need to start taking it a lot more seriously."
This isn't the first time Centcom experienced an online security failure. In 2015, hackers took over the agency's Twitter account.
What's inside
The data that was exposed includes information from Twitter, Facebook and other public websites.
The posts originate from many countries and are written in different languages, with an emphasis on Arabic, Farsi, and other Central and South Asian dialects spoken in Afghanistan and Pakistan, according to Vickery. Although the content goes back eight years, the uploads appear to have begun in 2013 and were ongoing when Vickery found the data.
Vickery analyzed a small fraction of it. Posts included comments from YouTube, Twitter and Facebook; local U.S. websites that focus on sports and guns; scam alert websites; and forums containing offensive content.
UpGuard, Vickery's firm, shared some English-language posts with CNN Tech.
Topics included: American history, President Trump, former presidential candidate Hillary Clinton, "killer clowns," Russia, former President Obama, Russian president Vladimir Putin, American pop stars, and the Pope.
Inside one Centcom data bucket is a folder labeled Outpost. Vickery's analysis indicates the folder contains information from a third-party contractor called Vendor X. This company no longer has an active presence online.
According to the LinkedIn profile of Erik Kjell Berg, former vice president of product at Vendor X, Outpost is "a multilingual social analytics platform designed to positively influence change in high-risk youth in unstable regions of the world, built exclusively for the Dept. of Defense."
Berg and other former executives for Vendor X did not respond to requests for comment.
Jacques, the spokesperson for U.S. Central Command, said Centcom has used commercial off-the-shelf and web-based programs for information gathering. "The information we gather is widely available to anyone who conducts similar online activities," he said.
What the data is used for
The purpose of the data collection effort is not clear.
Jacques said it is "used for measurement and engagement activities of our online programs on public sites." He declined to elaborate, although he said it "is not collected nor processed for any intelligence purposes."
Edgar worked in the Office of the Director of National Intelligence under President George W. Bush and later advised President Obama on privacy and cybersecurity issues.
He said the rules around open-source information gathering by government agencies remain at least partly unclear.
"There have been continuing question marks about the role of collecting publicly available information from social media," he said. "Government intelligence officers say we shouldn't inhibit ourselves when we're talking about collecting information about potential terrorists. If the rules allow it, we should do it. But that kind of approach can get problematic because it doesn't offer a whole lot of guidance."
Another expert, Andrea Little Limbago, chief social scientist at cybersecurity firm Endgame, said it's not uncommon for the Pentagon to collect vast sums of internet data.
"At times, you do need to cast a wide net, and then do the analytics to narrow down what you're trying to find," said Limbago, a former analyst with the Defense Department between 2007 and 2011.
She said she would be surprised if the Defense Department was targeting U.S. individuals without the proper authorization.