Google Opens Cloud Dataflow To All Developers, Launches European Zone For BigQuery


Google launched a couple of updates to its cloud-based big data products at the Hadoop Summit in Brussels on 16 April. These included the launch of the open beta of Cloud Dataflow, Google’s new service for processing huge amounts of data, as well as an update to BigQuery, which will make the company’s bid data database service available in Google’s European data centers and introduces row-level permissions.


Cloud Dataflow, which can process data both as streams and in batches, automatically scales according to the developer’s needs, for example (though it’s worth noting that Google plans to implement some controls so cost can’t get out of hand when a developer pushes more data through the system than necessary). Developers write their Cloud Dataflow code once and then Google handles all the infrastructure for them.


While Cloud Dataflow is new, BigQuery has now been around since 2010. Starting today, however, its users can also host their data in Google’s European data centers. Google’s director of product management Tom Kershaw told TechCrunch a lot of Google’s customers have been asking for this. Given the concerns around data sovereignty in Europe, it’s actually surprising that Google didn’t roll this feature out earlier.


The other update to BigQuery is that the database now supports row-level permissions. In many companies, different departments now need to access the same data, but while the marketing department may need to work with some tables and tables in your database, for example, you may not want to give them access to sensitive business data. Today, this typically means IT will make a copy of the data and share this copy with another department. Once you’ve made this copy, though, the different datasets are out of sync. “When we create datasets that are copied, we create outdated data — and wrong data,” Kershaw said. “You end up running analytics on this data and it’s wrong and old.” With row-level permissions, the administrators can now ensure that different departments only get access to the information they need without the drawbacks of making copies.


With this update, BigQuery can now also ingest up to 100,000 rows per second and table. That’s a lot of information, but not an unusual number when you analyze huge log files — something that’s become a major use case for BigQuery, Kershaw tells me.


Google’s big data lineup currently consists of BigQuery, Cloud Dataflow and the messaging service Cloud Pub/Sub. Given Google’s interest and in-house expertise in this area, chances are we will see more updates and new tools for working with large amounts of data at Google I/O next month.


To read the full article please visit TechCrunch