Apache Sentry

Friday March 25, 2016

Sentry Graduates to a Top-Level Apache Project

We are very excited to announce that Apache Sentry has graduated out of Incubator and is now an Apache Top Level Project! Sentry, which provides centralized fine-grained access control on metadata and data stored in Apache Hadoop clusters, was introduced as an Apache Incubator project back in August 2013. In the past two and a half years, the development community grew significantly to a large number of contributors from various organizations. Upon graduation, there were more than 50 contributors, 31 of whom had become committers.


What’s Sentry

While Hadoop has strong security at the filesystem level, it lacked the granular support needed to adequately secure access to data by users and BI applications. This problem forces users to make a choice: either leave data unprotected or lock out users entirely. Most of the time, the preferred choice is the latter, severely inhibiting access to data in Hadoop. Sentry provides the ability to enforce role-based access control to data and/or privileges on data for authenticated users in a fine-grained manner. For example, Sentry’s SQL permissions allow access control at the server, database, table, view and even column scope at different privilege levels including select, insert, etc for Apache Hive and Apache Impala. With role-based authorization, precise levels of access could be granted to the right users and applications.



What’s new

During incubation, Sentry had six releases and has continued to grow on providing unified authorization policy management across different Hadoop components.  Some of them including:

  • Sentry allow for multiple permission models, and also enforcing the same permission model across multiple compute frameworks and data access paths.

  • Support for Solr (Search)

  • Synchronizing SQL table permissions with HDFS file permissions

  • Audit log support for data governance purposes

  • Sentry High Availability (HA)

  • Import/export tool for replicating permissions to other clusters

  • Support for Apache Kafka, Apache Solr and Apache Sqoop

Future Work

Graduation is a terrific milestone, but only the beginning for Sentry. We are looking forward to continuing to help grow the Sentry community and fostering a strong ecosystem around the project.


We are targeting significant enhancements across the areas of:

  • Ease of Sentry enablement and management of permissions

  • Feature parity with access control capabilities of mature relational database systems

  • Attribute-Based Access Control (ABAC), including permissions based on data sensitivity tags

  • Integration with additional hadoop ecosystem frameworks so that existing permissions can be enforced across additional access paths


How to Get Involved
The Sentry community now includes new core committers, an active developer mailing list where future releases and patches are discussed, and increasing interest in running additional frameworks on Sentry. We strongly encourage new people join Sentry and contribute through jumping on the discussions on the mailing list, filing bugs through Jira, reviewing other's’ code or even providing new patches.[Read More]

Wednesday September 30, 2015

Apache Sentry 1.6.0-incubating released!

On September 22, the Sentry community announced the release of Apache Sentry 1.6.0-incubating. Significant works have gone into this release where 115 defects or improvements were fixed and made, and 4 new features were added by the team.

  • The most notable feature added to this release was "Sqoop2 integration with Sentry". This new feature allows for the Sqoop authorization policies to be managed by Sentry.
  • With the feature "Add Audit Log Support for Solr Sentry Handlers", data governance is achieved for Solr activity.
  • Sentry 1.6.0-incubating also supports CredentialProvider, so that the passwords can be specified in an external encrypted file.
  • Sentry also provides tool to dump entire content in Sentry specific format and load the data into another instance of the Sentry Service, it is very helpful for backups, migration from one backend store to another and debugging content of the underlying database
The complete list of bugfixes, improvements and features in Sentry 1.6.0-incubating can be viewed at: http://s.apache.org/w5
- Dapeng Sun (Release Manager)

 

Thursday April 16, 2015

Sentry 1.5.1 release annoucement


                                 Apache Sentry 1.5.1-Incubating released!


On July 16, the Apache Sentry community announced the release of Apache Sentry 1.5.1-incubating. Significant work has gone into this release which includes 8 major features, 18 improvements and 131 defects fixes.

The following are the important features in this release:
1. Column-level access control: This feature provides the access control on column level support for the component such as HIVE.  
2. More granular privileges to the DBModel: This feature adds more privilege types like CREATE, DROP, INDEX, LOCK etc. 
3. High availability for Sentry service: Provide the option of running more redundant Sentry instances in the same cluster with a hot standby. This allows a fast failover to a new Sentry instance seamlessly in the case of outages.
4. Solr Sentry plug-in integration with DB store: Solr can persist the security policies into a database store like the hive plugin does. This new feature leverages the generic authorization model framework (which is also one new feature added in this release).

The complete list of bugfixes, improvements and features in Sentry 1.5.1-incubating can be viewed at: http://s.apache.org/hVw

Guoquan Shen (Release Manager)

Note: Updated from 1.5.0 to 1.5.1 release as 1.5.0 release was defuncted 


Friday October 10, 2014

Sentry 1.4.0 release annoucement

Apache Sentry 1.4.0-Incubating released!


On August 19,  the Sentry community announced the release of Apache Sentry 1.4.0-incubating.  Significant works have gone into this release where 116 defects were fixed, 8 improvements were made, and 10 new features were added by the team.   


The most notable feature added to this release was the persistence of security policies into a database store (a.k.a DB based provider).   This new feature allows for dynamic policy changes in Sentry.  With the corresponding enhancement in Hive authorization hook,  Sentry 1.4.0-incubating is integrated with Hive’s GRANT and REVOKE DDL processing to dynamically capture policy changes into Sentry policy.  Sentry also provides hook into the Hive Metastore Service to protect it against changes not authorized by the security policy.

The complete list of bugfixes, improvements and features in Sentry 1.4.0-incubating can be viewed at: http://s.apache.org/ApB

Tuong Truong (Release Manager) 

Friday September 12, 2014

Apache Sentry user meetup @NYC during Strata + Hadoop World

 We are planning a Sentry user meetup on 16th October at NY, during  Strata + Hadoop World conference. The venue is still being decided. 

If you are NY area or attending the conference, please drop by to meet other Sentry users and contributors. You can RSVP for the event on http://s.apache.org/1Bd

Monday July 21, 2014

Apache Sentry architecture overview

Apache Sentry architecture overview

Apache Sentry is an authorization module for Hadoop that provides the granular, role-based authorization required to provide precise levels of access to the right users and applications.

It currently works out of the box with Apache Hive/Hcatalog, Apache Solr and Cloudera Impala. In future this could be extended to many Hadoop ecosystem components like HDFS and HBase.

This document provides high level architecture of Apache Sentry and integration with hive.

What is Apache Sentry

   While Hadoop has strong security at the filesystem level, it lacked the granular support needed to adequately secure access to data by users and BI applications. This problem forces users to make a choice: either leave data unprotected or lock out users entirely. Most of the time, the preferred choice is the latter, severely inhibiting access to data in Hadoop.

Sentry provides the ability to control and enforce access to data and/or privileges on data for authenticated users. It offers fine-grained access control to data and metadata in Hadoop. In its initial release for Hive and Impala, Sentry allows access control at the server, database, table, and view scopes at different privilege levels including select, insert, and all. The column level security can be implemented by creating a view of subset of allowed columns. One can restrict the base table and grant privileges on the view so that the columns with sensitive data doesn’t have to be exposed to the unauthorized users.

Sentry supports ease of administration through role-based authorization; you can easily grant multiple groups access to the same data at different privilege levels. For example, for a particular data set you may give your fraud detection team rights to view all columns, your analysts rights to view only non-sensitive or non-PII (personally identifiable information) columns, and your ingest processing pipeline rights to insert new data into HDFS.

How does Sentry work

The goal of Apache Sentry is to address the authorization requirement. It’s a policy engine that can be used by a data processing tool to validate access. It’s highly extensible to support any arbitrary data model. Currently it support the relational data model for Apache Hive and Cloudera Impala, as well as hierarchical data model used by Apache Solr.

Sentry provides means of defining and persisting the policies for accessing resources. Currently the policies can be stored in flat files or DB backed storage that can be accessed using a RPC service. The data processing tool (eg Hive) identifies the user request to access a piece of data in certain mode, eg read a data row from a table or drop a table. The tool then asks Sentry to validate this access. Sentry builds map of privileges allowed for the requesting user and then determines whether the given request should be allowed. The requesting tool then allows or prohibit the user access based on Sentry’s decision.

Following are the actors that play part in Sentry authorization

  • Resource
  • Privileges
  • Roles
  • Users and Groups

Resource

A resource is an object that you want to regulate access to. In the relational model a resource can be Server, Database, Table or URI (ie HDFS or local path).

Privileges

By default Sentry does not allow access to any resource unless explicitly granted. A privilege is essentially a rule that grant access to a resource. It spells out how a given resource is allowed to be accessed. For example, a table called customer_info from a database called sales is allowed to access in read mode.

Roles

Role is a collection of privileges. This is template to combine multiple privileges required for a logical role in the data processing. For example, a data analyst in your organisation requires read and write access to sales table, read access to customer table and full access to sandbox database. The notion of roles allow one to club all these rules under a single template which can be assigned to an analyst in one shot. Moreover this allows you to maintain the analyst permissions in future. For example, if analysts need change access to customer table from read mode to write mode, you can simply make that one change in the analyst role which will reflect for all analyst.

Groups

A group is a collection of users. Sentry group mapping is extensible. By default Sentry leverages Hadoop’s group mapping (which in turn can be OS groups or LDAP groups). Sentry allows you to associate roles to groups. The notion of groups further simplifies the administration. You can combine a number of users into a single group. For example, Bob, John and Kim are analyst in your organisation. You can put all of them into a single group called analyst. Then you can grant the analyst role (discussed in previous section) to this group analyst. This saves the trouble of assigning the roles to each users. If Bob moves out of analyst role, you can simply remove him from analyst group to restrict this access as analyst. Also if John takes an additional role of a manager, then you can simply add him to manager role to grant him all managerial access to.

Note that Sentry only supports this template based policy granting. You can’t grant a privilege directly to a user or group. You are required to combine privileges under roles and a role can only be granted to a group, not directly to a user.

Sentry architecture


Bindings

 As mentioned before, Sentry policy engine is a plugin invoked by downstream tool like Hive. The binding module is the bridge between the invoking tool and Sentry authorization. This layer takes the authorization request in the requestors native format and converts that into a auth request that can be handled by Sentry policy engine. For example consider consider following hive query,

INSERT INTO TABLE report_db.monthly_sales

SELECT customer_name, transaction_date, amount FROM  

prod_db.customer JOIN prod_db.transaction

ON (customer.id = transaction.cid)

This query needs write access to table montly_sales from database report_db, read access on tables customer and transaction from prod_db. It’s the responsibility  of the binding layer to extract this information from Hive’s compiler structure and pass it down to the policy engine.

Policy Engine

This is the core of Sentry’s authorization. The policy engine gets the requested privileges from the binding layer and the required privileges from the provider layer. It looks at the requested and required privileges and makes the decision whether the action should be allowed.

Policy provider

The provider is an abstraction for making the authorization metadata available for the policy engine. This allows the metadata to be pulled out of the underlying repository independent of the way that metadata is stored.

Currently Sentry support file based storage and DB based storage out of the box.

File based provider

The File based provider stores metadata in a ini format file. The file can reside on a local file system or HDFS. The policy file contains a group section that contains group to role mapping. The roles section contains role to privilege mapping. Here an example of a policy file

[groups]

# Assigns each Hadoop group to its set of roles

manager = analyst_role, junior_analyst_role

margin: 0px; font-family: Arial; line-height: 1; padding-top: 0pt; padding-bottom: 0pt; widows: 2; orphans: 2; direction: ltr;">analyst = analyst_role

admin = admin_role

[roles]

analyst_role = server=server1->db=analyst1, \

   server=server1->db=jranalyst1->table=*->action=select, \

   server=server1->uri=hdfs://ha-nn-uri/landing/analyst1, \

   server=server1->db=default->table=tab2

# Implies everything on server1.

admin_role = server=server1

DB based provider

The file provider makes it hard to modify programmatically, has race conditions when modifying, and is tedious to maintain. The products like Hive and Impala need to support industry standard SQL interface to administer the authorization policies which requires a programmatic way to manage it.

The Sentry policy store and Sentry Service persist the role to privilege and group to role mappings in an RDBMS and provide programmatic APIs to create, query, update and delete it. This enables various Sentry clients to retrieve and modify the privileges concurrently and securely.

Sentry Policy Store works with a number of back-end databases (MySQL, Postgres etc). It uses ORM library DataNucleus to read and write to the database.

Sentry Service supports Kerberos authentication. Other authentication mechanisms can be added subsequently, if needed. You can further restrict the connection by specifying a list of users that are allowed to connect to service.

Currently Sentry service supports trusted authorization. The users are that connect to the service are essentially super users (eg. hive or Impala). The connecting user can specify the effective user for the each RPC request. The admin users that are allowed to execute a request is configurable. For example, service user hive connect to Sentry store and submit a create role request on behalf of user Bob. If Bob is not configured as an admin user, this request will be rejected.

The current RPC interface supported by Sentry service is available athttps://github.com/apache/incubator-sentry/blob/master/sentry-provider/sentry-provider-db/src/main/resources/sentry_policy_service.thrift

Sentry Hive integration

Query authorization

Sentry policy engine is plugged into Hive via semantic hook. HiveServer2 executes this hook after the query is successfully compiled.

The hooks gets the list of objects the query is try to access in read and write mode. The Sentry Hive binding converts this into authorization request based on the SQL privilege model.

Policy manipulation

The policy manipulation is handled in two steps. During the query compilation, Hive invokes Sentry’s authorization task factory that generates Sentry specific task which is executed during query processing. This task invokes the Sentry store client which sends RPC request to Sentry service for making authorization policy changes.

HCatalog integration


Sentry is integrated into Hive Metastore via pre-listener hooks. The metastore executes this hook prior to executing the metadata manipulation request. The metastore binding creates a Sentry authorization requests for the metadata modification request coming for the metastore/HCatalog client.


Thursday December 05, 2013

Getting Started with Sentry in Hive

Apache Sentry (incubating) is a highly modular system for providing fine-grained role based authorization to both data and metadata stored on an Apache Hadoop cluster. It currently works out of the box with Apache Hive and Cloudera Impala. In this blog post, you will learn how to use Sentry with Hive.

Sentry uses a policy provider to define the access control to Hive. Sentry currently ships with a file-based policy provider, see below for an example. A single global policy file can be used to control access to an entire HiveServer2 instance, and multiple dependent per database policy files can be linked to the global one. Lets look at the structure of policy file with an example.

Global policy file:

[groups]
admin_group = admin_role
dep1_admin = uri_role

[roles]
admin_role = server=server1
uri_role = hdfs:///ha-nn-uri/data

[databases]
db1 = hdfs://ha-nn-uri/user/hive/sentry/db1.ini

Per db policy file: (at hdfs://ha-nn-uri/user/hive/sentry/db1.ini)

[groups]
dep1_admin = db1_admin_role
dep1_analyst = db1_read_role

[roles]
db1_admin_role = server=server1->db=db1
db1_read_role = server=server1->db=db1->table=*->action=select

As you can see above, there are usually three sections in the global policy file:

  • A [groups] section that provides group-to-role mapping
  • A [roles] section that provides role-to-privileges mapping
  • A [databases] (optional) section that provides database-to-per-database policy file mapping. This allows for maintaining per-database privileges separately.

Sentry provides authorization through a hook in HiveServer2. When a user makes a connection to HiveServer2, it authenticates the connecting user and persists the user information for the session. For the subsequent operations that user performs, Sentry authorizes the operation by mapping the user to the groups he/she belongs to and determining whether the group(s) have necessary privileges on the relevant objects.

Hive security landscape with Sentry

Next, lets look at how Sentry fits into the security landscape of Hive. The below infographic shows how different authentication and authorization pieces fit together.

sentry-blog.png
Here are the main points to take away:
  • Sentry requires that HiveServer2 be configured to use strong authentication. HiveServer2 supports Kerberos as well as LDAP (and AD) authentication mechanisms.
  • At the Sentry authorization level, there are two supported forms of user-group mappings:
    • HadoopGroup mapping, which uses the underlying Hadoop groups
      • Hadoop groups in turn support Shell-based mapping as well as LDAP group mapping. Please note that in case of Sentry with Hive, the mapping of users to groups is performed on the HiveServer2 host
    • LocalGroups, where the users and groups can be defined locally in the policy file using [users] section (for testing purposes only)

Demo

In this demo, we will be using Kerberos authentication for HiveServer2 with HadoopGroups as the Sentry group provider, which by default uses Shell mapping. We briefly go over Sentry and see how to configure and use it in this configuration. (Note: Cloudera Manager 4.7 and CDH 4.4 are shown here; for future versions, the steps will be similar.)

video_snapshot.png

Conclusion

Sentry brings in fine-grained authorization support for both data and metadata in a Hadoop cluster. It is already being used in production systems to secure the data and provide fine-grained access to its users. It is also integrated with the version of Hive shipping in CDH (upstream contribution is pending), Cloudera Impala, and Cloudera Search. Also, here is a short demo if you are interested in using it with Hue. 

Calendar

Search

Hot Blogs (today's hits)

Tag Cloud

Categories

Feeds

Links

Navigation