将 SQL 查询限制为 Graphene-SQLAlchemy 中定义的字段/列

Posted

技术标签:

【中文标题】将 SQL 查询限制为 Graphene-SQLAlchemy 中定义的字段/列【英文标题】:Limiting SQL query to defined fields/columns in Graphene-SQLAlchemy 【发布时间】:2018-11-03 07:03:11 【问题描述】:

这个问题已在https://github.com/graphql-python/graphene-sqlalchemy/issues/134 下发布为 GH 问题,但我想我也将其发布在这里以吸引 SO 人群。

可以在https://github.com/somada141/demo-graphql-sqlalchemy-falcon 下找到完整的工作演示。

考虑以下 SQLAlchemy ORM 类:

class Author(Base, OrmBaseMixin):
    __tablename__ = "authors"

    author_id = sqlalchemy.Column(
        sqlalchemy.types.Integer(),
        primary_key=True,
    )

    name_first = sqlalchemy.Column(
        sqlalchemy.types.Unicode(length=80),
        nullable=False,
    )

    name_last = sqlalchemy.Column(
        sqlalchemy.types.Unicode(length=80),
        nullable=False,
    )

像这样简单地包裹在SQLAlchemyObjectType 中:

class TypeAuthor(SQLAlchemyObjectType):
    class Meta:
        model = Author

并通过以下方式暴露:

author = graphene.Field(
    TypeAuthor,
    author_id=graphene.Argument(type=graphene.Int, required=False),
    name_first=graphene.Argument(type=graphene.String, required=False),
    name_last=graphene.Argument(type=graphene.String, required=False),
)

@staticmethod
def resolve_author(
    args,
    info,
    author_id: Union[int, None] = None,
    name_first: Union[str, None] = None,
    name_last: Union[str, None] = None,
):
    query = TypeAuthor.get_query(info=info)

    if author_id:
        query = query.filter(Author.author_id == author_id)

    if name_first:
        query = query.filter(Author.name_first == name_first)

    if name_last:
        query = query.filter(Author.name_last == name_last)

    author = query.first()

    return author

一个 GraphQL 查询,例如:

query GetAuthor
  author(authorId: 1) 
    nameFirst
  

将导致发出以下原始 SQL(取自 SQLA 引擎的回显日志):

SELECT authors.author_id AS authors_author_id, authors.name_first AS authors_name_first, authors.name_last AS authors_name_last
FROM authors
WHERE authors.author_id = ?
 LIMIT ? OFFSET ?
2018-05-24 16:23:03,669 INFO sqlalchemy.engine.base.Engine (1, 1, 0)

正如我们所见,我们可能只需要nameFirst 字段,即name_first 列,但会获取整行。当然 GraphQL 响应只包含请求的字段,即,


  "data": 
    "author": 
      "nameFirst": "Robert"
    
  

但我们仍然获取了整行,这在处理宽表时成为一个主要问题。

有没有办法自动向 SQLAlchemy 传达需要哪些列,以防止这种形式的过度获取?

【问题讨论】:

【参考方案1】:

我的问题已在 GitHub 问题 (https://github.com/graphql-python/graphene-sqlalchemy/issues/134) 上得到解答。

这个想法是从info 参数(类型为graphql.execution.base.ResolveInfo)中识别出请求的字段,该参数通过get_field_names 函数传递给解析器函数,如下所示:

def get_field_names(info):
    """
    Parses a query info into a list of composite field names.
    For example the following query:
        
          carts 
            edges 
              node 
                id
                name
                ...cartInfo
              
            
          
        
        fragment cartInfo on CartType  whatever 

    Will result in an array:
        [
            'carts',
            'carts.edges',
            'carts.edges.node',
            'carts.edges.node.id',
            'carts.edges.node.name',
            'carts.edges.node.whatever'
        ]
    """

    fragments = info.fragments

    def iterate_field_names(prefix, field):
        name = field.name.value

        if isinstance(field, FragmentSpread):
            _results = []
            new_prefix = prefix
            sub_selection = fragments[field.name.value].selection_set.selections
        else:
            _results = [prefix + name]
            new_prefix = prefix + name + "."
            if field.selection_set:
                sub_selection = field.selection_set.selections
            else:
                sub_selection = []

        for sub_field in sub_selection:
            _results += iterate_field_names(new_prefix, sub_field)

        return _results

    results = iterate_field_names('', info.field_asts[0])

    return results

以上函数取自https://github.com/graphql-python/graphene/issues/348#issuecomment-267717809。该问题包含此功能的其他版本,但我觉得这是最完整的。

并使用已识别的字段来限制 SQLAlchemy 查询中检索到的字段,如下所示:

fields = get_field_names(info=info)
query = TypeAuthor.get_query(info=info).options(load_only(*relation_fields))

当应用于上述示例查询时:

query GetAuthor
  author(authorId: 1) 
    nameFirst
  

get_field_names 函数将返回 ['author', 'author.nameFirst']。但是,由于“原始”SQLAlchemy ORM 字段是蛇形大小写的,因此需要更新 get_field_names 查询以删除 author 前缀并通过 graphene.utils.str_converters.to_snake_case 函数转换字段名。

长话短说,上述方法会产生如下原始 SQL 查询:

INFO:sqlalchemy.engine.base.Engine:SELECT authors.author_id AS authors_author_id, authors.name_first AS authors_name_first
FROM authors
WHERE authors.author_id = ?
 LIMIT ? OFFSET ?
2018-06-09 13:22:16,396 INFO sqlalchemy.engine.base.Engine (1, 1, 0)

更新

如果有人来到这里想知道如何实现我已经开始实现我自己的get_query_fields 函数版本:

from typing import List, Dict, Union, Type

import graphql
from graphql.language.ast import FragmentSpread
from graphql.language.ast import Field
from graphene.utils.str_converters import to_snake_case
import sqlalchemy.orm

from demo.orm_base import OrmBaseMixin

def extract_requested_fields(
    info: graphql.execution.base.ResolveInfo,
    fields: List[Union[Field, FragmentSpread]],
    do_convert_to_snake_case: bool = True,
) -> Dict:
    """Extracts the fields requested in a GraphQL query by processing the AST
    and returns a nested dictionary representing the requested fields.

    Note:
        This function should support arbitrarily nested field structures
        including fragments.

    Example:
        Consider the following query passed to a resolver and running this
        function with the `ResolveInfo` object passed to the resolver.

        >>> query = "query getAuthorauthor(authorId: 1)nameFirst, nameLast"
        >>> extract_requested_fields(info, info.field_asts, True)
        'author': 'name_first': None, 'name_last': None

    Args:
        info (graphql.execution.base.ResolveInfo): The GraphQL query info passed
            to the resolver function.
        fields (List[Union[Field, FragmentSpread]]): The list of `Field` or
            `FragmentSpread` objects parsed out of the GraphQL query and stored
            in the AST.
        do_convert_to_snake_case (bool): Whether to convert the fields as they
            appear in the GraphQL query (typically in camel-case) back to
            snake-case (which is how they typically appear in ORM classes).

    Returns:
        Dict: The nested dictionary containing all the requested fields.
    """

    result = 
    for field in fields:

        # Set the `key` as the field name.
        key = field.name.value

        # Convert the key from camel-case to snake-case (if required).
        if do_convert_to_snake_case:
            key = to_snake_case(name=key)

        # Initialize `val` to `None`. Fields without nested-fields under them
        # will have a dictionary value of `None`.
        val = None

        # If the field is of type `Field` then extract the nested fields under
        # the `selection_set` (if defined). These nested fields will be
        # extracted recursively and placed in a dictionary under the field
        # name in the `result` dictionary.
        if isinstance(field, Field):
            if (
                hasattr(field, "selection_set") and
                field.selection_set is not None
            ):
                # Extract field names out of the field selections.
                val = extract_requested_fields(
                    info=info,
                    fields=field.selection_set.selections,
                )
            result[key] = val
        # If the field is of type `FragmentSpread` then retrieve the fragment
        # from `info.fragments` and recursively extract the nested fields but
        # as we don't want the name of the fragment appearing in the result
        # dictionary (since it does not match anything in the ORM classes) the
        # result will simply be result of the extraction.
        elif isinstance(field, FragmentSpread):
            # Retrieve referened fragment.
            fragment = info.fragments[field.name.value]
            # Extract field names out of the fragment selections.
            val = extract_requested_fields(
                info=info,
                fields=fragment.selection_set.selections,
            )
            result = val

    return result

它将 AST 解析为 dict,保留查询的结构并(希望)匹配 ORM 的结构。

运行info 对象的查询,例如:

query getAuthor
  author(authorId: 1) 
    nameFirst,
    nameLast
  

生产

'author': 'name_first': None, 'name_last': None

而像这样的更复杂的查询:

query getAuthor
  author(nameFirst: "Brandon") 
    ...authorFields
    books 
      ...bookFields
    
  


fragment authorFields on TypeAuthor 
  nameFirst,
  nameLast


fragment bookFields on TypeBook 
  title,
  year

产生:

'author': 'books': 'title': None, 'year': None,
  'name_first': None,
  'name_last': None

现在,这些字典可用于定义主表上的字段(在本例中为 Author),因为它们将具有 None 的值,例如 name_first 或关系上的字段该主表的字段,例如 books 关系上的字段 title

自动应用这些字段的简单方法可以采用以下函数的形式:

def apply_requested_fields(
    info: graphql.execution.base.ResolveInfo,
    query: sqlalchemy.orm.Query,
    orm_class: Type[OrmBaseMixin]
) -> sqlalchemy.orm.Query:
    """Updates the SQLAlchemy Query object by limiting the loaded fields of the
    table and its relationship to the ones explicitly requested in the GraphQL
    query.

    Note:
        This function is fairly simplistic in that it assumes that (1) the
        SQLAlchemy query only selects a single ORM class/table and that (2)
        relationship fields are only one level deep, i.e., that requestd fields
        are either table fields or fields of the table relationship, e.g., it
        does not support fields of relationship relationships.

    Args:
        info (graphql.execution.base.ResolveInfo): The GraphQL query info passed
            to the resolver function.
        query (sqlalchemy.orm.Query): The SQLAlchemy Query object to be updated.
        orm_class (Type[OrmBaseMixin]): The ORM class of the selected table.

    Returns:
        sqlalchemy.orm.Query: The updated SQLAlchemy Query object.
    """

    # Extract the fields requested in the GraphQL query.
    fields = extract_requested_fields(
        info=info,
        fields=info.field_asts,
        do_convert_to_snake_case=True,
    )

    # We assume that the top level of the `fields` dictionary only contains a
    # single key referring to the GraphQL resource being resolved.
    tl_key = list(fields.keys())[0]
    # We assume that any keys that have a value of `None` (as opposed to
    # dictionaries) are fields of the primary table.
    table_fields = [
        key for key, val in fields[tl_key].items()
        if val is None
    ]

    # We assume that any keys that have a value being a dictionary are
    # relationship attributes on the primary table with the keys in the
    # dictionary being fields on that relationship. Thus we create a list of
    # `[relatioship_name, relationship_fields]` lists to be used in the
    # `joinedload` definitions.
    relationship_fieldsets = [
        [key, val.keys()]
        for key, val in fields[tl_key].items()
        if isinstance(val, dict)
    ]

    # Assemble a list of `joinedload` definitions on the defined relationship
    # attribute name and the requested fields on that relationship.
    options_joinedloads = []
    for relationship_fieldset in relationship_fieldsets:
        relationship = relationship_fieldset[0]
        rel_fields = relationship_fieldset[1]
        options_joinedloads.append(
            sqlalchemy.orm.joinedload(
                getattr(orm_class, relationship)
            ).load_only(*rel_fields)
        )

    # Update the SQLAlchemy query by limiting the loaded fields on the primary
    # table as well as by including the `joinedload` definitions.
    query = query.options(
        sqlalchemy.orm.load_only(*table_fields),
        *options_joinedloads
    )

    return query

【讨论】:

这正是我想要的,感谢您发布它!我唯一的问题是 - 为什么这个输出在 info 对象中不可用,而不是需要如此复杂的自定义解决方案?这是一个真正的问题-您似乎经常希望确保从数据库中获取正确的数据(防止不必要/浪费的查询)以进行进一步处理-但我是 Graphene/GraphQL 的新手所以我想知道我是否只是错过了一些关于 Graphene/GraphQL 如何“意味着”工作的信息? @Ascendant 我认为问题不在于 Graphene/GraphQL。这些解决方案不关心您如何获取数据或获取的性能如何。问题出在 Graphene-SQLAlchemy 包中,主要关注的是将 SQLAlchemy 模式映射到 GraphQL 模式。在覆盖所有边缘情况的同时生成这样的查询将过于复杂(如果不是不可能的话),因此实际的获取落在了开发人员身上。我上面的解决方案不是万能的,但考虑到 SQLAlchemy 的复杂性,在许多情况下都无法实现?。 在 GraphQL 中,当您在数据库中查询多对多时,无论您使用什么 ORM(如果有),这不是一个更普遍的问题吗?如果您有ab 的多对多关系,并且您的查询是 a b ,那么您似乎想要连接ab(按@ 分组987654354@) 在a 的解析器中...否则,a 中的b 的解析器将针对a 中的每一行访问数据库。但是a 的解析器无法知道如果没有您发布的解决方案,它需要加入 b 您将如何使用apply_requested_fields 函数?您能否提供一个将其应用于标准查询的示例,以及 .filter_by(...).one().all() 等其他子句?

以上是关于将 SQL 查询限制为 Graphene-SQLAlchemy 中定义的字段/列的主要内容,如果未能解决你的问题,请参考以下文章

将最小 n 条记录限制为 Oracle SQL 中的输出

将查询设置为模型并进一步限制它

如何优化 sql 查询以避免在没有 php.ini 或设置时间限制的情况下执行最长时间 [关闭]

如何在 Oracle SQL 中将结果限制为 1 行 [重复]

限制 SQL 查询中每个 id 的行数 [重复]

关于数据库更新/插入速率限制的一些查询(基于 SQL 或基于 NoSQL)