SELECT
在 Polars SQL 中,SELECT
语句用于从表中检索数据到 DataFrame
中。Polars SQL 中 SELECT
语句的基本语法如下:
SELECT column1, column2, ...
FROM table_name;
在这里,column1
、column2
等是您希望从表中选择的列。您也可以使用通配符 *
来选择所有列。table_name
是您想要从中检索数据的表的名称。在下面的章节中,我们将介绍一些更常见的 SELECT 变体。
df = pl.DataFrame(
{
"city": [
"New York",
"Los Angeles",
"Chicago",
"Houston",
"Phoenix",
"Amsterdam",
],
"country": ["USA", "USA", "USA", "USA", "USA", "Netherlands"],
"population": [8399000, 3997000, 2705000, 2320000, 1680000, 900000],
}
)
ctx = pl.SQLContext(population=df, eager=True)
print(ctx.execute("SELECT * FROM population"))
shape: (6, 3)
┌─────────────┬─────────────┬────────────┐
│ city ┆ country ┆ population │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═════════════╪═════════════╪════════════╡
│ New York ┆ USA ┆ 8399000 │
│ Los Angeles ┆ USA ┆ 3997000 │
│ Chicago ┆ USA ┆ 2705000 │
│ Houston ┆ USA ┆ 2320000 │
│ Phoenix ┆ USA ┆ 1680000 │
│ Amsterdam ┆ Netherlands ┆ 900000 │
└─────────────┴─────────────┴────────────┘
GROUP BY
GROUP BY
语句用于根据一个或多个列对表中的行进行分组,并对每个组计算聚合函数。
result = ctx.execute(
"""
SELECT country, AVG(population) as avg_population
FROM population
GROUP BY country
"""
)
print(result)
shape: (2, 2)
┌─────────────┬────────────────┐
│ country ┆ avg_population │
│ --- ┆ --- │
│ str ┆ f64 │
╞═════════════╪════════════════╡
│ Netherlands ┆ 900000.0 │
│ USA ┆ 3.8202e6 │
└─────────────┴────────────────┘
ORDER BY
ORDER BY
语句用于按一个或多个列对查询结果集进行升序或降序排序。
result = ctx.execute(
"""
SELECT city, population
FROM population
ORDER BY population
"""
)
print(result)
shape: (6, 2)
┌─────────────┬────────────┐
│ city ┆ population │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════════╪════════════╡
│ Amsterdam ┆ 900000 │
│ Phoenix ┆ 1680000 │
│ Houston ┆ 2320000 │
│ Chicago ┆ 2705000 │
│ Los Angeles ┆ 3997000 │
│ New York ┆ 8399000 │
└─────────────┴────────────┘
JOIN
income = pl.DataFrame(
{
"city": [
"New York",
"Los Angeles",
"Chicago",
"Houston",
"Amsterdam",
"Rotterdam",
"Utrecht",
],
"country": [
"USA",
"USA",
"USA",
"USA",
"Netherlands",
"Netherlands",
"Netherlands",
],
"income": [55000, 62000, 48000, 52000, 42000, 38000, 41000],
}
)
ctx.register_many(income=income)
result = ctx.execute(
"""
SELECT country, city, income, population
FROM population
LEFT JOIN income on population.city = income.city
"""
)
print(result)
shape: (6, 4)
┌─────────────┬─────────────┬────────┬────────────┐
│ country ┆ city ┆ income ┆ population │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 │
╞═════════════╪═════════════╪════════╪════════════╡
│ USA ┆ New York ┆ 55000 ┆ 8399000 │
│ USA ┆ Los Angeles ┆ 62000 ┆ 3997000 │
│ USA ┆ Chicago ┆ 48000 ┆ 2705000 │
│ USA ┆ Houston ┆ 52000 ┆ 2320000 │
│ USA ┆ Phoenix ┆ null ┆ 1680000 │
│ Netherlands ┆ Amsterdam ┆ 42000 ┆ 900000 │
└─────────────┴─────────────┴────────┴────────────┘
函数
Polars 提供了多种 SQL 函数,包括:
- 数学函数:
ABS
,EXP
,LOG
,ASIN
,ACOS
,ATAN
, 等。 - 字符串函数:
LOWER
,UPPER
,LTRIM
,RTRIM
,STARTS_WITH
,ENDS_WITH
。 - 聚合函数:
SUM
,AVG
,MIN
,MAX
,COUNT
,STDDEV
,FIRST
等。 - 数组函数:
EXPLODE
,UNNEST
,ARRAY_SUM
,ARRAY_REVERSE
, 等。
有关支持的函数完整列表,请参阅 API 文档。以下示例演示了如何在查询中使用函数
result = ctx.execute(
"""
SELECT city, population
FROM population
WHERE STARTS_WITH(country,'U')
"""
)
print(result)
shape: (5, 2)
┌─────────────┬────────────┐
│ city ┆ population │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════════╪════════════╡
│ New York ┆ 8399000 │
│ Los Angeles ┆ 3997000 │
│ Chicago ┆ 2705000 │
│ Houston ┆ 2320000 │
│ Phoenix ┆ 1680000 │
└─────────────┴────────────┘
表函数
在之前的示例中,我们首先生成了一个 DataFrame
并将其注册到 SQLContext
中。Polars 还支持在 SQL 查询中使用表函数 read_xxx
直接读取 CSV、Parquet、JSON 和 IPC 文件。
result = ctx.execute(
"""
SELECT *
FROM read_csv('docs/assets/data/iris.csv')
"""
)
print(result)
shape: (150, 5)
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
│ sepal_length ┆ sepal_width ┆ petal_length ┆ petal_width ┆ species │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ Setosa │
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ Setosa │
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ Setosa │
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ Setosa │
│ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ Setosa │
│ … ┆ … ┆ … ┆ … ┆ … │
│ 6.7 ┆ 3.0 ┆ 5.2 ┆ 2.3 ┆ Virginica │
│ 6.3 ┆ 2.5 ┆ 5.0 ┆ 1.9 ┆ Virginica │
│ 6.5 ┆ 3.0 ┆ 5.2 ┆ 2.0 ┆ Virginica │
│ 6.2 ┆ 3.4 ┆ 5.4 ┆ 2.3 ┆ Virginica │
│ 5.9 ┆ 3.0 ┆ 5.1 ┆ 1.8 ┆ Virginica │
└──────────────┴─────────────┴──────────────┴─────────────┴───────────┘