Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.
Note: These APIs are still in development and are subject to change.
Binary installers for the latest released version are available at the Python Package Index (PyPI). (Currently only available through the test respository)
python3 -m pip install malloy
- Malloy Language GitHub - Primary location for the malloy language source, documentation, and information
- Malloy Language - A quick introduction to the language
- eCommerce Example Analysis - A walkthrough of the basics on an ecommerce dataset (BigQuery public dataset)
- Modeling Walkthrough - An introduction to modeling via the Iowa liquor sales public data set (BigQuery public dataset)
- Malloy on YouTube - Watch demos / walkthroughs of Malloy
- Join our Malloy Slack Community! Use this community to ask questions, meet other Malloy users, and share ideas with one another.
- Use GitHub issues to provide feedback, suggest improvements, report bugs, and start new discussions.
Run named query from malloy file:
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/imdb"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
data = await runtime.load_file(home_dir + "/5_movie_complex.malloy").run(
named_query="horror_combo")
dataframe = data.df()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Get SQL from inline query using malloy file as source:
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/faa"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
[sql, connection
] = await runtime.load_file(home_dir + "/flights.malloy").get_sql(query="""
query: flights -> {
where: carrier ? 'WN' | 'DL', dep_time ? @2002-03-03
group_by:
flight_date is dep_time.day
carrier
aggregate:
daily_flight_count is flight_count
aircraft.aircraft_count
nest: per_plane_data is {
top: 20
group_by: tail_num
aggregate: plane_flight_count is flight_count
nest: flight_legs is {
order_by: 2
group_by:
tail_num
dep_minute is dep_time.minute
origin_code
dest_code is destination_code
dep_delay
arr_delay
}
}
}
""")
print(sql)
if __name__ == "__main__":
asyncio.run(main())
Write inline malloy model source and run query:
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/auto_recalls"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
data = await runtime.load_source("""
source: auto_recalls is table('duckdb:auto_recalls.csv') {
declare:
recall_count is count()
percent_of_recalls is recall_count/all(recall_count)*100
}
""").run(query="""
query: auto_recalls -> {
group_by: Manufacturer
aggregate:
recall_count
percent_of_recalls
}
""")
dataframe = data.df()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
git submodule init
git submodule update
python3 -m pip install -r requirements.dev.txt
scripts/gen-services.sh
scripts/gen-protos.sh
python3 -m pytest