✅ an Accounting Analytics exercise

a Web-centric (JavaScript) approach

Objectives

  • Load and inspect a financial dataset.
  • Wrangle and filter data based on financial metrics.
  • Compute and interpret summary statistics.
  • Create visualizations to explore data distributions and relationships.
  • Summarize data by groups and visualize results.
  • Build an interactive dashboard using JavaScript for web-based analytics.
  • Compare Python and JavaScript approaches for accounting analytics.

Prerequisites

  • We have completed five weeks on core JavaScript programming, including data types, loops, functions, and arrays / objects.
  • For this exercise, read the documentation for the following JavaScript libraries before completing the exercises below

Question: Load a CSV dataset of selected U.S. firms (available here)

listOfFirms = await aq.loadCSV(urlData);

Question: Select only firms with more than $50 billion of assets, and generate new columns based on existing ones (e.g., log of assets)

viewof filtered_data = listOfFirms
  .filter((d) => d.Assets > 50000)
  .derive({
    log_AT: (d) => op.log(d.Assets), // log of assets
    grossMargin: (d) => ((d.Sales - d.CostOfSales) / d.Sales) * 100, // gross profit margin
    AssetTO: (d) => d.Sales / d.Assets // Asset Turnover
  })
.orderby(['Industry','grossMargin'])
  .view()

Above, we dynamically load this CSV file, and added three new columns (i.e., log of assets, gross profit margin, and asset turnover ratio).

Question: Calculate the mean, median, and standard deviation of the log of assets (log_AT)

viewof summary_stats = filtered_data
  .rollup({
    mean: op.mean("log_AT"),
    median: op.median("log_AT"),
    std: op.stdev("log_AT"),
    count: op.count()
  })
  .view()

Question: Plot a histogram for log(Assets)

Plot.rectY(
  filtered_data,
  Plot.binX({ y: "count" }, { x: "log_AT", thresholds: 30 })  // # of bins
).plot()

Question: Calculate the median gross profit margin and asset turnover for each industry

viewof gmByIndustry = filtered_data
  .groupby("Industry")
  .rollup({
    "median Margin": op.median("grossMargin"),
    "median Asset Turnover": op.median("AssetTO"),
    count: op.count()
  })
  .orderby(aq.desc("median Margin"))
  .view()

Question: Bar plot of the median gross profit, by industry (ordered by margin, descending)

Plot.plot({
  marks: [
    Plot.barY(gmByIndustry, {
      x: "Industry",
      y: "median Margin",
      sort: { x: "-y" }
    })
  ]
})

Question: Scatter plot of the gross profit vs asset turnover, by industry

Plot.plot({
  grid: true,
  color: {
    legend: true
  },
  marks: [
    Plot.dot(filtered_data, {
      x: "AssetTO",
      y: "grossMargin",
      stroke: "Industry"
    })
  ]
})

Adding interactivity

Code
Plot.rectY(
  filtered_data,
  Plot.binX({ y: "count" }, { x: "log_AT", thresholds: bins })  // # of bins
).plot()

Teaching Notes

Overview

This exercise is a modified and expanded version of the Diamonds Exercise from Dr. Quinn Swanquist’s AC 547 – Accounting Analytics (Python) course at the University of Alabama, publicly available at Dr. Swanquist’s website and this GitHub repository.

Instead of diamonds, we use this financial dataset containing attributes of publicly traded U.S. firms. This exercise is completed in JavaScript, using Arquero for data wrangling and Observable Plot for visualizations.

This allows us to compare with the Python approach to data analysis, which uses Pandas and Seaborn (see Dr. Swanquist’s solution, the original archived here in case it gets revised).

The goal is to explore how these tools handle data wrangling, visualization, and dashboard creation, highlighting their strengths and differences in an accounting analytics context.

Assumptions / Prerequisites

  • For this exercise, we assume students have completed a few weeks of classes on variables, data types, loops, array/list, and objects. Many of these concepts are similar in both Web and Python programming.

Comparing Python and JavaScript

A visual comparison of these two approaches suggests they are similar in capability. Both sets of code are concise and readable.

The Python/Pandas approach benefits from Pandas’ longer history and larger user base compared to Arquero. On the other hand, Arquero, being newer, has the advantage of a modern design that avoids some of the longstanding issues that have plagued Pandas.

Arquero (JavaScript)

Arquero, a library for data manipulation, is heavily inspired by the “data verb” grammar of R’s popular dplyr library. It is designed to bring high-performance data processing to the modern, web-centric JavaScript environment.

Modern Design: Benefiting from its recent development, Arquero’s design avoids some of the API inconsistencies that have developed in Pandas over its long history. This results in a more streamlined and consistent set of commands (verbs) for data transformation.

Client-Side Power: Arquero’s primary advantage is that it runs natively in the browser. This allows for complex data filtering, transformation, and analysis to happen on the client-side without needing a server-side Python kernel.

Interactivity: It is optimized for creating dynamic, interactive web applications and dashboards. It integrates perfectly with JavaScript visualization libraries (like D3.js, Vega-Lite, or Observable Plot), allowing developers to build charts that users can manipulate in real-time.

Pandas (Python)

While the Python/Pandas ecosystem is well-established and supported by numerous libraries, the Python/Pandas solution illustrates a few drawbacks that often frustrate students who are new to programming.

  • Pandas employs Boolean indexing along with bitwise operators (instead of the more intuitive logical operators) for filtering. A typical filter requires creating a mask (e.g., mask = df[‘column’] > value) and then applying it (e.g., df[mask]), or more commonly, chaining it inline as df[df[‘column’] > value]. This inline form references the DataFrame df twice, once inside the condition and once as the outer indexer, which creates a visually repetitive structure that can feel redundant, cumbersome, and perplexing to beginners.

image

  • In Pandas workflows involving .groupby() followed by aggregation (e.g., .sum(), .mean()), the resulting DataFrame often features a MultiIndex where the grouping column(s) become part of the index. This structure is efficient for advanced operations but can complicate further manipulation, visualization, or export for beginners. Invoking .reset_index() converts the index back to regular columns (or a default integer index), which is a frequent requirement for cleaner, more intuitive data handling. But for accounting and business students new to programming, grasping the index concept, why it changes and how to reset it, poses a notable challenge, as it diverges from spreadsheet-like expectations.

    image

Live copy of this webpage