PMunch/futhark

Better `sanitizeName`

Opened this issue · 5 comments

At the moment, Futhark applies renameCallback before other fixed renaming rules, including nimIdentNormalize pass, which nullify all the NEP1 transformations user supplied renameCallback apply.

I'd say nimIdentNormalize is not really NEP1 friendly

I've recently pushed PR to anycase package to support compile time string transformations https://github.com/epszaw/anycase/issues/6

What about replacing nimIdentNormalize in sanitizeName with something that is more renameCallback and NEP1 friendly? Or maybe provide an additional renameTailCallback to fix things after the fixed transformations?

I tested my idea replacing nimIdentNormalize call in futhark.nim and GDAL library, and it seems to be working nicely

changes to futhark.nim sanitizeName proc

import anycase

...

proc sanitizeName(usedNames: var HashSet[string], origName: string, kind: string, renameCallback: RenameCallback, partof = ""): string {.compileTime.} =
  result = origName
  if not renameCallback.isNil:
    result = result.renameCallback(kind, partof)
  if result.startsWith("_"):
    if result.startsWith("__"):
      result = "compiler_" & result[2..^1]
    else:
      result = "internal_" & result[1..^1]

  # result = result.nimIdentNormalize()
  result = case kind:
    of "enum", "struct", "union", "const", "typedef": result.pascal # AnyCase
    of "proc", "arg", "enumval", "field": result.camel # anyCase
    else: result.nimIdentNormalize() # [A|a]nycase

  var renamed = false
  if usedNames.contains(result) or result in builtins:
    result.add kind
    renamed = true
  if usedNames.contains(result) or result in builtins:
    result.add hash(origName).toHex
    renamed = true
  if renamed:
    hint "Renaming \"" & origName & "\" to \"" & result & "\"" & (if partof.len != 0: " in " & partof else: "")
  usedNames.incl result

Results for GDAL library. The dirty part is still under user control via renameCallback

import std/[sugar, os, strutils]
import futhark


func renameCb(n, k: string, p = ""): string =
  result = n
  for prefix in ["GDAL_", "OGR_"]:
    if result.startsWith prefix:
      result = result.replace(prefix, "")
      break
  for prefix in ["GDAL", "OGR", "OCT", "OSR", "CPL", "VSI"]:
    if result.startsWith prefix:
      result = result.replace(prefix, prefix.toLower)
      break
  result = result.replace("_", "")


importc:
  outputPath currentSourcePath.parentDir / "gdal_c.nim"
  renameCallback renameCb
  "gdal/gdal.h"
  "gdal/ogr_api.h"
  "gdal/ogr_srs_api.h"
  "gdal/cpl_conv.h"
  "gdal/cpl_vsi.h"


proc main =
  var major, minor, patch: cint
  assert ogrGetGEOSVersion(major.addr, minor.addr, patch.addr)
  echo "GEOS version: ", major, ".", minor, ".", patch
  osrGetPROJVersion(major.addr, minor.addr, patch.addr)
  echo "PROJ version: ", major, ".", minor, ".", patch
  
  gdalAllRegister()
  let spatialReference = osrNewSpatialReference(nil)
  assert spatialReference.osrImportFromEPSG(4326) == 0

  var geometry: OgrGeometryH
  var wkt = "POINT(1 2)".cstring
  assert gCreateFromWkt(wkt.addr, spatialReference, geometry.addr) == 0

  dump geometry.gGetGeometryType()
  dump geometry.gGetGeometryName()
  let json = geometry.gExportToJson()
  dump json
  vsiFree(json)

  gDestroyGeometry(geometry)
  osrDestroySpatialReference(spatialReference)


main()

generated gdal_c.zip and rename hints

Hint: Renaming "__time_t" to "Timettypedef" [User]
Hint: Renaming "ptr" to "ptrarg" [User]
Hint: Renaming "is" to "isarg" [User]
Hint: Renaming "ptr" to "ptrarg" [User]
Hint: Renaming "ptr" to "ptrarg" [User]

output

GEOS version: 3.10.2
PROJ version: 8.2.1
geometry.gGetGeometryType() = wkbPoint
geometry.gGetGeometryName() = POINT
json = { "type": "Point", "coordinates": [ 2.0, 1.0 ] }

This code has a slight issue, it won't properly handle name collisions. For anyone who didn't follow the IRC discussion when this came up the reason why sanitizeName calls nimIdentNormalize is so that the hash table lookup of names is guaranteed to get identifiers which will collide in Nim (but which might not collide in C because it's case and underscore sensitive). The proper workaround for this is to still use the normalized name in these data-structures but change to storing the original (or anycased) names and normalizing them when doing lookups.

As a motivating example I wrote a test case in the normalize branch which shows off the kind of thing the current solution was meant to handle. It tries to wrap this perfectly valid (albeit a bit extreme) C header file:

#define my_var 1
#define myVar 2
#define myvar 3
#define MYVAR 4
#define MY_VAR 5
#define MyVar 6
#define My_Var 7

The current version wraps this fine and renames fields to avoid collisions as we can see in the output:

/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "myVar" to "myvarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "myvar" to "myvarconst00000000C690172C" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "MY_VAR" to "Myvarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "MyVar" to "Myvarconst000000002E4AA817" [User]
/home/peter/Projects/futhark/src/futhark.nim(190, 5) Hint: Renaming "My_Var" to "Myvarconst00000000038D4D97" [User]

The names get appended some information before falling back to a hash of the original identifier.

With your version of sanitizeName we instead see this output:

/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "myVar" to "MyVarconst" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "MY_VAR" to "MyVarconst00000000AEC77163" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "MyVar" to "MyVarconst000000002E4AA817" [User]
/home/peter/Projects/futhark/src/futhark.nim(197, 5) Hint: Renaming "My_Var" to "MyVarconst00000000038D4D97" [User]
/home/peter/Projects/futhark/src/futhark.nim(908, 3) Hint: Caching Futhark output in /home/peter/.cache/nim/tnormalize_d/futhark_33DDD4433451E496.nim [User]
/home/peter/Projects/futhark/src/futhark.nim(141, 21) Hint: Declaration of Myvar already exists, not redeclaring [User]
/home/peter/Projects/futhark/src/futhark.nim(141, 21) Hint: Declaration of MYVAR already exists, not redeclaring [User]

As we can see it doesn't properly rename all the identifiers, and some of them are aliased together in the wrapper making #define myvar 3 and #define MYVAR 4 now point to #define my_var 1. This means that these values are not possible to get with the wrapper, and with --nodeclguards the code fails to compile at all because of the re-definition.

Futhark no longer performs the nimIdentNormalize step on the output, so the original names from the C sources should be preserved, and you should now be able to create a renameCallback which restyles identifiers as you seem fit. Please try this out and tell me how it works!

I tried running this generation script that worked withut errors using futhark < 0.13.2

import std/[os]

import futhark
importc:
  outputPath currentSourcePath.parentDir / "gdal_c.nim"
  "gdal/gdal.h"
  "gdal/ogr_api.h"
  "gdal/ogr_srs_api.h"

with 0.13.2 I'm getting

/home/arkanoid/.nimble/pkgs2/futhark-0.13.2-33b5d9892d6ef514a5cc7e92585d319a15c23ab3/futhark.nim(694, 14) template/generic instantiation of `importcImpl` from here
(1, 21) Error: undeclared identifier: 'GDALRPCInfoV2_536873295'

gdal_c.nim seems generated anyway correctly, it might be truncated but it seems to contain the calls I need. C names are maintained now! Nice :)

EDIT:

a quick git diff shows that the newly generated gdal_c.nim contains MORE symbols compared to the previous one

Yes I've seen something similar while wrapping Gtk. It generates the file properly, but for some reason the macro output is invalid. So if you run it twice so it grabs the cached file then everything works fine