chickadee » bind

bind

Generates wrappers from C/C++ source code.

Usage

 (import bind)

Requirements

Documentation

This extension provides a parser for a restricted subset of C and C++ that allows the easy generation of foreign variable declarations, procedure bindings and C++ class wrappers. The parser is invoked via the bind form, which extracts binding information and generates the necessary code. An example:

(bind "double sin(double);")

(print (sin 3.14))

The parser would generate code that is equivalent to

 (define sin (foreign-lambda double "sin" double))

Another example, here using C++. Consider the following class:

// file: foo.h

class Foo {
 private:
  int x_;
 public:
  Foo(int x);
  void setX(int x);
  int getX();
};

To generate a wrapper class that provides generic functions for the constructor and the setX and getX methods, we can use the following class definition:

; file: test-foo.scm

(import bind coops cplusplus-object)

(bind* "#include \"Foo.h\"")

(define x (new <Foo> 99))
(print (getX x))              ; prints ''99''
(setX x 42)
(print (getX x))              ; prints ''42''
(delete x)

Provided the file foo.o contains the implementation of the class Foo, the given example could be compiled like this (assuming a UNIX like environment):

% csc test-foo.scm foo.o -c++

To use the C++ interface, the coops extension is needed. Additionally, the class <c++-object> must be available, which is provided in the cplusplus-object extension.

To use this facility, you can either use the syntactic forms provided by the bind extension or run the chicken-bind standalone program to process a C/C++ file and generate a file containing wrapper code which then can be compiled with the CHICKEN compiler.

As a debugging aid, you can pass -debug F to the Scheme compiler to see the generated wrapper code.

chicken-bind accepts a number of command-line options to enable/disable various options, enter

 % chicken-bind -help

for more information.

General operation

The parser will generally perform the following functions:

Basic token-substitution of macros defined via #define is performed. The preprocessor commands #ifdef, #ifndef, #else, #endif, #undef and #error are handled. The preprocessor commands #if and #elif are not supported and will signal an error when encountered by the parser, because C expressions (even if constant) are not parsed. The preprocessor command #pragma is allowed but will be ignored.

During processing of C code, the macro CHICKEN is defined (similar to the C compiler option -DCHICKEN).

Macro- and type-definitions are available in subsequent bind declarations. C variables declared generate a procedure with zero or one argument with the same name as the variable. When called with no arguments, the procedure returns the current value of the variable. When called with an argument, then the variable is set to the value of that argument. C and C++ style comments are supported. Variables declared as const will generate normal Scheme variables, bound to the initial value of the variable.

Function-, member-function and constructor/destructor definitions may be preceded by the ___safe qualifier, which marks the function as (possibly) performing a callback into Scheme. If a wrapped function calls back into Scheme code, and ___safe has not been given very strange and hard to debug problems will occur.

Functions and member functions prefixed with ___discard and a result type that maps to a Scheme string (c-string), will have their result type changed to c-string* instead.

Constants (as declared by #define or enum) are not visible outside of the current Compilation units unless the export-constants option has been enabled. Only numeric or character constants are directly supported.

Function-arguments may be preceded by ___in, ___out and ___inout qualifiers to specify values that are passed by reference to a function, or returned by reference. Only basic types (booleans, numbers and characters) can be passed using this method. During the call a pointer to a temporary piece of storage containing the initial value (or a random value, for ___out parameters) will be allocated and passed to the wrapped function. This piece of storage is subject to garbage collection and will move, should a callback into Scheme occur that triggers a garbage collection. Multiple ___out and ___inout parameters will be returned as multiple values, preceded by the normal return value of the function (if not void). Here is a simple example:

(bind* #<<EOF
#ifndef CHICKEN
#include <math.h>
#endif

double modf(double x, ___out double *iptr);
EOF
)

(let-values ([(frac int) (modf 33.44)])
  ...)

Function-arguments may be preceded by ___length(ID), where ID designates the name of another argument that must refer to a number vector or string argument. The value of the former argument will be computed at run-time and thus can be omitted:

(import srfi-4)

(bind* #<<EOF
double sumarray(double *arr, ___length(arr) int len)
{
  double sum = 0;

  while(len--) sum += *(arr++);

  return sum;
}
EOF
)

(print (sumarray (f64vector 33 44 55.66)))

The length variable may be positioned anywhere in the argument list. Length markers may only be specified for arguments passed as SRFI-4 byte-vectors, byte-vectors (as provided by the lolevel library unit) or strings.

Structure and union definitions containing actual field declarations generate getter procedures (and SRFI-17 setters when declared ___mutable or the mutable-fields option has been enabled) The names of these procedures are computed by concatenating the struct (or union) name, a hyphen ("-") and the field name. Structure definitions with fields may not be used in positions where a type specifier is normally expected. The field accessors operate on struct/union pointers only. Additionally a zero-argument procedure named make-<structname> will be generated that allocates enough storage to hold an instance of the structure (or union). Prefixing the definition with ___abstract will omit the creation procedure.

(bind* #<<EOF
struct My_struct { int x; ___mutable float y; };

typedef struct My_struct My_struct;

My_struct *make_struct(int x, float y) 
{
  My_struct *s = (My_struct *)malloc(sizeof(My_struct));
  s->x = x;
  s->y = y;
  return s;
}
EOF
)

will generate the following definitions:

(make-My_struct) -> PTR
(My_struct-x PTR) -> INT
(My_struct-y PTR) -> FLOAT
(set! (My_struct-y PTR) FLOAT)
(make_struct INT FLOAT) -> PTR

As of version 1.0, nested structs are supported. While nested unions are not, pointers to either unions and structs are.

Mutable struct-members of type c-string or c-string* will copy the passed argument string when assigned (using strdup(3)), since data may move in subsequent garbage collections.

All specially handled tokens preceded with ___ are defined as C macros in the header file chicken.h and will usually expand into nothing, so they don't invalidate the processed source code.

C++ namespace declarations of the form namespace NAME @{... @} recognized but will be completely ignored.

Keep in mind that this is not a fully general C/C++ parser. Taking an arbitrary header file and feeding it to bind will in most cases not work or generate ridiculous amounts of code. This FFI facility is for carefully written header files, and for declarations directly embedded into Scheme code.

Syntactic forms

bind

(bind STRING ...)syntax

Parses the C code in STRING ... and expands into wrapper code providing access to the declarations defined in it.

bind*

(bind* STRING ...)syntax

Similar to bind, but also embeds the code in the generated Scheme expansion using foreign-declare.

bind-type

(bind-type TYPENAME SCHEMETYPE [CONVERTARGUMENT [CONVERTRESULT]])syntax

Declares a foreign type transformation, similar to define-foreign-type. There should be two to four arguments: a C type name, a Scheme foreign type specifier and optional argument- and result-value conversion procedures.

;;;; foreign type that converts to unicode (assumes 4-byte wchar_t):
;
; - Note: this is rather kludgy and is only meant to demonstrate the `bind-type'
;         syntax

(import srfi-4 bind)

(define mbstowcs (foreign-lambda int "mbstowcs" nonnull-u32vector c-string int))

(define (str->ustr str)
  (let* ([len (string-length str)]
         [us (make-u32vector (add1 len) 0)] )
    (mbstowcs us str len)
    us) )

(bind-type unicode nonnull-u32vector str->ustr)

(bind* #<<EOF
static void foo(unicode ws)
{
  printf("%ls\n", ws);
}
EOF
)

(foo "this is a test!")

bind-opaque-type

(bind-opaque-type TYPENAME SCHEMETYPE)syntax

Similar to bind-type, but provides automatic argument- and result conversions to wrap a value into a structure:

(bind-opaque-type myfile (pointer "FILE"))

(bind "myfile fopen(char *, char *);")

(fopen "somefile" "r")   ==> <myfile>

(bind-opaque-type TYPENAME TYPE) is basically equivalent to (bind-type TYPENAME TYPE TYPE->RECORD RECORD->TYPE) where TYPE->RECORD and RECORD->TYPE are compiler-generated conversion functions that wrap objects of type TYPE into a record and back.

bind-file

(bind-file FILENAME ...)syntax

Reads the content of the given files and generates wrappers using bind. Note that FILENAME ... is not evaluated.

bind-file*

(bind-file* FILENAME ...)syntax

Reads the content of the given files and generates wrappers using bind*.

bind-rename

(bind-rename CNAME SCHEMENAME)syntax

Defines to what a certain C/C++ name should be renamed. CNAME specifies the C/C++ identifier occurring in the parsed text and SCHEMENAME gives the name used in generated wrapper code.

bind-rename/pattern

(bind-rename/pattern REGEX REPLACEMENT)syntax

Declares a renaming pattern to be used for C/C++ identifiers occurring in bound code. REGEX should be a string or SRE and replacement a string, which may optionally contain back-references to matched sub-patterns.

bind-options

(bind-options OPTION VALUE ...)syntax

Enables various translation options, where OPTION is a keyword and VALUE is a value given to that option. Note that VALUE is not evaluated.

Possible options are:

export-constants
export-constants: BOOLEAN

Define global variables for constant-declarations (with #define or enum), making the constant available outside the current compilation unit.

class-finalizers
class-finalizers: BOOLEAN

Automatically generates calls to set-finalizer! so that any unused references to instances of subsequently defined C++ class wrappers will be destroyed. This should be used with care: if the embedded C++ object which is represented by the reclaimed coops instance is still in use in foreign code, unpredictable things will happen.

mutable-fields
mutable-fields: BOOLEAN

Specifies that all struct or union fields should generate setter procedures (the default is to generate only setter procedures for fields declared ___mutable).

constructor-name
constructor-name: STRING

Specifies an alternative name for constructor methods (the default is constructor), a default-method for which is defined in the cplusplus-object extension.

destructor-name
destructor-name: STRING

Specifies an alternative name for destructor methods (the default is destructor), a default-method for which is defined in the cplusplus-object extension.

exception-handler
exception-handler: STRING

Defines C++ code to be executed when an exception is triggered inside a C++ class member function. The code should be one or more catch forms that perform any actions that should be taken in case an exception is thrown by the wrapped member function:

(bind-options exception-handler: "catch(...) { return 0; }")

(bind* #<<EOF
class Foo {
 public:
  Foo *bar(bool f) { if(f) throw 123; else return this; }
};
EOF
)

(define f1 (new <Foo>))
(print (bar f1 #f))
(print (bar f1 #t))
(delete f1)

will print <Foo> and #f, respectively.

full-specialization
full-specialization: BOOLEAN

Enables full specialization mode. In this mode all wrappers for functions, member functions and static member functions are created as fully specialized coops methods. This can be used to handle overloaded C++ functions properly. Only a certain set of foreign argument types can be mapped to coops classes, as listed in the following table:

TypeClass
char<char>
bool<bool>
c-string<string>
unsigned-char<exact>
byte<exact>
unsigned-byte<exact>
[unsigned-]int<exact>
[unsigned-]short<exact>
[unsigned-]long<integer>
[unsigned-]integer<integer>
float<inexact>
double<inexact>
number<number>
(enum _)char<exact>
(const T)char(as T)
(function ...)<pointer>
c-pointer<pointer>
(pointer _)<pointer>
(c-pointer _)<pointer>
u8vector<u8vector>
s8vector<s8vector>
u16vector<u16vector>
s16vector<s16vector>
u32vector<u32vector>
s32vector<s32vector>
f32vector<f32vector>
f64vector<f64vector>

All other foreign types are specialized as #t.

Full specialization can be enabled globally, or only for sections of code by enclosing it in

(bind-options full-specialization: #t)

(bind #<<EOF
...
int foo(int x);
int foo(char *x);
...
EOF
)

(bind-options full-specialization: #f)

Alternatively, member function definitions may be prefixed by ___specialize for specializing only specific members.

prefix
 prefix: STRING

Sets a prefix that should be be added to all generated Scheme identifiers. For example

(bind-options prefix: "mylib:")

(bind "#define SOME_CONST 42")

would generate the following code:

(define-constant mylib:SOME_CONST 42)

To switch prefixing off, use the value #f. Prefixes are not applied to Class names.

default-renaming
default_renaming: STRING

Chooses a standard name-transformation, converting underscores (_) to hyphens (-) and transforming CamelCase into camel-case. All uppercase characters are also converted to lowercase. The result is prefixed with the argument STRING (equivalent to the prefix ption).

foreign-transform
foreign-transformer: procedure

Applies the supplied procedure before emitting code. Note that the procedure is evaluated at compile-time. You can reference procedure-names when you use define-for-syntax. The procedure takes in two arguments: a bind-foreign-lambda s-expression and a rename procedure (for hygienic macros). The car of bind-foreign-lambda form is either a renamed version of 'foreign-lambda*' or 'foreign-safe-lambda*'.

 (define-for-syntax (my-transformer form rename)
   (pp form) `(lambda () #f)) ;; prints and disables all bindings!
 (bind-options foreign-transformer: my-transformer)
 (bind "void foo();")

foreign-transformer may be useful with large header-files that require custom type-conversion, where bind-foreign-type isn't flexible enough. It is possible to write foreign-transformers that allow returning structs by value, for example, by converting to a blob or u8vector.

bind-foreign-lambdas are similar to foreign-lambdas, but use s-expressions instead of flat C strings to allow simple modification. See some of the tests for how these may be used: http://bugs.call-cc.org/browser/release/4/bind/trunk/tests.

bind-include-path

(bind-include-path STRING ...)syntax

Appends the paths given in STRING ... to the list of available include paths to be searched when an #include ... form is processed by bind.

Grammar

The parser understand the following grammar:

PROGRAM = PPCOMMAND
        | DECLARATION ";"

PPCOMMAND = "#define" ID [TOKEN ...]
          | "#ifdef" ID
          | "#ifndef" ID
          | "#else"
          | "#endif"
          | "#undef" ID
          | "#error" TOKEN ...
          | "#include" INCLUDEFILE
          | "#import" INCLUDEFILE
          | "#pragma" TOKEN ...

DECLARATION = FUNCTION
            | VARIABLE
            | ENUM
            | TYPEDEF
            | CLASS
            | CONSTANT
            | STRUCT
            | NAMESPACE
            | USING

STRUCT = ("struct" | "union") ID ["{" {["___mutable"] TYPE {"*"} ID {"," {"*"} ID}} "}]

NAMESPACE = "namespace" NAMESPACEDEF
NAMESPACEDEF = ID ("{" DECLARATION ... "}" | "::" NAMESPACEDEF) 

USING = "using" "namespace" ID

INCLUDEFILE = "\"" ... "\""
            | "<" ... ">"

FUNCTION = {"___safe" | "___specialize" | "___discard"} [STORAGE] TYPE ID "(" ARGTYPE "," ... ")" [CODE]
         | {"___safe" | "___specialize" | "___discard"} [STORAGE] TYPE ID "(" "void" ")" [CODE]

ARGTYPE = [IOQUALIFIER] TYPE [ID ["=" NUMBER]]

IOQUALIFIER = "___in" | "___out" | "___inout" | LENQUALIFIER

LENQUALIFIER = "___length" "(" ID ")"

VARIABLE = [STORAGE] ENTITY ["=" INITDATA]

ENTITY = TYPE ID ["[" ... "]"]

STORAGE = "extern" | "static" | "volatile" | "inline"

CONSTANT = "const" TYPE ID "=" INITDATA

ENUM = "enum" [ID] "{" ENUMDEF "}" |
       "enum" ":" ID "{ ENUMDEF "}" |
       "enum" ID ":" TYPE "{" ENUMDEF "}" |
       "enum" ("class" | "struct") ID [":" TYPE] "{" ENUMDEF "}"
ENUMDEF = ID ["=" (NUMBER | ID)] "," ...

TYPEDEF = "typedef" TYPE ["*" ...] [ID]

TYPE = ["const"] BASICTYPE [("*" ... | "&" | "<" TYPE "," ... ">" | "(" "*" [ID] ")" "(" TYPE "," ... ")")]

BASICTYPE = ["unsigned" | "signed"] "int" 
          | ["unsigned" | "signed"] "char" 
          | ["unsigned" | "signed"] "short" ["int"]
          | ["unsigned" | "signed"] "long" ["int"]
          | ["unsigned" | "signed"] "___byte" 
          | "size_t"
          | "float"
          | "double"
          | "void"
          | "bool"
          | "___bool"
          | "___scheme_value"
          | "___scheme_pointer"
          | "___byte_vector"
          | "___pointer_vector"
          | "___pointer" TYPE "*"
          | "C_word"
          | "___fixnum"
          | "___number"
          | "___symbol"
          | "___u32"
          | "___s32"
          | "___s64"
          | "__int64"
          | "int64_t"
          | "uint32_t"
          | "uint64_t"
          | "struct" ID
          | "union" ID
          | "enum" ID
          | ID

CLASS = ["___abstract"] "class" ID [":" [QUALIFIER] ID "," ...] "{" MEMBER ... "}"

MEMBER = [QUALIFIER ":"] ["virtual"] (MEMBERVARIABLE | CONSTRUCTOR | DESTRUCTOR | MEMBERFUNCTION)

MEMBERVARIABLE = TYPE ID ["=" INITDATA]

MEMBERFUNCTION = {"___safe" | "static" | "___specialize" | "___discard"} TYPE ID "(" ARGTYPE "," ... ")" ["const"] ["=" "0"] [CODE]
               | {"___safe" | "static" | "___specialize" | "___discard"} TYPE ID "(" "void" ")" ["const"] ["=" "0"] [CODE]

CONSTRUCTOR = ["___safe"] ["explicit"] ID "(" ARGTYPE "," ... ")" [BASECONSTRUCTORS] [CODE]

DESTRUCTOR = ["___safe"] "~" ID "(" ["void"] ")" [CODE]

QUALIFIER = ("public" | "private" | "protected")

NUMBER = <a C integer or floating-point number, in decimal, octal or hexadecimal notation>

INITDATA = <everything up to end of chunk>

BASECONSTRUCTORS = <everything up to end of chunk>

CODE = <everything up to end of chunk>

The following table shows how argument-types are translated:

C typeScheme type
[unsigned] charchar
[unsigned] short[unsigned-]short
[unsigned] int[unsigned-]integer
[unsigned] long[unsigned-]long
___u32unsigned-integer32
___s32integer32
___s64integer64
int64_tinteger64
__int64integer64
uint32_tunsigned-integer32
uint64_tunsigned-integer64
floatfloat
doubledouble
size_tunsigned-integer
boolint
___boolint
___fixnumint
___numbernumber
___symbolsymbol
___scheme_valuescheme-object
C_wordscheme-object
___scheme_pointerscheme-pointer
char *c-string
signed char *s8vector
[signed] short *s16vector
[signed] int *s32vector
[signed] long *s32vector
unsigned char *u8vector
unsigned short *u16vector
unsigned int *u32vector
unsigned long *u32vector
float *f32vector
double *f64vector
___byte_vectorbyte-vector
___pointer_vectorpointer-vector
CLASS *(instance CLASS <CLASS>)
CLASS &(instance-ref CLASS <CLASS>)
TYPE *(pointer TYPE)
TYPE &(ref TYPE)
TYPE<T1, ...>(template TYPE T1 ...)
TYPE1 (*)(TYPE2, ...)(function TYPE1 (TYPE2 ...))

The following table shows how result-types are translated:

C type Scheme type
void void
[unsigned] char char
[unsigned] short [unsigned-]short
[unsigned] int [unsigned-]integer
[unsigned] long [unsigned-]long
___u32 unsigned-integer32
___s32 integer32
___s64 integer64
int64_t integer64
__int64 integer64
uint64_t unsigned-integer64
__uint64 unsigned-integer64
float float
double double
size_t unsigned-integer
bool bool
___bool bool
___fixnum int
___number number
___symbol symbol
___scheme_value scheme-object
char * c-string
TYPE * (c-pointer TYPE)
TYPE & (ref TYPE)
TYPE<T1, ...> (template TYPE T1 ...)
TYPE1 (*)(TYPE2, ...) (function TYPE1 (TYPE2 ...))
CLASS * (instance CLASS <CLASS>)
CLASS & (instance-ref CLASS <CLASS>)

The ___pointer argument marker disables automatic simplification of pointers to number-vectors and C-strings: normally arguments of type int * are handled as SRFI-4 s32vector number vectors. To force treatment as a pointer argument, precede the argument type with ___pointer. The same applies to strings: char * is by default translated to the foreign type c-string, but ___pointer char * is translated to (c-pointer char).

C notes

Foreign variable definitions for macros are not exported from the current compilation unit, but definitions for C variables and functions are.

bind does not embed the text into the generated C file, use bind* for that.

Functions with variable number of arguments are not supported.

C++ notes

Each C++ class defines a coops class, which is a subclass of <c++-object>. Instances of this class contain a single slot named this, which holds a pointer to a heap-allocated C++ instance. The name of the coops class is obtained by putting the C++ class name between angled brackets (<...>). coops classes are not seen by C++ code.

The C++ constructor is invoked by the constructor generic, which accepts as many arguments as the constructor. If no constructor is defined, a default-constructor will be provided taking no arguments.

To release the storage allocated for a C++ instance invoke the delete generic (the name can be changed by using the destructor-name option).

Static member functions are wrapped in a Scheme procedure named <class>::<member>.

Member variables and non-public member functions are ignored.

Virtual member functions are not seen by C++ code. Overriding a virtual member function with a coops method will not work when the member function is called by C++.

Operator functions and default arguments are not supported.

Exceptions must be explicitly handled by user code and may not be thrown beyond an invocation of C++ by Scheme code.

Generally, the following interface to the creation and destruction of wrapped C++ instances is provided:

constructor

constructor CLASS INITARGSprocedure

A generic function that, when invoked will construct a C++ object represented by the class CLASS, which should inherit from <c++-object>. INITARGS is a list of arguments that should be passed to the constructor and which must match the argument types for the wrapped constructor.

destructor

destructor OBJECTprocedure

A generic function that, when invoked will destroy the wrapped C++ object OBJECT.

new
new CLASS ARG1 ...procedure

A convenience procedure that invokes the constructor generic function for CLASS.

delete
delete OBJECTprocedure

A convenience procedure that invokes the destructor generic method of OBJECT.

Authors

felix winkelmann and Kristian Lein-Mathisen

Repository

This egg is hosted on the CHICKEN Subversion repository:

https://anonymous@code.call-cc.org/svn/chicken-eggs/release/5/bind

If you want to check out the source code repository of this egg and you are not familiar with Subversion, see this page.

License

This code is placed into the public domain.

Version History

1.2.6
support for nested namespaces and typed and enums (contributed by rnlf)
1.2.5
add missing (but documented) support for chicken-bind commandline option (by rnlf)
1.2.4
bugfixes and enhancements by evhan
1.2
bugfixes and egg file updates to add missing dependencies
1.1
update for changes in CHICKEN core
1.0
initial release for CHICKEN 5, based on version 1.2 from CHICKEN 4

Contents »