I've started the changes to the API with the Spider modifications. Since these changes are mainly additions to the Spider base class, it was a reasonable task to start with. A new from_crawler class method was introduced to deal with handling the current context of execution (Crawler internals) to each spider at initialization time. This was implemented as non-intrusively as possible, since maintaining backward compatibility in this key component in every Scrapy project is a major concern.
After that, I decided to focus on some of the changes in the Crawler class. I began adjusting the init and crawl methods to the new parameters that were previously defined in my proposal. Changing signatures in public functions is a sensitive matter, since this could easily lead to breaking backward compatibility. Python doesn't support function overloading, so we have to work around taking different inputs for our routines.
Taking this into account, and making note that its usage has to be as simple as possible to actually be helpful, I made the following decorator to allow different positional arguments in the same function:
def deprecated_signature(old_implementation, types=None, warning_msg="Calling {fname} with the given " "arguments is no longer supported, please refer " "to the documentation for its new usage."): """Decorator factory that defines a deprecated implementation with different signature for a given function. An additional dictionary of argument names and their types can be provided to differentiate functions with the same argument count but different expected types for them. """ def decorator(func): @wraps(func) def wrapper(*args, **kwargs): try: call_bindings = inspect.getcallargs(old_implementation, *args, **kwargs) if types is not None: _check_types(call_bindings, types) except TypeError: f = func else: warnings.warn(warning_msg.format(fname=func.__name__), ScrapyDeprecationWarning, stacklevel=2) f = old_implementation return f(*args, **kwargs) wrapper.__name__ = func.__name__ wrapper.__doc__ = func.__doc__ return wrapper return decorator def _check_types(bindings, types): for arg, value in six.iteritems(bindings): if arg in types and not isinstance(value, types[arg]): raise TypeError
An issue about this approach is that we need a reference to the old_implementation function object, which it's not reachable if we are decorating a method and its old implementation is another method in the same object, since they are created after the whole the class is defined. A more flexible and convenient implementation allows dynamically loading both old_implementation and the types of the arguments:
def deprecated_signature(old_implementation, types=None, module='', warning_msg="Calling {fname} with the given " "arguments is no longer supported, please refer " "to the documentation for its new usage."): """Decorator factory that defines a deprecated implementation with different signature for a given function. An additional dictionary of argument names and their types can be provided to differentiate functions with the same argument count but different expected types for them. If `old_implementation` is a string, that reference will be dynamically loaded from the `module` provided. If that last argument is an empty string (default) it will be expanded to the module of the decorated routine. If it's the None value, the module will be strip out of the `old_implementation` string. Values from the `types` dictionary can also be dynamically loaded, but it will always be assumed that each string contains the full path for the object. """ def decorator(func): @wraps(func) def wrapper(*args, **kwargs): try: old_call = _load_or_return_object(old_implementation, module or func.__module__) call_bindings = inspect.getcallargs(old_call, *args, **kwargs) if types is not None: _check_types(call_bindings, types) except TypeError: f = func else: warnings.warn(warning_msg.format(fname=func.__name__), ScrapyDeprecationWarning, stacklevel=2) f = old_call return f(*args, **kwargs) wrapper.__name__ = func.__name__ wrapper.__doc__ = func.__doc__ return wrapper return decorator def _check_types(bindings, types): types = {k: _load_or_return_object(v) for k, v in six.iteritems(types)} for arg, value in six.iteritems(bindings): if arg in types and not isinstance(value, types[arg]): raise TypeError def _load_or_return_object(path_or_object, module=None): if isinstance(path_or_object, six.string_types): if module is None: module, path_or_object = path_or_object.rsplit('.', 1) return _load_object(path_or_object, module) return path_or_object def _load_object(path, module): parent = import_module(module) for children in path.split('.'): parent = getattr(parent, children) return parent
The next step is to merge CrawlerProcess into Crawler, but I judged that it was best to leave it for later since it would probably break a lot of tests, and I won't be able to verify following changes until I have fixed them.
I proceeded to work on the scrapy.cfg config file. I've deprecated two settings, SPIDER_MODULES and SPIDER_MANAGER_CLASS, in favor of options in this config file. Currently I'm working on populating them from the settings file for backward compatibility, while doing a simple check on required options and expansion of defaults.
I haven't created a pull request yet since I'm constantly rewriting the commit history, but my progress can be tracked on my local repository, in the api-cleanup branch.
No comments:
Post a Comment